mirror of
https://github.com/ProgramSnail/Lama.git
synced 2025-12-06 06:48:48 +00:00
Spec for new syntax definitions
This commit is contained in:
parent
4386e6cfd0
commit
86a67568c4
6 changed files with 91 additions and 10 deletions
BIN
lama-spec.pdf
BIN
lama-spec.pdf
Binary file not shown.
|
|
@ -79,8 +79,8 @@ The following identifiers are reserved for keywords:
|
|||
after array at before boxed case do elif else
|
||||
esac eta false fi for fun if import infix
|
||||
infixl infixr lazy length local od of public repeat
|
||||
return sexp skip string string then true unboxed until
|
||||
when while
|
||||
return sexp skip string string syntax then true unboxed
|
||||
until when while
|
||||
\end{lstlisting}
|
||||
|
||||
\subsection{Infix Operators}
|
||||
|
|
|
|||
|
|
@ -73,7 +73,7 @@ Multiple postfixes are allowed, for example
|
|||
The basic form of expression is \nonterm{primary}. The simplest form of primary is an identifier or constant. Keywords \lstinline|true| and \lstinline|false|
|
||||
designate integer constants 1 and 0 respectively, character constant is implicitly converted into its ASCII code. String constants designate arrays
|
||||
of one-byte characters. Infix constants allow to reference a functional value associated with corresponding infix operator (however, a value associated with
|
||||
builtin assognment operator "\lstinline|:=|" can not be taken), and functional constant (\emph{lambda-expression})
|
||||
builtin assignment operator "\lstinline|:=|" can not be taken), and functional constant (\emph{lambda-expression})
|
||||
designates an anonymous functional value in the form of closure.
|
||||
|
||||
\begin{figure}[h]
|
||||
|
|
|
|||
|
|
@ -131,3 +131,79 @@ is equivalent to
|
|||
|
||||
where $x_i$~--- fresh variables, not free in $e$.
|
||||
|
||||
\section{Syntax Definitions}
|
||||
|
||||
Syntax definition extension represents an alternative simplified syntax for parsers written using standard unit \lstinline|Ostap| (see Section~\ref{sec:ostap}).
|
||||
The syntax for syntax definition expressions is shown on Fig.~\ref{syntax_expressions}.
|
||||
|
||||
\begin{figure}[h]
|
||||
\[
|
||||
\begin{array}{rcll}
|
||||
\defterm{syntaxExpression} & : & \term{syntax}\s\term{(}\s\nonterm{syntaxSeq}\s(\s\term{$\mid$}\s\nonterm{syntaxSeq}\s)^*\s\term{)}&\\
|
||||
\defterm{syntaxSeq} & : & \nonterm{syntaxBinding}^+\s[\s\term{\{}\s\nonterm{expression}\s\term{\}}\s]&\\
|
||||
\defterm{syntaxBinding} & : & [\s\term{-}\s]\s[\s\nonterm{pattern}\s\term{=}\s]\s\s\nonterm{syntaxPostfix}&\\
|
||||
\defterm{syntaxPostfix} & : & \nonterm{syntaxPrimary}\s[\s\term{*}\s\alt\s\term{+}\s\alt\s\term{?}\s]&\\
|
||||
\defterm{syntaxPrimary} & : & \token{LIDENT}\s(\s\term{[}\s[\s\nonterm{expression}\s(\s\term{,}\s\nonterm{expression}\s)^*\s]\s\term{]}\s)^*&\alt\\
|
||||
& & \term{(}\s\nonterm{syntaxExpression}\s\term{)}&\alt\\
|
||||
& & \term{\$(}\s\nonterm{expression}\s\term{)}&
|
||||
\end{array}
|
||||
\]
|
||||
\caption{Syntax definition expressions}
|
||||
\label{syntax_expressions}
|
||||
\end{figure}
|
||||
|
||||
Syntax expressions can be used wherever regular expressions are allowed. Each syntax expressions is expanded in a certain combination of \lstinline|Ostap| primitives.
|
||||
For example,
|
||||
|
||||
\begin{lstlisting}
|
||||
fun sum (str) {
|
||||
parseString (
|
||||
syntax (l=DECIMAL token["+"] r=DECIMAL eof {
|
||||
stringInt (l) + stringInt (r)
|
||||
}),
|
||||
str
|
||||
)
|
||||
}
|
||||
\end{lstlisting}
|
||||
|
||||
defines a function which parses its arguments into an expression \lstinline|"l + r"|, where \lstinline|l| and \lstinline|r| are decimal literals, and evaluates its value.
|
||||
|
||||
A syntax expression itself is a sequence of alternatives, and each alternative is a sequential composition (\nonterm{syntaxSeq}) of primitive parsers equipped with optional
|
||||
semantic action (a \emph{general} expression in curly brackets).
|
||||
|
||||
A primitive parser is either an l-indentfier (possibly supplied with arguments), or a \emph{general} expression, surrounded by brackets \term{\$(}..\term{)},
|
||||
or a \emph{syntax} expression, surrounded by round brackets. Note, the arguments for primitive parsers in syntax expressions are surrounded by
|
||||
\term{[}..\term{]} unlike general expressions; thus
|
||||
|
||||
\begin{lstlisting}
|
||||
x ("a")
|
||||
\end{lstlisting}
|
||||
|
||||
means a sequential composition of \lstinline|x| and "\lstinline|a|", not a combinator \lstinline|x| applied to "\lstinline|a|".
|
||||
|
||||
A primitive parser can be followed by one of postfix operators ("\term{*}", "\term{+}", or "\term{?}"), corresponding
|
||||
to "\lstinline|rep0|", "\lstinline|rep|", or "\lstinline|opt|" combinators of \lstinline|Ostap| respectively, for example
|
||||
|
||||
\begin{lstlisting}
|
||||
token["a"]+
|
||||
identifier?
|
||||
\end{lstlisting}
|
||||
|
||||
A value recognized by a primitive parser can be matched against a pattern, for example
|
||||
|
||||
\begin{lstlisting}
|
||||
value=(identifier | constant)
|
||||
h:tl=item+
|
||||
\end{lstlisting}
|
||||
|
||||
The bindings provided by pattern-matching can be used in semantic actions.
|
||||
|
||||
Finally, if no semantic action is given, a sequential syntax expression returns a tuple of its components. However, if a parser
|
||||
in a sequential composition is preceded by "\term{-}" then its value is not included into the default result. Thus,
|
||||
|
||||
\begin{lstlisting}
|
||||
parse -eof
|
||||
\end{lstlisting}
|
||||
|
||||
returns what "\lstinline|parse|" recognized; the input stream is parsed against "\lstinline|eof|", but the result of "\lstinline|eof|"
|
||||
is omitted.
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
\chapter{Driver Options and Separate Compilation}
|
||||
\label{sec:driver}
|
||||
|
||||
Driver is a command-line unitility "\texttt{lamac}" which controls the invocation of the compiler. The
|
||||
Driver is a command-line utility "\texttt{lamac}" which controls the invocation of the compiler. The
|
||||
general format of invocation is
|
||||
|
||||
\begin{lstlisting}
|
||||
|
|
|
|||
|
|
@ -322,6 +322,7 @@ Return value represents parsing result as per "\lstinline|Ostap|".}
|
|||
\descr{\lstinline|fun getCol (m)|}{Gets a column number for the current position of matcher "\lstinline|m|".}
|
||||
|
||||
\section{Unit \texttt{Ostap}}
|
||||
\label{sec:ostap}
|
||||
|
||||
Unit "\lstinline|Ostap|" implements monadic parser combinators in continuation-passing style with memoization~\cite{MonPC,MemoParsing,Meerkat}.
|
||||
A parser is a function of the shape
|
||||
|
|
@ -349,6 +350,8 @@ The unit describes some primitive parsers and combinators which allow to constru
|
|||
|
||||
\descr{\lstinline|fun empty (k)|}{A parser which recognizes empty string.}
|
||||
|
||||
\descr{\lstinline|fun loc (k)|}{A parser which returns the current position (a pair "\lstinline|[line, col]|") in a stream.}
|
||||
|
||||
\descr{\lstinline|fun alt (a, b)|}{A parser combinator which constructs a parser alternating between "\lstinline|a|" and "\lstinline|b|".}
|
||||
|
||||
\descr{\lstinline|fun seq (a, b)|}{A parser combinator which construct a sequential composition of "\lstinline|a|" and "\lstinline|b|". While
|
||||
|
|
@ -393,11 +396,11 @@ parser which behaves exactly as "\lstinline|a|", but additionally applies "\lsti
|
|||
|
||||
\begin{lrbox}{\exprbox}
|
||||
\begin{lstlisting}
|
||||
{[Left, {[token ("+"), fun (l, r) {Add (l, r)}],
|
||||
[token ("-"), fun (l, r) {Sub (l, r)}]
|
||||
{[Left, {[token ("+"), fun (l, op, r) {Add (l, r)}],
|
||||
[token ("-"), fun (l, op, r) {Sub (l, r)}]
|
||||
}],
|
||||
[Left, {[token ("*"), fun (l, r) {Mul (l, r)}],
|
||||
[token ("/"), fun (l, r) {Div (l, r)}]
|
||||
[Left, {[token ("*"), fun (l, op, r) {Mul (l, r)}],
|
||||
[token ("/"), fun (l, op, r) {Div (l, r)}]
|
||||
}]}
|
||||
\end{lstlisting}
|
||||
\end{lrbox}
|
||||
|
|
@ -405,12 +408,14 @@ parser which behaves exactly as "\lstinline|a|", but additionally applies "\lsti
|
|||
\descr{\lstinline|fun expr (ops, opnd)|}{A super-combinator to generate infix expression parsers. The argument "\lstinline|opnd|" parses primary operand, "\lstinline|ops|" is
|
||||
a list of infix operator descriptors. Each element of the list describes one \emph{precedence level} with precedence increasing from head to tail. A descriptor on
|
||||
each level is a pair, where the first element describes the associativity at the given level ("\lstinline|Left|", "\lstinline|Right|" or "\lstinline|None|") and
|
||||
the second element is a list of pairs~--- a parser for an infix operator and the semantics action (a two-argument function). For example,
|
||||
the second element is a list of pairs~--- a parser for an infix operator and the semantics action (a three-argument function accepting the left parser operand, that that
|
||||
infix operator parser returns, and the right operand). For example,
|
||||
|
||||
\usebox\exprbox
|
||||
|
||||
specifies two levels of precedence, both left-associative, with infix operators "\lstinline|+|" and "\lstinline|-|" at the first level and
|
||||
"\lstinline|*|" and "\lstinline|/|" at the second. The semantics for these operators constructs abstract sytax trees.
|
||||
"\lstinline|*|" and "\lstinline|/|" at the second. The semantics for these operators constructs abstract syntax trees (in this particular example the
|
||||
second argument of semantics functions is unused).
|
||||
}
|
||||
|
||||
\section{Unit \texttt{Ref}}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue