diff --git a/lama-spec.pdf b/lama-spec.pdf index 9997b90fd..3ae1510aa 100644 Binary files a/lama-spec.pdf and b/lama-spec.pdf differ diff --git a/spec/03.01.lexical_structure.tex b/spec/03.01.lexical_structure.tex index 19d8276ce..e96298bcb 100644 --- a/spec/03.01.lexical_structure.tex +++ b/spec/03.01.lexical_structure.tex @@ -79,8 +79,8 @@ The following identifiers are reserved for keywords: after array at before boxed case do elif else esac eta false fi for fun if import infix infixl infixr lazy length local od of public repeat - return sexp skip string string then true unboxed until - when while + return sexp skip string string syntax then true unboxed + until when while \end{lstlisting} \subsection{Infix Operators} diff --git a/spec/03.04.expressions.tex b/spec/03.04.expressions.tex index 9d042d1c3..a89604d15 100644 --- a/spec/03.04.expressions.tex +++ b/spec/03.04.expressions.tex @@ -73,7 +73,7 @@ Multiple postfixes are allowed, for example The basic form of expression is \nonterm{primary}. The simplest form of primary is an identifier or constant. Keywords \lstinline|true| and \lstinline|false| designate integer constants 1 and 0 respectively, character constant is implicitly converted into its ASCII code. String constants designate arrays of one-byte characters. Infix constants allow to reference a functional value associated with corresponding infix operator (however, a value associated with -builtin assognment operator "\lstinline|:=|" can not be taken), and functional constant (\emph{lambda-expression}) +builtin assignment operator "\lstinline|:=|" can not be taken), and functional constant (\emph{lambda-expression}) designates an anonymous functional value in the form of closure. \begin{figure}[h] diff --git a/spec/04.extensions.tex b/spec/04.extensions.tex index 274497681..d87512282 100644 --- a/spec/04.extensions.tex +++ b/spec/04.extensions.tex @@ -131,3 +131,79 @@ is equivalent to where $x_i$~--- fresh variables, not free in $e$. +\section{Syntax Definitions} + +Syntax definition extension represents an alternative simplified syntax for parsers written using standard unit \lstinline|Ostap| (see Section~\ref{sec:ostap}). +The syntax for syntax definition expressions is shown on Fig.~\ref{syntax_expressions}. + +\begin{figure}[h] + \[ + \begin{array}{rcll} + \defterm{syntaxExpression} & : & \term{syntax}\s\term{(}\s\nonterm{syntaxSeq}\s(\s\term{$\mid$}\s\nonterm{syntaxSeq}\s)^*\s\term{)}&\\ + \defterm{syntaxSeq} & : & \nonterm{syntaxBinding}^+\s[\s\term{\{}\s\nonterm{expression}\s\term{\}}\s]&\\ + \defterm{syntaxBinding} & : & [\s\term{-}\s]\s[\s\nonterm{pattern}\s\term{=}\s]\s\s\nonterm{syntaxPostfix}&\\ + \defterm{syntaxPostfix} & : & \nonterm{syntaxPrimary}\s[\s\term{*}\s\alt\s\term{+}\s\alt\s\term{?}\s]&\\ + \defterm{syntaxPrimary} & : & \token{LIDENT}\s(\s\term{[}\s[\s\nonterm{expression}\s(\s\term{,}\s\nonterm{expression}\s)^*\s]\s\term{]}\s)^*&\alt\\ + & & \term{(}\s\nonterm{syntaxExpression}\s\term{)}&\alt\\ + & & \term{\$(}\s\nonterm{expression}\s\term{)}& + \end{array} + \] + \caption{Syntax definition expressions} + \label{syntax_expressions} +\end{figure} + +Syntax expressions can be used wherever regular expressions are allowed. Each syntax expressions is expanded in a certain combination of \lstinline|Ostap| primitives. +For example, + +\begin{lstlisting} + fun sum (str) { + parseString ( + syntax (l=DECIMAL token["+"] r=DECIMAL eof { + stringInt (l) + stringInt (r) + }), + str + ) + } +\end{lstlisting} + +defines a function which parses its arguments into an expression \lstinline|"l + r"|, where \lstinline|l| and \lstinline|r| are decimal literals, and evaluates its value. + +A syntax expression itself is a sequence of alternatives, and each alternative is a sequential composition (\nonterm{syntaxSeq}) of primitive parsers equipped with optional +semantic action (a \emph{general} expression in curly brackets). + +A primitive parser is either an l-indentfier (possibly supplied with arguments), or a \emph{general} expression, surrounded by brackets \term{\$(}..\term{)}, +or a \emph{syntax} expression, surrounded by round brackets. Note, the arguments for primitive parsers in syntax expressions are surrounded by +\term{[}..\term{]} unlike general expressions; thus + +\begin{lstlisting} + x ("a") +\end{lstlisting} + +means a sequential composition of \lstinline|x| and "\lstinline|a|", not a combinator \lstinline|x| applied to "\lstinline|a|". + +A primitive parser can be followed by one of postfix operators ("\term{*}", "\term{+}", or "\term{?}"), corresponding +to "\lstinline|rep0|", "\lstinline|rep|", or "\lstinline|opt|" combinators of \lstinline|Ostap| respectively, for example + +\begin{lstlisting} + token["a"]+ + identifier? +\end{lstlisting} + +A value recognized by a primitive parser can be matched against a pattern, for example + +\begin{lstlisting} + value=(identifier | constant) + h:tl=item+ +\end{lstlisting} + +The bindings provided by pattern-matching can be used in semantic actions. + +Finally, if no semantic action is given, a sequential syntax expression returns a tuple of its components. However, if a parser +in a sequential composition is preceded by "\term{-}" then its value is not included into the default result. Thus, + +\begin{lstlisting} + parse -eof +\end{lstlisting} + +returns what "\lstinline|parse|" recognized; the input stream is parsed against "\lstinline|eof|", but the result of "\lstinline|eof|" +is omitted. diff --git a/spec/05.driver_options.tex b/spec/05.driver_options.tex index 2da777e77..9a6c5f42e 100644 --- a/spec/05.driver_options.tex +++ b/spec/05.driver_options.tex @@ -1,7 +1,7 @@ \chapter{Driver Options and Separate Compilation} \label{sec:driver} -Driver is a command-line unitility "\texttt{lamac}" which controls the invocation of the compiler. The +Driver is a command-line utility "\texttt{lamac}" which controls the invocation of the compiler. The general format of invocation is \begin{lstlisting} diff --git a/spec/06.standard_library.tex b/spec/06.standard_library.tex index eb0c12b1f..1661103f2 100644 --- a/spec/06.standard_library.tex +++ b/spec/06.standard_library.tex @@ -322,6 +322,7 @@ Return value represents parsing result as per "\lstinline|Ostap|".} \descr{\lstinline|fun getCol (m)|}{Gets a column number for the current position of matcher "\lstinline|m|".} \section{Unit \texttt{Ostap}} +\label{sec:ostap} Unit "\lstinline|Ostap|" implements monadic parser combinators in continuation-passing style with memoization~\cite{MonPC,MemoParsing,Meerkat}. A parser is a function of the shape @@ -349,6 +350,8 @@ The unit describes some primitive parsers and combinators which allow to constru \descr{\lstinline|fun empty (k)|}{A parser which recognizes empty string.} +\descr{\lstinline|fun loc (k)|}{A parser which returns the current position (a pair "\lstinline|[line, col]|") in a stream.} + \descr{\lstinline|fun alt (a, b)|}{A parser combinator which constructs a parser alternating between "\lstinline|a|" and "\lstinline|b|".} \descr{\lstinline|fun seq (a, b)|}{A parser combinator which construct a sequential composition of "\lstinline|a|" and "\lstinline|b|". While @@ -393,11 +396,11 @@ parser which behaves exactly as "\lstinline|a|", but additionally applies "\lsti \begin{lrbox}{\exprbox} \begin{lstlisting} - {[Left, {[token ("+"), fun (l, r) {Add (l, r)}], - [token ("-"), fun (l, r) {Sub (l, r)}] + {[Left, {[token ("+"), fun (l, op, r) {Add (l, r)}], + [token ("-"), fun (l, op, r) {Sub (l, r)}] }], - [Left, {[token ("*"), fun (l, r) {Mul (l, r)}], - [token ("/"), fun (l, r) {Div (l, r)}] + [Left, {[token ("*"), fun (l, op, r) {Mul (l, r)}], + [token ("/"), fun (l, op, r) {Div (l, r)}] }]} \end{lstlisting} \end{lrbox} @@ -405,12 +408,14 @@ parser which behaves exactly as "\lstinline|a|", but additionally applies "\lsti \descr{\lstinline|fun expr (ops, opnd)|}{A super-combinator to generate infix expression parsers. The argument "\lstinline|opnd|" parses primary operand, "\lstinline|ops|" is a list of infix operator descriptors. Each element of the list describes one \emph{precedence level} with precedence increasing from head to tail. A descriptor on each level is a pair, where the first element describes the associativity at the given level ("\lstinline|Left|", "\lstinline|Right|" or "\lstinline|None|") and - the second element is a list of pairs~--- a parser for an infix operator and the semantics action (a two-argument function). For example, + the second element is a list of pairs~--- a parser for an infix operator and the semantics action (a three-argument function accepting the left parser operand, that that + infix operator parser returns, and the right operand). For example, \usebox\exprbox specifies two levels of precedence, both left-associative, with infix operators "\lstinline|+|" and "\lstinline|-|" at the first level and - "\lstinline|*|" and "\lstinline|/|" at the second. The semantics for these operators constructs abstract sytax trees. + "\lstinline|*|" and "\lstinline|/|" at the second. The semantics for these operators constructs abstract syntax trees (in this particular example the + second argument of semantics functions is unused). } \section{Unit \texttt{Ref}}