Spec for new syntax definitions

This commit is contained in:
Dmitry Boulytchev 2020-04-13 04:28:43 +03:00
parent 4386e6cfd0
commit 86a67568c4
6 changed files with 91 additions and 10 deletions

Binary file not shown.

View file

@ -79,8 +79,8 @@ The following identifiers are reserved for keywords:
after array at before boxed case do elif else
esac eta false fi for fun if import infix
infixl infixr lazy length local od of public repeat
return sexp skip string string then true unboxed until
when while
return sexp skip string string syntax then true unboxed
until when while
\end{lstlisting}
\subsection{Infix Operators}

View file

@ -73,7 +73,7 @@ Multiple postfixes are allowed, for example
The basic form of expression is \nonterm{primary}. The simplest form of primary is an identifier or constant. Keywords \lstinline|true| and \lstinline|false|
designate integer constants 1 and 0 respectively, character constant is implicitly converted into its ASCII code. String constants designate arrays
of one-byte characters. Infix constants allow to reference a functional value associated with corresponding infix operator (however, a value associated with
builtin assognment operator "\lstinline|:=|" can not be taken), and functional constant (\emph{lambda-expression})
builtin assignment operator "\lstinline|:=|" can not be taken), and functional constant (\emph{lambda-expression})
designates an anonymous functional value in the form of closure.
\begin{figure}[h]

View file

@ -131,3 +131,79 @@ is equivalent to
where $x_i$~--- fresh variables, not free in $e$.
\section{Syntax Definitions}
Syntax definition extension represents an alternative simplified syntax for parsers written using standard unit \lstinline|Ostap| (see Section~\ref{sec:ostap}).
The syntax for syntax definition expressions is shown on Fig.~\ref{syntax_expressions}.
\begin{figure}[h]
\[
\begin{array}{rcll}
\defterm{syntaxExpression} & : & \term{syntax}\s\term{(}\s\nonterm{syntaxSeq}\s(\s\term{$\mid$}\s\nonterm{syntaxSeq}\s)^*\s\term{)}&\\
\defterm{syntaxSeq} & : & \nonterm{syntaxBinding}^+\s[\s\term{\{}\s\nonterm{expression}\s\term{\}}\s]&\\
\defterm{syntaxBinding} & : & [\s\term{-}\s]\s[\s\nonterm{pattern}\s\term{=}\s]\s\s\nonterm{syntaxPostfix}&\\
\defterm{syntaxPostfix} & : & \nonterm{syntaxPrimary}\s[\s\term{*}\s\alt\s\term{+}\s\alt\s\term{?}\s]&\\
\defterm{syntaxPrimary} & : & \token{LIDENT}\s(\s\term{[}\s[\s\nonterm{expression}\s(\s\term{,}\s\nonterm{expression}\s)^*\s]\s\term{]}\s)^*&\alt\\
& & \term{(}\s\nonterm{syntaxExpression}\s\term{)}&\alt\\
& & \term{\$(}\s\nonterm{expression}\s\term{)}&
\end{array}
\]
\caption{Syntax definition expressions}
\label{syntax_expressions}
\end{figure}
Syntax expressions can be used wherever regular expressions are allowed. Each syntax expressions is expanded in a certain combination of \lstinline|Ostap| primitives.
For example,
\begin{lstlisting}
fun sum (str) {
parseString (
syntax (l=DECIMAL token["+"] r=DECIMAL eof {
stringInt (l) + stringInt (r)
}),
str
)
}
\end{lstlisting}
defines a function which parses its arguments into an expression \lstinline|"l + r"|, where \lstinline|l| and \lstinline|r| are decimal literals, and evaluates its value.
A syntax expression itself is a sequence of alternatives, and each alternative is a sequential composition (\nonterm{syntaxSeq}) of primitive parsers equipped with optional
semantic action (a \emph{general} expression in curly brackets).
A primitive parser is either an l-indentfier (possibly supplied with arguments), or a \emph{general} expression, surrounded by brackets \term{\$(}..\term{)},
or a \emph{syntax} expression, surrounded by round brackets. Note, the arguments for primitive parsers in syntax expressions are surrounded by
\term{[}..\term{]} unlike general expressions; thus
\begin{lstlisting}
x ("a")
\end{lstlisting}
means a sequential composition of \lstinline|x| and "\lstinline|a|", not a combinator \lstinline|x| applied to "\lstinline|a|".
A primitive parser can be followed by one of postfix operators ("\term{*}", "\term{+}", or "\term{?}"), corresponding
to "\lstinline|rep0|", "\lstinline|rep|", or "\lstinline|opt|" combinators of \lstinline|Ostap| respectively, for example
\begin{lstlisting}
token["a"]+
identifier?
\end{lstlisting}
A value recognized by a primitive parser can be matched against a pattern, for example
\begin{lstlisting}
value=(identifier | constant)
h:tl=item+
\end{lstlisting}
The bindings provided by pattern-matching can be used in semantic actions.
Finally, if no semantic action is given, a sequential syntax expression returns a tuple of its components. However, if a parser
in a sequential composition is preceded by "\term{-}" then its value is not included into the default result. Thus,
\begin{lstlisting}
parse -eof
\end{lstlisting}
returns what "\lstinline|parse|" recognized; the input stream is parsed against "\lstinline|eof|", but the result of "\lstinline|eof|"
is omitted.

View file

@ -1,7 +1,7 @@
\chapter{Driver Options and Separate Compilation}
\label{sec:driver}
Driver is a command-line unitility "\texttt{lamac}" which controls the invocation of the compiler. The
Driver is a command-line utility "\texttt{lamac}" which controls the invocation of the compiler. The
general format of invocation is
\begin{lstlisting}

View file

@ -322,6 +322,7 @@ Return value represents parsing result as per "\lstinline|Ostap|".}
\descr{\lstinline|fun getCol (m)|}{Gets a column number for the current position of matcher "\lstinline|m|".}
\section{Unit \texttt{Ostap}}
\label{sec:ostap}
Unit "\lstinline|Ostap|" implements monadic parser combinators in continuation-passing style with memoization~\cite{MonPC,MemoParsing,Meerkat}.
A parser is a function of the shape
@ -349,6 +350,8 @@ The unit describes some primitive parsers and combinators which allow to constru
\descr{\lstinline|fun empty (k)|}{A parser which recognizes empty string.}
\descr{\lstinline|fun loc (k)|}{A parser which returns the current position (a pair "\lstinline|[line, col]|") in a stream.}
\descr{\lstinline|fun alt (a, b)|}{A parser combinator which constructs a parser alternating between "\lstinline|a|" and "\lstinline|b|".}
\descr{\lstinline|fun seq (a, b)|}{A parser combinator which construct a sequential composition of "\lstinline|a|" and "\lstinline|b|". While
@ -393,11 +396,11 @@ parser which behaves exactly as "\lstinline|a|", but additionally applies "\lsti
\begin{lrbox}{\exprbox}
\begin{lstlisting}
{[Left, {[token ("+"), fun (l, r) {Add (l, r)}],
[token ("-"), fun (l, r) {Sub (l, r)}]
{[Left, {[token ("+"), fun (l, op, r) {Add (l, r)}],
[token ("-"), fun (l, op, r) {Sub (l, r)}]
}],
[Left, {[token ("*"), fun (l, r) {Mul (l, r)}],
[token ("/"), fun (l, r) {Div (l, r)}]
[Left, {[token ("*"), fun (l, op, r) {Mul (l, r)}],
[token ("/"), fun (l, op, r) {Div (l, r)}]
}]}
\end{lstlisting}
\end{lrbox}
@ -405,12 +408,14 @@ parser which behaves exactly as "\lstinline|a|", but additionally applies "\lsti
\descr{\lstinline|fun expr (ops, opnd)|}{A super-combinator to generate infix expression parsers. The argument "\lstinline|opnd|" parses primary operand, "\lstinline|ops|" is
a list of infix operator descriptors. Each element of the list describes one \emph{precedence level} with precedence increasing from head to tail. A descriptor on
each level is a pair, where the first element describes the associativity at the given level ("\lstinline|Left|", "\lstinline|Right|" or "\lstinline|None|") and
the second element is a list of pairs~--- a parser for an infix operator and the semantics action (a two-argument function). For example,
the second element is a list of pairs~--- a parser for an infix operator and the semantics action (a three-argument function accepting the left parser operand, that that
infix operator parser returns, and the right operand). For example,
\usebox\exprbox
specifies two levels of precedence, both left-associative, with infix operators "\lstinline|+|" and "\lstinline|-|" at the first level and
"\lstinline|*|" and "\lstinline|/|" at the second. The semantics for these operators constructs abstract sytax trees.
"\lstinline|*|" and "\lstinline|/|" at the second. The semantics for these operators constructs abstract syntax trees (in this particular example the
second argument of semantics functions is unused).
}
\section{Unit \texttt{Ref}}