lama_byterun/spec/04.extensions.tex
Kakadu 0353e77a26 Add latex magic commands to many files
Signed-off-by: Kakadu <Kakadu@pm.me>
2022-08-23 17:25:52 +03:00

214 lines
9.1 KiB
TeX

% !TEX TS-program = pdflatex
% !TeX spellcheck = en_US
% !TEX root = lama-spec.tex
\chapter{Extensions}
\label{sec:extensions}
There are some extensions for the core language defined in the previous chapters. These
extensions add some syntactic sugar, which makes writing programs in \lama a less
painful task.
\section{Custom Infix Operators}
\label{sec:custom_infix}
Besides the set of builtin infix operators (see Fig.~\ref{builtin_infixes}) users may define
custom infix operators. These operators may be declared at any scope level; when defined
at the top level they can be exported as well. However, there are some restrictions regarding the
redefinition of builtin infix operators:
\begin{itemize}
\item redefinitions of builtin infix operators can not be exported;
\item the assignment operator "\lstinline|:=|" can not be redefined;
\item infix definitions can not be mutually recursive (but this can be worked around by
definining infix synonyms for mutually-recursive functions).
\end{itemize}
The syntax for infix operator definition is shown in Fig.~\ref{custom_infix_construct}; a custom infix definition must specify exactly two arguments.
An associativity and precedence level has to be assigned to each custom infix operator. A precedence level is assigned by specifying at which
position, relative to other known infix operators, the operator being defined is inserted. Three kinds of specifications are allowed: at given level,
immediately before or immediately after. For example, "\lstinline|at +|" means that the operator is assigned exactly the same
level of precedence as "\lstinline|+|"; "\lstinline|after +|" creates a new precedence level immediately \emph{after} that for
"\lstinline|+|" (but \emph{before} that for "\lstinline|*|"), and "\lstinline|before *|" has exactly the same effect (provided
there were no insertions of precedence levels between those for "\lstinline|+|" and "\lstinline|*|").
When being inserted at an existing precedence level, an infix operator inherits the associativity from that level; hence, only "\lstinline|infix|"
keyword can be used for such definitions. When a new level is created, an associativity for this level has to be additionally specified
by using corresponding keyword ("\lstinline|infix|" for non-associative levels, "\lstinline|infixr|"~--- for levels with right
associativity, and "\lstinline|infixl|"~--- for levels with left associativity).
When public infix operators are exported, their relative precedence levels and associativity are exported as well; since not all
custom infix definitions may be made public some levels may disappear from the export. For example, let us have the following definitions:
\begin{lstlisting}
infixl ** before * (x, y) {...}
public infixr *** before ** (x, y) {...}
\end{lstlisting}
Here in the top scope for the compilation unit we have two additional precedence levels: one for the "\lstinline|**|" and another for the "\lstinline|***|".
However, as "\lstinline|**|" is not exported its precedence level will be forgotten during the import. Thus, only the precedence level for
"\lstinline|***|" will be created during the import as if is was defined at the level "\lstinline|before *|".
Respectively, multiple imports of units with custom infix operators will modify the precedence level in the order of their import. For example,
if there are two units "\lstinline|A|" and "\lstinline|B|" with declarations "\lstinline|infixl ++ before +|" and "\lstinline|infixl +++ before +|"
correspondingly, then importing "\lstinline|A|" after "\lstinline|B|" will result in "\lstinline|++|" having a \emph{lower} precedence, then
"\lstinline|+++|".
\begin{figure}[t]
\[
\begin{array}{rcl}
\defterm{infixDefinition} & : & \nonterm{infixHead}\s\term{(}\s\nonterm{functionArguments}\s\term{)}\s\nonterm{functionBody}\\
\defterm{infixHead} & : & [\s\term{public}\s]\s\nonterm{infixity}\s\token{INFIX}\s\nonterm{level}\\
\defterm{infixity} & : & \term{infix}\alt\term{infixl}\alt\term{infixr}\\
\defterm{level} & : & [\s\term{at}\alt\term{before}\alt\term{after}\s]\s\token{INFIX}
\end{array}
\]
\caption{The Syntax for Infix Operator Definition}
\label{custom_infix_construct}
\end{figure}
\section{Lazy Values and Eta-expansion}
An expression
\begin{lstlisting}
lazy$\;e$
\end{lstlisting}
where $e$~--- a $\nonterm{basicExpression}$~--- is converted into
\begin{lstlisting}
makeLazy (fun () {$e$})
\end{lstlisting}
where "\lstinline|makeLazy|"~--- a function from standard unit "\lstinline|Lazy|" (see Section~\ref{sec:std:lazy}). An import for
"\lstinline|Lazy|" is added implicitly.
An expression
\begin{lstlisting}
eta$\;e$
\end{lstlisting}
where $e$~--- a $\nonterm{basicExpression}$~--- is converted into
\begin{lstlisting}
fun ($x$) {$e$ ($x$)})
\end{lstlisting}
where "$x$"~--- a fresh variable which does not occur free in "$e$".
\section{Dot Notation}
\label{sec:dot-notation}
A function call
\begin{lstlisting}
$f$ ($e_1$, ..., $e_k$)
\end{lstlisting}
where $f$~--- an identifier~--- can be rewritten as
\begin{lstlisting}
$e_1$.$f$ ($e_2$, ..., $e_k$)
\end{lstlisting}
In particular, a call to a one-argument function $f (e)$ can be rewritten as $e.f$.
\section{Patterns in Function Arguments}
Patterns can be used in function argument specification: a declaration
\begin{lstlisting}
fun f ($p_1$, ..., $p_k$) { e }
\end{lstlisting}
is equivalent to
\begin{lstlisting}
fun f ($x_1$, ..., $x_k$) {
case $x_1$ of
$p_1$ -> case $x_2$ of
... -> e
esac
esac
}
\end{lstlisting}
where $x_i$~--- fresh variables, not free in $e$.
\section{Syntax Definitions}
Syntax definition extension represents an alternative simplified syntax for parsers written using standard unit \lstinline|Ostap| (see Section~\ref{sec:ostap}).
The syntax for syntax definition expressions is shown in Fig.~\ref{syntax_expressions}.
\begin{figure}[h]
\[
\begin{array}{rcll}
\defterm{syntaxExpression} & : & \term{syntax}\s\term{(}\s\nonterm{syntaxSeq}\s(\s\term{$\mid$}\s\nonterm{syntaxSeq}\s)^*\s\term{)}&\\
\defterm{syntaxSeq} & : & \nonterm{syntaxBinding}^+\s[\s\term{\{}\s\nonterm{expression}\s\term{\}}\s]&\\
\defterm{syntaxBinding} & : & [\s\term{-}\s]\s[\s\nonterm{pattern}\s\term{=}\s]\s\s\nonterm{syntaxPostfix}&\\
\defterm{syntaxPostfix} & : & \nonterm{syntaxPrimary}\s[\s\term{*}\s\alt\s\term{+}\s\alt\s\term{?}\s]&\\
\defterm{syntaxPrimary} & : & \token{LIDENT}\s(\s\term{[}\s[\s\nonterm{expression}\s(\s\term{,}\s\nonterm{expression}\s)^*\s]\s\term{]}\s)^*&\alt\\
& & \term{(}\s\nonterm{syntaxExpression}\s\term{)}&\alt\\
& & \term{\$(}\s\nonterm{expression}\s\term{)}&
\end{array}
\]
\caption{Syntax definition expressions}
\label{syntax_expressions}
\end{figure}
Syntax expressions can be used wherever regular expressions are allowed. Each syntax expressions is expanded in a certain combination of \lstinline|Ostap| primitives.
For example,
\begin{lstlisting}
fun sum (str) {
parseString (
syntax (l=DECIMAL token["+"] r=DECIMAL eof {
stringInt (l) + stringInt (r)
}),
str
)
}
\end{lstlisting}
defines a function which parses its arguments into an expression \lstinline|"l + r"|, where \lstinline|l| and \lstinline|r| are decimal literals, and evaluates its value.
A syntax expression itself is a sequence of alternatives, and each alternative is a sequential composition (\nonterm{syntaxSeq}) of primitive parsers equipped with optional
semantic action (a \emph{general} expression in curly brackets).
A primitive parser is either an l-indentfier (possibly supplied with arguments), or a \emph{general} expression, surrounded by brackets \term{\$(}..\term{)},
or a \emph{syntax} expression, surrounded by round brackets. Note, the arguments for primitive parsers in syntax expressions are surrounded by
\term{[}..\term{]} unlike general expressions; thus
\begin{lstlisting}
x ("a")
\end{lstlisting}
means a sequential composition of \lstinline|x| and "\lstinline|a|", not a combinator \lstinline|x| applied to "\lstinline|a|".
A primitive parser can be followed by one of postfix operators ("\term{*}", "\term{+}", or "\term{?}"), corresponding
to "\lstinline|rep0|", "\lstinline|rep|", or "\lstinline|opt|" combinators of \lstinline|Ostap| respectively, for example
\begin{lstlisting}
token["a"]+
identifier?
\end{lstlisting}
A value recognized by a primitive parser can be matched against a pattern, for example
\begin{lstlisting}
value=(identifier | constant)
h:tl=item+
\end{lstlisting}
The bindings provided by pattern-matching can be used in semantic actions.
Finally, if no semantic action is given, a sequential syntax expression returns a tuple of its components. However, if a parser
in a sequential composition is preceded by "\term{-}" then its value is not included into the default result. Thus,
\begin{lstlisting}
parse -eof
\end{lstlisting}
returns what "\lstinline|parse|" recognized; the input stream is parsed against "\lstinline|eof|", but the result of "\lstinline|eof|"
is omitted.