mirror of
https://github.com/ProgramSnail/Lama.git
synced 2025-12-05 22:38:44 +00:00
116 lines
4.3 KiB
TeX
116 lines
4.3 KiB
TeX
% !TEX TS-program = pdflatex
|
|
% !TeX spellcheck = en_US
|
|
% !TEX root = lama-spec.tex
|
|
|
|
\section{Lexical Structure}
|
|
\label{sec:lexical_structure}
|
|
|
|
The character set for the language is \textsc{ASCII}, case-sensitive. In the following lexical description we will use
|
|
the GNU Regexp syntax~\cite{GNULib} in lexical definitions.
|
|
|
|
\subsection{Whitespaces and Comments}
|
|
|
|
Whitespaces and comments are \textsc{ASCII} sequences which serve as delimiters for other tokens but otherwise are
|
|
ignored.
|
|
|
|
The following characters are treated as whitespaces:
|
|
|
|
\begin{itemize}
|
|
\item blank character "\texttt{ }";
|
|
\item newline character "\texttt{\textbackslash n}";
|
|
\item carriage return character "\texttt{\textbackslash r}";
|
|
\item tabulation character "\texttt{\textbackslash t}".
|
|
\end{itemize}
|
|
|
|
Additionally, two kinds of comments are recognized:
|
|
|
|
\begin{itemize}
|
|
\item the end-of-line comment "\texttt{--}" escapes the rest of the line, including itself;
|
|
\item the block comment "\texttt{(*} ... \texttt{*)}" escapes all the text between
|
|
"\texttt{(*}" and "\texttt{*)}".
|
|
\end{itemize}
|
|
|
|
There is a number of specific cases which have to be considered explicitly.
|
|
|
|
First, block comments can be properly nested. Then, the occurrences of comment symbols inside string literals (see below) are not
|
|
considered as comments.
|
|
|
|
End-of-line comment encountered \emph{outside} of a block comment escapes block comment symbols:
|
|
|
|
\begin{lstlisting}
|
|
-- the following symbols are not considered as a block comment: (*
|
|
-- same here: *)
|
|
\end{lstlisting}
|
|
|
|
Similarly, an end-of-line comment encountered inside a block comment is escaped:
|
|
|
|
\begin{lstlisting}
|
|
(* Block comment starts here ...
|
|
-- and ends here: *)
|
|
\end{lstlisting}
|
|
|
|
\subsection{Identifiers and Constants}
|
|
|
|
The language distinguishes identifiers, signed decimal literals, string and character literals (see Fig.~\ref{idents_and_consts}). There are
|
|
two kinds of identifiers: those beginning with uppercase characters (\token{UIDENT}) and lowercase characters (\token{LIDENT}).
|
|
|
|
String literals cannot span multiple lines; a blockquote character (") inside a string literal has to be doubled to prevent from
|
|
being considered as this literal's delimiter.
|
|
|
|
Character literals as a rule are comprised of a single \textsc{ASCII} character; if this character is a quote (') it has to be doubled. Additionally
|
|
two-character abbreviations "\textbackslash t" and "\textbackslash n" are recognized and converted into a single-character representation.
|
|
|
|
\begin{figure}[t]
|
|
\[
|
|
\begin{array}{rcl}
|
|
\token{UIDENT} & = &\mbox{\texttt{[A-Z][a-zA-Z\_0-9]*}}\\
|
|
\token{LIDENT} & = &\mbox{\texttt{[a-z][a-zA-Z\_0-9]*}}\\
|
|
\token{DECIMAL}& = &\mbox{\texttt{-?[0-9]+}}\\
|
|
\token{STRING} & = &\mbox{\texttt{"([\^{}\textbackslash"]|"")*"}}\\
|
|
\token{CHAR} & = &\mbox{\texttt{'([\^{}']|''|\textbackslash n|\textbackslash t)'}}
|
|
\end{array}
|
|
\]
|
|
\caption{Identifiers and constants}
|
|
\label{idents_and_consts}
|
|
\end{figure}
|
|
|
|
|
|
\subsection{Keywords}
|
|
|
|
The following identifiers are reserved for keywords:
|
|
|
|
\begin{lstlisting}
|
|
after array at before box case do elif else
|
|
esac eta false fi for fun if import infix
|
|
infixl infixr lazy od of public sexp skip str
|
|
syntax then true val var while
|
|
\end{lstlisting}
|
|
|
|
\subsection{Infix Operators}
|
|
|
|
Infix operators defined as follows:
|
|
|
|
\[
|
|
\token{INFIX}=\mbox{\texttt{[+*/\%\$\#@!|\&\^{}~?<>:=\textbackslash-]+}}
|
|
\]
|
|
|
|
There is a predefined set of built-in infix operators (see Fig.~\ref{builtin_infixes}); additionally
|
|
an end-user can define custom infix operators (see Section~\ref{sec:custom_infix}). Note, sometimes
|
|
additional whitespaces are required to disambiguate infix operator applications. For example, if a
|
|
custom infix operator "\lstinline|+-|" is defined, then the expression "\lstinline|a +- b|" can no longer be
|
|
recognized as "\lstinline|a +(-b)|". Note also that a custom operator containing "\lstinline|--|" can not be
|
|
defined due to lexical conventions.
|
|
|
|
\subsection{Delimiters}
|
|
|
|
The following symbols are treated as delimiters:
|
|
|
|
\begin{lstlisting}
|
|
. , ( ) { }
|
|
; # -> |
|
|
\end{lstlisting}
|
|
|
|
Note, custom infix operators can coincide with delimiters "\lstinline|#|", "\lstinline!|!", and "\lstinline|->|", which can
|
|
sometimes be misleading.
|
|
|
|
|