% !TEX TS-program = pdflatex % !TeX spellcheck = en_US % !TEX root = lama-spec.tex \section{Lexical Structure} \label{sec:lexical_structure} The character set for the language is \textsc{ASCII}, case-sensitive. In the following lexical description we will use the POSIX-Extended Regular Expressions in lexical definitions. \subsection{Whitespaces and Comments} Whitespaces and comments are \textsc{ASCII} sequences which serve as delimiters for other tokens but otherwise are ignored. The following characters are treated as whitespaces: \begin{itemize} \item blank character "\texttt{ }"; \item newline character "\texttt{\textbackslash n}"; \item carriage return character "\texttt{\textbackslash r}"; \item tabulation character "\texttt{\textbackslash t}". \end{itemize} Additionally, two kinds of comments are recognized: \begin{itemize} \item the end-of-line comment "\texttt{--}" escapes the rest of the line, including itself; \item the block comment "\texttt{(*} ... \texttt{*)}" escapes all the text between "\texttt{(*}" and "\texttt{*)}". \end{itemize} There is a number of specific cases which have to be considered explicitly. First, block comments can be properly nested. Then, the occurrences of comment symbols inside string literals (see below) are not considered as comments. End-of-line comment encountered \emph{outside} of a block comment escapes block comment symbols: \begin{lstlisting} -- the following symbols are not considered as a block comment: (* -- same here: *) \end{lstlisting} Similarly, an end-of-line comment encountered inside a block comment is escaped: \begin{lstlisting} (* Block comment starts here ... -- and ends here: *) \end{lstlisting} \subsection{Identifiers and Constants} The language distinguishes identifiers, signed decimal literals, string and character literals (see Fig.~\ref{idents_and_consts}). There are two kinds of identifiers: those beginning with uppercase characters (\token{UIDENT}) and lowercase characters (\token{LIDENT}). String literals cannot span multiple lines; a blockquote character (") inside a string literal has to be doubled to prevent from being considered as this literal's delimiter. Character literals as a rule are comprised of a single \textsc{ASCII} character; if this character is a quote (') it has to be doubled. Additionally two-character abbreviations "\textbackslash t" and "\textbackslash n" are recognized and converted into a single-character representation. \begin{figure}[t] \[ \begin{array}{rcl} \token{UIDENT} & = &\mbox{\texttt{[A-Z][a-zA-Z\_0-9]*}}\\ \token{LIDENT} & = &\mbox{\texttt{[a-z][a-zA-Z\_0-9]*}}\\ \token{DECIMAL}& = &\mbox{\texttt{-?[0-9]+}}\\ \token{STRING} & = &\mbox{\texttt{"([\^{}\textbackslash"]|"")*"}}\\ \token{CHAR} & = &\mbox{\texttt{'([\^{}']|''|\textbackslash n|\textbackslash t)'}} \end{array} \] \caption{Identifiers and constants} \label{idents_and_consts} \end{figure} \subsection{Keywords} The following identifiers are reserved for keywords: \begin{lstlisting} after array at before box case do elif else esac eta false fi for fun if import infix infixl infixr lazy od of public sexp skip str syntax then true val var while let in \end{lstlisting} \subsection{Infix Operators} Infix operators defined as follows: \[ \token{INFIX}=\mbox{\texttt{[+*/\%\$\#@!|\&\^{}~?<>:=\textbackslash-]+}} \] There is a predefined set of built-in infix operators (see Fig.~\ref{builtin_infixes}); additionally an end-user can define custom infix operators (see Section~\ref{sec:custom_infix}). Note, sometimes additional whitespaces are required to disambiguate infix operator applications. For example, if a custom infix operator "\lstinline|+-|" is defined, then the expression "\lstinline|a +- b|" can no longer be recognized as "\lstinline|a +(-b)|". Note also that a custom operator containing "\lstinline|--|" can not be defined due to lexical conventions. \subsection{Delimiters} The following symbols are treated as delimiters: \begin{lstlisting} . , ( ) { } ; # -> | \end{lstlisting} Note, custom infix operators can coincide with delimiters "\lstinline|#|", "\lstinline!|!", and "\lstinline|->|", which can sometimes be misleading.