Unlike parsers, user do not call the lexer rules directly. The top level lexer rule,
llkNextToken(), is usually very tedious to write. So LLK synthesize the llkNextToken() rule implictily. The
llkNextToken() rule has ruleRef() to each public (literal/lexer) rules in the lexer grammar (that are not
annotated by #void) as its alternatives. llkNextToken() also resolve conflicts as much as it can
implicitly.
User can suppress LLK from synthesizing the llkNextToken() rule, by specifying a custom
llkNextToken rule in the lexer grammar:
public void llkNextToken() ::
{
...
}
There are a number of things that llkNextToken works differently from the normal lexer rule. In particular,
llkNextToken rule use two global variables (_token and _llkType) to create the
return token. _token holds the token to be returned (which is initialized to null implicitly
before the llkNextToken rule is invoked). If _token is null, a new token with type specified by _llkType is create and returned. See also example in the sample grammars for detail usage.
Lexer Context
Since v0.4, LLK support a context switching construct in the lexer grammar which is
particular useful for use in custom llkNextToken() rule to implement context dependent lexer.
public void llkNextToken() ::
{
switch(CONTEXT) {
case TAG
tag()
|
case STRING
string()
|
case TEXT:
text()
}
}
Above declared a lexer context named CONTEXT with four states (CONTEXT_NONE,
CONTEXT_TAG, CONTEXT_STRING and CONTEXT_TEXT). At any one time, only
one state can be valid and thus the three choices in the switch() construct are implicitly mutually
exclusive. llkSetContext(int context, int state) builtin method would set the state of the
specified context. The state of the lexer context can also be changed from parser using the llkSetContext(int context, int state, ILLKToken lt0) method which also take case of rewinding the
token stream to end of lt0.
Keyword rules
LLK lexer grammar use special syntax to explicitly specify keywords and literals and they receive special treatment and
optimization during code generation.
Keywords are declared in a KEYWORDS section, they are entered into a keyword table that can
be looked up with builtin method int llkLookupKeyword(...) methods in actions of any lexer
rules. Keywords are public and represent valid token types that can be referenced in the parsers.
Multiple string values can be declared for each keyword token. Example:
%KEYWORDS {
PUBLIC = "public";
PROTECTED = "protected";
CONST = "const" | "__const" | "__const__";
case Property {
GET = "get";
SET = "set";
}
...
}
Keyword contexts
Keywords can optionally be qualified by a keyword context such that keyword would only be
recognized when the corresponding context is activated. Keyword contexts are default to be deactivated.
Keyword context can be activated by llkEnableKeyword() and deactivated by llkDisableKeyword() builtin methods in parsers (not in lexer). Multiple keyword contexts can be
activated at any point of time and they are independent of the lexer rule contexts.
Literal rules
Literal rules declares simple string literal tokens. Literal rules can only have stringRef(),
nothing else (ie. no action, no return value ... etc). Literal rule can optionally be qualified by a
context just like normal lexer rules. Multiple string values can be declared for each literal rule.
Example:
Since literal rules have fixed length, code generate can automatically resolve any conflicts between them
by given longest match the higher priority. Conflicts with normal lexer rule still need to be resoved
explicitly in the grammar. Literal rules can be public or protected.
Normal lexer rule.
Normal lexer rule have standard LL(k) rule syntax. Lexer rules can be public, protected or
private. Normal lexer rule implicitly return the token type (via the llkType variable) to its caller,
typically llkNextToken() unless annotated with a #void attribute. Normal lexer rule accept the
following attributes: SPECIAL, IGNORE, IGNORE_CASE and void. If the token type is not TokenType.IGNORE, the
llkNextToken() rule implicitly create a LLKToken from the token type. Public lexer rule can also explicitly
return LLKToken (with llkCreateToken() builtin methods). In that case, llkNextToken() simply return that
token. Example: