LLK Terms
- Lookahead set
- Lookahead set at a given depth k is the set of possible tokens at k tokens ahead from the
current token in the alternative. The parser can choose between two alternatives if there is a depth, the
deterministic depth, that the lookahead set for the two alternatives at that depth do not intersect.
- Deterministic depth
- The minimum lookahead depth in which alternatives are have disjoint lookahead set.
- Incomplete depths
- When calculating the lookahead set of a rule reference at a paritcular depth, the end of the
rule may be reached before the required depth. The remaining depth is marked as the incomplete depth. For
example:
void a() : {}
{
b() A B E F G
}
void b() : {}
{
C
}
%LA(5) b() has incomplete depth= 4, since b only has a depth of 1. %LA(5) a() would see lookahead set
{F}.
- A lookahead set usually has a
Set of incomplete depths since there may be more
than one incomplete depth, eg. when alternatives with different length are encountered. The following has
multiple incomplete depths,
void a() : {}
{
b() A B E F G;
}
void b() : {}
{
D C // initial k=5, incomplete depth=3
|
C // initial k=5, incomplete depth=4
}
Here, LOOKAHEAD(5) b() returns a set of incomplete depths {3, 4}. Those are the lookahead depths past the
rule ref b() needed for the local follow.
- Epsilon flag
- Do not confuse the epsilon flag with the incomplete depths (for such purpose, LLK use name
incomplete depths instead of epsilon depths). Incomplete depths are normal condition and are intermediate
results, lookahead set calculation may continue. While epsilon flag indicates a terminating condition (ie.
lookahead set calculation has to stop). For example when requesting the lookahead set at depth beyond depth
of a syntactic lookahead, or there is reference cycle, it return epsilon.
- Greediness
- When greedy is asserted true, ()?, ()+ and ()* loops have priority over the exit path and
suppress conflict warnings.
ILLKToken STRING() :
{
'"'
(
c=ESCAPE()
|
(
'\n'
|
'\r' ( %GREEDY : '\n')?
)
|
~<"\"\r\n\\">
)*
'"'
}
- For non-greedy situation, use semantic lookahead. Example:
void comment(): {}
{
"/*"
(
( '\r' '\n' | '\r' | '\n' )
|
%LA({!(LA(1)=='*' && LA(2)=='/')})
~( '\n' | '\r' )
)*
"*/"
}
- Vocabulary
- Vocabulary is a set of token types. Each grammar has an input vocabulary (ie. token types it
can recognize in the input stream) and an output vocabulary (the token types that it can produce). For
lexer, the input vocabulary is the valid characters, the output vocabulary are the output lexer tokens. The
input vocabulary of a parser is the output vocabulary of its lexer, ... etc.
- Keyword section
- Keywords declared in keyword declarations are not rules. They are simply entered into a
keyword table. Lexer rule, such as IDENTIFIER, has to explicitly lookup the keyword with builtin methods
and return the proper token type.
- LLK provide syntax to support context dependent keywords. Keyword declaration can specify a
context in which the keyword would be enabled. Example:
GET(accessor)="get";
Two builtin methods llkEnableContext(String context) and llkDisableContext(String
context) can be used in parser to enable/disable a particular context. All contexts by default are
enabled at startup.
- In the following example, keyword lookup is preformed after the ID() rule is matched. If the
accessor context is disabled, lookup for "get" and "set" would fail and token type ID would be
returned.
KEYWORDS {
PUBLIC="public";
GET(accessor)="get";
SET(accessor)="set";
}
void ID() : {
int start=llkGetOffset();
}
{
( 'a'..'z' | 'A'..'Z' | '_' | '$' )
( 'a'..'z' | 'A'..'Z' | '_' | '$' | '0'..'9' )*
{
llkType=lkLookupKeyword(llkSource, start, llkGetOffset(), ID);
}
}
- Literal rule
- Literal rule is special simplified syntax in lexer grammar to declare literal string tokens
and thus it accept only string references on the right hand side. Example:
ASSIGN: "=";
EQUAL: "==";
In addition to the simplified syntax, literal rule allows optimization during code generation to take
advantage of the fact that they always have finite deterministic depth. Conflicts among literal rules are
implicitly resolved by giving higher priority to longer match. The Lookahead option for the lexer grammar
is adjusted to accommodate the max. lookahead depth (k) required for the literal rules. Conflicts between
literals rules and normal lexer rules, with the adjusted lookahead depth, are still reported and need to be
resolved explicitly.
|