Lookahead syntax
There are different types of lookaheads:
// Local lookahead options
%LA=n | DEBUG
// Semantic lookahead
%LA([n, [max,]] {...})
// Fixed syntactic lookahead
%LA(n)
// Variable syntactic lookahead
%LA([n, [max,]] ...)
// Mixed syntactic lookahead
%LA([n, [max,]] (... {...})+)
- Local lookahead options
- Local lookahead option specify options for the whole Choices block (instead of just for the
individual alternative as for syntax or semantic lookaheads).
- %LA=n specify the local lookahead depth, k, for the Choices block which override the
global Lookahead=n option. It is like saying %LA(n) for each alternative, but there is a subtle
difference. %LA(n) would suppress conflict warnings while %LA=n would not.
- %LA=DEBUG ask LLK to dump debug information for that Choices block.
- Fixed syntactic lookahead
- Lookahead the specified number of tokens (
n ) from the lookahead set of the
alternative. If matched, commit to the alternative.
- CAUTION: Since LLK lookahead set is a linear approximation, beware of pitfalls when
using fixed lookahead of depth more than 1. Example:
%LA(2)
[ "extern" ] DestructorDeclaration()
|
[ modifiers() ]
(
Type()
...
Here, %LA(2) for the first alternative see lookahead sets at k=1 { EXTERN, TILDE }, k=2
{ TILDE, IDENTIFIER }. This actually included token sequence EXTERN IDENTIFIER in the alternative below.
There is no conflict warning. LLK assumed user is intentionally giving priority to the
DestructorDeclaration() through the explicit lookahead clause, which is incorrect. More specific lookahead
eg. %LA( ["extern"] "~") should be used here.
- Semantic lookahead
- If the semantic predict boolean expression returns true, commit to the alternative. Fixed
syntactic looakhead can be used together with semantic lookahead, example:
(
%LA(1, { !(LA(2) == IDENTIFIER && LA(3) == ASSIGN) })
<COMMA>
Expression()
)*
equivalent to:
(
%LA(0, 1, { LA1()==COMMA && !(LA(2) == IDENTIFIER && LA(3) == ASSIGN) })
<COMMA>
Expression()
)*
would generate code like this:
_loop1 : while (true) {
if ((LA1() == COMMA) && (!(LA(2) == IDENTIFIER && LA(3) == ASSIGN))) {
llkInput.consume();
Expression();
} else {
break _loop1;
}
Note that there is an optional second integer, max , when the fixed looakhead is used with
semantic or syntactic lookaheads. It specify number of tokens that LLK can assume the lookahead has
verified (verified depth ). Typically, it is same as the fixed lookahead, n .
However, if the semantic lookahead expression checked additional tokens, max (> n) can be
specified to inform LLK about that so it may skip some checkings. If neither n or max is specified, both are default to 0.
- CAUTION Don't increase verified depth if the semantic predict is a superset of the
alternative lookahead set. For example,
%LA(1, { LA(2)!=PAREN })
<COMMA> <IDENTIFIER>
|
...
Here, it is still neccessary to check that LA(2) is indeed an IDENTIFIER.
- Variable/Mixed syntactic lookahead
- Lookahead whatever tokens specified in the syntactic predict. Syntactic predict is just like
a rule except that no lookahead and user actions are allowed. LLK generate a seperate method for each
syntactic predict (eg.
_synPredict0() ).
- LLK allow semantic predicts, that looks like a user action, to be embedded inside a syntactic
predict. Predict fail if any embedded semantic predict evaluated as false. For example:
%LA(modifiers() { LA1() != CLASS })
Here, LA1() look at the token that follows modifiers().
- Note that unlike pure semantic predict that is evaluate inline inside the rule method,
embedded semantic predicts are evaluated in the syntactic predict method (eg.
_synPredict0() ).
As such, embedded semantic predicts should not reference any local variables in the rule method.
- Variable syntactic lookahead may reference other rules. That works very much like the normal
ruleRef(). However, LLK generate a separate set of methods (syntactic predict rule methods with name eg.
_syn_modifiers() ) for each rule that is involved in syntactic predicts. Syntactic predict
methods, however, discard all the normal parameters, and user actions in the rule. And so, typically,
ruleRef() inside variable syntactic predict have no arguments even though the normal rule method require
arguments. Example:
void AttributeSection(FormatBuffer buf) #AttributeSection : {}
{
%LA(attributeTarget() <COLON>)
attributeTarget(buf)
...
}
void attributeTarget(FormatBuffer buf) #void : {}
{
...
}
Note that lookahead to attributeTarget() has no argument although the attributeTarget() rule requires a FormatBuffer parameter.
Advanced lookahead topics
- There are some occassions that syntactic lookahead may need to pass variables to the
referenced rules. For example, the callee has semantic predicts that needs reference to the variables.
There are several constructs in LLK to cater for such situations, namely, Inlined Sytactic
Lookahead, Syntactic Lookahead Parameter and Syntactic Lookahead Actions. Here is an
example:
void typeSpecifier(... % int count /* Syntactic Lookahead Parameter */) : {}
{
...
%LA({ count == 0 })
typedefName()
...
}
void declSpecifiers() : {}
%{ int count = 0; } // Syntactic Lookahead Variable Declaration
{
(
...
%LA %(typeSpecifier(count)) // Inlined Syntactic Lookahead
typeSpecifier(count)
%{ ++count; } // Syntactic Lookahead Action
...
)+
}
- Syntactic Lookahead Parameter
- Here
typeSpecifier() required to use parameter count for the
semantic predict. typeSpecifier() is also used in syntactic predicts, so a syntactic predict method
_syn_typeSpecifier() has to be generated. The '%' separator ask LLK code generator to use parameters after
'%' in the syntactic lookahead method, ie. generate _syn_typeSpecifier(int count) instead of
_syn_typeSpecifier() . The normal rule method typeSpecifier(..., int count) has
both the normal parameters and the syntactic lookahead parameters.
- Inlined Syntactic Lookahead
- Typical syntactic lookahead inside %LA() cannot pass arguments to the referenced
rules. Because such predicts are generated in a separate method that take no arguments (eg.
_synPredict0()). In order to pass variables to a syntactic lookahead, the special Inlined Syntactic
Lookahead syntax has to be used to ask LLK to inline the method call so that variables in the caller
method can be used as arguments in the syntactic lookahead. For example, LLK would generate code like this
for the above example:
_synMark() && _synRewind(typeSpecifier(count))
as part of the lookahead expression. Inlined syntactic lookahead syntax only allow a single ruleRef(). For
more complicate syntactic lookahead, package the lookahead into a dedicated rule and call it using the
inlined syntactic lookahead syntax.
- Syntactic Lookahead Parameter is also a normal parameter for the normal rule method. Thus the
count argument is required for typeSpecifier() in both the syntactic lookahead
and the normal rule reference.
- Syntactic Lookahead Action
- Typically, LLK strip all user action from the syntactic lookahead methods. To execute actions
in syntactic lookahead methods, the syntactic lookahead action syntax %{ action(); } is required. As for
the syntactic lookahead parameter, the syntactic lookahead action are generated in both the normal and the
syntactic lookahead methods. For example, %{ ++count; } in the above example is required to mantain the
state of the count variable inside the syntactic lookahead method
_syn_declSpecifiers() as
well as in the normal rule method declSpecifiers() .
- Note that
count is incremented in declSpecifiers() , not in typeSpecifier() . Otherwise, it would be incremented in both the syntactic lookahead method call and
the normal rule method call to typeSpecifier() .
- Return values from Lookahead
- By passing an array or an object to an Inlined Syntactic Lookahead, it is possible to return
values to the caller. Example:
void rule1(): {}
{
...
%LA (MemberName() <LPAREN>)
MethodDeclaration(mod, attrs)
|
%LA(MemberName() <LBRACE>)
PropertyDeclaration(mod, attrs)
...
}
Here the two lookaheads have same prefix MemberName() and require to look into MemberName if it is not a
MethodDeclaration. The second lookahead into MemberName() can be avoided if a flag is returned from the
first lookahead:
void rule1() : {
int[] ret=new int[1];
}
{
...
%LA %(lookaheadMemberName(ret))
MethodDeclaration(mod, attrs)
|
%LA({ ret[0] == OPEN_BRACE })
PropertyDeclaration(mod, attrs)
...
}
void lookaheadMemberName(% int[] ret) {
}
{
MemberName()
%{ ret[0]=LA1(); }
<PAREN>
}
Here, lookaheadMemberName() returns the token after MemberName() and thus a simple semantic predict can be
used thereafter to check what follows MemeberName().
- %LA(1) and %LA(1, {...}) vs %LA(<COMMA>)
- There is subtle difference between the first two and the third. The first two, fixed
lookaheads, look into the LookaheadSet of the alternative while the later looks into the LookaheadSet of
the specified syntactic lookahead (ie. <COMMA>).
- %LA(1, { LA(2)==IDENTIFIER }) vs %LA(<COMMA> { LA1()==IDENTIFIER })
- The two lookaheads are functionally the same, but there is subtle difference between the
semantic predicts in the two lookaheads. The semantic predict in the first lookahead is looking ahead from
offset before the COMMA token while in the second, it lookahead from offset after COMMA.
- Is alternatives checked in order or in parallel?
- LLK generate a switch() statement for alternatives whose lookahead set are mutually
exclusive. If there are conflicts, LLK generate a series of if/else block in order that the alternatives
appear in the grammar.
- Cautions when using %LA
- User has to be careful when adding %LA to resolve conflicts. An alternative with
lookahead means whenever the lookahead is met, commit to the alternative. %LA give the alternative
priority over other conflicting alternatives and suppress any conflict warnings. It is the user's
responsibility to make sure that the lookahead is resolving the conflict correctly. Example:
void ClassMember(): {}
{
%LA(1)
ConstructorDeclaration()
|
MethodDeclaration()
...
}
void ConstructorDeclaration() : {}
{
<IDENTIFIER> <OPEN_PAREN>
...
}
void MethodDeclaration(): {}
{
type() qualifiedIdentifer() <OPEN_PAREN>
...
}
Here ConstructorDeclaration() and MethodDeclaration() has conflict, however, it is not reported.
%LA(1) has given priority to ConstructorDeclaration() whenever an IDENTIFIER is seen although that
lookahead is not sufficient to resolve the conflict. %LA(2) is requried here.
|