LLK

Glossary terms explained

LLK Terms

  • Lookahead set
  • Lookahead set at a given depth k is the set of possible tokens at k tokens ahead from the current token in the alternative. The parser can choose between two alternatives if there is a depth, the deterministic depth, that the lookahead set for the two alternatives at that depth do not intersect.
  • Deterministic depth
  • The minimum lookahead depth in which alternatives are have disjoint lookahead set.
  • Incomplete depths
  • When calculating the lookahead set of a rule reference at a paritcular depth, the end of the rule may be reached before the required depth. The remaining depth is marked as the incomplete depth. For example:
    void a() : {} 
    {
        b() A B E F G
    }
    void b() : {}
    {
        C 
    }
    
    %LA(5) b() has incomplete depth= 4, since b only has a depth of 1. %LA(5) a() would see lookahead set {F}.
  • A lookahead set usually has a Set of incomplete depths since there may be more than one incomplete depth, eg. when alternatives with different length are encountered. The following has multiple incomplete depths,
    void a() : {} 
    {
        b() A B E F G;
    }
    void b() : {} 
    {
        D C  // initial k=5, incomplete depth=3
        | 
        C  // initial k=5, incomplete depth=4
    }
    
    Here, LOOKAHEAD(5) b() returns a set of incomplete depths {3, 4}. Those are the lookahead depths past the rule ref b() needed for the local follow.
  • Epsilon flag
  • Do not confuse the epsilon flag with the incomplete depths (for such purpose, LLK use name incomplete depths instead of epsilon depths). Incomplete depths are normal condition and are intermediate results, lookahead set calculation may continue. While epsilon flag indicates a terminating condition (ie. lookahead set calculation has to stop). For example when requesting the lookahead set at depth beyond depth of a syntactic lookahead, or there is reference cycle, it return epsilon.
  • Greediness
  • When greedy is asserted true, ()?, ()+ and ()* loops have priority over the exit path and suppress conflict warnings.
    ILLKToken STRING() :
    {
        '"' 
        (
            c=ESCAPE()
            |
            (
                '\n'
                | 
                '\r' ( %GREEDY : '\n')?
            )
            |
            ~<"\"\r\n\\">
        )*
        '"'
    }
    
  • For non-greedy situation, use semantic lookahead. Example:
    void comment(): {}
    {
        "/*"
        (
            ( '\r' '\n' | '\r' | '\n' )
            |
            %LA({!(LA(1)=='*' && LA(2)=='/')})
            ~( '\n' | '\r' )
        )*
        "*/"
    }
    
  • Vocabulary
  • Vocabulary is a set of token types. Each grammar has an input vocabulary (ie. token types it can recognize in the input stream) and an output vocabulary (the token types that it can produce). For lexer, the input vocabulary is the valid characters, the output vocabulary are the output lexer tokens. The input vocabulary of a parser is the output vocabulary of its lexer, ... etc.
  • Keyword section
  • Keywords declared in keyword declarations are not rules. They are simply entered into a keyword table. Lexer rule, such as IDENTIFIER, has to explicitly lookup the keyword with builtin methods and return the proper token type.
  • LLK provide syntax to support context dependent keywords. Keyword declaration can specify a context in which the keyword would be enabled. Example:
    GET(accessor)="get";
    
    Two builtin methods llkEnableContext(String context) and llkDisableContext(String context) can be used in parser to enable/disable a particular context. All contexts by default are enabled at startup.
  • In the following example, keyword lookup is preformed after the ID() rule is matched. If the accessor context is disabled, lookup for "get" and "set" would fail and token type ID would be returned.
    KEYWORDS {
        PUBLIC="public";
        GET(accessor)="get";
        SET(accessor)="set";
    }
    
    void ID() : {
        int start=llkGetOffset();
    }
    {
        ( 'a'..'z' | 'A'..'Z' | '_' | '$' )
        ( 'a'..'z' | 'A'..'Z' | '_' | '$' | '0'..'9' )*
        { 
            llkType=lkLookupKeyword(llkSource, start, llkGetOffset(), ID); 
        }
    }
    
  • Literal rule
  • Literal rule is special simplified syntax in lexer grammar to declare literal string tokens and thus it accept only string references on the right hand side. Example:
    ASSIGN: "=";
    EQUAL: "==";
    
    In addition to the simplified syntax, literal rule allows optimization during code generation to take advantage of the fact that they always have finite deterministic depth. Conflicts among literal rules are implicitly resolved by giving higher priority to longer match. The Lookahead option for the lexer grammar is adjusted to accommodate the max. lookahead depth (k) required for the literal rules. Conflicts between literals rules and normal lexer rules, with the adjusted lookahead depth, are still reported and need to be resolved explicitly.