LLK Release 0.4

Change history

LLK grammar syntax changes:

  • REMOVED GenSwitchThreshold and GenBitSetTestThreshold is now reserved for testing purpose. Normal use should use the default values.
  • NEW - Lexer now accept the IgnoreCase options in the %OPTIONS section. When the option is set to true, lexer would create a LLKLexerInput that would perform look ahead and token matching on lower case of the input text. Grammar should specify in lower case only. Token text in created tokens, however, would preserve case as in original text. If user instantiate LLKLexerInput for the lexer, use the LLKLexerInput(char[] source, boolean ignorecase) constructor.
  • NEW - Choices element now accept %LA=GREEDY: option to suppress choice-follow conflict warnings and explicitly assert that the default behavour of matching greedily is correct. There is no NONGREEDY option, use semantic predict if non-greedy is required.
  • NEW - In addition to ()? ()+ and ()* EBNF syntax, LLK now also accept ():min[,max] syntax to specify min and max occurences of the element. If max is not specified, max would be same as min, eg. (Hex()):4 matches four occurences of Hex().
  • NEW - rule syntax is changed to reduce some redundant typing. Rules with init. action no longer need the ':' before the init. action :
        rule() #void {
            boolean ok = true;
        }
        {
            choices()
        }
    
    Rules without init. action no longer require the dummy {} :
        rule() #void :
        {
            choices()
        }
    
  • NEW - LLK now support context switching in lexer. There is a new construct that works just like a switch statement:
        %CONTEXTS {
            CONTEXT: NONE | PUTBACK;
        }
        rule():
        {
            switch(CONTEXT) {
                case PUTBACK:
                    putback();
                case NONE:
                    ( ... )
            }
        }
    
    where CONTEXT is the context variable and PUTBACK and NONE are the states. All context variables has a default state of NONE. There are also llkSetContext(int context, int state) method to switch the context variable between different states in lexer which also take cares of flushing and rewinding the input stream.
  • NEW - Lexer rules can now prefixed with states of the default context (named CONTEXT):
        <context1, context2> void lexer_rule1() :
        {
            ...
        }
    
    When generating llkNextToken(), the rule would only be activated when the default context is in the specified state.
  • NEW - Reference elements (charSet(), tokenSet(), treeRef(), stringRef(), ruleRef()) can now be sufficed by *+? and :min,max just like a ( Choices ) element. It is just a shorthand for the equavilent choices of references eg.
        charSet()?
    
    iis same as
        ( charSet() )?
    
  • NEW - charSet() now accept a string which is same as specifying all the characters in the string individually, example:
        <"\r\n">
    
    is same as
        <'\r', '\n'>
    
  • NEW - tokenSet() now accept the following syntax to specify the lookahead token set of the given rule at the given k. Example:
        <%LA(1, Expression())>
    
    specify is the lookahead token set of Expression() at k=1.
  • CHANGED - Syntax for specifying keyword context is changed to group keywords of same context together :
        <context> {
            keyword1="string1";   
            keyword2="string2";
        }
    
    instead of:
        keyword1(context)="string1";
    
  • CHANGED - Instead of generating a token type only for public lexer rules, all lexer rules now by default generate a token type of the same name as the rule name unless the rule has the #void attribute.
  • CHANGED - Grammar options are no longer case sensitive.
  • CHANGED - Reserved tokens are now named _EOF_, _INVALID_, _IGNORE_ and _NULL_ instead of EOF, INVALID, IGNORE and NULL.
  • CHANGED LLK no longer need the %HEADER prefix for the header section. An optional header section can simply follow the %LEXER(), %PARSER() or %TREEPARSER() declarations. Example:
        %PARSER(LLKParser) {
            import sf.llk.support.java.*; // header section.
        }
        public class LLKParser extends ... {
            ...
        }
    
  • CHANGED LLK no longer accept rules with same name as input tokens even if the rule do not generate new token type.
  • CHANGED Removed NodeType and INodeType options, now always use type LLKNode and ILLKNode.

Implementation changes:

  • CHANGED - When NodeScopeHook option is enabled, calls to llkTree.open() and llkTree.close() are no longer generated. Grammar should call them in the llkOpenNode() and llkCloseNode() methods. Also llkCloseNode now get the conditional create flag and has method signature (ILLKNode node, boolean create) instead of llkCloseNode(ILLKNode node) so that llkTree.close(ILLKNode node, boolean create) can be called for conditional node. Also llkCloseNode() would now be called for conditional node even if the conditional evaluate to false and do not create any nodes.
  • CHANGED - LLKTree.open() and LLKTree.clearScope() no longer takes an ILLKNode parameter.
  • CHANGED - TokenType xml file is now generated to the output directory instead of the interface/support directory.
  • CHANGED LLKMain now default to perform AST optimization and use -notrom to turn it off (instead of using -trim to turn on optimization).
  • CHANGED ILLKMain.getFilename() used to return the full filepath is renamed to getFilepath(). Added ILLKMain.getSimpleFilename() that return a simple filename of the full filepath.
  • CHANGED ILLKNode is massively changed. It now use simplier method names, eg. getFirstChild() is renamed to getFirst() and getNextSibling() is renamed to getNext(), ...etc. There are also a few new methods, eg. remove(ILLKNode start, ILLKNode end), ... etc.
  • CHANGED LLKNode now compare with equals() instead of == when searching the children list.
  • CHANGED ILLKParserInput.match*() methods are moved to the parser itself and renamed as _match*() methods to make the interface cleaner.
  • CHANGED ILLKInput, ILLKLexer, ILLKParser now declare the clone() method. ILLKLexerInput, ILLKParserInput and ILLKTreeParserInput now has the clone() method. The clone() method for the lexer, parser and treeparser is, however, not generated and the grammar must declare one, may be just a dummy clone() method.
  • CHANGED ILLKNode, LLKNode added Object getValue() and void setValue(Object v) methods.
  • CHANGED Added IDirectiveHandler which by default is not used.
  • CHANGED Added setLocation(ISourceLocation loc) to ISourceLocation interface.
  • CHANGED ILLKToken, LLKToken has changed a lot. The int value field is replaced by an Object data field to store arbitary data. Also LLKToken no longer keep the line and column information, instead it only keep the start and end offset.
  • CHANGED Since LLKToken int value field changed to Object data, lexer llkCreateToken() method signatures are changed. Now provide the following three methods.
        public ILLKToken llkCreateToken(int type, ISourceLocation start, int end)
        public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, Object data)
        public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, String text, Object data)
    
    instead of
        public ILLKToken llkCreateToken(int type, ISourceLocation start, int end)
        public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, String text)
        public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, int value)
        public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, int value, String text)
    
    where the first two methods create token with/without text according to the lexer's TokenUseText options.

    NOTE: Please check all llkCreateToken() and new LLKToken() calls to make sure they conform to the new method signatures. Incorrect data and/or text field may cause difficult to track bugs.

  • CHANGED ILLKNode, LLKNode has changed a lot too. It now keep the end offset instead of the last token.
  • CHANGED Removed lexer _tokenHead field which is basically not used.
  • CHANGED LLK generated code no longer use _ prefix for private variables, only the llk prefix is used instead.
  • CHANGED There are many changes to the ISourceLocator stuffs. SourceLocator, an implementation of ISourceLocator is now part of ILLKMain and is solely reponsible for keep track of the filename, line and column information. There are many related changes to ILLKLexerInput, ILLKParserInput and ILLKTreeParserInput too. Please check the source code of the corresponding filesfor details.
  • REMOVED - ILLKNode.setName() and ILLKNode.getName(), in most case setText()/getText() can be used instead.

Enchancements:

  • Improved choice-follow conflict warning reporting. Warning message now show the path to the actual reference that caused the choice-follow conflict.
  • Static keyword lookup generator now support infinite nested level (default to 8) instead of just 2.
  • Generated code no longer depend on blacksun-util.jar.

Bug fixes:

  • NEW Code generator now emit code in the return action for literal rule in lexer grammar. Example:
        COMMA : "," | "=>"; { /* return action for literal rule. */ }
    
  • FIXED - NO_TEST_FOR_LAST_CASE debug flag should be false for normal operation. It is now removed.
  • FIXED Error choice is now handled by FixLLKTreeVisitor instead of LLKTreeTrimmerVisitor which would not be invoked when AST optimization is turned off.
  • FIXED Now generate correct code when lexer grammar specify the llkNextToken() rule.
  • FIXED Mixed lookahead should be treated as non-llk and need to be given priority over following alternatives.
  • FIXED Many other analyzer and code generator bug fixes.
  • FIXED llkCreateToken() that used llkGetOffset() instead of the end parameter as end of token.
  • FIXED IntList.java bound checks.

Sample grammars:

  • NEW Added CSharp grammar (with generic) and support files. The csharp parser has some preliminary support of parsing conditional sections into a single AST model. It is a work in progress.
  • NEW llk-examples project has sample grammar to showcase the context dependent lexer and user specified llkNextToken() rule features that are new in LLK v0.4.
  • CHANGE Java grammar is updated to support Java5 syntax.