LLK grammar syntax changes:
- REMOVED GenSwitchThreshold and GenBitSetTestThreshold is now reserved for testing
purpose. Normal use should use the default values.
- NEW - Lexer now accept the IgnoreCase options in the %OPTIONS section. When the
option is set to true, lexer would create a LLKLexerInput that would perform look ahead and
token matching on lower case of the input text. Grammar should specify in lower case only.
Token text in created tokens, however, would preserve case as in original text. If user
instantiate LLKLexerInput for the lexer, use the LLKLexerInput(char[] source, boolean
ignorecase) constructor.
- NEW - Choices element now accept
%LA=GREEDY: option to suppress
choice-follow conflict warnings and explicitly assert that the default behavour of matching
greedily is correct. There is no NONGREEDY option, use semantic predict if non-greedy is
required.
- NEW - In addition to ()? ()+ and ()* EBNF syntax, LLK now also accept ():min[,max]
syntax to specify min and max occurences of the element. If max is not specified, max would be
same as min, eg. (Hex()):4 matches four occurences of Hex().
-
NEW - rule syntax is changed to reduce some redundant typing. Rules with init.
action no longer need the ':' before the init. action :
rule() #void {
boolean ok = true;
}
{
choices()
}
Rules without init. action no longer require the dummy {} :
rule() #void :
{
choices()
}
-
NEW - LLK now support context switching in lexer. There is a new construct that
works just like a switch statement:
%CONTEXTS {
CONTEXT: NONE | PUTBACK;
}
rule():
{
switch(CONTEXT) {
case PUTBACK:
putback();
case NONE:
( ... )
}
}
where CONTEXT is the context variable and PUTBACK and NONE are the states. All
context variables has a default state of NONE. There are also llkSetContext(int
context, int state) method to switch the context variable between different states
in lexer which also take cares of flushing and rewinding the input stream.
-
NEW - Lexer rules can now prefixed with states of the default context (named
CONTEXT ):
<context1, context2> void lexer_rule1() :
{
...
}
When generating llkNextToken(), the rule would only be activated when the default context
is in the specified state.
-
NEW - Reference elements (charSet(), tokenSet(), treeRef(), stringRef(), ruleRef())
can now be sufficed by *+? and :min,max just like a ( Choices ) element. It is just a
shorthand for the equavilent choices of references eg.
charSet()?
iis same as
( charSet() )?
-
NEW - charSet() now accept a string which is same as specifying all the characters
in the string individually, example:
<"\r\n">
is same as
<'\r', '\n'>
-
NEW - tokenSet() now accept the following syntax to specify the lookahead token set
of the given rule at the given k. Example:
<%LA(1, Expression())>
specify is the lookahead token set of Expression() at k=1.
-
CHANGED - Syntax for specifying keyword context is changed to group keywords of same
context together :
<context> {
keyword1="string1";
keyword2="string2";
}
instead of:
keyword1(context)="string1";
- CHANGED - Instead of generating a token type only for public lexer rules, all lexer
rules now by default generate a token type of the same name as the rule name unless the rule
has the
#void attribute.
- CHANGED - Grammar options are no longer case sensitive.
- CHANGED - Reserved tokens are now named _EOF_, _INVALID_, _IGNORE_ and _NULL_
instead of EOF, INVALID, IGNORE and NULL.
-
CHANGED LLK no longer need the %HEADER prefix for the header section. An optional
header section can simply follow the %LEXER(), %PARSER() or %TREEPARSER() declarations.
Example:
%PARSER(LLKParser) {
import sf.llk.support.java.*; // header section.
}
public class LLKParser extends ... {
...
}
- CHANGED LLK no longer accept rules with same name as input tokens even if the rule
do not generate new token type.
- CHANGED Removed NodeType and INodeType options, now always use type LLKNode and
ILLKNode.
Implementation changes:
- CHANGED - When NodeScopeHook option is enabled, calls to llkTree.open() and
llkTree.close() are no longer generated. Grammar should call them in the llkOpenNode() and
llkCloseNode() methods. Also llkCloseNode now get the conditional create flag and has method
signature (ILLKNode node, boolean create) instead of llkCloseNode(ILLKNode node) so that
llkTree.close(ILLKNode node, boolean create) can be called for conditional node. Also
llkCloseNode() would now be called for conditional node even if the conditional evaluate to
false and do not create any nodes.
- CHANGED - LLKTree.open() and LLKTree.clearScope() no longer takes an ILLKNode
parameter.
- CHANGED - TokenType xml file is now generated to the output directory instead of the
interface/support directory.
- CHANGED LLKMain now default to perform AST optimization and use -notrom to turn it
off (instead of using -trim to turn on optimization).
- CHANGED ILLKMain.getFilename() used to return the full filepath is renamed to
getFilepath(). Added ILLKMain.getSimpleFilename() that return a simple filename of the full
filepath.
- CHANGED ILLKNode is massively changed. It now use simplier method names, eg.
getFirstChild() is renamed to getFirst() and getNextSibling() is renamed to getNext(), ...etc.
There are also a few new methods, eg. remove(ILLKNode start, ILLKNode end), ... etc.
- CHANGED LLKNode now compare with equals() instead of == when searching the children
list.
- CHANGED ILLKParserInput.match*() methods are moved to the parser itself and renamed
as _match*() methods to make the interface cleaner.
- CHANGED ILLKInput, ILLKLexer, ILLKParser now declare the clone() method.
ILLKLexerInput, ILLKParserInput and ILLKTreeParserInput now has the clone() method. The clone()
method for the lexer, parser and treeparser is, however, not generated and the grammar must
declare one, may be just a dummy clone() method.
- CHANGED ILLKNode, LLKNode added
Object getValue() and void
setValue(Object v) methods.
- CHANGED Added
IDirectiveHandler which by default is not used.
- CHANGED Added
setLocation(ISourceLocation loc) to
ISourceLocation interface.
- CHANGED
ILLKToken , LLKToken has changed a lot. The
int value field is replaced by an Object data field to store arbitary
data. Also LLKToken no longer keep the line and column information,
instead it only keep the start and end offset.
-
CHANGED Since LLKToken
int value field changed to
Object data , lexer llkCreateToken() method signatures are
changed. Now provide the following three methods.
public ILLKToken llkCreateToken(int type, ISourceLocation start, int end)
public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, Object data)
public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, String text, Object data)
instead of
public ILLKToken llkCreateToken(int type, ISourceLocation start, int end)
public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, String text)
public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, int value)
public ILLKToken llkCreateToken(int type, ISourceLocation start, int end, int value, String text)
where the first two methods create token with/without text according to the lexer's
TokenUseText options.
NOTE: Please check all llkCreateToken() and new LLKToken() calls to make sure they
conform to the new method signatures. Incorrect data and/or text field may cause
difficult to track bugs.
- CHANGED
ILLKNode , LLKNode has changed a lot too. It now
keep the end offset instead of the last token.
- CHANGED Removed lexer
_tokenHead field which is basically not
used.
- CHANGED LLK generated code no longer use
_ prefix for private
variables, only the llk prefix is used instead.
- CHANGED There are many changes to the
ISourceLocator stuffs.
SourceLocator, an implementation of ISourceLocator is now part of ILLKMain and is
solely reponsible for keep track of the filename, line and column information. There are many
related changes to ILLKLexerInput , ILLKParserInput and
ILLKTreeParserInput too. Please check the source code of the corresponding
filesfor details.
- REMOVED -
ILLKNode.setName() and ILLKNode.getName() , in most case
setText()/getText() can be used instead.
Enchancements:
- Improved choice-follow conflict warning reporting. Warning message now show the path to the
actual reference that caused the choice-follow conflict.
- Static keyword lookup generator now support infinite nested level (default to 8) instead of
just 2.
- Generated code no longer depend on blacksun-util.jar.
Bug fixes:
Sample grammars:
- NEW Added CSharp grammar (with generic) and support files. The csharp parser has
some preliminary support of parsing conditional sections into a single AST model. It is a work
in progress.
- NEW
llk-examples project has sample grammar to showcase the context dependent
lexer and user specified llkNextToken() rule features that are new in LLK v0.4.
- CHANGE Java grammar is updated to support Java5 syntax.
|