ANTLR4 学习

##5.1 Deriving Grammars from Language Samples

Proper grammar design mirrors functional decomposition or top-down design
in the programming world.

  • 第一步:找到 start rule
    例如,“a comma-separated-value (CSV) file is a sequence of
    rows terminated by newlines.”
    file : «sequence of rows that are terminated by newlines» ;
  • 逐步向下 by describing the elements identified on the right side of the start rule.
    依然以CSV为例:
    file : «sequence of rows that are terminated by newlines» ;
    row : «sequence of fields separated by commas» ;
    field : «number or string» ;

再以Java为例,

1
2
3
4
5
6
7
8
9
10
compilationUnit : «optional packageSpec then classDefinitions» ;
packageSpec : 'package' identifier ';' ;
classDefinition :
'class' «optional superclassSpec optional implementsClause classBody» ;
superclassSpec : 'super' identifier ;
implementsClause :
'implements' «one or more identifiers separated by comma» ;
classBody : '{' «zero-or-more members» '}' ;
member : «nested classDefinition or field or method» ;
...

##5.2 Using Existing Grammars as a Guide
##5.3 Recognizing Common Language Patterns with ANTLR Grammars
常用语言模式:

  • sequence
    CSV又是一个很好的例子:
    file : (row '\n')* ; // sequence with a '\n' terminator
    row : field (',' field)* ; // sequence with a ',' separator
    field: INT ; // assume fields are just integers
  • choice
    当出现某种语言模式可以是这个也可以是那个的想法时,就应该使用choice (|)
    type: 'float' | 'int' | 'void' ; // user-defined types
  • token dependence
    vector : '[' INT+ ']' ; // [1], [1 2], [1 2 3], ...
  • nested phrase
    1
    2
    3
    4
    expr: ID '[' expr ']' // a[1], a[b[1]], a[(2*b[1])]
    | '(' expr ')' // (1), (a[1]), (((1))), (2*a[1])
    | INT // 1, 94117
    ;