#Chapter 2 The Big Picture
##2.1 Let’s Get Meta!
- interpreter
if an application computes or “executes” sentences, we call that
application an interpreter. Examples include calculators, configuration file
readers, and Python interpreters. - translator
If we’re converting sentences from one language to another, we call that application a translator. Examples include Java to C# converters and compilers. - parsers or syntax analyzers
Programs that recognize (identify the various components in a phrase and differentiate it from other phrases) languages. - Two stages of parsing
- lexical analysis or tokenizing (lexer)
The process of grouping characters into words or symbols (tokens) - tokens
Tokens consist of at least two pieces of information: the token type (identifying the lexical structure) and the text matched for that token by the lexer. - parser
The second stage is the actual parser and feeds off of these tokens to recognize
the sentence structure.
By default, ANTLR-generated parsers build a data structure called a parse tree or syntax
tree that records how the parser recognized the structure of the input sentence
and its component phrases.
- lexical analysis or tokenizing (lexer)
We would specify phrase structure with a set of rules. Here is the grammar rule that corresponds to the first level of the assign subtree from the diagram”:assign : ID '=' expr ';' ; // match an assignment statement like "sp = 100;"
##2.2 Implementing Parsers (之后有时间重新读一遍)
The ANTLR tool generates recursive-descent parsers from grammar rules.
A more general term for this kind of parsing is top-down parsing; recursive-descent parsers are just one kind of top-down parser implementation.
The stat symbol means calling method stat() for the parse tree. To get an idea of what recursive-descent parsers look like, here’s the (slightly cleaned up) method that ANTLR generates for rule assign:
1 | // assign : ID '=' expr ';' ; |
For example, the stat rule that invokes assign
likely has a list of other kinds of statements.
1 | /** Match any kind of statement starting at the current input position */ |
A parsing rule for stat looks like a switch.
1 | void stat() { |
##2.3 You Can’t Put Too Much Water into a Nuclear Reactor (ambiguous phrase)
We have to provide unambiguous grammars.
- duplicate is an obvious ambiguities
- some more subtle ambiguities:Here are the two interpretations of input f(); starting in rule stat:
1
2
3
4
5
6stat: expr ';' // expression statement
| ID '(' ')' ';' // function call statement
;
expr: ID '(' ')'
| INT
;
Ambiguities can occur in the lexer as well as the parser, but ANTLR resolves
them so the rules behave naturally
1 | BEGIN : 'begin' ; // match b-e-g-i-n sequence; ambiguity resolves to BEGIN |
that input beginner would match only to rule ID. The lexer would not match beginner as BEGIN followed by an ID matching input ner.
##2.4 Building Language Applications Using Parse Trees