Context-Free Grammars and Parsing
Context Free Grammar
The parsing process
input —— the sequence of tokens produced by the scanner
output ——the syntax tree
1、语法分析需要一个单词的时候调用一个词法分析。
2、 在单通道编译器中,解析器将合并编译器的所有其他阶段,
parse();
3、 更常见的是,编译器将是多路径的,进一步的过程将使用语法树作为输入。
One problem that is more difficult: the treatment of errors.
1、error in the scanner: generate an error token and consume the offending character.
2、error in the parser:
- report an error message 【报错】
- recover from the error and continue parsing (to find as many errors as possible). 【尽可能多发现错误】
- Sometimes, a parser may perform error repair.【修复错误】
Context-free Grammars
上下文无关语法使用递归,是一种编程语言语法结构的规范。
eg:
exp -> exp op exp | (exp) | number
op -> + | – | *
其中:
number = digit digit*
digit = 0|1|2|3|4|5|6|7|8|9
Derivations
Derivation 派生
语法规则通过派生来确定标记符号的合法字符串。
- 通过语法规则右侧的选项替换结构名称的序列。
- 派生以单个结构名称开始,以标记符号字符串结束。
- 在派生的每个步骤中,使用语法规则中的一个选项进行单个替换。
生成语言 L(G) = { s | exp =>* s }
1、 G表示表达式语法
2、 s表示任意的标记符号字符串(有时称为句子)
3、 符号=>*表示由前面描述的替换序列组成的派生。
4、 语法规则有时被称为产生式,因为它们通过派生“产生”L(G)中的字符串。
符号:
推导规则:->
单步推导:=>
多步推导:=>*
起始符号、非终结符、终结符
起始符号the start symbol:最一般的结构列在语法规则的第一位。
非终结符nonterminals:因为在派生过程中它们总是必须被进一步替换。
终结符terminals:字母表中的符号称为终结符,因为它们终止派生。终端通常是编译器应用程序中的tokens。
left recursive: the nonterminal A appears as the first symbol on the right-hand side of the rule defining A.
right recursive: the nonterminal A appears as the last symbol on the right-hand side of the rule defining A.
Egs:
规则 | 产生式 |
A -> (A) A | ε | |
stmt-sequence -> stmt ; stmt-sequence | stmt
stmt -> s |
L(G)= { s, s;s, s;s;s… } |
stmt-sequence -> stmt ; stmt-sequence | ε
stmt -> s |
L(G’)= { ε, s;, s;s;, s;s;s;,… ) |
stmt-sequence -> nonempty-stmt-sequence |ε
nonempty-stmt-sequence -> stmt ; nonempty-stmt-sequence | stmt stmt -> s |
L(G)= { ε, s, s;s, s;s;s… } |
修改文法的同时需要注意是否改变了原有的句意。
Parse trees and abstract syntax
A parse tree corresponding to a derivation is a labeled tree
- the interior nodes are labeled by nonterminals,
- the leaf nodes are labeled by terminals
- the children of each internal node represent the replacement of the associated nonterminal in one step of the derivation.
Abstract syntax trees
1、 这些树表示实际源代码令牌序列的抽象,并且不能从中恢复令牌序列(与解析树不同)。
2、 解析树是将普通的称为具体语法的结构与抽象语法进行比较时的一种表示。
3、 抽象语法可以用类似BNF的符号给出形式定义,就像具体语法一样。
typedef enum {Plus,Minus,Times} OpKind; typedef enum {Opk,Constk} ExpKind; typedef struct streenode { ExpKind kind; . OpKind op; struct streenode *lchild,*rchild; int val; } STreeNode; typedef STreeNode * SyntaxTree;
typedef enum {ExpK,StmtK} NodeKind; typedef enum {Zero,One} ExpKind; typedef enum {IfK,OtherK) StmtKind; typedef struct streenode { NodeKind kind; ExpKind ekind; . StmtKind skind; struct streenode *test,*thenpart,*elsepart; } STreeNode; typedef STreeNode * SyntaxTree;
使用链表存储孩子。
Ambiguous grammmars
an ambiguous grammar: a grammar that generates a string with two distinct parse trees is called an ambiguous grammar. 一种生成具有两个不同解析树的字符串的语法称为二义语法。
- Such a grammar represents a serious problem for a parser.
- it does not specify precisely the syntactic structure of a program.
两种基本方法:
1、在每种不明确的情况下,指定哪种解析树(或语法树)是正确的。这种规则称为消歧规则 disambiguating rule。
- 优点:它可以纠正歧义,而不会改变语法(也可能使语法复杂化)。
- 缺点:语言的句法结构不再仅仅由语法给出。
2、 将语法更改为强制构造正确解析树的形式。
*在这两种方法中,我们必须首先确定在不明确的情况下哪个树是正确的。
Precedence and associativity
将运算符分组为具有相同优先级的组,对于每个优先级,我们必须编写不同的规则。
例如,乘法优先于加法和减法可以添加到我们的简单表达式语法中,如下所示:
Precedence:
- exp -> exp addop exp | term
- addop -> + | –
- term -> term mulop term| factor
- mulop -> *
- factor -> ( exp ) | number
associativity:
- exp -> exp addop exp | term
- replacing the rule by
- exp -> exp addop term |term (left associative )
- exp -> term addop exp |term (right associative)
The dangling else problem
规则 | 用例 |
statement -> if-stmt | other
if-stmt -> if ( exp ) statement | if ( exp ) statement else statement exp -> 0 | 1 |
if (0) if (1) other else other |
This string has the two parse trees.
修改后不再具有二义性:
规则 | 用例 |
statement -> matched-stmt | unmatched-stmt
matched-stmt -> if ( exp ) matched-stmt else matched-stmt | other unmatched-stmt -> if ( exp ) statement | if ( exp ) matched-stmt else unmatched-stmt exp -> 0 | 1 |
其他解决方法:
- if & end if
- else & end if
EBNF
左右递归:
- A -> β { α } (left recursive)
- A -> { α } β (right recursive)
可选语句:
- statement -> if-stmt | other
- if-stmt -> if ( exp ) statement [ else statement ]
- exp -> 0 | 1
推导中称为句型(sentential form),推导终结称为句子(sentences)。
Top-Down Parsing
Top-down parsing
递归下降句法分析的思想:
- 非终端a作为过程识别a的定义的语法规则;
- A语法的右侧指定此过程的代码结构。
BNF | EBNF | 程序 |
factor →(expr) ∣ number | Procedure factor
BEGIN case token of ( : match( ( ); expr; match( )); number: match(number); else error; end case; END factor |
|
If-stmt → if ( exp ) statement ∣ if ( exp ) statement else statement | If-stmt → if ( exp ) statement [ else statement] | Procedure IfStmt ;
begin match( if); match((); exp; match()); statement; if token = else then match ( else); statement; endif end IfStmt; |
expr → expr addop term∣term | expr → term {addop term} | Procedure exp;
Begin term; while token = + or token = – do match(token); term; End while; End exp; |
(1) It may be difficult to convert a grammar in BNF into EBNF form;
(2) It is difficult to decide when to use the choice A →α and the choice A →β;if both α and β begin with non-terminals. (First Sets.)
(3) It may be necessary to know what token legally coming from the non-terminal A, A→ε. (Follow Sets.)
(4) It requires computing the First and Follow sets in order to detect the errors as early as possible. Such as “)3-2)”, (descend from exp to term to factor) 它需要计算第一组和第二组,以便尽早检测错误。例如“3-2)”,(从exp到term再到factor)
LL(k) grammars
LL(1)解析使用显式堆栈而不是递归调用来执行解析。
0 条评论