第五部分LL1文法及分析程序教学课件-

第五章第五章 LL(1)LL(1)文法及其分析程序文法及其分析程序z5.1 5.1 预测分析预测分析程序程序z5.2 5.2 LL(1)LL(1)文法文法z FIRSTFIRST和和FOLLOWFOLLOW集定义和计集定义和计算算z LL(1) LL(1) 文法定义文法定义z LL(1)LL(1)分析程序的生成分析程序的生成z5.3 5.3 非非LL(1)LL(1)文法的改造文法的改造1自上而下分析算法自上而下分析算法要点：.由根向下构造语法树.构造最左推导.推导出的终结符是否与当前输入符匹配 S aaab A Ba AS ABA aA | B b | bBaaab.S AB S AB aAB A aA aaAB A aA aaaAB A aA aaa B A aaab B b2带回溯的自上而下分析自上而下分析S ABA aA | B b | bBa a a b b.S(1) A. S AB(2) aA. A aA(3) aaA. A aA (4) aaaA. A aA (5) aaa B. A (6) aaab B baaabb.S(1) A. S AB (2) aA. A aA (3) aaA. A aA (4) aaaA. A aA (5) aaa B A (6) aaa b B B bB(7) aaabb B b 3预测分析程序Predictive parser无回溯的自顶向下分析程序特征根据下一个输入符号为当前要处理的非终结符选择产生式要求文法是LL(1)的第一个L 从左到右扫描输入串第二个L 生成的是最左推导 1 向前看一个输入符号（lookahead)预测分析程序的实现技术 1 递归下降子程序 2 表驱动分析程序4PL/0PL/0语言的语言的EBNFEBNFv程序程序=分程序分程序. .v分程序分程序=常量说明部分常量说明部分变量说明部变量说明部分分过程说明部分过程说明部分语句语句v常量说明部分常量说明部分= =CONSTCONST常量定义部分常量定义部分，常，常量量定义定义；v变量说明部分变量说明部分=VARVAR标识符标识符，标识符，标识符；v过程说明部分过程说明部分= = PROCEDURE PROCEDURE 标识符分程序标识符分程序；过程说明部分；过程说明部分；v语句语句= = 标识符：标识符：= =表达式表达式 | |IF IF 条件条件 thenthen语句语句| |CALL|READ|BEGIN CALL|READ|BEGIN 语句语句；语；语句句 END|WHILE|END|WHILE|5begin(*statement*) if sym=ident then (*parsing ass.st.*) begin getsym; if sym=becomes then getsym else error(13); expression(fsys); endelse if sym=readsym then(* parsing read st.*)begin getsym; if symlparen then error(34) else repeat getsym; if sym ident then error(35) else getsym until symcomma; if symrparen then error(33);end6递归下降子程序递归下降子程序program function_listfunction_list function function_list | function FUNC identifier ( parameter_list ) statementvoid ParseFunction()MatchToken(T_FUNC);ParseIdentifier();MatchToken(T_LPAREN);ParseParameterList();MatchToken(T_RPAREN);ParseStatement();7void MatchToken(int expected)if (lookahead != expected) printf(syntax error n);exit(0); else / if match, consume token and move onlookahead = yylex();8例：递归子程序实现例：递归子程序实现表达式的语法分析表达式的语法分析表达式的表达式的EBNF表达式表达式=+|-=+|-项项（+|-+|-）项）项项项=因子因子（*|/*|/）因子）因子因子因子=identident| |numbernumber|（表达式表达式）9zprocedure procedure exprexpr; ;beginbegin if sym in plus, minus then if sym in plus, minus then begin begin getsym getsym; term; ; term; end endelse term;else term; while sym in plus, minus do while sym in plus, minus do begin begin getsym getsym; term; term; end endend;end;10 Procedure term;Procedure term; begin factor; begin factor; while sym in times,slash do while sym in times,slash do begin begin getsymgetsym;factor end;factor endend;end;11 Procedure factor;Procedure factor; begin if sym= begin if sym=identident then then getsymgetsym elseelse if sym=number if sym=number then then getsymgetsym else else if sym=( if sym=( then begin then begin getsym getsym; ; expr expr; ; if sym=) if sym=) then then getsymgetsym else errorelse error end end else error else error end; end; 12表驱动予测分析程序模型予测分析程序模型 Input Input #总控程序总控程序预测分析表预测分析表stack13 带带 a0 a1 a2 a3 a4 a5 a6 a7 a8 an-1 an 有限控制器有限控制器磁头磁头识别程序的数学模型下推自下推自动机动机14上下文无关语言句型分析（识别）程序的数学模型下推自动机下推自动机PdaPda=(K,f,H,h=(K,f,H,h0 0,S,Z),S,Z) H: H:下推栈符号的有穷字母表下推栈符号的有穷字母表 h h0 0 :H :H中的初始符号中的初始符号 f: K f: K （ ) H H K K H H* * PdaPda的一个组态是的一个组态是K K * * H H 中的一个中的一个（k,w,k,w, ) k:) k:当前状当前状态，态，w:w:余留输入串，余留输入串，：栈中符号，最左边的符号在栈顶。：栈中符号，最左边的符号在栈顶。PdaPda的一次移动用组态表示的一次移动用组态表示终止和接受的条件终止和接受的条件: : 1.1.到达输入串结尾时，处在到达输入串结尾时，处在Z Z中的一个状态中的一个状态或或 2. 2.某个动作序列导致栈空时某个动作序列导致栈空时 15例：例：PdaPda P=(A,B,C),a,b,c),f,h,i,i P=(A,B,C),a,b,c),f,h,i,i A, ) A, )f(A,a,i) = (B,h) f(B,a,h) = (B,f(A,a,i) = (B,h) f(B,a,h) = (B,hhhh) ) f(C,b,h) = (C, f(C,b,h) = (C, ) ) f(A,c,i) = (A, f(A,c,i) = (A, ) ) f(B,c,h) = (C,h) f(B,c,h) = (C,h) 接受输入串接受输入串aacbbaacbb的过程的过程(A,aacbb,i) 读a, pop i, push h, goto B (B,acbb,h) 读a, pop h, push hh, goto B (B,cbb,hh) 读c, pop h, push h , goto C (C,bb,hh) 读b, pop h, push , goto C (C,b,h) 读b ,pop h, push , goto C (C, , , ) )16 G E: (1) E TE (2) E +TE (3) E (4) T FT (5) T *FT (6) T (7) F (E) (8) F a 17 a + * ( ) # E (1) (1) E (2) (3) (3) T (4) (4) T (6) (5) (6) (6) F (8) (7)G E: (1) E TE (2) E +TE (3) E (4) T FT (5) T *FT (6) T (7) F (E) (8) F a18分析算法分析算法 zBEGINBEGINz 首先把首先把#然后把文法开始符号推入栈；把第一个输然后把文法开始符号推入栈；把第一个输入符号读进入符号读进b; FLAGb; FLAG：=TRUE=TRUE；zWHILE FLAG DOWHILE FLAG DO BEGIN BEGINz 把栈顶符号上托出去并放在把栈顶符号上托出去并放在中；中；z IF X IF X Vt Vt THEN IF X=b THENTHEN IF X=b THENz 把下一把下一个输入符号读进个输入符号读进a az ELSE ERROR ELSE ERROR ELSE IF X=# THEN ELSE IF X=# THENz IF X=b THEN FLAG:=FALSE IF X=b THEN FLAG:=FALSEz ELSE ERROR ELSE ERRORz ELSE IF ELSE IF X,b=X X,b=X X X1 1X X2 2.X.XK K z THEN THEN 把把X XK K，X X K-1K-1,.,X,.,X1 1一一推进栈一一推进栈 z ELSEELSEERRORERRORz END OF WHILE; END OF WHILE;zSTOP/*STOP/*分析成功，过程完毕分析成功，过程完毕* *zENDEND19分析输入串#a+a#栈内容栈顶符号当前输入余留串 MX,b 1 #E E a +a# E TE2 #ET T a +a# T FT3 #ETF F a +a# F a4 #ETa a a +a#5 # ET T + a# T 6 #E E + a# E +TE7 #ET+ + + a#8 # ET T a # T FT 9 #ETF F a # F a10 #ETa a a #11 #ET T # T 12 #E E # E 13 # # #20FIRSTFIRST集和集和FOLLOWFOLLOW集的定义集的定义设设G=(VG=(VT T,V,VN N,P,P，S)S)是上下文无关文法是上下文无关文法FIRSTFIRST（）=a|=a| =* a a ,a,aV VT T, , , , VV* * 若若 =* 则规定则规定FRISTFRIST（）FOLLOWFOLLOW（A A）=a=aS S =* A A 且且a a FRISTFRIST（），）， VV* *， VV+ + 若若S S =* u A u A , ,且且 =* ，则则#F FOLLOW(A)OLLOW(A)LL (1) LL (1) 文法文法21计算计算FIRSTFIRST集集1.1.若若X X V V , ,则则FIRST(X)=XFIRST(X)=X2.2.若若X X V VN N, ,且有产生式且有产生式X Xa a，则把则把a a加入到加入到FIRST(X)FIRST(X)中中; ;若若X X 也是一条产生式也是一条产生式, ,则把则把也加到也加到FIRST(X)FIRST(X)中中. .3 3. .若若X XY Y是一个产生式且是一个产生式且Y Y V VN N, ,则把则把FIRST(Y)FIRST(Y)中的所有非中的所有非元元素都加到素都加到FIRST(X)FIRST(X)中中; ;若若X X Y Y1 1Y Y2 2Y YK K 是一个产生式是一个产生式, ,Y Y1 1,Y,Y2 2, ,Y,Y(i-1)(i-1)都是非终结符都是非终结符, ,而且而且, ,对于任何对于任何j,1j,1j j i-i-1, 1, FIRST(FIRST(Y Yj j) )都含有都含有 ( (即即Y Y1 1.Y.Y(i-(i-1) 1) =* ), ),则把则把FIRST(FIRST(Y Yj j) )中的所有非中的所有非元素元素都加到都加到FIRST(X)FIRST(X)中中; ;特别是特别是, ,若所有的若所有的FIRST(FIRST(Y Yj j , , j=1,2,K)j=1,2,K)均含有均含有 , ,则把则把加到加到FRIST(X)FRIST(X)中中. . 22计算计算FOLLOWFOLLOW集集1.1.对于文法的开始符号对于文法的开始符号S,S,置置# #于于FOLLOW(S) FOLLOW(S) 中中; ;2.2.若若 B B 是一个产生式是一个产生式, ,则把则把 FIRST(FIRST() 加至加至FOLLOW(B)FOLLOW(B)中中; ;3.3.若若 B B是一个产生式是一个产生式, ,或或 BB是是一个产生式而一个产生式而 =* ( (即即 FIRST(FIRST() )），），则把则把FOLLOWFOLLOW（A A）加至加至FOLLOWFOLLOW（B B）中中23 一个文法一个文法G G是是LLLL（1 1）的，当且仅当对于的，当且仅当对于G G的每一个的每一个非终结符的任何两个不同产生式非终结符的任何两个不同产生式，下面的条件成立：下面的条件成立：FIRSTFIRST（）FIRST(FIRST()=)= , ,也就是也就是和和推导不出以同一个终结符推导不出以同一个终结符a a为首的符号串；它们不为首的符号串；它们不应该都能推出空字应该都能推出空字 . .假若假若 =* ，那么，那么， FIRST FIRST（)FOLLOW)FOLLOW（A A） . . 也就是，也就是，若若 =* . .则则所能推出的串的首符号不应在所能推出的串的首符号不应在FOLLOW(AFOLLOW(A）中中24G E: (1) E TE (2) E +TE (3) E (4) T FT (5) T *FT (6) T (7) F (E) (8) F a各非终结符的FIRST集合如下：FIRST(E)=(,iFIRST(E)=+,FIRST(T)=(,iFIRST(T)=*,FIRST(F)=(,i各非终结符的FOLLOW集合为：FOLLOW(E)=),FOLLOW(E)=)，FOLLOW(T)=，)，FOLLOW(T)=,)，# FOLLOW(F)=*,，),# 25G E: (1) E TE (2) E +TE (3) E (4) T FT (5) T *FT (6) T (7) F (E) (8) F aE +TE | FIRST(+TEFIRST(+TE)=)=+ + FOLLOW(E)= FOLLOW(E)=) )，T *FT | FIRST(*FTFIRST(*FT)=)=* * FOLLOW(T)= FOLLOW(T)=+,)+,)，F (E) | a FIRST( (E)=FIRST( (E)=( ( FIRST( a)= FIRST( a)=a a所以所以GEGE是是LLLL（1 1）的的26予测分析表构造算法予测分析表构造算法1.1.对文法对文法G G的每个产生式的每个产生式执行第二步执行第二步和第三步；和第三步；2.2.对每个终结符对每个终结符a a FIRST(FIRST( ) )，把把加加至至 A,aA,a中，中，3.3.若若 FIRST(FIRST( ) )，则对任何则对任何b b FOLLOW(A)FOLLOW(A) 把把加至加至 A,bA,b中，中，4.4.把所有无定义的把所有无定义的 A,aA,a标上标上“出错标志出错标志”。可以证明，一个文法可以证明，一个文法G G的予测分析表不含多重的予测分析表不含多重入口，当且仅当该文法是入口，当且仅当该文法是LL(1)LL(1)的的27LL(1)LL(1)文法的性质文法的性质： LL(1) LL(1)文法是无二义的文法是无二义的 LL(1)LL(1)文法不含左递归文法不含左递归非非LL(1)LL(1)文法的改造文法的改造消除左递归消除左递归提左公因子提左公因子将产生式将产生式 | 变换为：变换为： B B B B | 28EE+TTTT*FFFi(E)FIRST(E)=(,iFIRST(T)=(,iFIRST(F)=(,i消左递归 E TE E +TE E S S ifif C t S | C t S | ifif C t S e S C t S e SC bC b提左因子提左因子 S S ifif C t S A C t S A A e S | A e S | First First集集 FollowFollow集集S S ifif #,e #,eA e, A e, #, eC b tMA,e=A e S A e S A A 29LL(1)分析中的一种错误处理办法发现错误发现错误1栈顶的终结符与当前输入符不匹配2非终结符A于栈顶，面临的输入符为a，但分析表M的MA,a为空“应急应急”恢复策略恢复策略跳过输入串中的一些符号直至遇到“同步符号”为止。同步符号的选择同步符号的选择1把FOLLOW(A)中的所有符号作为A的同步符号。跳过输入串中的一些符号直至遇到这些“同步符号”，把A从栈中弹出，可使分析继续2把FIRST(A)中的符号加到A的同步符号集，当FIRST(A)中的符号在输入中出现时，可根据A恢复分析30review-parsingThe syntax analysis phase of a compiler verifies that the sequence of tokens returned from the scanner represent valid sentences in the grammar of the programming language. There are two major parsing approaches: top-down and bottom-up. In top-down parsing, you start with the start symbol and apply the productions until you arrive at the desired string. In bottom-up parsing, you start with the string and reduce it to the start symbol by applying the productions backwards. 31In the top-down parsing,we begin with the start symbol and at each step, expand one of the remaining nonterminals by replacing it with the right side of one its productions.We repeat until only terminals remain. The top-down parse prints a leftmost derivation of the sentence.A bottom-up parse works in reverse. We begin with the sentence of terminals and each step applies a production in reverse, replacing a substring that matches the right side with the nonterminal on the left. We continue until we have substituted our way back to the start symbol. If you read from the bottom to top, the bottom-up parse prints out a rightmost derivation of the sentence.32 lookahead symbol The lookahead symbol is the next symbol coming up in the input. backtracking. Based on the information the parser currently has about the input, a decision is made to go with one particular production. If this choice leads to a dead end, the parser would have to backtrack to that decision point, moving backwards through the input, and start again making a different choice and so on until it either found the production that was the appropriate one or ran out of choices.33predictive parser and LL(1)grammarPredictive parser is a non-backtracking top-down parser. A predictive parser is characterized by its ability to choose the production to apply solely on the basis of the next input symbol and the current nonterminal being processed. To enable this, the grammar must take a particular form. We call such a grammar LL(1). The first “L” means we scan the input from left to right; the second “L” means we create a leftmost derivation; and the 1 means one input symbol of lookahead.34recursive-descentThe first technique for implementing a predictive parser is called recursive-descent. A recursive descent parser consists of several small functions(procedures), one for each nonterminal in the grammar. As we parse a sentence, we call the functions (procedures) that correspond to the left side nonterminal of the productions we are applying. If these productions are recursive, we end up calling the functions recursively.35Table-driven LL(1) parsing In a recursive-descent parser, the production information is embedded in the individual parse functions for each nonterminal and the run-time execution stack is keeping track of our progress through the parse. There is another method for implementing a predictive parser that uses a table to store that production along with an explicit stack to keep track of where we are in the parse.36 How a table-driven predictive parser works We push the start symbol on the stack and read the first input token. As the parser works through the input, there are the following possibilities for the top stack symbol X and the input token nonterminal a:1. If X = a and a = end of input (#): parser halts and parse completed successfully2. If X = a and a != #: successful match, pop X and advance to next input token. This is called a match action.3. If X != a and X is a nonterminal, pop X and consult table at X,a to see which production applies, push right side of production on stack. This is called a predict action.4. If none of the preceding cases applies or the table entry from step 3 is blank, there has been a parse error37The first set of a sequence of symbols u, written as First(u ) is the set of terminals whichstart all the sequences of symbols derivable from u. A bit more formally, consider allstrings derivable from u by a leftmost derivation. If u =* v , where v begins with someterminal, that terminal is in First(u). If u =* , then is in First(u ).38The follow set of a nonterminal A is the set of terminal symbols that can appear immediately to the right of A in a valid sentence. A bit more formally, for every valid sentence S =*uAv , where v begins with some terminal, that terminal is in Follow(A).39Computing first To calculate First(u) where u has the form X1X2.Xn, do the following:1. If X1 is a terminal, then add X1 to First(u), otherwise add First(X1) - to First(u ) .2. If X1 is a nullable nonterminal, i.e., X1 =* , add First(X2) - to First(u). Furthermore, if X2 can also go to , then add First(X3) - and so on, through all Xn until the first nonnullable one.3. If X1X2.Xn =* , add to the first set.40Calculating follow sets. For each nonterminal in the grammar, do the following:1. Place# in Follow(S) where S is the start symbol and # is the inputs right endmarker.The endmarker might be end of file, it might be newline, it might be a special symbol, whatever is the expected end of input indication for this grammar. We will typically use # as the endmarker.2. For every production A uBv where u and v are any string of grammar symbols and B is a nonterminal, everything in First(v) except is placed in Follow(B).3. For every production A uB, or a production A u Bv where First(v ) contains (i.e. v is nullable), then everything in Follow(A) is added to Follow(B).41 Constructing the parse table1. For each production A u of the grammar, do steps 2 and 32. For each terminal a in First(u), add A u to MA,a3. If in First(u), (i.e. A is nullable) add A u to MA,b for each terminal b in Follow(A), If in First(u), and # is in Follow(A), add A u to MA,#4. All undefined entries are errors42LL(1) grammarA grammar G is LL(1) iff whenever A u | v are two distinct productions of G, the following conditions hold:- for no terminal a do both u and v derive strings beginning with a (i.e. first sets are disjoint)- at most one of u and v can derive the empty string- if v =* then v does not derive any string beginning with a terminal in Follow(A)43