资源预览内容
第1页 / 共71页
第2页 / 共71页
第3页 / 共71页
第4页 / 共71页
第5页 / 共71页
第6页 / 共71页
第7页 / 共71页
第8页 / 共71页
第9页 / 共71页
第10页 / 共71页
亲,该文档总共71页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Chapter 2ScanningInstructor Jianhui YueSoftware College SCUcompilerscu21cn.com,Introduction,The scanning, or lexical analysis phase, reads the source program as a stream of characters and divides them into tokens.This is a special case of pattern matchingNeed to study methods of pattern specification and recognitionRegular expressionsFinite automataThe scanner must operate as efficiently as possible.,Introduction (cont ),Scanner,Parser,Semantic Analyzer,Source Code Optimizer,Code Generator,Target Code Optimizer,Source Code,Target Code,Tokens,Syntax Tree,Annotated Tree,Intermediate code,Target Code,Literal Table,Symbol Table,Error Handler,Tokens,Logical units that the scanner generates.Example: aindex = 4 + 2; a identifier left bracket index identifier right bracket = assignment 4 number + plus sign 2 number,Tokens (cont),The tokens fall into three categoriesReserved words: if , then, etc.Special symbols: +, -, ;, etc. Numbers and identifiers: 3.14, myName, myGrade, etc.Literals: “Hello, world!”, a, b, etc.Tokens are usually defined as an enumerated typetypedef enum IF, THEN, ELSE, PLUS, MINUS, NUM, ID, TokenType;,Token Attributes,The string of characters represented by a token is called its string value or its lexeme.Token must be distinguished from the string of characters it represent.Some tokens have one lexemeReserved words and special charactersPotentially, a token can represent infinitely many lexemes.ID token represents all identifiers.Any value associated with a token is called an attribute.Lexeme, numerical value for a NUM token.,Token Record,Token attributes are collected into a single structured data type.,typedef struct TokenType tokenval; char * stringval; int numval; TokenRecord;,Regular Expressions,Regular expressions represent patterns.A set of strings that matches a regular expression r is called the language generated by r and written as L(r).The set of characters or symbols that are available is called the alphabet. The alphabet is denoted by .A regular expression contains symbols of , but they indicate a pattern.Pattern will be written in boldfaceA regular expression may contain metacharacters or metasymbols,Basic Regular Expressions,These are just the single characters from the alphabet, which match themselves.r = a, L(a)=a denotes an empty string: L()= matches no string: L( ) = .,Token Function,The scanner returns the next token from the input on demand via a function TokenType getToken( );The function computes additional attributes as well.The string of input characters is not passed as a parameter, but is kept in a buffer provided by the system facilities.,Regular Expression Operations,Choice among Alternatives,Let r and s be regular expressions.r|s is a regular expression that matches any string that matched either by r or by s.L(r|s) = L(r) U L(s)Examples: a|b matches either a or b. L(a|b) = a,b.a| matches either a or . L(a|) = a, .L(a|b|c|d) = a, b, c, d,Concatenation,Examples:ab matches the only string ab.(a|b)c matches either ac or bc.Given two sets of string S1 and S2, the concatenated set of strings S1S2 is the set of string of S1 appended by all the strings of S2.Example: S1=aa, b and S2=a, bbS1S2=aaa, aabb, ba, bbbL(rs) = L(r)L(s), where r and s are two regular expressions.,Repetition,Also called (Kleene) closure.r* matches any finite concatenation of strings, each of which matches r.Example: a* matches , a, aa, aaa, aaaa, L(r*) = L(r)*Example: L(a|bc)*) = L(a|bc)* = a, bb* =, a, bb, aa, abb, bbbb, aaa, aabb, abba, abbbb, bbabb, bbbba, bbbbbb,Precedence of Operations,Repetition Concatenation AlternativeExample: a|cb* is interpreted as a|(c(b*)Parentheses are used to indicate the precedence as usual.,Highest,Lowest,Names for Regular Expressions,Often,it is helpful to use regular definitions of names.Example: digit=0|1|2|9The name becomes a metasymbol and should be distinguished from the string (digit in the example).We can use names in regular expressions like digit digit*,Example 2.1, = a, b, cFind a regular expression for the strings of that contain exactly one b.(a|c)*b(a|c)*b does not have to be a center of a matched string.,Example 2.2, = a, b, cFind a regular expression for the strings of that contain at most one b.We can use a solution to the previous example and add another case matching the strings with no b. (a|c)*|(a|c)*b(a|c)* A|B A: no b B : Just only one b Another solution: (a|c)*(b|)(a|c)* (Its structure is similar to 2.1),Example 2.3, = a, bConsider a set of strings consisting of a single b surrounded by the same number of as.S = b, aba, aabaa, aaabaaa, Regular expression a*ba* does not work.This set of strings cannot be described by a regular expression.This can be proved using a famous theorem called the pumping lemma.,
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号