Python 之【re模块的正则表达式学习】-

1 Python 之【re模块的正则表达式学习】摘要： re模块包括操作正则表达式的函数，一些工作中都需要用到，现在说明下使用方法。使用说明：一，re模块下的函数：函数描述compile(pattern)创建模式对象search(pattern,string)在字符串中寻找模式match(pattern,string)在字符串开始处匹配模式split(pattern,string)根据模式分割字符串findall(pattern,string)列表形式返回匹配项sub(pat,repl,string)pat匹配想用repl替换escape(string)特殊字符转义1，compile： import re pat=re.compile(A) m=pat.search(CBA) print m #匹配到了，返回MatchObject（True） m=pat.search(CBD) print mNone #没有匹配到，返回None（False）#上面的等价于 re.search(A,CBA)#推荐都用第一种方法说明：将正则表达式转换为模式对象，实现更有效率的匹配，因为其他的函数会在内部进行转换。2，search： m = re.search(asd,ASDasd) print m #匹配到了，返回MatchObject（True） m = re.search(asd,ASDASD) print mNone #没有匹配到，返回None（False）说明：在给定的字符串中寻找第一个匹配给正则表达式的子字符串，有多个也只返回第一个出现的。3，match： m = re.match(a,Aasd) print mNone #没有匹配到，返回None（False m = re.match(a,aASD) print m #匹配到了，返回MatchObject（True）可以用第一个方法：compile pat=re.compile(a) printpat.match(Aasd)None printpat.match(aASD)说明：在给定的字符串的开头匹配正则表达式。上面的函数返回都可以在if条件语句中进行判断： if pat.search(asd):. print OK. OK #找到返回 if re.search(a,ASD):. print OK. #没有找到4，split： re.split(,a,s,d,asd)a, s, d, asd #返回列表 pat = re.compile(,) pat.split(a,s,d,asd)a, s, d, asd #返回列表 re.split(, +,a , s ,d ,asd) #正则匹配：, +，后面说明a, s, d, asd re.split(, +,a , s ,d ,asd,maxsplit=2) # maxsplit 最多分割次数a, s, d ,asd pat = re.compile(, +) #正则匹配：, +，后面说明 pat.split(a , s ,d ,asd,maxsplit=2) # maxsplit 最多分割次数a, s, d ,asd说明：根据模式的匹配项来分割字符串，类似字符串的split 的方法，但是字符串的split只能用固定长度的分割符，而re.split允许用任意长度和个数的分割符。5，findall： re.findall(a,ASDaDFGAa) a, a #列表形式返回匹配到的字符串 pat = re.compile(a) pat.findall(ASDaDFGAa)a, a #列表形式返回匹配到的字符串 pat = re.compile(A-Z+) #正则匹配：A-Z+ 后面有说明 pat.findall(ASDcDFGAa)ASD, DFGA #找到匹配到的字符串 pat = re.compile(A-Z) pat.findall(ASDcDFGAa) #正则匹配：A-Z+ 后面有说明A, S, D, D, F, G, A #找到匹配到的字符串 pat = re.compile(A-Za-z) #正则匹配：A-Za-z+ 匹配所有单词，后面有说明 pat.findall(ASDcDFGAa)A, S, D, c, D, F, G, A, a 说明：列表形式返回给定模式的匹配项。返回所有匹配的字符串。6，sub： re.sub(a,A,abcasd) #找到a用A替换，后面见和group的配合使用AbcAsd pat = re.compile(a) pat.sub(A,abcasd)AbcAsd#通过组进行更新替换： pat=re.compile(rwww.(.*).3) #正则表达式 pat.match(www.dxy.com).group(1)dxy pat.sub(r1,hello,www.dxy.com) #通过正则匹配找到符合规则的”www.dxy.com“ ，取得组1字符串去替换整个匹配得到字符串。dxy - www.dxy.comhello,dxy pat=re.compile(r(w+) (w+) #正则表达式 s=hello world ! hello hz ! pat.findall(hello world ! hello hz !)(hello, world), (hello, hz) pat.sub(r2 1,s) #通过正则得到组1(hello)，组2(world)，再通过sub去替换。即组1替换组2，组2替换组1，调换位置。 world hello!hz hello!说明：使用给定的内容替换掉找到的字符串。a-A dir(m)_class_, _copy_, _deepcopy_, _delattr_, _doc_, _format_, _getattribute_, _hash_, _init_, _new_, _reduce_, _reduce_ex_, _repr_, _setattr_, _sizeof_, _str_, _subclasshook_, end, endpos, expand, group, groupdict, groups, lastgroup, lastindex, pos, re, regs, span, start, string7，escape： re.escape(www.dxy.cn)www.dxy.cn #转义说明：对字符串里面的特殊字符串进行转义。上面的函数中，只有match、search有group方法，其他的函数没有。二，函数的方法： dir(m)_class_, _copy_, _deepcopy_, _delattr_, _doc_, _format_, _getattribute_, _hash_, _init_, _new_, _reduce_, _reduce_ex_, _repr_, _setattr_, _sizeof_, _str_, _subclasshook_, end, endpos, expand, group, groupdict, groups, lastgroup, lastindex, pos, re, regs, span, start, string这里介绍几个：方法描述group获取子模式(组)的匹配项start给定组匹配项的开始位置end给定组匹配项的结束位置span给定组匹配项的开始结束位置 pat = re.compile(rwww.(.*).(.*) #用()表示一个组，2个组 m = pat.match(www.dxy.com) m.group() #默认为0，表示匹配整个字符串 www.dxy.com m.group(1) #返回给定组1匹配的子字符串dxy m.group(2)com m.start(2)