分类号学号 M201171697学校代码 10487密级 硕 士 学 位 论 文基于 OCR 的身份证识别系统学位申请人:凃益民学 科 专 业:通信与信息系统 指 导 教 师:尤新革答 辩 日 期:2014 年 2 月 15 日A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of EngineeringAn Identification Card Recognition System Based on OCRCandidate:Tu YiminMajor:CommunicationandInformationSystemSupervisor:You XingeHuazhong University of Science & Technology Wuhan 430074, P.R.ChinaFebruary,2014独创性声明本人声明所呈交的学位论文是我个人在导师的指导下进行的研究工作及取得的 研究成果。尽我所知,除文中己标明引用的内容外,本论文不包含任何其他人或集 体己经发表或撰写过的研究成果。对本文的研究做出贡献的个人和集体,均己在文 中以明确方式标明。本人完全意识到本声明的法律结果由本人承担。学位论文作者签名:日期:年月日学位论文版权使用授权书本学位论文作者完全了解学校有关保留、使用学位论文的规定,即:学校有权 保留并向国家有关部门或机构送交论文的复印件和电子版,允许论文被查阅和借 阅。本人授权华中科技大学可以将本学位论文的全部或部分内容编入有关数据库进 行检索,可以采用影印、缩印或扫描等复制手段保存和汇编本学位论文。保密,在年解密后适用本授权书。 本论文属于不保密。(请在以上方框内打“”)学位论文作者签名:指导教师签名:日期:年月日日期:年月日摘要身份证作为人口信息行之有效的管理工具,已经运用到了社会生活的方方面 面,身份证的信息获取具有十分重要的作用。目前,身份证中的个人信息录入大多 采用人工录入,或者利用设备读取其中的磁信号。人工录入方式不但耗时,效率低 下,并且容易因为人工输入原因产生信息错录入,造成不必要的损失;而利用设备 读取身份证磁信号则因公安机构许可和身份证消磁等原因无法广泛使用。如果能够 从图像处理的角度,让机器代替人工对图像的信息进行抓取并自动识别,将能解决 人工录入问题。光学字符识别(OCR)是近年来一个热门的研究方向,将 OCR 的方法运用到身 份证信息识别中面临的主要问题是:身份证图像背景复杂;存在汉字、符号、英文 等的混排;涉及汉字非常多。这对我们在图像预处理、字符切分以及汉字特征提取 等环节提出了非常高的要求。针对以上几个问题,我们首先对身份证进行版面分析,并针对身份证的特点, 在预处理环节采用假设选择滤波器对图像进行增强;其次,针对地址信息中存在的 混排以致难以切分,提出了基于汉字周期和识别反馈的切分方法,该方法通过分析 字符间距周期判定连通区域的类型,通过汉字部件合并后的识别反馈,完成对连通 区域合并;最后是识别切分出来的字符,本文采用了多级识别的框架,以汉字笔画 全穿过特征进行粗分,并在此基础上根据汉字的结构特征,提出提取汉字笔画半穿 过特征,并将全穿过、半穿过特征结合起来作为粗分的特征值,解决了单独使用全 穿过特征粗分能力不太强的问题,减少了细分的工作量,对粗分不能直接区分的汉 字,二级识别使用四角的能量值密度作为特征对汉字进行细分。实验表明,该方法具有较高的汉字识别率,该系统能够较好的完成身份证信息 识别。关键词:身份证 光学字符识别 图像增强 特征选择IAbstractAs an effective management tool for population information, ID card has been applied to all aspects of social life. The capture of the ID card information plays a very important role. At present, the personal information are entered manually or by use of equipment which reads the magnetic signal. Manual entry is not only time-consuming, inefficient, but also causes artificial wrong input, resulting in unnecessary losses. Magnetic equipment is not widely used due to the license of public security agencies and ID card degaussing. If computers can capture the identity information and recognize it automatically, we will be able to solve the input problem.Optical Character Recognition (OCR) is a hot research direction in recent years. The applying of OCR methods into ID card recognition will face the problems as following: Firstly, ID card has a complex background. Secondly, the Chinese character, English character and symbols are mixed. Thirdly, this system involves a lot of Chinese character. It raises very high demands during our image preprocessing, character segmentation and feature extraction of Chinese characters.To solve the questions above, we firstly make layout analysis of ID card, and propose a set of image enhancement methods for the layout features. Secondly in this paper, we introduce a novel approach based on character periods and recognition feedback to solve the segmentation problem for mixed character. In this approach, we analysis the character period to decide the type of the connected region. And combine the connected region by getting the feedback of the recognition of the combined region. Finally, we recognize the characters. In this paper, we adopt a multistage recognition framework. We roughly make the first recognition depending on traversing times of strokes which is full-breakthrough. And then introduce the half-breakthrough of strokes according to the Chinese character structure. Then these two features are joined together to implement the first recognition, resulting to have raised the efficiency for some Chinese characters, and reduce workload for later recognition. The energy-density of the four corners is used to perform the later recognition for the Chinese characters which can not be recognized in the first recognition.Experiments show that the method has a high rate of character recognition. The system can do a good job on ID card information identification.Keywords: ID Card, Optical Character Recognition, Image Enhancement, Feature Selection目录摘要IAbstractII1 绪论1.1 课题研究背景和意义 . (1)1.2 国内外研究现状. (3)1.3 本文主要研究内容 . (4)1.4 本文的组织结构. (5)2 身份证图像的预处理2.1 身份证图像预处理 . (6)2.2 彩色图像灰度化. (7)2.3 倾斜校正.
