资源预览内容
第1页 / 共87页
第2页 / 共87页
第3页 / 共87页
第4页 / 共87页
第5页 / 共87页
第6页 / 共87页
第7页 / 共87页
第8页 / 共87页
第9页 / 共87页
第10页 / 共87页
亲,该文档总共87页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
本科毕业论文基于数据挖掘的纳税人预警监控系统预处理模块和 X-Means 算法改进Early-warning Supervisory System of Taxpayers Based on Data MiningImplementation of Data Pre-processing Module and Improvement of the X-Means Algorithm姓名: 学号: 学院:软件学院系:软件工程专业:软件工程年级: 指导教师: 二XX 年 X 月摘要许多国家和地区每年都会因为纳税人的偷税漏税问题而损失大量的财政收入,税务稽查部门一直以来都致力于解决这方面的问题。科技的发展使得一些先进的数据库和信息存储工具用于税收数据的录入、存储、统计和检索等。随着税收信息化工作的深入,税务部门积累了海量的业务明细数据,这其中包含着大量对决策有价值的信息。但没有强有力的分析工具,理解这些存放在大型和大量数据库中的海量数据已经远远超出了人类的能力,很有可能会使这些系统和数据变成一个个“信息孤岛”和“数据坟墓”。 因此,将数据挖掘技术应用于对纳税人进行预警和监控这一全新的领域,对税务系统中积累的海量数据进行挖掘,从中提取对决策有价值的信息,解决数据和信息之间的鸿沟,将“数据坟墓”转换成知识“金块”就显得很有必要。本文首先对课题的研究背景及实际意义、国内外研究现状以及存在的问题本文的研究内容以及特色等做了介绍,并简单说明了论文的组织结构。然后,本文阐述了纳税人预警监控系统和数据挖掘子系统的需求,对用于挖掘工作的原始数据作了详细说明,并且分析了数据挖掘子系统的系统结构。接着,详细阐述了数据预处理模块的实现过程,包括数据集成和选择、数据清洗和数据变化算法的设计和实现。进而,本文介绍了 X-Means 算法的思想,对其做了改进,并且分析了算法在不同数据源上进行数据挖掘以及算法改进前后所得挖掘结果的不同。采用改进后的 X-Means 算法对经过预处理的数据进行挖掘,得到的结果能清楚的把那些有购电,但 XSE=0 且 SE=0 的有重大偷税嫌疑的纳税户;有偷税漏税嫌疑,但嫌疑不重大的纳税户;纳税记录优良、纳税额高于同行业平均水平,需要提供适当税收扶持的纳税户和没有严重纳税指标异常,只需进行日常征管的纳税户分离出来,这些纳税户分别占总量的 1%、6%、0%和 93%。关键词:数据挖掘;数据预处理;X-Means 算法AbstractMany countries and regions bear significant loss of fiscal revenue because of the taxpayers tax evasion every year. Tax inspection departments have been committed to solve this problem. The development of technology makes some of the advanced databases and information storage tools used in the entry, storage, statistic and retrieval of tax datas. As the deepening of taxation information, The tax department has accumulated vast amounts of detailed business data, which includes a large number of valuable information for decision-making. But without Powerful analytical tools its impossible for people to comprehend these massive data which store in a large number of different data banks. So its in all probability that the these systems and data will turn into information islands and data graveyards one by one. Thus, its necessary to use data mining technology into the new are of Taxpayers Early-warning and Monitoring system. What we need to do is extract valuable information for decision-making from the vast amounts of data accumulated by the tax departments, fill in the gap between data and information, turn the data graveyard into “Nuggets of knowledge”.Firstly, this thesis illustrated the background and significance of this research, the status quo and existing problems of related researches at home and abroad. The main contents and characters as well as the arrangements of the thesis were presented after that. Then, the thesis introduced the requirement of the Taxpayers early-warning and monitoring system and the data mining subsystem. Explained the raw data we used for our mining process in detail, and then analysised the structure of the data mining subsystem. Thirdly, fully introduced the implementation procedure of the data pre-processing module, including the design and implement of data Integration and choice, data cleaning and data changing algorithms. In addition, the thesis introduced the X-Means algorithm and improved it. Analysised the difference between mining on data before pre-processing and data after pre-processing as well as the differet result made by X-Means algorithm before and after improvement.Using the improved X-Means algorithm to mining the pre-processed data, we canclassified the taxpayers needed to be focus on, spot check, support, administrate as usual clearly. These types of taxpayers separately account for 1%, 6%, 0% and 93% of the total.Key words: Data Mining; Data Pre-processing; X-Means Algorithm目录第一章 绪论.11.1研究背景及选题意义.11.2研究现状及存在问题.21.3主要研究内容及特色.51.4论文组织结构.6第二章需求分析与系统结构.82.1纳税人预警监控系统.82.1.1系统概述.82.1.2系统功能.92.1.3数据说明.102.2数据挖掘子系统的需求分析.222.2.1数据挖掘子系统概述.222.2.2数据挖掘子系统的需求.
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号