第1页 / 共63页
第2页 / 共63页
第3页 / 共63页
第4页 / 共63页
第5页 / 共63页
第6页 / 共63页
第7页 / 共63页
第8页 / 共63页
第9页 / 共63页
第10页 / 共63页
中国科学技术大学 硕士学位论文 基于TANDEM的声学模型区分性训练在语音评测系统中的研究 姓名:龚澍 申请学位级别:硕士 专业:信号与信息处理 指导教师:刘庆峰 20100518 摘 要 摘摘 要要 近年来,以计算机辅助语言学习为代表的语音评测系统越来越多的运用在口 语考试和语言教学活动之中,不仅提高了评分工作的公正性、高效性,保证了考 试成绩的客观性,而且增强了教学反馈的及时性、准确性,激发了学生的学习兴 趣。 目前主流的语音评测系统采用的是基于 MFCC 特征的最大似然估计 MLE 建 模方式。这套方法虽然成熟可靠,但也存在着诸如易受模型假设错误的影响、对 模式的识别分类能力较差等缺点, 从而制约了系统评测性能的进一步提升。 因此, 本文考虑引入区分性训练技术和 TANDEM 特征,分别在声学模型训练准则和声 学特征两个方面对原有系统进行改进。 本文的结构如下: 第一章概述性地介绍了语音评测技术的发展背景,较为详细地说明了语音评 分系统和发音检错系统的基本原理和实现方式, 重点阐述了语音评测的识别理论 基础,包括声学特征、声学模型和语言模型等概念。 第二章首先通过对贝叶斯决策理论的叙述指出了传统的最大似然估计 MLE 准则存在的不足,在此基础上引入了声学模型区分性训练的思想。再经过对各种 区分性训练准则的目标函数和参数更新算法进行推导和比较, 将它们统一地纳入 到一套训练框架体系之中。之后,文章又分析了语音评测系统的各种度量得分与 不同区分性训练准则目标函数的对应关系, 从而为区分性训练的建模方式在语音 评测系统中的应用提供了理论基础。 第三章首先分析了 HMM/GMM 框架和 HMM/ANN 框架各自的优缺点, 之后 提出了一种综合了两者优点的特征变换前端处理技术TANDEM 方法,并将 其应用到普通话发音检错系统中。TANDEM 方法通过使用区分性训练的神经网 络去估计音素级后验概率,经过一系列后续处理将原始 MFCC 特征转化为 TANDEM 特征,作为基于 HMM 统计模型的评测系统的输入,进而完成评分或 检错的任务。实验结果证明,TANDEM 方法使系统的检错性能有了较大的提升, 结合 MLLR 等自适应方法的使用效果会更为明显。 第四章首先分析了 TANDEM 特征和区分性训练技术相结合的可能性,之后 介绍了英文评分系统的架构、评分特征和系统性能度量。最后搭建了 MFCC-MLE、TANDEM-MLE、MFCC-MPE、TANDEM-MPE 四个系统,分别用 Child 测试集和 Middle 测试集在不同配置的系统上进行测试,实验结果证明,基 于 TANDEM 的声学模型区分性训练技术是一种有效的切实可行的提高目前英文 发音评测系统性能的方法。 第五章对全文进行总结,指出不足之处和改进方向。 摘 要 关键词:关键词:语音评测系统 语音检错 语音评分 区分性训练 最小音素错误 TANDEM 多层感知器 ABSTRACT ABSTRACT In recent years, the speech assessment and evaluation systems with the represent of computer assisted language learning system are more and more applied in the oral exams and language learning activities. These systems can not only help teachers give scores of oral tests much more objectively and efficiently but also give students pronunciation proficiency evaluation immediately and accurately. Now most of speech assessment and evaluation systems use maximum likelihood estimation for providing estimates for the parameters of models based on MFCC Features. This popular statistical method has also some disadvantages. When there are confusable models or the training data is limited, it is unlikely to reach an optimization solution. To solve this problem, this thesis proposes discriminative training criterions and TANDEM feature which can improve the performance of the current speech evaluation system. The whole thesis is organized as follows: Chapter 1 gives a brief summary on the development and background of speech evaluation, then, we explain the basic principle and system structure for speech scoring system and speech error detection system respectively. Finally, we give introduction to some concept of speech recognition technology as the foundation of speech evaluation, such as acoustic features, acoustic model, language model and so on. Chapter 2 gives an overview on Bayesian decision theory firstly. To overcome the weakness of MLE, we bring discriminative training methods for hidden Markov models into speech evaluation system. Four typical discriminative training criterions and some updating methods of acoustic model parameters are introduced, then, they are defined in a unified framework. Finally, we analyze the relationship between the target of speech evaluation task and the objection function of each discriminative training criterion. This thesis proposes that the choice strategy of the discriminative function must be consistent with the measure of pronunciation evaluation. Chapter 3 compares HMM/ANN framework with HMM/GMM framework at first. HMM/ANN has the advantages in discriminative training abilities over HMM/GMM. However, incremental enhancements such as speaker adaptation and ABSTRACT discriminative parameter estimation were not easily implemented in it. In this work, we apply the TANDEM approach which combines neural-net discriminative feature processing with Gaussian-mixture distribution modeling to Mandarin speech error detection system. By training MLP network to estimate the probability distributions, then the error detection system based on HMM/GMM framework uses transformations of these estimates as the input features. In this chapter, the experiment results show a large improvement in error-detecting performance, especially using maximum likelihood linear regression adaptation. Chapter 4 gives an analysis on chance for combining TANDEM feature with discriminative training method, then, we introduce the system structure, scoring features and performance measurement for English speech scoring system. Finally, we design and build four systems, namely MFCC-MLE, TANDEM-MLE, MFCC-MPE and TANDEM-MPE. We test on them with Child data set and Middle data set. The experiment results show discriminative training based on TANDEM achieves the best evaluation performance which significantly outperforms MLE based on MFCC. Chapter 5 concludes the thesis. The possible improvements are also discussed here. Key Words: Speech Evaluation System, Speech Error Detection, Speech Scoring, Discriminative Training, Minimum Phone Error, TANDEM, Multi-Layer Perceptron. 插图索引 插图索引插图索引 图图 1.1 语音评分系统的结构图- 4 图图 1.2 发音检错系统的结构图-
收藏 下载该资源
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号