基于rbf神经网络的语音识别研究-

河北工业大学硕士学位论文基于RBF神经网络的语音识别研究姓名：郑肖霞申请学位级别：硕士专业：通信与信息系统指导教师：王霞 20070101 河北工业大学硕士学位论文 i 基于基于 RBF 神经网络的语音识别研究神经网络的语音识别研究摘摘要要语音识别由于其重要的理论价值与广阔的应用前景，受到人们的广泛重视。到目前为止，语音识别研究大部分以线性系统理论为基础，随着研究的逐步深入，发现语音识别若要取得突破，必须引入非线性理论的方法。从 20 世纪 80 年代开始，随着人工神经网等非线性理论研究和应用的逐渐深入，将这些理论应用于语音识别成为可能。 RBF(Radial Basis Function, RBF)神经网络为多层前馈式神经网络的学习提供了一种新颖而有效的手段，它的研究和应用在近年来得到了迅速的发展。本文基于 RBF 神经网络，对语音识别的预处理、特征提取与识别算法等环节进行了计算验证，性能分析和结果评述。基本的 RBF 神经网络是一种三层前馈网络，其收敛速度大大高于一般的 BP 网络，且网络拓扑可以在算法中确定。设计中存在的主要问题包括隐层神经元数、中心和半径的确定，以及网络权值的训练。本文采用的网络构建方法为：采用竞争算法和聚类算法相结合的混合算法动态选择隐层神经元数；用梯度下降法找出使代价函数最小的权值参数；从节省资源的角度出发，本文采用了 Akaike 的最终预报误差标准 FPE 删除那些对网络输出贡献较小的节点以取得网络精度与复杂度的平衡，直至 FPE 不再下降，停止筛选并计算网络最优权值，从而得到了一个合理的网络。另外还实现常用的迭代法、随机固定法训练的径向基网络结构和概率神经网络结构。文中用双门限法端点检测后用 Mel 频率倒谱系数 MFCC 提取语音特征参数，动态时间规整后输入构建好的 RBF 网络结构，用训练数据进行学习网络，网络训练完毕后将测试样本输入已训练的网络进行识别。对语音库中的纯净语音和带噪语音识别结果显示，改进后的 RBF 网络在识别率和识别速度上都有了显著提高。关键词：关键词：语音识别，径向基神经网络，竞争学习算法，梯度下降法，特征提取，删除策略基于 RBF 神经网络的语音识别研究 ii SPEECH RECOGNITION BASED ON RBF NEURAL NETWORK ABSTRACT Speech recognition has received more and more attention recently due to the important theoretical meaning and practical value. Up to now, most speech recognition is based on conventional linear system theory. With the deep study of speech recognition, nonlinear system theory method must be introduced. From the nineties of the twentieth century, with the development of nonlinear-system theories such as artificial neural networks (ANN), it is possible to apply these theories to speech recognition. Radial Basic Function Neural Network (RBFNN) offers a novel and effective way for studying as a feed forward multi-layered network, its study and application has developed rapidly Recently. This paper mainly studies speech recognition based on RBFNN. Computing validation performance analysis and results assessing are handled to each part of speech recognition process such as preprocessing, feature extraction and recognition algorithms. A RBFNN is a kind of feed forward network which basically involves 3 layers. It is proved that its speed of convergence is much quicker than general Back Propagation (BP) algorithm, and its structure can be fixed in the algorithm. The main problems in designing a RBFNN depend on fixing the nodes of the hidden layer, the parameters of the centers and the linear weights. This paper uses this method to construct RBFNN: Uses the combine of competition and clustering arithmetic to train the parameters of the hidden layer dynamically; Gradient-descent is used to train the weights which make the cost function minimized; To save the resource, Akaikes Final Frediction Error (FPE) standard is employed to delete the nodes that contribute little to the outputs of the network. This will balance the precision with the complexity of the network. Until the value of FPE no longer drops, a group of final optimum weights and a rational network have been found. Otherwise, this paper constructs RBFNN with Iterative method, Select fixed center random and Probabilistic Neural Networks (PNN). This paper uses VAD to detect speech point, uses Mel-Frequency Cepstrurn Coefficients (MFCC) to get speech characteristic parameter, uses Dynamic Time Warping (DTW) to adjust parameter finally. Then train the RBFNN with the train data. Finally input the test data to the learned network to recognize, recognition result based on pure speech and noisy speech samples 河北工业大学硕士学位论文 iii shows that this improved RBFNN achieves excellent performance in terms of recognition rate and recognition speed. KEY WORDS: speech recognition, RBF neural network, competitive learning algorithm, gradient-descent, feature extraction, delete policy 河北工业大学硕士学位论文 1 第一章第一章绪论绪论 1-1 语音识别的研究现状语音识别的研究现状语音识别作为一门交叉学科，涉及到了信号处理、统计模式识别、人工智能、计算机科学、语言学和认知科学等众多学科。随着这些学科的快速发展，语音识别技术在近几十年里取得了长足的进步。语音识别的研究起于 20 世纪 50 年代初，AT, 2 , 1,exp 2 2 KK= = (3.12) 上式中是训练样本中第个输入向量。 q xq 3-4-2 中心的自组织选择中心的自组织选择固定中心的方法的主要缺陷是为了达到满意的性能需要一个巨大的训练集合，克服的一个方法就是使用一种混合学习过程，包括下面两个不同的阶段：自组织学习阶段，其目的是为隐层 RBF 的中心估计一个合适的位置。监督学习阶段，通过估计输出层的权值完成神经网络的设计。上述学习都可以用批处理来执行，但是用自适应（迭代）的方法更理想。对于自组织学习过程，首先需要一个聚类的算法将所给的数据点剖分成几个不同的部分，每一部分中的数据都尽量有相同的性质。一种这样的算法为 k均值聚类算法，它将径向基函数的中心放在输入空间中重要数据点所在的区域上。令M表示 RBF 数目，它要依靠试验来决定取何种适合值。令表示 RBF 在第次迭代时的中心。那么，k均值聚类算法进行如下： M ii nc 1 )( = n （1）初始化。随机选择中心的初始值，要求每一个中心的初值不同。将中心的欧几里德范数保持为较小的值可能会更理想一些。 )0( i c （2）抽取样本。在输入空间中以某种概率抽取样本向量x，作为第次迭代的输入向量。 n （3）相似匹配。令表示输入向量)(xix的最佳匹配（竞争获胜）中心的下标值。第次迭代时按欧几里德最小距离准则确定的值： n )(xi Mincnxxi i i , 2 , 1,)()(minarg)(K= （3.13）其中表示第i个径向基函数在第n次迭代时的中心。 )(nci （4）更新。用下述规则调整径向基函数的中心： 23 基于 RBF 神经网络的语音识别研究 =+ =+ 其它，， )( )()()()( ) 1( nc xiincnxnc nc i ii i (3.14) 其中是学习率，且10 max M max td5 . 1=（d为此时中心之间的平均距离），0.1 经过300次迭代后，得到最终的聚类中心数为M=95。将X的各个向量分配到离它最近的聚类中，然后用梯度下降法训练权值，保存权值进行分类，得到了较好的结果。表5.3为迭代步数与相应的权值： 40 河北工业大学硕士学位论文 41 表5.3 迭代步数与相应的权值 Table 5.3 The weight according iterate step 1 2 3 4 5 6 480 0.4508 -2.9711 -0.58428 4.8116 2.7145 2.0002 0.045264 4.6078 3.5058 -2.3