HUNAN UNIVERSITY 毕 业 论 文 论 文 题 目论 文 题 目 一种类人机器人的语音交互与 软件设计 学生姓名 陈明 学生学号 201208070103 专业班级 智能科学与技术 1 班 学院名称 信息科学与工程学院 指导老师 李仁发 学院院长 李仁发 2016 年 6 月 1 日 湖南大学本科生毕业设计(论文) I 一种类人机器人的语音交互与软件设计 摘 要 本文阐述了利用 NAO 机器人进行语音识别研究并涉及了机器人相关的常见行为交 互。语音识别技术是一门涉及了语音学、声学、语言学、信号处理、人工智能等多学科 的综合性技术, 目前其应用越来越广。 NAO 机器人作为标准机器人平台应用在比赛、 教 育、 科研等方方面面, 基于 NAO 机器人进行相关科研是符合时代趋势与研究趋势所在。 论文前面部分介绍了语音识别领域的基础方法与知识,并且简要介绍了 NAO 机器 人的结构和功能。在理论部分,第 3 章介绍了 GMM-HMM,即高斯混合模型-隐马尔科 夫模型的理论知识。这两个模型在实验中都应用到了语音识别中。 在语音识别的实验部分,通过对由 NAO 机器人捕获的音频流进行处理操作:音频 分轨、滤波、分帧、加 Hamming 窗函数、语音特征提取、对样本音频流进行机器学习训 练等等。完成必要的处理后,处理结果将会由本地计算机的 matlab 客户端传到 NAO 机 器人控制软件 Choregraphe 的服务器端。机器人将会根据识别的传回的结果做出相应的 行为。 论文除了进行语音识别的研究外,还对 NAO 机器人进行定向运动、多任务并行的 舞蹈、给定话题下的交流这些行为交互功能进行了设计。定向运动能够使机器人运动具 体的角度和旋转方向。 多任务并行的舞蹈的设计实际上是把多种任务聚合在behavior层, 这些任务包括: 头、 足、 手臂的分帧运动设计, LEDs 灯组的颜色变化以及根据 Aldebaran Robotics 公司的官方文档中 QiChat Syntax 部分进行给定话题下的对话设计。 总的来说,本项目设计和论文的撰写包含语音识别和行为交互设计两大部分。语音 识别是通过 NAO 机器人捕获目标音频流并通过 ftp 传入本地计算机继续处理。行为设 计是在 NAO 机器人的顶层控制软件 Choregraphe 中进行多种行为的设计,这些行为中 的特定行为将会依据语音识别的结果被触发,成功完成规定的设计任务。 关键词:高斯混合模型;隐马尔可夫模型;定向运动;多任务并行舞蹈;音频流分 帧;窗函数;语音特征提取;TCP/IP 通信 湖南大学本科生毕业设计(论文) II An approach of speech interaction and software design for humanoid robots Abstract The essay illustrates the research on speech recognition and the usual behavior interactions on basis of a NAO robot. Speech recognition is a kind of comprehensive technique concerning Acoustics, Phonetics, Linguistics, Signal Processing and Artificial Intelligence, etc. Currently, the techniques of Speech Recognition are widely applied into an increasingly number of fields. As the standard platform in a variety of areas, such as competitions, elementary and tertiary education, scientific research, NAO robots are of significant importance in terms of doing studies, which is in accordance with the mainstream research. At the beginning of the essay, basic approaches and knowledge related to Speech Recognition are briefly discussed, followed by the introduction of structure and functions about NAO robots. As for the part of the applied theories, GMM (Gaussian Mixture Model) and HMM (Hidden Markov Model) are main points in chapter 3. Both these theories would be applied in my research experiment. In the part of practical experiments on Speech Recognition, capturing the audio stream is the initial operation for NAO robot, after which the target stream would be downloaded by local computer through ftp commands in matlab command window. Then the processing of the target audio stream is supposed to be divided into a series of operations, concerning separating audio tracks (4 tracks are captured by NAOs microphones), filtering audio wave, framing target audio, adding Hamming window function, extracting features of the audio, using machine learning methods for training audio data set. After the above indispensable steps, the processed result of the target audio stream would be transferred to NAO robots socket sever in Choregraphe through TCP/IP communication protocols. As long as the result is transferred to NAO, the robot would begin to do the pre-designed behavior in accordance with the result. In addition to the research of speech recognition, multiple behaviors of NAO robot are studied and designed as well. The designed behaviors involve orientation-moving, multi-task dancing and talking on a given topic. The behavior of orientation-moving enables the robot to move along a specific direction with accurate distance and angle of rotation. Multi-tasking dancing is about the concept of parallel processing and it needs the convergence of the 湖南大学本科生毕业设计(论文) III interactive levels of head, arms, legs, LEDs sets, sound and music. For the talking behavior on a given topic, the designed conversations ought to follow QiChat Syntax, which could be divided into nine main classifications, illustrated with details in Aldebaran Documentation. To conclude, the whole project deals with the problem of speech recognition through analyzing the audio stream captured from NAO robots microphones and enables the robot to do pre-designed behaviors in accordance with the recognition result transferred from local computer though TCP/IP communication protocols. Key Words: Gaussian Mixture Model; Hidden Markov Model; multi-tasking dancing; framing audio stream; window function; speech feature extractions; TCP/IP communication 湖南大学本科生毕业设计(论文) IV 目录 毕业设计(论文)原创性声明和毕业设计(论文)版权使用授权书. 摘 要. Abstract. 插图索引. 第 1 章 绪论 1 1.1 引言 . 1 1.2 语音识别技术的发展 . 1 1.2.1 基于模板的方法 . 2 1.2.2 基于知识的方法 . 2 1.2.3 连接方法 . 2 1.2.4 统计方法 3 1.3 语音识别技术在机器人中的应用 . 3 1.3.1 智能轮椅机器人 3 1.3.2 语音聊天机器人 4 1.4 本文主要研究内容 . 5 第 2 章 NAO 机器人的结构和功能的介绍 6 2.1 NAO 机器人简介 6 2.2 NAO 机器人的一般特征 7 2.3 NAO 机器人的配置 8 2.4 视觉系统 . 10 2.5 音频系统 . 11 2.6 官方软件开发平台-Choregraphe . 11 第 3 章 GMM-HMM 基本理论知识 13 3.1 GMM(高斯混合模型) 13 3.2 HMM(隐马尔可夫模型) 16 3.2.1 马尔可夫链 17 3.2.2 隐马尔科夫模型的定义 17 湖南大学本科生毕业设计(论文) V 第 4 章 NAO 机器人的行为设计 2
