资源预览内容
第1页 / 共9页
第2页 / 共9页
第3页 / 共9页
第4页 / 共9页
第5页 / 共9页
第6页 / 共9页
第7页 / 共9页
第8页 / 共9页
第9页 / 共9页
亲,该文档总共9页全部预览完了,如果喜欢就下载吧!
资源描述
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 7, OCTOBER 2002 495Improved Audio Coding Using a PsychoacousticModel Based on a Cochlear Filter BankFrank BaumgarteAbstractPerceptual audio coders use an estimated maskedthreshold for the determination of the maximum permissiblejust-inaudible noise level introduced by quantization. This es-timate is derived from a psychoacoustic model mimicking theproperties of masking. Most psychoacoustic models for codingapplications use a uniform (equal bandwidth) spectral decomposi-tion as a first step to approximate the frequency selectivity of thehuman auditory system. However, the equal filter properties of theuniform subbands do not match the nonuniform characteristics ofcochlear filters and reduce the precision of psychoacoustic mod-eling. Even so, uniform filter banks are applied because they arecomputationally efficient. This paper presents a psychoacousticmodel based on an efficient nonuniform cochlear filter bankand a simple masked threshold estimation. The novel filter-bankstructure employs cascaded low-order IIR filters and appropriatedown-sampling to increase efficiency. The filter responses areoptimized for the modeling of auditory masking effects. Resultsof the new psychoacoustic model applied to audio coding showbetter performance in terms of bit rate and/or quality of the newmodel in comparison with other state-of-the-art models using auniform spectral decomposition. The low delay of the new modelis particularly suitable for low-delay coders.Index TermsAudio coding, filter bank, masked threshold,model of masking, perceptual model.I. INTRODUCTIONIN PERCEPTUAL audio coding 1, the audio signal istreated as a masker for distortions introduced by lossydata compression. For this purpose, the masked threshold forthe distortions is approximated by a psychoacoustic model.The masked threshold is the time and frequency-dependentmaximum level that marks the boundary for distortions beinginaudible if superimposed to the audio signal. The initial audiosignal processing within the psychoacoustic model consists of aspectral decomposition to account for the frequency selectivityof the auditory system. However, the auditory system performsa nonuniform (nonequal bandwidths) spectral decompositionof the acoustic signal in the cochlea. This first stage of cochlearsound processing already determines basic properties ofmasking, e.g., the frequency spread of masking which is relatedto the frequency response of the human cochlear filters. Above1 kHz, the cochlear filter bandwidths increase almost propor-tionally to the center frequency. These bandwidths determineboth, the spectral width of energy integration associated withManuscript received June 20, 2001; revised July 18, 2002. The associate ed-itor coordinating the review of this manuscript and approving it for publicationwas Dr. Peter Vary.The author is with the Media Signal Processing Research Department, AgereSystems, Berkeley Heights, NJ 07922 USA (e-mail: fbagere.com).Digital Object Identifier 10.1109/TSA.2002.804536a band and the range of spectral components that can interactwithin a band, e.g., two sinusoids creating a beating effect. Thisinteraction plays a crucial role in the perception of whether asound is noise-like which in turn corresponds to a significantlymore efficient masking compared with a tone-like signal 2.The noise or tone-like character is basically determined bythe amount of envelope fluctuations at the cochlear filteroutputs which widely depend on the interaction of the spectralcomponents in the pass-band of the filter.Many existing psychoacoustic models, e.g., 1, 3, and 4,employ an FFT-based transform to derive a spectral decom-position of the audio signal into uniform subbands with equalbandwidths. The nonuniform spectral resolution of the auditorysystem is taken into account by summing up the energies of theappropriate number of neighboring FFT frequency subbands.Consequently, the phase relation between the spectral compo-nents of the different subbands within a cochlear filter band isnot taken into account. Since the cochlear filter slopes are lesssteep than the subband slopes, they must be approximated byspreading the subband energies across several bands. This wayof mapping the uniform subbands to cochlear filter bands pro-duces envelopes of the output signal that are different from thosemeasured at the output of the cochlea. The temporal resolutionof the spectral decomposition is determined by the transformsize, i.e., FFT length, and thus, is constant across all center fre-quencies. For high center frequencies this results in a signif-icantly lower temporal resolution in comparison with that ofthe corresponding cochlear filters. All the described mismatchescontribute to an inaccurate modeling of masking that causes sub-optimal coder compression performance.To overcome the mismatch between uniform filter banks andthe spectral decomposition of the cochlea, a linear nonuni
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号