Investigations of
Formant and Wavelet Representations
for Speech Movement Planning

Dave Johnson
Boston University
Department of Cognitive and Neural Systems
January 1998

(Full text of Dissertation in PDF, 982KB)

PhD Dissertation Abstract

Models of speech production have historically assumed that phoneme targets are represented as points in a planning space of vocal tract constrictions. However, recent evidence suggests that speakers may utilize an acoustic-like space in the planning of articulator movements for vowel and consonant production. This thesis demonstrates the feasibility of using acoustic planning spaces for the production of vowel sounds in DIVA, a computational neural model of motor-equivalent vowel production. Two acoustic planning spaces are studied, a planning space based on formant frequencies and a planning space based on a wavelet decomposition of the speech spectrum. The successful implementation of the formant planning space proves that articulatory control based on formants is practical and does not suffer from inherent nonlinearities in the inverse kinematic map between desired formant movements and corresponding movements of the vocal articulators. However, formant-based models leave a number of issues unresolved, and several researchers have suggested that the gross shape of the vowel spectrum may correlate more closely with vowel perception data. Recent physiological studies suggest that the peripheral auditory system computes the log magnitude spectrum of a steady vowel, and that the representation in primary auditory cortex is similar to a wavelet decomposition of this spectrum. Based on these psychophysical and physiological results, the thesis proposes a model of vowel production based on a wavelet expansion of the log magnitude spectrum of the target vowel. The model employs an orthonormal set of wavelet basis functions which spans the space of possible vowel spectra. The wavelet-auditory planning space dimensions correspond to the coefficients in the wavelet expansion of the spectrum, and vowel targets are assumed to be connected regions in this space. In addition to support from the physiological literature, this model has a number of advantages over formant-based vowel production models, including simpler computation of the spectral parameters and better approximations of gross spectral shape. Also, the wavelet-auditory planning space is used to explain the spectral center of gravity effect, in which formant clusters are averaged into a single formant peak during vowel perception.