A computational model using formant space planning of articulator movements for vowel production.

Frank H. Guenther
(Department of Cognitive and Neural Systems,
Boston University, 111 Cummington St. Room 244, Boston, MA, 02215)

Dave Johnson
(Department of Cognitive and Neural Systems, Boston University)

It is often hypothesized that articulator movements are planned within a coordinate frame whose variables correspond to key vocal tract constrictions [e.g., E. Saltzman, K.G. Munhall, Ecol. Psych. 1, 333-382 (1989)]. However, recent evidence suggests that speakers may utilize a more acoustic-like space for planning vowel movements [J. Perkell, M. Matthies, M. Svirsky, M. Jordan, J. Acoust. Soc. Am. 93, 2948-2961, (1993)]. Previous work has verified the capacity of a computational speech production model called DIVA to explain a wide range of experimental data using a constriction planning space. The current work extends the model to allow formant space planning of vowel movements. The model learns target regions for F1 and F2 for each vowel during a babbling cycle. A mapping between desired formant changes and articulator movements that achieve these changes is also learned. After babbling, the model successfully reaches all vowel targets from any initial vocal tract configuration, even in the presence of constraints such as a blocked jaw, and the resulting synthesized vowels are easily recognizable. Although vowel targets specify only formant ranges with no articulatory information, articulator configurations used by the model to produce vowels are similar to human configurations.

[Partially supported by AFOSR F49620-92-J-0499.]