M. SAM Sethserey, doctorant en co-tutelle MICA - LIG Grenoble, a soutenu brillament sa thèse à Grenoble le 7 juin 2011 et ainsi obtenu le titre de Docteur en Sciences. Cette thèse est une co-tutelle vraie entre le LIG Grenoble et le Centre MICA Hanoi
Titre : Vers une adaptation autonome des modèles acoustiques multilingues pour le traitement automatique de la parole
Co-directeur de thèse (LIG) : M. Laurent BESACIER
Co-directeur de thèse (MICA) : M. Eric CASTELLI
Membres du jury :
M. Christian BOITET | Président | DE, PRE | UJF Grenoble |
M. Hervé GLOTIN | Rapporteur | PR | USTV de Toulon |
M. Christophe CERISARA | Rapporteur | DR HDR | LORIA Vandoeuvre-lès-Nancy |
Mme Martine ADDA-DECKER | Examinateur | DR | LIMSI-CNRS Paris |
M. Eric CASTELLI | Co-directeur | HDR, MCF | MICA CNRS/UMI-2954 |
M. Laurent BESACIER | Co-directeur | PR | UJF Grenoble (LIG) |
Abstract:
Automatic speech recognition technologies are now integrated into many systems. The performance of speech recognition systems for non-native speakers, however, continues to suffer from high error rates, due to the difference between non-native speech and models trained on native speech. The making of recordings in large quantities of nonnative speech to represent all the origins of the speakers is a very difficult and impractical task.
This thesis focuses on improving multilingual acoustic models for automatic phonetic transcription of speech in “multilingual meetings”. There are several challenges in “multilingual meeting” speech: 1) there can be a conversation between native and non-native speakers; 2) there is not only one language spoken by a non-native, but several languages spoken by speakers from different origins; 3) it is difficult to collect sufficient data to bootstrap the transcription systems. To meet these challenges, we propose a process of adaptation of multilingual acoustic models called "autonomous adaptation". In autonomous adaptation, we studied several approaches for adapting multilingual acoustic models in an unsupervised way (spoken languages and speakers’ origins are not known in advance) and no additional data are used during the adaptation process. The approaches studied are decomposed into two modules. The first module, called the "language observer", recovers the linguistic information (spoken languages and speakers’ origins) of the segments to be decoded. The second module adapts the multilingual acoustic models based on knowledge provided by the language observer. To evaluate the usefulness of autonomous adaptation of multilingual acoustic models, we use a set of test data, which are extracted from multilingual meeting corpora, containing native and non-native speech in three languages: English (EN), French (FR) and Vietnamese (VN). According to the experimental results, the autonomous adaptation approach shows promising results for non-native speech, but degrades very slightly the performance on native speech. To improve the overall performance of transcription systems for both native and non-native speech, we study several approaches for detecting non-native speech, and propose to cascade such a detector with our self-adaptation process (autonomous adaptation). The results obtained so far are the best among all experiments done on our corpus of multilingual meetings.
Keywords: Non-native speech recognition, autonomous adaptation of multilingual acoustic models, language observer, interpolation, discrimination between native and non-native speech.