Keynote Lecture (EURASIP Seminar): 14 May 2014, 10:10-11:10
Abstract: Speech-to-speech translation technology enables natural oral communication between different language speaking people. Many research projects have addressed speech-to-speech translation (S2ST) technology, such as ATR, VERBMOBIL, C-STAR, NESPOLE!, BABYLON, GALE, and EU-bridge. The speech-to-speech translation system is normally composed of automatic speech recognition (ASR), machine translation (MT), and speech synthesis (TTS). All of the modules are corpus-based and statistical model-based systems. In this talk new challenges toward a real-time multimodal speech-to-speech translation will be introduced.
Biography: Satoshi Nakamura is a Professor of Graduate School of Information Science, Nara Institute of Science and Technology, Japan, Honorarprofessor of Karlsruhe Institute of Technology, Germany, and ATR Fellow. He was Director of ATR Spoken Language Communication Research Laboratories in 2000-2008 and Vice president of ATR in 2007-2008. He was Director General of Keihanna Research Laboratories and the Executive Director of Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, Japan in 2009-2010. He is currently Director of Augmented Human Communication laboratory and a full professor of Graduate School of Information Science at Nara Institute of Science and Technology. He is interested in modeling and systems of speech-to-speech translation and speech recognition. He is one of the leaders of speech-to-speech translation research and has been serving for various speech-to-speech translation research projects in the world including C-STAR, IWSLT and A-STAR. He was a project leader of the world first network-based commercial speech-to-speech translation service for 3-G mobile phones in 2007 and VoiceTra Project for iPhone in 2010. He received Yamashita Research Award, Kiyasu Award from the Information Processing Society of Japan, Telecom System Award, AAMT Nagao Award, Docomo Mobile Science Award in 2007, ASJ Award for Distinguished Achievements in Acoustics. He received the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology, and the Commendation for Science and Technology in Information Technology by the Minister of Internal Affair and Communications. He was also awarded Antonio Zampolli Prize by ELRA Association. He organized the International Workshop of Spoken Language Translation (IWSLT 2006) and Oriental Cocosda 2008 as a general chair. He also served as the program chair of INTERSPEECH 2010. He is currently Elected Board Member of International Speech Communication Association (ISCA), Elected Committee Member of IEEE SPS Spoken Language Technology Technical committee, and IEEE Signal Processing Magazine Editorial Board Member since April 2012.
Keynote Lecture: 15 May 2014, 09:20-10:20
Abstract:Recently there has been increased interest in Automatic Speech Recognition (ASR) and Key Word Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper describes some of the research funded by this project at Cambridge University, as part of the Lorelei team co-ordinated by IBM. A range of topics are discussed including: deep neural network based acoustic models; data augmentation; and zero acoustic model resource systems. Performance for all approaches is evaluated using the Limited (approximately 10 hours) and/or Full (approximately 80 hours) language packs distributed by IARPA. Both KWS and ASR performance figures are given. Though absolute performance varies from language to language, and keyword list, the approaches described show consistent trends over the languages investigated to date. Using comparable systems over the five Option Period 1 languages indicates a strong correlation between ASR performance and KWS performance.
Biography: Mark Gales is a Professor of Information Engineering in the Machine Intelligence Laboratory (formerly the Speech Vision and Robotics (SVR) group) and a Fellow of Emmanuel College. He is a member of the Speech Research Group. Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He was appointed Reader in Information Engineering in 2004. He is currently a Professor of Information Engineering (appointed 2012) and a Professorial College Lecturer and Official Fellow of Emmanuel College. Mark Gales is a Fellow of the IEEE and was a member of the Speech Technical Committee from 2001-2004. He was an associate editor for IEEE Signal Processing Letters from 2009-2011 and is currently an associate editor for IEEE Transactions on Audio Speech and Language Processing. He is also on the Editorial Board of Computer Speech and Language. Mark Gales was awarded a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.