Written by
Nickolay Shmyrev
on
Selected Papers Interspeech 2019 Monday
Overall, it is going pretty good. Many very good papers, diarization joins with decoding, everything goes to the right direction.
RadioTalk: a large-scale corpus of talk radio transcripts
Doug Beeferman (MIT Media Lab), William Brannon (MIT Media Lab), Deb Roy (MIT Media Lab)
248000 hours dataset
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2714.pdf
Automatic lyric transcription from Karaoke vocal tracks: Resources and a Baseline System
Gerardo Roa (University of Sheffield), Jon Barker (University of Sheffield)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2378.pdf
https://github.com/groadabike/Kaldi-Dsing-task
Speaker Diarization with Lexical Information
Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1947.pdf
Full-Sentence Correlation: a Method to Handle Unpredictable Noise for Robust Speech Recognition
Ming Ji (Queen’s University Belfast), Danny Crookes (Queen’s University Belfast)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2127.pdf
Untranscribed Web Audio for Low Resource Speech Recognition
Andrea Carmantini, Peter Bell, Steve Renals
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2623.pdf
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data
Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1775.pdf
How to annotate 100 hours in 45 minutes
Per Fallgren (KTH Royal Institute of Technology), Zofia Malisz (KTH, Stockholm), Jens Edlund (KTH Speech, Music and Hearing)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1648.pdf
Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3246.pdf
High quality - lightweight and adaptable TTS using LPCNet
Zvi Kons (IBM Haifa research lab), Slava Shechtman (Speech Technologies, IBM Research AI), Alexander Sorin (IBM Research - Haifa), Carmel Rabinovitz (IBM Research - Haifa), Ron Hoory (IBM Haifa Research Lab)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1705.pdf
http://srv-wtts.haifa.il.ibm.com/TTS-voice-conversion-IS2019/
Very nice quality
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition
Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, Björn W. Schuller
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1649.pdf
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition
Khoi-Nguyen Mac (University of Illinois at Urbana-Champaign), Xiaodong Cui (IBM T. J. Watson Research Center), Wei Zhang (IBM T. J. Watson Research Center), Michael Picheny (IBM T. J. Watson Research Center)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2641.pdf
An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models
Khe Chai Sim, Petr Zadrazil, Françoise Beaufays
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1752.pdf