Selected Papers Interspeech 2019 Monday

Overall, it is going pretty good. Many very good papers, diarization joins with decoding, everything goes to the right direction.

RadioTalk: a large-scale corpus of talk radio transcripts Doug Beeferman (MIT Media Lab), William Brannon (MIT Media Lab), Deb Roy (MIT Media Lab) 248000 hours dataset

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2714.pdf

Automatic lyric transcription from Karaoke vocal tracks: Resources and a Baseline System Gerardo Roa (University of Sheffield), Jon Barker (University of Sheffield)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2378.pdf

https://github.com/groadabike/Kaldi-Dsing-task

Speaker Diarization with Lexical Information Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1947.pdf

Full-Sentence Correlation: a Method to Handle Unpredictable Noise for Robust Speech Recognition Ming Ji (Queen’s University Belfast), Danny Crookes (Queen’s University Belfast)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2127.pdf

Untranscribed Web Audio for Low Resource Speech Recognition Andrea Carmantini, Peter Bell, Steve Renals

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2623.pdf

Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1775.pdf

How to annotate 100 hours in 45 minutes Per Fallgren (KTH Royal Institute of Technology), Zofia Malisz (KTH, Stockholm), Jens Edlund (KTH Speech, Music and Hearing)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1648.pdf

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3246.pdf

High quality - lightweight and adaptable TTS using LPCNet Zvi Kons (IBM Haifa research lab), Slava Shechtman (Speech Technologies, IBM Research AI), Alexander Sorin (IBM Research - Haifa), Carmel Rabinovitz (IBM Research - Haifa), Ron Hoory (IBM Haifa Research Lab)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1705.pdf

http://srv-wtts.haifa.il.ibm.com/TTS-voice-conversion-IS2019/

Very nice quality

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, Björn W. Schuller

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1649.pdf

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition Khoi-Nguyen Mac (University of Illinois at Urbana-Champaign), Xiaodong Cui (IBM T. J. Watson Research Center), Wei Zhang (IBM T. J. Watson Research Center), Michael Picheny (IBM T. J. Watson Research Center)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2641.pdf

An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1752.pdf