Written by Nickolay Shmyrev
on September 16, 2019

Selected Papers Interspeech 2019 Monday

Overall, it is going pretty good. Many very good papers, diarization joins with decoding, everything goes to the right direction.

RadioTalk: a large-scale corpus of talk radio transcripts Doug Beeferman (MIT Media Lab), William Brannon (MIT Media Lab), Deb Roy (MIT Media Lab) 248000 hours dataset

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2714.pdf

Automatic lyric transcription from Karaoke vocal tracks: Resources and a Baseline System Gerardo Roa (University of Sheffield), Jon Barker (University of Sheffield)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2378.pdf

https://github.com/groadabike/Kaldi-Dsing-task

Speaker Diarization with Lexical Information Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1947.pdf

Full-Sentence Correlation: a Method to Handle Unpredictable Noise for Robust Speech Recognition Ming Ji (Queen’s University Belfast), Danny Crookes (Queen’s University Belfast)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2127.pdf

Untranscribed Web Audio for Low Resource Speech Recognition Andrea Carmantini, Peter Bell, Steve Renals

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2623.pdf

Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1775.pdf

How to annotate 100 hours in 45 minutes Per Fallgren (KTH Royal Institute of Technology), Zofia Malisz (KTH, Stockholm), Jens Edlund (KTH Speech, Music and Hearing)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1648.pdf

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3246.pdf

High quality - lightweight and adaptable TTS using LPCNet Zvi Kons (IBM Haifa research lab), Slava Shechtman (Speech Technologies, IBM Research AI), Alexander Sorin (IBM Research - Haifa), Carmel Rabinovitz (IBM Research - Haifa), Ron Hoory (IBM Haifa Research Lab)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1705.pdf

http://srv-wtts.haifa.il.ibm.com/TTS-voice-conversion-IS2019/

Very nice quality

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, Björn W. Schuller

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1649.pdf

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition Khoi-Nguyen Mac (University of Illinois at Urbana-Champaign), Xiaodong Cui (IBM T. J. Watson Research Center), Wei Zhang (IBM T. J. Watson Research Center), Michael Picheny (IBM T. J. Watson Research Center)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2641.pdf

An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1752.pdf

← Top →