Written by
Nickolay Shmyrev
on
Selected Papers Interspeech 2019 Wednesday
A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition
Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2700.pdf
Cool merge graphs
Detection and Recovery of OOVs for Improved English Broadcast News Captioning
Samuel Thomas (IBM Research AI), Kartik Audhkhasi (IBM Research AI), Zoltan Tuske (IBM Research AI), Yinghui Huang (IBM Research AI), Michael Picheny (IBM Research AI)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2793.pdf
Nothing new but still important
Disfluencies and Human Speech Transcription Errors
Vicky Zayats (University of Washington), Trang Tran (University of Washington), Courtney Mansfield (University of Washington), Richard Wright (University of Washington), Mari Ostendorf (University of Washington)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3134.pdf
Robust Sound Recognition: A Neuromorphic Approach
Jibin Wu (National University of Singapore), Zihan Pan , Malu Zhang , Rohan Kumar Das , Yansong Chua , Haizhou Li
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/8032.pdf
Spiking neural networks
Neural Named Entity Recognition from Subword Units
Abdalghani Abujabal (Max Planck Institute for Informatics), Judith Gaspers (Amazon)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1305.pdf
Names recognition is still important
Unsupervised Acoustic Segmentation and Clustering using Siamese Network Embeddings
Saurabhchand Bhati (The Johns Hopkins University), Shekhar Nayak (Indian Institute of Technology Hyderabad), Sri Rama Murty Kodukula (IIT Hyderabad), Najim Dehak (Johns Hopkins University)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2981.pdf
Acoustic Model Bootstrapping Using Semi-Supervised Learning
Langzhou Chen (Amazon Cambridge office), Volker Leutnant (Amazon Aachen office)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2818.pdf
Bandwidth Embeddings for Mixed-bandwidth Speech Recognition
Gautam Mantena (Apple Inc.), Ozlem Kalinli (Apple Inc), Ossama Abdel-Hamid (Apple Inc), Don McAllaster (Apple Inc)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2589.pdf
Towards Debugging Deep Neural Networks by Generating Speech Utterances
Bilal Soomro (University of Eastern Finland), Anssi Kanervisto (University of Eastern Finland), Trung Ngo Trong (University of Eastern Finland), Ville Hautamaki (University of Eastern Finland)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2339.pdf
Debugging is very nice idea
A Study for Improving Device-Directed Speech Detection toward Frictionless Human-Machine Interaction
Che-Wei Huang (Amazon), Roland Maas (Amazon.com), Sri Harish Mallidi (Amazon, USA), Bjorn Hoffmeister (Amazon.com)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2840.pdf
Nice idea, we covered that before
Deep Learning for Orca Call Type Identification — A Fully Unsupervised Approach
Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas Maier, Volker Barth, Elmar Nöth
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1857.pdf
Kinda cool
The STC ASR System for the VOiCES from a Distance Challenge 2019
Ivan Medennikov (STC-innovations Ltd), Yuri Khokhlov (STC-innovations Ltd), Aleksei Romanenko (ITMO University), Ivan Sorokin (STC), Anton Mitrofanov (STC-innovations Ltd), Vladimir Bataev (Speech Technology Center Ltd), Andrei Andrusenko (STC-innovations Ltd), Tatiana Prisyach (STC-innovations Ltd), Mariya Korenevskaya (STC-innovations Ltd), Oleg Petrov (ITMO University), Alexander Zatvornitskiy (Speech Technology Center)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1574.pdf
Kaggle type and cool tricks (char based LM), congrats to STC
Continuous Emotion Recognition in Speech – Do We Need Recurrence?
Maximilian Schmitt (ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg), Nicholas Cummins (University of Augsburg), Björn Schuller (University of Augsburg / Imperial College London)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2710.pdf
Self-supervised speaker embeddings
Themos Stafylakis (Omilia - Conversational Intelligence), Johan Rohdin (Brno University of Technology), Oldrich Plchot (Brno University of Technology), Petr Mizera (Czech Technical University in Prague), Lukas Burget (Brno University of Technology)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2842.pdf
the word of the year
Better morphology prediction for better speech systems
Dravyansh Sharma (Carnegie Mellon University), Melissa Wilson (Google LLC), Antoine Bruguier (Google LLC)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3207.pdf
Connecting and Comparing Language Model Interpolation Techniques
Ernest Pusateri, Christophe Van Gysel, Rami Botros, Sameer Badaskar, Mirko Hannemann, Youssef Oualil, Ilya Oparin
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1822.pdf
Worth to remind
Articulation rate as a metric in spoken language assessment
Calbert Graham (University of Cambridge), Francis Nolan (University of Cambridge)
https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2098.pdf