Written by Nickolay Shmyrev
on September 17, 2019

Selected Papers Interspeech 2019 Wednesday

A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2700.pdf

Cool merge graphs

Detection and Recovery of OOVs for Improved English Broadcast News Captioning Samuel Thomas (IBM Research AI), Kartik Audhkhasi (IBM Research AI), Zoltan Tuske (IBM Research AI), Yinghui Huang (IBM Research AI), Michael Picheny (IBM Research AI)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2793.pdf

Nothing new but still important

Disfluencies and Human Speech Transcription Errors Vicky Zayats (University of Washington), Trang Tran (University of Washington), Courtney Mansfield (University of Washington), Richard Wright (University of Washington), Mari Ostendorf (University of Washington)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3134.pdf

Robust Sound Recognition: A Neuromorphic Approach Jibin Wu (National University of Singapore), Zihan Pan , Malu Zhang , Rohan Kumar Das , Yansong Chua , Haizhou Li

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/8032.pdf

Spiking neural networks

Neural Named Entity Recognition from Subword Units Abdalghani Abujabal (Max Planck Institute for Informatics), Judith Gaspers (Amazon)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1305.pdf

Names recognition is still important

Unsupervised Acoustic Segmentation and Clustering using Siamese Network Embeddings Saurabhchand Bhati (The Johns Hopkins University), Shekhar Nayak (Indian Institute of Technology Hyderabad), Sri Rama Murty Kodukula (IIT Hyderabad), Najim Dehak (Johns Hopkins University)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2981.pdf

Acoustic Model Bootstrapping Using Semi-Supervised Learning Langzhou Chen (Amazon Cambridge office), Volker Leutnant (Amazon Aachen office)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2818.pdf

Bandwidth Embeddings for Mixed-bandwidth Speech Recognition Gautam Mantena (Apple Inc.), Ozlem Kalinli (Apple Inc), Ossama Abdel-Hamid (Apple Inc), Don McAllaster (Apple Inc)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2589.pdf

Towards Debugging Deep Neural Networks by Generating Speech Utterances Bilal Soomro (University of Eastern Finland), Anssi Kanervisto (University of Eastern Finland), Trung Ngo Trong (University of Eastern Finland), Ville Hautamaki (University of Eastern Finland)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2339.pdf

Debugging is very nice idea

A Study for Improving Device-Directed Speech Detection toward Frictionless Human-Machine Interaction Che-Wei Huang (Amazon), Roland Maas (Amazon.com), Sri Harish Mallidi (Amazon, USA), Bjorn Hoffmeister (Amazon.com)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2840.pdf

Nice idea, we covered that before

Deep Learning for Orca Call Type Identification — A Fully Unsupervised Approach Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas Maier, Volker Barth, Elmar Nöth

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1857.pdf

Kinda cool

The STC ASR System for the VOiCES from a Distance Challenge 2019 Ivan Medennikov (STC-innovations Ltd), Yuri Khokhlov (STC-innovations Ltd), Aleksei Romanenko (ITMO University), Ivan Sorokin (STC), Anton Mitrofanov (STC-innovations Ltd), Vladimir Bataev (Speech Technology Center Ltd), Andrei Andrusenko (STC-innovations Ltd), Tatiana Prisyach (STC-innovations Ltd), Mariya Korenevskaya (STC-innovations Ltd), Oleg Petrov (ITMO University), Alexander Zatvornitskiy (Speech Technology Center)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1574.pdf

Kaggle type and cool tricks (char based LM), congrats to STC

Continuous Emotion Recognition in Speech – Do We Need Recurrence? Maximilian Schmitt (ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg), Nicholas Cummins (University of Augsburg), Björn Schuller (University of Augsburg / Imperial College London)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2710.pdf

Self-supervised speaker embeddings Themos Stafylakis (Omilia - Conversational Intelligence), Johan Rohdin (Brno University of Technology), Oldrich Plchot (Brno University of Technology), Petr Mizera (Czech Technical University in Prague), Lukas Burget (Brno University of Technology)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2842.pdf

the word of the year

Better morphology prediction for better speech systems Dravyansh Sharma (Carnegie Mellon University), Melissa Wilson (Google LLC), Antoine Bruguier (Google LLC)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/3207.pdf

Connecting and Comparing Language Model Interpolation Techniques Ernest Pusateri, Christophe Van Gysel, Rami Botros, Sameer Badaskar, Mirko Hannemann, Youssef Oualil, Ilya Oparin

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1822.pdf

Worth to remind

Articulation rate as a metric in spoken language assessment Calbert Graham (University of Cambridge), Francis Nolan (University of Cambridge)

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2098.pdf

← Top →