-
DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants
Authors:
Deepak Muralidharan,
Joel Ruben Antony Moniz,
Weicheng Zhang,
Stephen Pulman,
Lin Li,
Megan Barnes,
**g**g Pan,
Jason Williams,
Alex Acero
Abstract:
Named entity recognition (NER) is usually developed and tested on text from well-written sources. However, in intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. In applications, entity labels may change frequently, and non-textual properties like topicality or popularity may be needed to choose among alternative…
▽ More
Named entity recognition (NER) is usually developed and tested on text from well-written sources. However, in intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. In applications, entity labels may change frequently, and non-textual properties like topicality or popularity may be needed to choose among alternatives.
We describe a NER system intended to address these problems. We test and train this system on a proprietary user-derived dataset. We compare with a baseline text-only NER system; the baseline enhanced with external gazetteers; and the baseline enhanced with the search and indirect labelling techniques we describe below. The final configuration gives around 6% reduction in NER error rate. We also show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.
△ Less
Submitted 14 August, 2021;
originally announced August 2021.
-
Noise Robust Named Entity Understanding for Voice Assistants
Authors:
Deepak Muralidharan,
Joel Ruben Antony Moniz,
Sida Gao,
Xiao Yang,
Justine Kao,
Stephen Pulman,
Atish Kothari,
Ray Shen,
Yinying Pan,
Vivek Kaul,
Mubarak Seyed Ibrahim,
Gang Xiang,
Nan Dun,
Yidan Zhou,
Andy O,
Yuan Zhang,
Pooja Chitkara,
Xuan Wang,
Alkesh Patel,
Kushal Tayal,
Roger Zheng,
Peter Grasch,
Jason D. Williams,
Lin Li
Abstract:
Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that jointly solves the NER and EL tasks by combining them in a joint reranking module. We show that our proposed framework improves NER accuracy by up to…
▽ More
Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that jointly solves the NER and EL tasks by combining them in a joint reranking module. We show that our proposed framework improves NER accuracy by up to 3.13% and EL accuracy by up to 3.6% in F1 score. The features used also lead to better accuracies in other natural language understanding tasks, such as domain classification and semantic parsing.
△ Less
Submitted 10 August, 2021; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Leveraging User Engagement Signals For Entity Labeling in a Virtual Assistant
Authors:
Deepak Muralidharan,
Justine Kao,
Xiao Yang,
Lin Li,
Lavanya Viswanathan,
Mubarak Seyed Ibrahim,
Kevin Luikens,
Stephen Pulman,
Ashish Garg,
Atish Kothari,
Jason Williams
Abstract:
Personal assistant AI systems such as Siri, Cortana, and Alexa have become widely used as a means to accomplish tasks through natural language commands. However, components in these systems generally rely on supervised machine learning algorithms that require large amounts of hand-annotated training data, which is expensive and time consuming to collect. The ability to incorporate unsupervised, we…
▽ More
Personal assistant AI systems such as Siri, Cortana, and Alexa have become widely used as a means to accomplish tasks through natural language commands. However, components in these systems generally rely on supervised machine learning algorithms that require large amounts of hand-annotated training data, which is expensive and time consuming to collect. The ability to incorporate unsupervised, weakly supervised, or distantly supervised data holds significant promise in overcoming this bottleneck. In this paper, we describe a framework that leverages user engagement signals (user behaviors that demonstrate a positive or negative response to content) to automatically create granular entity labels for training data augmentation. Strategies such as multi-task learning and validation using an external knowledge base are employed to incorporate the engagement annotated data and to boost the model's accuracy on a sequence labeling task. Our results show that learning from data automatically labeled by user engagement signals achieves significant accuracy gains in a production deep learning system, when measured on both the sequence labeling task as well as on user facing results produced by the system end-to-end. We believe this is the first use of user engagement signals to help generate training data for a sequence labeling task on a large scale, and can be applied in practical settings to speed up new feature deployment when little human annotated data is available.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.