Skip to main content

Showing 1–15 of 15 results for author: Bakhturina, E

.
  1. arXiv:2310.03025  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Retrieval meets Long Context Large Language Models

    Authors: Peng Xu, Wei **, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? ii) Can both methods be combined to get the best of both worlds? In this work, we answer these questions by stu… ▽ More

    Submitted 23 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024

  2. arXiv:2310.02943  [pdf, other

    cs.CL

    LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models

    Authors: Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format. Simultaneously, the development of end-to-end ASR models capable of predicting punctuation and capitalization presents several challenges, primarily due to limited dat… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  3. arXiv:2309.13426  [pdf, other

    cs.CL cs.AI

    A Chat About Boring Problems: Studying GPT-based text normalization

    Authors: Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg

    Abstract: Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language models. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models (LLM) for text normalization in few-shot scenarios. Combining self-consistency reasoning with linguistic-informed prompt engineering, we find LLM based text normal… ▽ More

    Submitted 17 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  4. arXiv:2306.02317  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram map**s

    Authors: Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

    Abstract: Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition (ASR) quality given user vocabulary. To deal with large user vocabularies, most of these models include candidate retrieval mechanisms, usually based on minimum edit distance between fragments of ASR hypothesis and user phrases. However, the edit-distance approach is slow, non-trainab… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  5. arXiv:2302.14523  [pdf, other

    cs.CL

    Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners

    Authors: Jocelyn Huang, Evelina Bakhturina, Oktai Tatanov

    Abstract: Grapheme-to-phoneme (G2P) transduction is part of the standard text-to-speech (TTS) pipeline. However, G2P conversion is difficult for languages that contain heteronyms -- words that have one spelling but can be pronounced in multiple ways. G2P datasets with annotated heteronyms are limited in size and expensive to create, as human labeling remains the primary method for heteronym disambiguation.… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

  6. arXiv:2208.00064  [pdf

    cs.CL cs.AI

    Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

    Authors: Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

    Abstract: Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms. One can consider ITN as a Machine Translation task and use neural sequence-to-sequence models to solve it. Unfortunately, such neural models are prone to hallu… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

  7. arXiv:2203.15917  [pdf, ps, other

    cs.CL

    Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization

    Authors: Evelina Bakhturina, Yang Zhang, Boris Ginsburg

    Abstract: Text normalization (TN) systems in production are largely rule-based using weighted finite-state transducers (WFST). However, WFST-based systems struggle with ambiguous input when the normalized form is context-dependent. On the other hand, neural text normalization systems can take context into account but they suffer from unrecoverable errors and require labeled normalization datasets, which are… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

  8. arXiv:2108.09889  [pdf, other

    cs.CL

    A Unified Transformer-based Framework for Duplex Text Normalization

    Authors: Tuan Manh Lai, Yang Zhang, Evelina Bakhturina, Boris Ginsburg, Heng Ji

    Abstract: Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, ranging from weighted finite-state transducers to neural networks. Despite their impressive performance, these methods aim to tackle only one of the two ta… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

    Comments: Under Review

  9. arXiv:2105.08049  [pdf, other

    cs.CL cs.LG

    SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services

    Authors: Yang Zhang, Vahid Noroozi, Evelina Bakhturina, Boris Ginsburg

    Abstract: Dialogue state tracking is an essential part of goal-oriented dialogue systems, while most of these state tracking models often fail to handle unseen services. In this paper, we propose SGD-QA, a simple and extensible model for schema-guided dialogue state tracking based on a question answering approach. The proposed multi-pass model shares a single encoder between the domain information and dialo… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  10. arXiv:2104.05055  [pdf, other

    cs.CL cs.SD eess.AS

    NeMo Inverse Text Normalization: From Development To Production

    Authors: Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg

    Abstract: Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted finite-state transducer(WFST) grammars since this task has extremely low tolerance to unrecoverable errors. We introduce an open-source Python WFST-based library for ITN w… ▽ More

    Submitted 17 May, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

  11. arXiv:2104.04896  [pdf

    eess.AS cs.CL cs.SD

    A Toolbox for Construction and Analysis of Speech Datasets

    Authors: Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on Kürzinger et al. work, and, to the best of our k… ▽ More

    Submitted 6 January, 2022; v1 submitted 10 April, 2021; originally announced April 2021.

  12. arXiv:2104.01497  [pdf, other

    eess.AS

    Hi-Fi Multi-Speaker English TTS Dataset

    Authors: Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang

    Abstract: This paper introduces a new multi-speaker English dataset for training text-to-speech models. The dataset is based on LibriVox audiobooks and Project Gutenberg texts, both in the public domain. The new dataset contains about 292 hours of speech from 10 speakers with at least 17 hours per speaker sampled at 44.1 kHz. To select speech samples with high quality, we considered audio recordings with a… ▽ More

    Submitted 14 June, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

  13. arXiv:2010.06060  [pdf, other

    cs.CL

    BioMegatron: Larger Biomedical Domain Language Model

    Authors: Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, Raghav Mani

    Abstract: There has been an influx of biomedical domain-specific language models, showing language models pre-trained on biomedical text perform better on biomedical domain benchmarks than those trained on general domain text corpora such as Wikipedia and Books. Yet, most works do not study the factors affecting each domain language application deeply. Additionally, the study of model size on domain-specifi… ▽ More

    Submitted 13 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at EMNLP 2020

  14. arXiv:2008.12335  [pdf, other

    cs.LG stat.ML

    A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset

    Authors: Vahid Noroozi, Yang Zhang, Evelina Bakhturina, Tomasz Kornuta

    Abstract: Dialog State Tracking (DST) is one of the most crucial modules for goal-oriented dialogue systems. In this paper, we introduce FastSGT (Fast Schema Guided Tracker), a fast and robust BERT-based model for state tracking in goal-oriented dialogue systems. The proposed model is designed for the Schema-Guided Dialogue (SGD) dataset which contains natural language descriptions for all the entities incl… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

    Comments: Accepted to the Workshop on Conversational Systems Towards Mainstream Adoption at KDD 2020

  15. arXiv:1712.00725  [pdf, other

    cs.CL cs.AI cs.CV cs.LG stat.ML

    Sentiment Classification using Images and Label Embeddings

    Authors: Laura Graesser, Abhinav Gupta, Lakshay Sharma, Evelina Bakhturina

    Abstract: In this project we analysed how much semantic information images carry, and how much value image data can add to sentiment analysis of the text associated with the images. To better understand the contribution from images, we compared models which only made use of image data, models which only made use of text data, and models which combined both data types. We also analysed if this approach could… ▽ More

    Submitted 3 December, 2017; originally announced December 2017.

    Comments: 13 pages, 3 figures, 9 tables. Technical report for Statistical Natural Language Processing Project (NYU CS - Fall 2016)