Skip to main content

Showing 1–15 of 15 results for author: Hrinchuk, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19954  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

    Authors: Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Żelasko, Jagadeesh Balam, Boris Ginsburg

    Abstract: Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a sequence of LLM inputs like a decoder-only model; ii) T5-style, introduce speech cross-attention to each layer of the pretrained LLMs. We propose BESTO… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    MSC Class: 68T10 ACM Class: I.2.7

  2. arXiv:2406.19674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

    Authors: Krishna C. Puvvada, Piotr Żelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

    Abstract: Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech translation model, outperforms current state-of-the-art models - Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages, while b… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech-2024

  3. arXiv:2310.09424  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

    Authors: Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recogni… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: submit to ICASSP 2024

    MSC Class: 68T10 ACM Class: I.2.7

  4. arXiv:2305.06155  [pdf, other

    cs.CL cs.AI cs.LG

    Leveraging Synthetic Targets for Machine Translation

    Authors: Sarthak Mittal, Oleksii Hrinchuk, Oleksii Kuchaiev

    Abstract: In this work, we provide a recipe for training machine translation models in a limited resource setting by leveraging synthetic target data generated using a large pre-trained model. We show that consistently across different benchmarks in bilingual, multilingual, and speech translation setups, training models on synthetic targets outperforms training on the actual ground-truth data. This performa… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

  5. arXiv:2305.05084  [pdf, other

    eess.AS cs.SD

    Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

    Authors: Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

    Abstract: Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel downsampling schema. The proposed model, named Fast Conformer(FC), is 2.8x faster than the original Conformer, supports scaling to Billion parameters witho… ▽ More

    Submitted 30 September, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at ASRU 2023

  6. arXiv:2206.01137  [pdf, other

    cs.CL cs.LG

    Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation

    Authors: Virginia Adams, Sandeep Subramanian, Mike Chrzanowski, Oleksii Hrinchuk, Oleksii Kuchaiev

    Abstract: General translation models often still struggle to generate accurate translations in specialized domains. To guide machine translation practitioners and characterize the effectiveness of domain adaptation methods under different data availability scenarios, we conduct an in-depth empirical exploration of monolingual and parallel data approaches to domain adaptation of pre-trained, third-party, NMT… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  7. arXiv:2111.08634  [pdf, other

    cs.CL cs.LG

    NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21

    Authors: Sandeep Subramanian, Oleksii Hrinchuk, Virginia Adams, Oleksii Kuchaiev

    Abstract: This paper provides an overview of NVIDIA NeMo's neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks. Our news task submissions for English-German (En-De) and English-Russian (En-Ru) are built on top of a baseline transformer-based sequence-to-sequence model. Specifically, we use a combination of 1) checkpoint averaging 2) mod… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: WMT'21 news and biomedical shared task submission

  8. arXiv:1910.10697  [pdf, other

    cs.CL cs.SD eess.AS

    Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

    Authors: Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg

    Abstract: In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and semantically correct text. We investigate different strategies for regularizing and optimizing the model and show that extensive data augmentation and the initializatio… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

  9. arXiv:1909.09577  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    NeMo: a toolkit for building AI applications using Neural Modules

    Authors: Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen

    Abstract: NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition. NeMo is built around neural modules, conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: 6 pages plus references

  10. arXiv:1905.11286  [pdf, other

    cs.LG stat.ML

    Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

    Authors: Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen

    Abstract: We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay. In our experiments on neural networks for image classification, speech recognition, machine translation, and language modeling, it performs on par or better than well tuned SGD with momentum and Adam or AdamW. Additionally, NovoGrad (1) is robust to the choice of l… ▽ More

    Submitted 6 February, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Preprint, under review

  11. arXiv:1903.00027  [pdf, other

    cs.LG stat.ML

    Catalyst.RL: A Distributed Framework for Reproducible RL Research

    Authors: Sergey Kolesnikov, Oleksii Hrinchuk

    Abstract: Despite the recent progress in deep reinforcement learning field (RL), and, arguably because of it, a large body of work remains to be done in reproducing and carefully comparing different RL algorithms. We present catalyst.RL, an open source framework for RL research with a focus on reproducibility and flexibility. Main features of our library include large-scale asynchronous distributed training… ▽ More

    Submitted 28 February, 2019; originally announced March 2019.

  12. arXiv:1902.02441  [pdf, other

    cs.LG cs.RO stat.ML

    Artificial Intelligence for Prosthetics - challenge solutions

    Authors: Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu, Pranav Shyam, Rupesh Kumar Srivastava, Sergey Kolesnikov, Oleksii Hrinchuk, Anton Pechenko, Mattias Ljungström, Zhen Wang, Xu Hu, Zehong Hu, Minghui Qiu, Jun Huang , et al. (25 additional authors not shown)

    Abstract: In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many s… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

  13. arXiv:1901.10801  [pdf, other

    cs.LG stat.ML

    Generalized Tensor Models for Recurrent Neural Networks

    Authors: Valentin Khrulkov, Oleksii Hrinchuk, Ivan Oseledets

    Abstract: Recurrent Neural Networks (RNNs) are very successful at solving challenging problems with sequential data. However, this observed efficiency is not yet entirely explained by theory. It is known that a certain class of multiplicative RNNs enjoys the property of depth efficiency --- a shallow network of exponentially large width is necessary to realize the same score function as computed by such an… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Accepted as a conference paper at ICLR 2019

  14. arXiv:1901.10787  [pdf, other

    cs.CL cs.LG

    Tensorized Embedding Layers for Efficient Model Compression

    Authors: Oleksii Hrinchuk, Valentin Khrulkov, Leyla Mirvakhabova, Elena Orlova, Ivan Oseledets

    Abstract: The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous, which precludes their deployment in a limited resource setting. We introduce a novel way of parametrizing embedding layers based on the Tensor Train (TT) decompos… ▽ More

    Submitted 19 February, 2020; v1 submitted 30 January, 2019; originally announced January 2019.

  15. arXiv:1704.08059  [pdf, other

    cs.CL

    Riemannian Optimization for Skip-Gram Negative Sampling

    Authors: Alexander Fonarev, Oleksii Hrinchuk, Gleb Gusev, Pavel Serdyukov, Ivan Oseledets

    Abstract: Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in "word2vec" software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problems is to apply Riemannian optimization framework to… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

    Comments: 9 pages, 4 figures, ACL 2017