Skip to main content

Showing 1–24 of 24 results for author: Kulikov, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17744  [pdf, other

    cs.CL

    Following Length Constraints in Instructions

    Authors: Weizhe Yuan, Ilia Kulikov, ** Yu, Kyunghyun Cho, Sainbayar Sukhbaatar, Jason Weston, **g Xu

    Abstract: Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 13 pages

  2. arXiv:2406.16838  [pdf, other

    cs.CL cs.LG

    From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

    Authors: Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui

    Abstract: One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, m… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.02733  [pdf, other

    cs.CL cs.SD eess.AS

    Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

    Authors: Min-Jae Hwang, Ilia Kulikov, Benjamin Peloquin, Hongyu Gong, Peng-Jen Chen, Ann Lee

    Abstract: In this paper, we propose a textless acoustic model with a self-supervised distillation strategy for noise-robust expressive speech-to-speech translation (S2ST). Recently proposed expressive S2ST systems have achieved impressive expressivity preservation performances by cascading unit-to-speech (U2S) generator to the speech-to-unit translation model. However, these systems are vulnerable to the pr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (findings)

  4. arXiv:2403.12408  [pdf, other

    cs.CL cs.SD eess.AS

    MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation

    Authors: Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

    Abstract: There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserve… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  5. arXiv:2403.12402  [pdf, other

    cs.CL cs.SD eess.AS

    An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis

    Authors: Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

    Abstract: Speech language models (LMs) are promising for high-quality speech synthesis through in-context learning. A typical speech LM takes discrete semantic units as content and a short utterance as prompt, and synthesizes speech which preserves the content's semantics but mimics the prompt's style. However, there is no systematic understanding on how the synthesized audio is controlled by the prompt and… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  6. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  7. arXiv:2310.02720  [pdf, other

    cs.SD eess.AS

    Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

    Authors: Jiatong Shi, Hirofumi Inaguma, Xutai Ma, Ilia Kulikov, Anna Sun

    Abstract: Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech signals. In contrast, this paper aims to incorporate multi-resolution information into speech self-supervised representation learning. We introduce a SSL model that l… ▽ More

    Submitted 30 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR2024 as spotlight

  8. arXiv:2308.11596  [pdf, other

    cs.CL

    SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

    Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    ACM Class: I.2.7

  9. arXiv:2212.08055  [pdf, other

    cs.CL cs.SD eess.AS

    UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

    Authors: Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

    Abstract: Direct speech-to-speech translation (S2ST), in which all components can be optimized jointly, is advantageous over cascaded approaches to achieve fast inference with a simplified pipeline. We present a novel two-pass direct S2ST architecture, UnitY, which first generates textual representations and predicts discrete acoustic units subsequently. We enhance the model performance by subword predictio… ▽ More

    Submitted 26 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: ACL 2023 (main conference)

  10. arXiv:2210.14514  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Speech-to-Speech Translation Through Unlabeled Text

    Authors: Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong

    Abstract: Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm due to the significant scarcity of S2ST data. While effort has been made to increase the data size from unlabeled speech by cascading pretrained speech recognition (ASR), machine translation (MT) and text-to-speech (TTS) models; unlabeled text has remained relatively under-utilized to impr… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  11. arXiv:2210.11981  [pdf, other

    cs.CL

    Named Entity Detection and Injection for Direct Speech Translation

    Authors: Marco Gaido, Yun Tang, Ilia Kulikov, Rongqing Huang, Hongyu Gong, Hirofumi Inaguma

    Abstract: In a sentence, certain words are critical for its semantic. Among them, named entities (NEs) are notoriously challenging for neural models. Despite their importance, their accurate handling has been neglected in speech-to-text (S2T) translation research, and recent work has shown that S2T models perform poorly for locations and notably person names, whose spelling is challenging unless known in ad… ▽ More

    Submitted 11 March, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  12. arXiv:2210.10191  [pdf, other

    cs.CL cs.SD eess.AS

    Simple and Effective Unsupervised Speech Translation

    Authors: Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino

    Abstract: The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages. To address this issue, we study a simple and effective approach to build speech translation systems without labeled data by leveraging recent advances in unsupervised speech recognit… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

  13. arXiv:2204.00471  [pdf, other

    cs.CL

    Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models

    Authors: Felix Stahlberg, Ilia Kulikov, Shankar Kumar

    Abstract: In many natural language processing (NLP) tasks the same input (e.g. source sentence) can have multiple possible outputs (e.g. translations). To analyze how this ambiguity (also known as intrinsic uncertainty) shapes the distribution learned by neural sequence models we measure sentence-level uncertainty by computing the degree of overlap between references in multi-reference test sets from two di… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: ACL 2022 paper

  14. arXiv:2112.08914  [pdf, other

    cs.LG cs.CL

    Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling

    Authors: Ilia Kulikov, Maksim Eremeev, Kyunghyun Cho

    Abstract: Neural autoregressive sequence models smear the probability among many possible sequences including degenerate ones, such as empty or repetitive sequences. In this work, we tackle one specific case where the model assigns a high probability to unreasonably short sequences. We define the oversmoothing rate to quantify this issue. After confirming the high degree of oversmoothing in neural machine t… ▽ More

    Submitted 22 December, 2021; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Ilia Kulikov and Maksim Eremeev contributed equally

  15. arXiv:2106.05459  [pdf, other

    cs.LG stat.ML

    Mode recovery in neural autoregressive sequence modeling

    Authors: Ilia Kulikov, Sean Welleck, Kyunghyun Cho

    Abstract: Despite its wide use, recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models trained with maximum likelihood, such as an unreasonably high affinity to short sequences after training and to infinitely long sequences at decoding time. We propose to study these phenomena by investigating how the modes, or local maxima, of a distribution are maintai… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: ACL-IJCNLP 2021 5th Workshop on Structured Prediction for NLP

  16. arXiv:2002.02492  [pdf, other

    cs.LG cs.CL stat.ML

    Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

    Authors: Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho

    Abstract: Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm,… ▽ More

    Submitted 2 October, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: EMNLP 2020

  17. arXiv:1911.03860  [pdf, other

    cs.CL

    Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

    Authors: Margaret Li, Stephen Roller, Ilia Kulikov, Sean Welleck, Y-Lan Boureau, Kyunghyun Cho, Jason Weston

    Abstract: Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addre… ▽ More

    Submitted 6 May, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

  18. arXiv:1908.08485  [pdf

    cs.RO cs.AI cs.CY eess.SY

    Simulation Model of Two-Robot Cooperation in Common Operating Environment

    Authors: V. Ya. Vilisov, B. Yu. Murashkin, A. I. Kulikov

    Abstract: The article considers a simulation modelling problem related to the chess game process occurring between two three-tier manipulators. The objective of the game construction lies in develo** the procedure of effective control of the autonomous manipulator robots located in a common operating environment. The simulation model is a preliminary stage of building a natural complex that would provide… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: 6 pages, 6 figures

  19. arXiv:1908.04319  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Text Generation with Unlikelihood Training

    Authors: Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston

    Abstract: Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive outputs. While some post-hoc fixes have been proposed, in particular top-$k$ and nucleus sampling, they do not address the fact that the token-level probabilities predicted by the mode… ▽ More

    Submitted 26 September, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Sean Welleck and Ilia Kulikov contributed equally

  20. arXiv:1906.00141  [pdf, other

    cs.CL cs.LG

    Multi-Turn Beam Search for Neural Dialogue Modeling

    Authors: Ilia Kulikov, Jason Lee, Kyunghyun Cho

    Abstract: In neural dialogue modeling, a neural network is trained to predict the next utterance, and at inference time, an approximate decoding algorithm is used to generate next utterances given previous ones. While this autoregressive framework allows us to model the whole conversation during training, inference is highly suboptimal, as a wrong utterance can affect future utterances. While beam search yi… ▽ More

    Submitted 12 November, 2019; v1 submitted 31 May, 2019; originally announced June 2019.

  21. Evaluation of Intel Memory Drive Technology Performance for Scientific Applications

    Authors: Vladimir Mironov, Andrey Kudryavtsev, Yuri Alexeev, Alexander Moskovsky, Igor Kulikov, Igor Chernykh

    Abstract: In this paper, we present benchmark data for Intel Memory Drive Technology (IMDT), which is a new generation of Software-defined Memory (SDM) based on Intel ScaleMP collaboration and using 3D XPointTM based Intel Solid-State Drives (SSDs) called Optane. We studied IMDT performance for synthetic benchmarks, scientific kernels, and applications. We chose these benchmarks to represent different patte… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Journal ref: In Proceedings of the Workshop on Memory Centric High Performance Computing (MCHPC'18). ACM, New York, NY, USA, 14-21, 2018

  22. arXiv:1811.00907  [pdf, other

    cs.CL cs.LG

    Importance of Search and Evaluation Strategies in Neural Dialogue Modeling

    Authors: Ilia Kulikov, Alexander H. Miller, Kyunghyun Cho, Jason Weston

    Abstract: We investigate the impact of search strategies in neural dialogue modeling. We first compare two standard search algorithms, greedy and beam search, as well as our newly proposed iterative beam search which produces a more diverse set of candidate responses. We evaluate these strategies in realistic full conversations with humans and propose a model-based Bayesian calibration to address annotator… ▽ More

    Submitted 3 November, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: iNLG 2019 camera ready version

  23. arXiv:1804.07178  [pdf, other

    cs.MA cs.AI

    Vehicle Communication Strategies for Simulated Highway Driving

    Authors: Cinjon Resnick, Ilya Kulikov, Kyunghyun Cho, Jason Weston

    Abstract: Interest in emergent communication has recently surged in Machine Learning. The focus of this interest has largely been either on investigating the properties of the learned protocol or on utilizing emergent communication to better solve problems that already have a viable solution. Here, we consider self-driving cars coordinating with each other and focus on how communication influences the agent… ▽ More

    Submitted 14 August, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

    Comments: NIPS 2017 Workshop on Emergent Communication

  24. arXiv:1608.00895  [pdf, other

    cs.LG cs.CL cs.NE

    RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

    Authors: Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney

    Abstract: In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool whic… ▽ More

    Submitted 10 January, 2017; v1 submitted 2 August, 2016; originally announced August 2016.