Skip to main content

Showing 1–20 of 20 results for author: Sharifi, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.13626  [pdf, other

    cs.RO eess.SY

    Safe Force/Position Tracking Control via Control Barrier Functions for Floating Base Mobile Manipulator Systems

    Authors: Maryam Sharifi, Shahab Heshmati-Alamdari

    Abstract: This paper introduces a safe force/position tracking control strategy designed for Free-Floating Mobile Manipulator Systems (MMSs) engaging in compliant contact with planar surfaces. The strategy uniquely integrates the Control Barrier Function (CBF) to manage operational limitations and safety concerns. It effectively addresses safety-critical aspects in the kinematic as well as dynamic level, su… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted for presentation at the European Control Conference (ECC) 2024, Stockholm, Sweden

  2. arXiv:2306.12925  [pdf, other

    cs.CL cs.AI cs.SD eess.AS stat.ML

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

    Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Technical report

  3. arXiv:2305.09636  [pdf, other

    cs.SD cs.LG eess.AS

    SoundStorm: Efficient Parallel Audio Generation

    Authors: Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi

    Abstract: We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consist… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  4. arXiv:2304.10892  [pdf, other

    cs.LG cs.DC eess.SY

    Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

    Authors: Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, Max Mühlhäuser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharifi

    Abstract: The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations… ▽ More

    Submitted 24 April, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

  5. arXiv:2302.03540  [pdf, other

    cs.SD eess.AS

    Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

    Authors: Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, Neil Zeghidour

    Abstract: We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a composition of two sequence-to-sequence tasks: from text to high-level semantic tokens (akin to "reading") and from semantic tokens to low-level acoustic tokens ("speaking"). Decoupling these two tasks enables… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  6. arXiv:2301.11325  [pdf, other

    cs.SD cs.LG eess.AS

    MusicLM: Generating Music From Text

    Authors: Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank

    Abstract: We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous s… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Supplementary material at https://google-research.github.io/seanet/musiclm/examples and https://kaggle.com/datasets/googleai/musiccaps

  7. arXiv:2209.03143  [pdf, other

    cs.SD cs.LG eess.AS

    AudioLM: a Language Modeling Approach to Audio Generation

    Authors: Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

    Abstract: We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenizati… ▽ More

    Submitted 25 July, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

  8. arXiv:2202.07273  [pdf, other

    cs.SD cs.LG eess.AS

    SpeechPainter: Text-conditioned Speech Inpainting

    Authors: Zalán Borsos, Matt Sharifi, Marco Tagliasacchi

    Abstract: We propose SpeechPainter, a model for filling in gaps of up to one second in speech samples by leveraging an auxiliary textual input. We demonstrate that the model performs speech inpainting with the appropriate content, while maintaining speaker identity, prosody and recording environment conditions, and generalizing to unseen speakers. Our approach significantly outperforms baselines constructed… ▽ More

    Submitted 30 March, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: Submitted to Interspeech 2022

  9. arXiv:2109.13832  [pdf, other

    eess.SY

    Compositional Construction of Abstractions for Infinite Networks of Switched Systems

    Authors: Maryam Sharifi, Abdalla Swikir, Navid Noroozi, Majid Zamani

    Abstract: We construct compositional continuous approximations for an interconnection of infinitely many discrete-time switched systems. An approximation (known as abstraction) is itself a continuous-space system, which can be used as a replacement of the original (known as concrete) system in a controller design process. Having synthesized a controller for the abstract system, the controller is refined to… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2101.08873

  10. arXiv:2109.07121  [pdf, other

    cs.RO eess.SY

    Enhancing Data-Driven Reachability Analysis using Temporal Logic Side Information

    Authors: Amr Alanwar, Frank J. Jiang, Maryam Sharifi, Dimos V. Dimarogonas, Karl H. Johansson

    Abstract: This paper presents algorithms for performing data-driven reachability analysis under temporal logic side information. In certain scenarios, the data-driven reachable sets of a robot can be prohibitively conservative due to the inherent noise in the robot's historical measurement data. In the same scenarios, we often have side information about the robot's expected motion (e.g., limits on how much… ▽ More

    Submitted 30 March, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted at the IEEE International Conference on Robotics and Automation (ICRA 2022)

  11. arXiv:2103.15604  [pdf, ps, other

    eess.SY

    Higher Order Convergent Control Barrier Functions for Leader-Follower Multi-Agent Systems under STL Tasks

    Authors: Maryam Sharifi, Dimos V. Dimarogonas

    Abstract: This paper presents control strategies based on time-varying convergent higher order control barrier functions for a class of leader-follower multi-agent systems under signal temporal logic (STL) tasks. Each agent is assigned a local STL task which may be dependent on the behavior of agents involved in other tasks. The leader has knowledge on the associated tasks and controls the performance of th… ▽ More

    Submitted 9 October, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

  12. arXiv:2103.00986  [pdf, ps, other

    eess.SY

    Fixed-Time Convergent Control Barrier Functions for Coupled Multi-Agent Systems Under STL Tasks

    Authors: Maryam Sharifi, Dimos V. Dimarogonas

    Abstract: This paper presents a control strategy based on a new notion of time-varying fixed-time convergent control barrier functions (TFCBFs) for a class of coupled multi-agent systems under signal temporal logic (STL) tasks. In this framework, each agent is assigned a local STL task regradless of the tasks of other agents. Each task may be dependent on the behavior of other agents which may cause conflic… ▽ More

    Submitted 29 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: Accepted in ECC 2021

  13. arXiv:2101.10627  [pdf, ps, other

    eess.SY

    Robust Finite-Time Consensus Subject to Unknown Communication Time Delays Based on Delay-Dependent Criteria

    Authors: Maryam Sharifi

    Abstract: In this paper, robust finite-time consensus of a group of nonlinear multi-agent systems in the presence of communication time delays is considered. In particular, appropriate delay-dependent strategies which are less conservative are suggested. Sufficient conditions for finite-time consensus in the presence of deterministic and stochastic disturbances are presented. The communication delays don't… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  14. arXiv:2101.08873  [pdf, ps, other

    eess.SY

    Compositional Construction of Abstractions for Infinite Networks of Discrete-Time Switched Systems

    Authors: Maryam Sharifi, Abdalla Swikir, Navid Noroozi, Majid Zamani

    Abstract: In this paper, we develop a compositional scheme for the construction of continuous approximations for interconnections of infinitely many discrete-time switched systems. An approximation (also known as abstraction) is itself a continuous-space system, which can be used as a replacement of the original (also known as concrete) system in a controller design process. Having designed a controller for… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

  15. arXiv:2005.13101  [pdf, other

    eess.SY

    State Estimation-Based Robust Optimal Control of Influenza Epidemics in an Interactive Human Society

    Authors: Vahid Azimi, Mojtaba Sharifi, Seyed Fakoorian, Thang Tien Nguyen, Van Van Huynh

    Abstract: This paper presents a state estimation-based robust optimal control strategy for influenza epidemics in an interactive human society in the presence of modeling uncertainties. Interactive society is influenced by the random entrance of individuals from other human societies whose effects can be modeled as a non-Gaussian noise. Since only the number of exposed and infected humans can be measured, s… ▽ More

    Submitted 11 November, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  16. arXiv:2002.01322  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Training Keyword Spotters with Limited and Synthesized Speech Data

    Authors: James Lin, Kevin Kilgour, Dominik Roblek, Matthew Sharifi

    Abstract: With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  17. arXiv:1910.11664  [pdf, other

    eess.AS cs.LG cs.SD

    SPICE: Self-supervised Pitch Estimation

    Authors: Beat Gfeller, Christian Frank, Dominik Roblek, Matt Sharifi, Marco Tagliasacchi, Mihajlo Velimirović

    Abstract: We propose a model to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation. We acknowledge the fact that obtaining ground truth annotations at the required temporal and frequency resolution is a particularly daunting task. Therefore, we propose to adopt a self-supervised learning technique, which is able to estimate pitch without any form of supervision. Th… ▽ More

    Submitted 4 September, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: Accepted to IEEE Transactions on Audio, Speech and Language Processing

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1118-1128, 2020

  18. arXiv:1812.08466  [pdf, other

    eess.AS cs.SD

    Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms

    Authors: Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi

    Abstract: We propose the Fréchet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms. We demonstrate how typical evaluation metrics for speech enhancement and blind source separation can fail to accurately measure the perceived effect of a wide variety of distortions. As an alternative, we propose adapting the Fréchet Inception Distance (FID) metric used to evalu… ▽ More

    Submitted 17 January, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

  19. arXiv:1811.00006  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition

    Authors: David B. Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharifi

    Abstract: Low power digital signal processors (DSPs) typically have a very limited amount of memory in which to cache data. In this paper we develop efficient bottleneck feature (BNF) extractors that can be run on a DSP, and retrain a baseline large-vocabulary continuous speech recognition (LVCSR) system to use these BNFs with only a minimal loss of accuracy. The small BNFs allow the DSP chip to cache more… ▽ More

    Submitted 31 October, 2018; originally announced November 2018.

    Comments: Submitted to ICASSP 2019

  20. arXiv:1711.10958  [pdf, other

    cs.SD cs.AI eess.AS

    Now Playing: Continuous low-power music recognition

    Authors: Blaise Agüera y Arcas, Beat Gfeller, Ruiqi Guo, Kevin Kilgour, Sanjiv Kumar, James Lyon, Julian Odell, Marvin Ritter, Dominik Roblek, Matthew Sharifi, Mihajlo Velimirović

    Abstract: Existing music recognition applications require a connection to a server that performs the actual recognition. In this paper we present a low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction. To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main applicatio… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: Authors are listed in alphabetical order by last name