Skip to main content

Showing 1–30 of 30 results for author: Vyas, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06251  [pdf, other

    eess.AS cs.CL

    Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

    Authors: Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu

    Abstract: As the scale of generative models continues to grow, efficient reuse and adaptation of pre-trained models have become crucial considerations. In this work, we propose Voicebox Adapter, a novel approach that integrates fine-grained conditions into a pre-trained Voicebox speech generation model using a cross-attention module. To ensure a smooth integration of newly added modules with pre-trained one… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  2. arXiv:2312.15821  [pdf, other

    cs.SD cs.LG eess.AS

    Audiobox: Unified Audio Generation with Natural Language Prompts

    Authors: Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

    Abstract: Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in sever… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  3. arXiv:2312.14556  [pdf, other

    cs.CV

    CaptainCook4D: A dataset for understanding errors in procedural activities

    Authors: Rohith Peddi, Shivvrat Arya, Bharath Challa, Likhitha Pallapothula, Akshay Vyas, Jikai Wang, Qifan Zhang, Vasundhara Komaragiri, Eric Ragan, Nicholas Ruozzi, Yu Xiang, Vibhav Gogate

    Abstract: Following step-by-step procedures is an essential component of various activities carried out by individuals in their daily lives. These procedures serve as a guiding framework that helps to achieve goals efficiently, whether it is assembling furniture or preparing a recipe. However, the complexity and duration of procedural activities inherently increase the likelihood of making errors. Understan… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted to the 2023 International Conference on Machine Learning(ICML) workshop on Data-centric Machine Learning Research(DMLR), Project Page: https://captaincook4d.github.io/captain-cook/

  4. arXiv:2310.16338  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Generative Pre-training for Speech with Flow Matching

    Authors: Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

    Abstract: Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples where generative models have shined. While generative models have been applied to different applications in speech, there… ▽ More

    Submitted 25 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  5. arXiv:2306.15687  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

    Authors: Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu

    Abstract: Large-scale generative models such as GPT and DALL-E have revolutionized the research community. These models not only generate high fidelity outputs, but are also generalists which can solve tasks not explicitly taught. In contrast, speech generative models are still primitive in terms of scale and task generalization. In this paper, we present Voicebox, the most versatile text-guided generative… ▽ More

    Submitted 19 October, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  6. arXiv:2305.13516  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling Speech Technology to 1,000+ Languages

    Authors: Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

    Abstract: Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000 languages spoken around the world. The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  7. arXiv:2304.08389  [pdf, other

    math.OC cs.LG

    Beyond first-order methods for non-convex non-concave min-max optimization

    Authors: Abhijeet Vyas, Brian Bullins

    Abstract: We propose a study of structured non-convex non-concave min-max problems which goes beyond standard first-order approaches. Inspired by the tight understanding established in recent works [Adil et al., 2022, Lin and Jordan, 2022b], we develop a suite of higher-order methods which show the improvements attainable beyond the monotone and Minty condition settings. Specifically, we provide a new under… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  8. arXiv:2301.13331  [pdf, other

    cs.AI cs.LG physics.comp-ph

    Neural Operator: Is data all you need to model the world? An insight into the impact of Physics Informed Machine Learning

    Authors: Hrishikesh Viswanath, Md Ashiqur Rahman, Abhijeet Vyas, Andrey Shor, Beatriz Medeiros, Stephanie Hernandez, Suhas Eswarappa Prameela, Aniket Bera

    Abstract: Numerical approximations of partial differential equations (PDEs) are routinely employed to formulate the solution of physics, engineering and mathematical problems involving functions of several variables, such as the propagation of heat or sound, fluid flow, elasticity, electrostatics, electrodynamics, and more. While this has led to solving many complex phenomena, there are some limitations. Co… ▽ More

    Submitted 18 September, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  9. arXiv:2209.01349  [pdf, other

    cs.NI cs.CR

    Towards the Age of Intelligent Vehicular Networks for Connected and Autonomous Vehicles in 6G

    Authors: Van-Linh Nguyen, Ren-Hung Hwang, Po-Ching Lin, Abhishek Vyas, Van-Tao Nguyen

    Abstract: Twenty-two years after the advent of the first-generation vehicular network, i.e., dedicated short-range communications (DSRC) standard/IEEE 802.11p, the vehicular technology market has become very competitive with a new player, Cellular Vehicle-to-Everything (C-V2X). Currently, C-V2X technology likely dominates the race because of the big advantages of comprehensive coverage and high throughput/r… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

  10. arXiv:2208.03505  [pdf, other

    cs.CR

    "All of them claim to be the best": Multi-perspective study of VPN users and VPN providers

    Authors: Reethika Ramesh, Anjali Vyas, Roya Ensafi

    Abstract: As more users adopt VPNs for a variety of reasons, it is important to develop empirical knowledge of their needs and mental models of what a VPN offers. Moreover, studying VPN users alone is not enough because, by using a VPN, a user essentially transfers trust, say from their network provider, onto the VPN provider. To that end, we are the first to study the VPN ecosystem from both the users' and… ▽ More

    Submitted 28 September, 2022; v1 submitted 6 August, 2022; originally announced August 2022.

    Comments: Accepted to appear at USENIX Security Symposium 2023 (32nd USENIX Security Symposium, 2023)

  11. arXiv:2205.14232  [pdf, other

    math.OC cs.LG

    Competitive Gradient Optimization

    Authors: Abhijeet Vyas, Kamyar Azizzadenesheli

    Abstract: We study the problem of convergence to a stationary point in zero-sum games. We propose competitive gradient optimization (CGO ), a gradient-based method that incorporates the interactions between the two players in zero-sum games for optimization updates. We provide continuous-time analysis of CGO and its convergence properties while showing that in the continuous limit, CGO predecessors degenera… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  12. arXiv:2204.11934  [pdf, other

    cs.LG cs.SD eess.AS

    On-demand compute reduction with stochastic wav2vec 2.0

    Authors: Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski

    Abstract: Squeeze and Efficient Wav2vec (SEW) is a recently proposed architecture that squeezes the input to the transformer encoder for compute efficient pre-training and inference with wav2vec 2.0 (W2V2) models. In this work, we propose stochastic compression for on-demand compute reduction for W2V2 models. As opposed to using a fixed squeeze factor, we sample it uniformly during training. We further intr… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: submitted to Interspeech, 2022

  13. arXiv:2104.02558  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

    Authors: Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

    Abstract: In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on thr… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  14. arXiv:2012.14252  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models

    Authors: Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

    Abstract: In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI on three different datasets. Our results show that fine-tuning with LFMMI, we consistently obtain relative WER improvements of 10% and 35.3% on the c… ▽ More

    Submitted 6 April, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

  15. arXiv:2012.03506  [pdf, other

    cs.LG cs.AI cs.SI

    Dynamic Structure Learning through Graph Neural Network for Forecasting Soil Moisture in Precision Agriculture

    Authors: Anoushka Vyas, Sambaran Bandyopadhyay

    Abstract: Soil moisture is an important component of precision agriculture as it directly impacts the growth and quality of vegetation. Forecasting soil moisture is essential to schedule the irrigation and optimize the use of water. Physics based soil moisture models need rich features and heavy computation which is not scalable. In recent literature, conventional machine learning models have been applied f… ▽ More

    Submitted 16 May, 2022; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: Accepted for publication in IJCAI 2022

  16. arXiv:2011.11972  [pdf, other

    cs.RO

    Foundations of the Socio-physical Model of Activities (SOMA) for Autonomous Robotic Agents

    Authors: Daniel Beßler, Robert Porzel, Mihai Pomarlan, Abhijit Vyas, Sebastian Höffner, Michael Beetz, Rainer Malaka, John Bateman

    Abstract: In this paper, we present foundations of the Socio-physical Model of Activities (SOMA). SOMA represents both the physical as well as the social context of everyday activities. Such tasks seem to be trivial for humans, however, they pose severe problems for artificial agents. For starters, a natural language command requesting something will leave many pieces of information necessary for performing… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  17. arXiv:2010.03466  [pdf, ps, other

    eess.AS cs.SD

    Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

    Authors: Srikanth Madikeri, Sibo Tong, Juan Zuluaga-Gomez, Apoorv Vyas, Petr Motlicek, Hervé Bourlard

    Abstract: We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch in designing model architectures. It exposes the LF-MMI cost function as an autograd function. Other capabilities of Kaldi have also been ported to Py… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

  18. arXiv:2010.01200  [pdf, other

    cs.NE

    FPGA Implementation of Simplified Spiking Neural Network

    Authors: Shikhar Gupta, Arpan Vyas, Gaurav Trivedi

    Abstract: Spiking Neural Networks (SNN) are third-generation Artificial Neural Networks (ANN) which are close to the biological neural system. In recent years SNN has become popular in the area of robotics and embedded applications, therefore, it has become imperative to explore its real-time and energy-efficient implementations. SNNs are more powerful than their predecessors because they encode temporal in… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  19. arXiv:2007.04825  [pdf, other

    cs.LG stat.ML

    Fast Transformers with Clustered Attention

    Authors: Apoorv Vyas, Angelos Katharopoulos, François Fleuret

    Abstract: Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this, we propose clustered attention, which instead of computing the attention for every query, grou… ▽ More

    Submitted 29 September, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

  20. arXiv:2007.02871  [pdf, other

    cs.CL

    DART: Open-Domain Structured Data Record to Text Generation

    Authors: Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

    Abstract: We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploi… ▽ More

    Submitted 12 April, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: NAACL 2021

  21. arXiv:2006.16236  [pdf, other

    cs.LG stat.ML

    Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

    Authors: Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret

    Abstract: Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from… ▽ More

    Submitted 31 August, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML 2020, project at https://linear-transformers.com/

  22. arXiv:2006.03316  [pdf, other

    cs.IR cs.SI

    Gandhipedia: A one-stop AI-enabled portal for browsing Gandhian literature, life-events and his social network

    Authors: Sayantan Adak, Atharva Vyas, Animesh Mukherjee, Heer Ambavi, Pritam Kadasi, Mayank Singh, Shivam Patel

    Abstract: We introduce an AI-enabled portal that presents an excellent visualization of Mahatma Gandhi's life events by constructing temporal and spatial social networks from the Gandhian literature. Applying an ensemble of methods drawn from NLTK, Polyglot and Spacy we extract the key persons and places that find mentions in Gandhi's written works. We visualize these entities and connections between them b… ▽ More

    Submitted 5 June, 2020; originally announced June 2020.

  23. arXiv:2005.00730  [pdf, other

    cs.CL cs.LG

    ESPRIT: Explaining Solutions to Physical Reasoning Tasks

    Authors: Nazneen Fatema Rajani, Rui Zhang, Yi Chern Tan, Stephan Zheng, Jeremy Weiss, Aadit Vyas, Abhijit Gupta, Caiming XIong, Richard Socher, Dragomir Radev

    Abstract: Neural networks lack the ability to reason about qualitative physics and so cannot generalize to scenarios and tasks unseen during training. We propose ESPRIT, a framework for commonsense reasoning about qualitative physics in natural language that generates interpretable descriptions of physical events. We use a two-step approach of first identifying the pivotal physical events in an environment… ▽ More

    Submitted 13 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  24. arXiv:1912.06214  [pdf, other

    cs.CL

    Encoding Knowledge Graph Entity Aliases in Attentive Neural Network for Wikidata Entity Linking

    Authors: Isaiah Onando Mulang, Kuldeep Singh, Akhilesh Vyas, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, Soren Auer

    Abstract: The collaborative knowledge graphs such as Wikidata excessively rely on the crowd to author the information. Since the crowd is not bound to a standard protocol for assigning entity titles, the knowledge graph is populated by non-standard, noisy, long or even sometimes awkward titles. The issue of long, implicit, and nonstandard entity representations is a challenge in Entity Linking (EL) approach… ▽ More

    Submitted 26 September, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Comments: 15 pages

    Journal ref: WISE 2020 (21st International Conference on Web Information Systems Engineering)

  25. Security, Privacy and Safety Risk Assessment for Virtual Reality Learning Environment Applications

    Authors: Aniket Gulhane, Akhil Vyas, Reshmi Mitra, Roland Oruche, Gabriela Hoefer, Samaikya Valluripally, Prasad Calyam, Khaza Anuarul Hoque

    Abstract: Social Virtual Reality based Learning Environments (VRLEs) such as vSocial render instructional content in a three-dimensional immersive computer experience for training youth with learning impediments. There are limited prior works that explored attack vulnerability in VR technology, and hence there is a need for systematic frameworks to quantify risks corresponding to security, privacy, and safe… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.

    Comments: Tp appear in the CCNC 2019 Conference

  26. arXiv:1809.03576  [pdf, other

    cs.LG cs.CV stat.ML

    Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers

    Authors: Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, Theodore L. Willke

    Abstract: As deep learning methods form a critical part in commercially important applications such as autonomous driving and medical diagnostics, it is important to reliably detect out-of-distribution (OOD) inputs while employing these algorithms. In this work, we propose an OOD detection algorithm which comprises of an ensemble of classifiers. We train each classifier in a self-supervised manner by leavin… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

  27. arXiv:1803.07288   

    cs.CV

    Face Recognition Techniques: A Survey

    Authors: Raunak Dave, Ankit Vyas, Nikita P Desai

    Abstract: Nowadays research has expanded to extracting auxiliary information from various biometric techniques like fingerprints, face, iris, palm and voice . This information contains some major features like gender, age, beard, mustache, scars, height, hair, skin color, glasses, weight, facial marks and tattoos. All this information contributes strongly to identification of human. The major challenges tha… ▽ More

    Submitted 30 January, 2021; v1 submitted 20 March, 2018; originally announced March 2018.

    Comments: Work in progress

  28. arXiv:1802.04538  [pdf, other

    cs.DL cs.IR

    Automated Early Leaderboard Generation From Comparative Tables

    Authors: Mayank Singh, Rajdeep Sarkar, Atharva Vyas, Pawan Goyal, Animesh Mukherjee, Soumen Chakrabarti

    Abstract: A leaderboard is a tabular presentation of performance scores of the best competing techniques that address a specific scientific problem. Manually maintained leaderboards take time to emerge, which induces a latency in performance discovery and meaningful comparison. This can delay dissemination of best practices to non-experts and practitioners. Regarding papers as proxies for techniques, we pre… ▽ More

    Submitted 19 February, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: Accepted at ECIR 2019

  29. arXiv:1708.03951  [pdf, other

    stat.ML cs.AI q-bio.QM

    Optimization of Ensemble Supervised Learning Algorithms for Increased Sensitivity, Specificity, and AUC of Population-Based Colorectal Cancer Screenings

    Authors: Anirudh Kamath, Aditya Singh, Raj Ramnani, Ayush Vyas, Jay Shenoy

    Abstract: Over 150,000 new people in the United States are diagnosed with colorectal cancer each year. Nearly a third die from it (American Cancer Society). The only approved noninvasive diagnosis tools currently involve fecal blood count tests (FOBTs) or stool DNA tests. Fecal blood count tests take only five minutes and are available over the counter for as low as \… ▽ More

    Submitted 14 August, 2017; v1 submitted 13 August, 2017; originally announced August 2017.

    Comments: 7 pages, 3 figures

  30. arXiv:1202.6677  [pdf, other

    cs.DB

    Trajectory and Policy Aware Sender Anonymity in Location Based Services

    Authors: Alin Deutsch, Richard Hull, Avinash Vyas, Kevin Keliang Zhao

    Abstract: We consider Location-based Service (LBS) settings, where a LBS provider logs the requests sent by mobile device users over a period of time and later wants to publish/share these logs. Log sharing can be extremely valuable for advertising, data mining research and network management, but it poses a serious threat to the privacy of LBS users. Sender anonymity solutions prevent a malicious attacker… ▽ More

    Submitted 29 February, 2012; originally announced February 2012.