Skip to main content

Showing 1–25 of 25 results for author: Gulati, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06555  [pdf, other

    cs.LG cs.AI cs.CL cs.PL

    An Evaluation Benchmark for Autoformalization in Lean4

    Authors: Aryan Gulati, Devanshu Ladsaria, Shubhra Mishra, Jasdeep Sidhu, Brando Miranda

    Abstract: Large Language Models (LLMs) hold the potential to revolutionize autoformalization. The introduction of Lean4, a mathematical programming language, presents an unprecedented opportunity to rigorously assess the autoformalization capabilities of LLMs. This paper introduces a novel evaluation benchmark designed for Lean4, applying it to test the abilities of state-of-the-art LLMs, including GPT-3.5,… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: To appear at ICLR 2024 as part of the Tiny Papers track

  2. arXiv:2404.05227  [pdf, ps, other

    quant-ph cs.CR

    A Note on the Common Haar State Model

    Authors: Prabhanjan Ananth, Aditya Gulati, Yao-Ting Lin

    Abstract: Common random string model is a popular model in classical cryptography with many constructions proposed in this model. We study a quantum analogue of this model called the common Haar state model, which was also studied in an independent work by Chen, Coladangelo and Sattath (arXiv 2024). In this model, every party in the cryptographic system receives many copies of one or more i.i.d Haar states.… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 17 pages, 1 figure. arXiv admin note: text overlap with arXiv:2311.18566 by other authors

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2402.18032  [pdf, other

    cs.CV

    Human Shape and Clothing Estimation

    Authors: Aayush Gupta, Aditya Gulati, Himanshu, Lakshya LNU

    Abstract: Human shape and clothing estimation has gained significant prominence in various domains, including online shop**, fashion retail, augmented reality (AR), virtual reality (VR), and gaming. The visual representation of human shape and clothing has become a focal point for computer vision researchers in recent years. This paper presents a comprehensive survey of the major works in the field, focus… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  5. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  6. arXiv:2311.02901  [pdf, other

    quant-ph cs.CC cs.CR

    Pseudorandom Isometries

    Authors: Prabhanjan Ananth, Aditya Gulati, Fatih Kaleoglu, Yao-Ting Lin

    Abstract: We introduce a new notion called ${\cal Q}$-secure pseudorandom isometries (PRI). A pseudorandom isometry is an efficient quantum circuit that maps an $n$-qubit state to an $(n+m)$-qubit state in an isometric manner. In terms of security, we require that the output of a $q$-fold PRI on $ρ$, for $ ρ\in {\cal Q}$, for any polynomial $q$, should be computationally indistinguishable from the output of… ▽ More

    Submitted 10 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  7. arXiv:2304.06277  [pdf

    cs.LG cs.AI cs.CV

    Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies

    Authors: Anand Gokul Mahalingam, Aayush Shah, Akshay Gulati, Royston Mascarenhas, Rakshitha Panduranga

    Abstract: Improving performance in multiple domains is a challenging task, and often requires significant amounts of data to train and test models. Active learning techniques provide a promising solution by enabling models to select the most informative samples for labeling, thus reducing the amount of labeled data required to achieve high performance. In this paper, we present an active learning-based fram… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 13 pages, 20 figures, draft work previously published as a medium story

  8. arXiv:2304.00171  [pdf, other

    cs.CL cs.SD eess.AS

    Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

    Authors: Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

    Abstract: Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers. With limited memory bandwidth, reading these from memory at each inference step can slow down inference. In this paper, we design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs. We explore various ideas to impr… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  9. arXiv:2211.01444  [pdf, other

    quant-ph cs.CC cs.CR

    Pseudorandom (Function-Like) Quantum State Generators: New Definitions and Applications

    Authors: Prabhanjan Ananth, Aditya Gulati, Luowen Qian, Henry Yuen

    Abstract: Pseudorandom quantum states (PRS) are efficiently constructible states that are computationally indistinguishable from being Haar-random, and have recently found cryptographic applications. We explore new definitions, new properties and applications of pseudorandom states, and present the following contributions: 1. New Definitions: We study variants of pseudorandom function-like state (PRFS) ge… ▽ More

    Submitted 9 June, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  10. arXiv:2210.01122  [pdf, other

    cs.HC cs.AI

    BIASeD: Bringing Irrationality into Automated System Design

    Authors: Aditya Gulati, Miguel Angel Lozano, Bruno Lepri, Nuria Oliver

    Abstract: Human perception, memory and decision-making are impacted by tens of cognitive biases and heuristics that influence our actions and decisions. Despite the pervasiveness of such biases, they are generally not leveraged by today's Artificial Intelligence (AI) systems that model human behavior and interact with humans. In this theoretical paper, we claim that the future of human-machine collaboration… ▽ More

    Submitted 1 December, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 14 pages, 1 figure; Accepted for presentation at the AAAI Fall Symposium 2022 on Thinking Fast and Slow and Other Cognitive Theories in AI. Corrected typos; v3: Updated figure 1, added table 4

  11. arXiv:2207.02107  [pdf, other

    cs.MA

    EasyABM: a lightweight and easy to use heterogeneous agent-based modelling tool written in Julia

    Authors: Renu Solanki, Monisha Khanna, Shailly Anand, Anita Gulati, Prateek Kumar, Munendra Kumar, Dushyant Kumar

    Abstract: Agent based modelling is a computational approach that aims to understand the behaviour of complex systems through simplified interactions of programmable objects in computer memory called agents. Agent based models (ABMs) are predominantly used in fields of biology, ecology, social sciences and economics where the systems of interest often consist of several interacting entities. In this work, we… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: 18 pages, 7 figures

  12. arXiv:2110.10329  [pdf, other

    cs.CL cs.LG

    SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

    Authors: Ankur Bapna, Yu-an Chung, Nan Wu, Anmol Gulati, Ye Jia, Jonathan H. Clark, Melvin Johnson, Jason Riesa, Alexis Conneau, Yu Zhang

    Abstract: Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-trai… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  13. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  14. arXiv:2104.14830  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling End-to-End Models for Large-Scale Multilingual ASR

    Authors: Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai

    Abstract: Building ASR models across many languages is a challenging multi-task learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource to low resource languages. However, degradations on high resource languages are commonly observed due to interference from the heterogeneous multilingual data and reduction in per-language capacity.… ▽ More

    Submitted 11 September, 2021; v1 submitted 30 April, 2021; originally announced April 2021.

    Comments: ASRU 2021

  15. arXiv:2101.06914  [pdf, other

    cs.CY cs.SI

    Capitol (Pat)riots: A comparative study of Twitter and Parler

    Authors: Hitkul, Avinash Prabhu, Dipanwita Guhathakurta, Jivitesh jain, Mallika Subramanian, Manvith Reddy, Shradha Sehgal, Tanvi Karandikar, Amogh Gulati, Udit Arora, Rajiv Ratn Shah, Ponnurangam Kumaraguru

    Abstract: On 6 January 2021, a mob of right-wing conservatives stormed the USA Capitol Hill interrupting the session of congress certifying 2020 Presidential election results. Immediately after the start of the event, posts related to the riots started to trend on social media. A social media platform which stood out was a free speech endorsing social media platform Parler; it is being claimed as the platfo… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  16. arXiv:2011.10978  [pdf, ps, other

    math.NT cs.CC

    On algorithms to find p-ordering

    Authors: Aditya Gulati, Sayak Chakrabarti, Rajat Mittal

    Abstract: The concept of p-ordering for a prime p was introduced by Manjul Bhargava (in his PhD thesis) to develop a generalized factorial function over an arbitrary subset of integers. This notion of p-ordering provides a representation of polynomials modulo prime powers, and has been used to prove properties of roots sets modulo prime powers. We focus on the complexity of finding a p-ordering given a prim… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: 26 pages

  17. arXiv:2011.10798  [pdf, other

    eess.AS cs.SD

    A Better and Faster End-to-End Model for Streaming ASR

    Authors: Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

    Abstract: End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this i… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: Accepted in ICASSP 2021

  18. Interleaving Fast and Slow Decision Making

    Authors: Aditya Gulati, Sarthak Soni, Shrisha Rao

    Abstract: The "Thinking, Fast and Slow" paradigm of Kahneman proposes that we use two different styles of thinking -- a fast and intuitive System 1 for certain tasks, along with a slower but more analytical System 2 for others. While the idea of using this two-system style of thinking is gaining popularity in AI and robotics, our work considers how to interleave the two styles of decision-making, i.e., how… ▽ More

    Submitted 26 March, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: 7 pages, 11 figures; typos corrected, references added

  19. arXiv:2010.11148  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

    Authors: Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

    Abstract: Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing approaches including Early and Late Penalties and Constrained Alignments penalize emission delay by manipulating per-token or per-frame probability prediction i… ▽ More

    Submitted 3 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted in ICASSP 2021

  20. arXiv:2010.06030  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

    Authors: Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang

    Abstract: Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses. In this work, we propose a unified framework, Dual-mode ASR, to train a single end-to-end ASR model with shared weights for both streaming and full-context speech reco… ▽ More

    Submitted 27 January, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: Accepted in ICLR 2021

  21. arXiv:2009.05566  [pdf, other

    cs.CR cs.LG

    Accelerating 2PC-based ML with Limited Trusted Hardware

    Authors: Muqsit Nawaz, Aditya Gulati, Kunlong Liu, Vishwajeet Agrawal, Prabhanjan Ananth, Trinabh Gupta

    Abstract: This paper describes the design, implementation, and evaluation of Otak, a system that allows two non-colluding cloud providers to run machine learning (ML) inference without knowing the inputs to inference. Prior work for this problem mostly relies on advanced cryptography such as two-party secure computation (2PC) protocols that provide rigorous guarantees but suffer from high resource overhead.… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: 19 pages

  22. arXiv:2005.10627  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Dynamic Sparsity Neural Networks for Automatic Speech Recognition

    Authors: Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang

    Abstract: In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different sparsity levels usually need to be separately trained and deployed to heterogeneous target hardware with different resource specifications and for applications that h… ▽ More

    Submitted 8 February, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  23. arXiv:2005.08100  [pdf, other

    eess.AS cs.LG cs.SD

    Conformer: Convolution-augmented Transformer for Speech Recognition

    Authors: Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

    Abstract: Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution ne… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  24. arXiv:2005.05513  [pdf, other

    cs.CL cs.CY cs.SI

    Psychometric Analysis and Coupling of Emotions Between State Bulletins and Twitter in India during COVID-19 Infodemic

    Authors: Baani Leen Kaur Jolly, Palash Aggrawal, Amogh Gulati, Amarjit Singh Sethi, Ponnurangam Kumaraguru, Tavpritesh Sethi

    Abstract: COVID-19 infodemic has been spreading faster than the pandemic itself. The misinformation riding upon the infodemic wave poses a major threat to people's health and governance systems. Since social media is the largest source of information, managing the infodemic not only requires mitigating of misinformation but also an early understanding of psychological patterns resulting from it. During the… ▽ More

    Submitted 13 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  25. arXiv:2005.03191  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

    Authors: Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu

    Abstract: Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel CNN-RNN-transducer architecture, which we call ContextNet. ContextNet features a fully convolutional encoder that incorporates global context information into… ▽ More

    Submitted 15 May, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020