Skip to main content

Showing 1–21 of 21 results for author: Diamos, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17642  [pdf, other

    cs.CL cs.AI

    Banishing LLM Hallucinations Requires Rethinking Generalization

    Authors: Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos

    Abstract: Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2207.10062  [pdf, other

    cs.LG

    DataPerf: Benchmarks for Data-Centric AI Development

    Authors: Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman , et al. (20 additional authors not shown)

    Abstract: Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing datase… ▽ More

    Submitted 13 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  3. arXiv:2111.09344  [pdf, other

    cs.LG stat.ML

    The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

    Authors: Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, Vijay Janapa Reddi

    Abstract: The People's Speech is a free-to-download 30,000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset). The data is collected via searching the Internet for appropriately licensed audio data with existing transcriptions. We describe our data collection methodology and release our data collection… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: Part of 2021 Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks

  4. arXiv:2110.01406  [pdf

    cs.LG cs.DC cs.PF cs.SE

    MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation

    Authors: Alexandros Karargyris, Renato Umeton, Micah J. Sheller, Alejandro Aristizabal, Johnu George, Srini Bala, Daniel J. Beutel, Victor Bittorf, Akshay Chaudhari, Alexander Chowdhury, Cody Coleman, Bala Desinghu, Gregory Diamos, Debo Dutta, Diane Feddema, Grigori Fursin, Junyi Guo, Xinyuan Huang, David Kanter, Satyananda Kashyap, Nicholas Lane, Indranil Mallick, Pietro Mascagni, Virendra Mehta, Vivek Natarajan , et al. (17 additional authors not shown)

    Abstract: Medical AI has tremendous potential to advance healthcare by supporting the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving provider and patient experience. We argue that unlocking this potential requires a systematic way to measure the performance of medical AI models on large-scale heterogeneous data. To meet this need, we are building MedPerf,… ▽ More

    Submitted 28 December, 2021; v1 submitted 29 September, 2021; originally announced October 2021.

  5. arXiv:2102.11447  [pdf, other

    cs.LG

    Data Engineering for Everyone

    Authors: Vijay Janapa Reddi, Greg Diamos, Pete Warden, Peter Mattson, David Kanter

    Abstract: Data engineering is one of the fastest-growing fields within machine learning (ML). As ML becomes more common, the appetite for data grows more ravenous. But ML requires more data than individual teams of data engineers can readily produce, which presents a severe challenge to ML deployment at scale. Much like the software-engineering revolution, where mass adoption of open-source software replace… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  6. arXiv:1911.02549  [pdf, other

    cs.LG cs.PF stat.ML

    MLPerf Inference Benchmark

    Authors: Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee , et al. (22 additional authors not shown)

    Abstract: Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devic… ▽ More

    Submitted 9 May, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: ISCA 2020

  7. arXiv:1910.01500  [pdf, other

    cs.LG cs.PF stat.ML

    MLPerf Training Benchmark

    Authors: Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan , et al. (12 additional authors not shown)

    Abstract: Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits h… ▽ More

    Submitted 2 March, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: MLSys 2020

  8. arXiv:1909.01736  [pdf, other

    cs.LG

    Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

    Authors: Joel Hestness, Newsha Ardalani, Greg Diamos

    Abstract: Deep learning (DL) research yields accuracy and product improvements from both model architecture changes and scale: larger data sets and models, and more computation. For hardware design, it is difficult to predict DL model changes. However, recent prior work shows that as dataset sizes grow, DL model accuracy and model size grow predictably. This paper leverages the prior work to project the dat… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  9. arXiv:1907.04648  [pdf, other

    cs.LG

    EPNAS: Efficient Progressive Neural Architecture Search

    Authors: Yanqi Zhou, Peng Wang, Sercan Arik, Haonan Yu, Syed Zawad, Feng Yan, Greg Diamos

    Abstract: In this paper, we propose Efficient Progressive Neural Architecture Search (EPNAS), a neural architecture search (NAS) that efficiently handles large search space through a novel progressive search policy with performance prediction based on REINFORCE~\cite{Williams.1992.PG}. EPNAS is designed to search target networks in parallel, which is more scalable on parallel systems such as GPU/TPU cluster… ▽ More

    Submitted 6 July, 2019; originally announced July 2019.

  10. arXiv:1902.10162  [pdf, other

    cs.AI cs.DM cs.LG

    Coloring Big Graphs with AlphaGoZero

    Authors: Jiayi Huang, Mostofa Patwary, Gregory Diamos

    Abstract: We show that recent innovations in deep reinforcement learning can effectively color very large graphs -- a well-known NP-hard problem with clear commercial applications. Because the Monte Carlo Tree Search with Upper Confidence Bound algorithm used in AlphaGoZero can improve the performance of a given heuristic, our approach allows deep neural networks trained using high performance computing (HP… ▽ More

    Submitted 8 November, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

  11. arXiv:1810.10045  [pdf, other

    cs.CL

    Language Modeling at Scale

    Authors: Mostofa Patwary, Milind Chabbi, Heewoo Jun, Jiaji Huang, Gregory Diamos, Kenneth Church

    Abstract: We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs. LM plays a key role in many important natural language applications such as speech recognition and machine translation. Scaling up LM is important since it is widely accepted by the community that there is no data like more data. Eventually, we would like to train on terabyt… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  12. arXiv:1808.06719  [pdf, other

    cs.SD cs.LG eess.AS

    Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

    Authors: Sercan O. Arik, Heewoo Jun, Gregory Diamos

    Abstract: We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core pro… ▽ More

    Submitted 5 November, 2018; v1 submitted 20 August, 2018; originally announced August 2018.

  13. arXiv:1806.07912  [pdf, other

    cs.NE cs.AI

    Resource-Efficient Neural Architect

    Authors: Yanqi Zhou, Siavash Ebrahimi, Sercan Ö. Arık, Haonan Yu, Hairong Liu, Greg Diamos

    Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

  14. arXiv:1712.00409  [pdf, other

    cs.LG stat.ML

    Deep Learning Scaling is Predictable, Empirically

    Authors: Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

    Abstract: Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, comput… ▽ More

    Submitted 1 December, 2017; originally announced December 2017.

    Comments: 19 pages, 11 figures

  15. arXiv:1711.02782  [pdf, other

    cs.LG cs.AI stat.ML

    Block-Sparse Recurrent Neural Networks

    Authors: Sharan Narang, Eric Undersander, Gregory Diamos

    Abstract: Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their den… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

  16. arXiv:1710.03740  [pdf, other

    cs.AI cs.LG stat.ML

    Mixed Precision Training

    Authors: Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu

    Abstract: Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and g… ▽ More

    Submitted 15 February, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018

  17. arXiv:1705.08947  [pdf, other

    cs.CL

    Deep Voice 2: Multi-Speaker Neural Text-to-Speech

    Authors: Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei **, Jonathan Raiman, Yanqi Zhou

    Abstract: We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constr… ▽ More

    Submitted 20 September, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: Accepted in NIPS 2017

  18. arXiv:1704.05119  [pdf, other

    cs.LG cs.CL

    Exploring Sparsity in Recurrent Neural Networks

    Authors: Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta

    Abstract: Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The challenge is due to both the size of the model and the time it takes to evalua… ▽ More

    Submitted 6 November, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

    Comments: Published as a conference paper at ICLR 2017

  19. arXiv:1702.07825  [pdf, other

    cs.CL cs.LG cs.NE cs.SD

    Deep Voice: Real-time Neural Text-to-Speech

    Authors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi

    Abstract: We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency predi… ▽ More

    Submitted 7 March, 2017; v1 submitted 24 February, 2017; originally announced February 2017.

    Comments: Submitted to ICML 2017

  20. arXiv:1512.02595  [pdf, other

    cs.CL

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    Authors: Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, **gdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh , et al. (9 additional authors not shown)

    Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our app… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

  21. arXiv:1412.5567  [pdf, other

    cs.CL cs.LG cs.NE

    Deep Speech: Scaling up end-to-end speech recognition

    Authors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng

    Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model backgroun… ▽ More

    Submitted 19 December, 2014; v1 submitted 17 December, 2014; originally announced December 2014.