Skip to main content

Showing 1–16 of 16 results for author: Noroozi, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12946  [pdf

    eess.AS cs.AI cs.CL cs.LG

    Instruction Data Generation and Unsupervised Adaptation for Speech Language Models

    Authors: Vahid Noroozi, Zhehuai Chen, Somshubra Majumdar, Steve Huang, Jagadeesh Balam, Boris Ginsburg

    Abstract: In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both modalities, synthetic data generation emerges as a crucial strategy to enhance the performance of such systems and facilitate the modeling of cross-modal relationships be… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  2. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2312.17279  [pdf, other

    cs.CL eess.AS

    Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

    Authors: Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

    Abstract: In this paper, we propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture. We adapted the FastConformer architecture for streaming applications through: (1) constraining both the look-ahead and past contexts in the encoder, and (2) introducing an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively du… ▽ More

    Submitted 2 May, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: Shorter version accepted to ICASSP 2024

  4. arXiv:2309.09950  [pdf, other

    eess.AS cs.SD

    Investigating End-to-End ASR Architectures for Long Form Audio Transcription

    Authors: Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg

    Abstract: This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios. We study three categories of Automatic Speech Recognition(ASR) models based on their core architecture: (1) convolutional, (2) convolutional with squeeze-and-excitation and (3) convolutional models with attention. We selected one ASR model from each category and evaluated Word Error Rate, maxim… ▽ More

    Submitted 20 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: PrePrint. Submitted to ICASSP 2024

  5. arXiv:2305.05084  [pdf, other

    eess.AS cs.SD

    Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

    Authors: Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

    Abstract: Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel downsampling schema. The proposed model, named Fast Conformer(FC), is 2.8x faster than the original Conformer, supports scaling to Billion parameters witho… ▽ More

    Submitted 30 September, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at ASRU 2023

  6. arXiv:2105.08049  [pdf, other

    cs.CL cs.LG

    SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services

    Authors: Yang Zhang, Vahid Noroozi, Evelina Bakhturina, Boris Ginsburg

    Abstract: Dialogue state tracking is an essential part of goal-oriented dialogue systems, while most of these state tracking models often fail to handle unseen services. In this paper, we propose SGD-QA, a simple and extensible model for schema-guided dialogue state tracking based on a question answering approach. The proposed multi-pass model shares a single encoder between the domain information and dialo… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  7. arXiv:2104.02609  [pdf, other

    eess.IV cs.CV

    I-ODA, Real-World Multi-modal Longitudinal Data for OphthalmicApplications

    Authors: Nooshin Mojab, Vahid Noroozi, Abdullah Aleem, Manoj P. Nallabothula, Joseph Baker, Dimitri T. Azar, Mark Rosenblatt, RV Paul Chan, Darvin Yi, Philip S. Yu, Joelle A. Hallak

    Abstract: Data from clinical real-world settings is characterized by variability in quality, machine-type, setting, and source. One of the primary goals of medical computer vision is to develop and validate artificial intelligence (AI) based algorithms on real-world data enabling clinical translations. However, despite the exponential growth in AI based applications in healthcare, specifically in ophthalmol… ▽ More

    Submitted 29 March, 2021; originally announced April 2021.

  8. arXiv:2104.02014  [pdf, other

    cs.CL eess.AS

    SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

    Authors: Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

    Abstract: In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models. This adds complexity and limits performance, as many formatting tasks benefit from semantic information present… ▽ More

    Submitted 6 April, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: 5 pages, 1 figure. Submitted to INTERSPEECH 2021

  9. arXiv:2008.12335  [pdf, other

    cs.LG stat.ML

    A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset

    Authors: Vahid Noroozi, Yang Zhang, Evelina Bakhturina, Tomasz Kornuta

    Abstract: Dialog State Tracking (DST) is one of the most crucial modules for goal-oriented dialogue systems. In this paper, we introduce FastSGT (Fast Schema Guided Tracker), a fast and robust BERT-based model for state tracking in goal-oriented dialogue systems. The proposed model is designed for the Schema-Guided Dialogue (SGD) dataset which contains natural language descriptions for all the entities incl… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

    Comments: Accepted to the Workshop on Conversational Systems Towards Mainstream Adoption at KDD 2020

  10. arXiv:2007.12672  [pdf, other

    cs.CV

    Real-World Multi-Domain Data Applications for Generalizations to Clinical Settings

    Authors: Nooshin Mojab, Vahid Noroozi, Darvin Yi, Manoj Prabhakar Nallabothula, Abdullah Aleem, Phillip S. Yu, Joelle A. Hallak

    Abstract: With promising results of machine learning based models in computer vision, applications on medical imaging data have been increasing exponentially. However, generalizations to complex real-world clinical data is a persistent problem. Deep learning models perform well when trained on standardized datasets from artificial settings, such as clinical trials. However, real-world data is different and… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

  11. arXiv:1912.13230  [pdf, other

    cs.LG cs.AI stat.ML

    Leveraging Semi-Supervised Learning for Fairness using Neural Networks

    Authors: Vahid Noroozi, Sara Bahaadini, Samira Sheikhi, Nooshin Mojab, Philip S. Yu

    Abstract: There has been a growing concern about the fairness of decision-making systems based on machine learning. The shortage of labeled data has been always a challenging problem facing machine learning based systems. In such scenarios, semi-supervised learning has shown to be an effective way of exploiting unlabeled data to improve upon the performance of model. Notably, unlabeled data do not contain l… ▽ More

    Submitted 31 December, 2019; originally announced December 2019.

    Comments: 6 pages, 5 figures, accepted to ICMLA 2019

  12. arXiv:1811.04480  [pdf, other

    cs.LG stat.ML

    Semi-supervised Deep Representation Learning for Multi-View Problems

    Authors: Vahid Noroozi, Sara Bahaadini, Lei Zheng, Sihong Xie, Weixiang Shao, Philip S. Yu

    Abstract: While neural networks for learning representation of multi-view data have been previously proposed as one of the state-of-the-art multi-view dimension reduction techniques, how to make the representation discriminative with only a small amount of labeled data is not well-studied. We introduce a semi-supervised neural network model, named Multi-view Discriminative Neural Network (MDNN), for multi-v… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

    Comments: Accepted to IEEE Big Data 2018. 9 Pages

  13. arXiv:1805.07037  [pdf, other

    cs.IR

    MARS: Memory Attention-Aware Recommender System

    Authors: Lei Zheng, Chun-Ta Lu, Lifang He, Sihong Xie, Vahid Noroozi, He Huang, Philip S. Yu

    Abstract: In this paper, we study the problem of modeling users' diverse interests. Previous methods usually learn a fixed user representation, which has a limited ability to represent distinct interests of a user. In order to model users' various interests, we propose a Memory Attention-aware Recommender System (MARS). MARS utilizes a memory component and a novel attentional mechanism to learn deep \textit… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

  14. arXiv:1805.02296  [pdf, other

    cs.LG stat.ML

    DIRECT: Deep Discriminative Embedding for Clustering of LIGO Data

    Authors: Sara Bahaadini, Vahid Noroozi, Neda Rohani, Scott Coughlin, Michael Zevin, Aggelos K. Katsaggelos

    Abstract: In this paper, benefiting from the strong ability of deep neural network in estimating non-linear functions, we propose a discriminative embedding function to be used as a feature extractor for clustering tasks. The trained embedding function transfers knowledge from the domain of a labeled set of morphologically-distinct images, known as classes, to a new domain within which new classes can poten… ▽ More

    Submitted 6 May, 2018; originally announced May 2018.

    Comments: This work has been accepted to be presented in the 25th IEEE International Conference on Image Processing (ICIP)

  15. arXiv:1706.03692  [pdf, other

    cs.LG stat.ML

    SEVEN: Deep Semi-supervised Verification Networks

    Authors: Vahid Noroozi, Lei Zheng, Sara Bahaadini, Sihong Xie, Philip S. Yu

    Abstract: Verification determines whether two samples belong to the same class or not, and has important applications such as face and fingerprint verification, where thousands or millions of categories are present but each category has scarce labeled examples, presenting two major challenges for existing deep learning models. We propose a deep semi-supervised model named SEmi-supervised VErification Networ… ▽ More

    Submitted 14 June, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: 7 pages, 2 figures, accepted to the 2017 International Joint Conference on Artificial Intelligence (IJCAI-17)

  16. arXiv:1701.04783  [pdf, ps, other

    cs.LG cs.IR

    Joint Deep Modeling of Users and Items Using Reviews for Recommendation

    Authors: Lei Zheng, Vahid Noroozi, Philip S. Yu

    Abstract: A large amount of information exists in reviews written by users. This source of information has been ignored by most of the current recommender systems while it can potentially alleviate the sparsity problem and improve the quality of recommendations. In this paper, we present a deep model to learn item properties and user behaviors jointly from review text. The proposed model, named Deep Coopera… ▽ More

    Submitted 17 January, 2017; originally announced January 2017.

    Comments: WSDM 2017