Skip to main content

Showing 1–38 of 38 results for author: Sim, C

.
  1. arXiv:2404.09173  [pdf, other

    cs.LG cs.AI cs.CL

    TransformerFAM: Feedback attention is working memory

    Authors: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

    Abstract: While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, a… ▽ More

    Submitted 7 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 26 pages, 12 figures, 14 tables

  2. arXiv:2403.19709  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.NE

    Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

    Authors: Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara Sainath, Pedro Moreno Mengibar

    Abstract: Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures, 5 tables

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2310.04627  [pdf, other

    cs.LG

    Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

    Authors: Liam Collins, Shanshan Wu, Sewoong Oh, Khe Chai Sim

    Abstract: In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge. However, the presence of data heterogeneity across clients induces a fundamental trade-off between personalization (i.e., adaptation to a local distribution) and robustness (i.e., not forgetting previously l… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  5. arXiv:2310.00178  [pdf, other

    cs.CL eess.AS

    Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

    Authors: Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

    Abstract: Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During beam search, we boost the score of a token extension if it extends matching into a set of biasing… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  6. arXiv:2309.12963  [pdf, ps, other

    eess.AS cs.SD

    Massive End-to-end Models for Short Search Queries

    Authors: Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

    Abstract: In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters. The encoders of our models use the neural architecture of Google's universal speech model (USM), with additional funnel pooling layers to signifi… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  7. arXiv:2309.09996  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Improving Speech Recognition for African American English With Audio Classification

    Authors: Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar

    Abstract: Automatic speech recognition (ASR) systems have been shown to have large quality disparities between the language varieties they are intended or expected to recognize. One way to mitigate this is to train or fine-tune models with more representative datasets. But this approach can be hindered by limited in-domain data for training and evaluation. We propose a new way to improve the robustness of a… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  8. Effectiveness and predictability of in-network storage cache for scientific workflows

    Authors: Caitlin Sim, Kesheng Wu, Alex Sim, Inder Monga, Chin Guok, Frank Wurthwein, Diego Davila, Harvey Newman, Justas Balcas

    Abstract: Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access lat… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  9. arXiv:2306.01789  [pdf, other

    cs.SD cs.CL eess.AS

    Edit Distance based RL for RNNT decoding

    Authors: Dongseong Hwang, Changwan Ryu, Khe Chai Sim

    Abstract: RNN-T is currently considered the industry standard in ASR due to its exceptional WERs in various benchmark tests and its ability to support seamless streaming and longform transcription. However, its biggest drawback lies in the significant discrepancy between its training and inference objectives. During training, RNN-T maximizes all alignment probabilities by teacher forcing, while during infer… ▽ More

    Submitted 14 July, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures

  10. arXiv:2302.01496  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Efficient Domain Adaptation for Speech Foundation Models

    Authors: Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

    Abstract: Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowledge transfer capabilities. In this paper, we presen… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  11. arXiv:2211.08215  [pdf, ps, other

    math.OC

    Superlinear Convergence of an Interior Point Algorithm on Linear Semi-definite Feasibility Problems with Application to Linear Matrix Inequalities

    Authors: Chee-Khian Sim

    Abstract: In the literature, besides the assumption of strict complementarity, superlinear convergence of implementable polynomial-time interior point algorithms using known search directions, namely, the HKM direction, its dual or the NT direction, to solve semi-definite programs (SDPs) is shown by (i) assuming that the given SDP is nondegenerate and making modifications to these algorithms [10], or (ii) c… ▽ More

    Submitted 12 January, 2024; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: This replacement is a corrected version of the submission arXiv2211.08215 with different title and nontrivial changes

    MSC Class: 90C22; 90C51

  12. arXiv:2211.02712  [pdf, other

    cs.LG cs.SD eess.AS

    Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion

    Authors: Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman

    Abstract: Self-supervised pre-training of a speech foundation model, followed by supervised fine-tuning, has shown impressive quality improvements on automatic speech recognition (ASR) tasks. Fine-tuning separate foundation models for many downstream tasks are expensive since the foundation model is usually very big. Parameter-efficient fine-tuning methods (e.g. adapter, sparse update methods) offer an alte… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  13. arXiv:2210.05793  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

    Authors: Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

    Abstract: Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data. In this paper, we focus on knowledge distillation for the RNN-T model, which is widely used in state-of-the-art (SoTA) automatic speech recognition (ASR). Specifically, we compared using soft and hard target distillation to train l… ▽ More

    Submitted 28 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 8 pages, 2 figures

  14. arXiv:2208.03067  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

    Authors: Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

    Abstract: Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data… ▽ More

    Submitted 4 October, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  15. arXiv:2207.00706  [pdf, other

    eess.AS cs.CL cs.LG

    UserLibri: A Dataset for ASR Personalization Using Only Text

    Authors: Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

    Abstract: Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech co… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted for publication in Interspeech 2022. 9 total pages with appendix, 9 total tables, 5 total figures

  16. arXiv:2203.12668  [pdf, other

    cs.LG cs.CL

    Pseudo Label Is Better Than Human Label

    Authors: Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman

    Abstract: State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of hours of labeled speech data. Human transcription is expensive and time consuming. Factors such as the quality and consistency of the transcription can greatly affect the performance of the ASR models trained with these data. In this paper, we show that we can train a strong teacher model to produce h… ▽ More

    Submitted 1 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: 6 pages, 2 figures, 9 tables, Proceedings of INTERSPEECH 2022

  17. arXiv:2111.08137  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Joint Unsupervised and Supervised Training for Multilingual ASR

    Authors: Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

    Abstract: Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Jo… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  18. arXiv:2111.04177  [pdf, other

    math.OC

    Solution to a Monotone Inclusion Problem using the Relaxed Peaceman-Rachford Splitting Method: Convergence and its Rates

    Authors: Chee Khian Sim

    Abstract: We consider the convergence behavior using the relaxed Peaceman-Rachford splitting method to solve the monotone inclusion problem $0 \in (A + B)(u)$, where $A, B: \Re^n \rightrightarrows \Re^n$ are maximal $β$-strongly monotone operators, $n \geq 1$ and $β> 0$. Under a technical assumption, convergence of iterates using the method on the problem is proved when either $A$ or $B$ is single-valued, a… ▽ More

    Submitted 13 November, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

    Comments: 23 pages, 1 figure, 1 table

    MSC Class: 90C25; 90C06

  19. arXiv:2110.02220  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.NE

    Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

    Authors: Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Trevor Strohman, Françoise Beaufays

    Abstract: Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based… ▽ More

    Submitted 6 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, 3 tables

  20. arXiv:2110.00165  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

    Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

    Abstract: Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online A… ▽ More

    Submitted 15 February, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

  21. arXiv:2110.00155  [pdf, other

    cs.SD cs.LG eess.AS

    Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

    Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays

    Abstract: Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server data distribution can be very different from the data distribution on user devices, which could affect the model performance. There are two main challenges for on… ▽ More

    Submitted 30 September, 2021; originally announced October 2021.

    Comments: 5 pages

  22. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  23. arXiv:2106.10259  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

    Authors: Katrin Tomanek, Françoise Beaufays, Julie Cattiau, Angad Chandorkar, Khe Chai Sim

    Abstract: While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, de… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  24. arXiv:2008.09911  [pdf, ps, other

    math.OC

    A FISTA-Type First Order Algorithm on Composite Optimization Problems that is Adaptable to the Convex Situation

    Authors: Chee-Khian Sim

    Abstract: In this note, we propose a FISTA-type first order algorithm, VAR-FISTA, to solve a composite optimization problem. A distinctive feature of VAR-FISTA is its ability to exploit the convexity of the function in the problem, resulting in an improved iteration complexity when the function is convex compared to when it is nonconvex. The iteration complexity result for the convex and nonconvex case obta… ▽ More

    Submitted 22 August, 2020; originally announced August 2020.

    Comments: 13 pages, no figures

    MSC Class: 90C26; 90C25

  25. arXiv:2007.07087  [pdf

    astro-ph.IM astro-ph.EP

    The case for a multi-channel polarization sensitive LIDAR for investigation of insolation-driven ices and atmospheres

    Authors: Adrian J. Brown, Gorden Videen, Evgenij Zubko, Nicholas Heavens, Nicole-Jeanne Schlegel, Patricio Becerra, Young-Jun Choi, Colin R. Meyer, Tanya N. Harrison, Paul Hayne, Rachel W. Obbard, Tim Michaels, Michael J. Wolff, Scott Guzewich, Yongxiang Hu, Claire Newman, Christian J. Grund, Chae Kyung Sim, Peter B. Buhler, Margaret E. Landis, Timothy J. Stubbs, Aymeric Spiga, Devanshu Jha

    Abstract: All LIDAR instruments are not the same, and advancement of LIDAR technology requires an ongoing interest and demand from the community to foster further development of the required components. The purpose of this paper is to make the community aware of the need for further technical development, and the potential payoff of investing experimental time, money and thought into the next generation of… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

    Comments: 12 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:1406.0030

  26. arXiv:2001.08885  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network

    Authors: Mary Gooneratne, Khe Chai Sim, Petr Zadrazil, Andreas Kabel, Françoise Beaufays, Giovanni Motta

    Abstract: Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. However, one of the major obstacles to achieving this goal is the memory limitation of mobile devices. Reducing training memory enables models with high-dimensional weight matrices, like automatic speech recognition (ASR) models, to be trained on-device. In this paper, we prop… ▽ More

    Submitted 24 January, 2020; originally announced January 2020.

  27. arXiv:1912.09251  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

    Authors: Khe Chai Sim, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian Zhou

    Abstract: We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acq… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  28. arXiv:1909.06678  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

    Authors: Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

    Abstract: Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers. However, these systems do not always generalize well for users with very different speech characteristics. This issue can be addressed by building personalized systems that are designed to work well for each specific use… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

  29. arXiv:1907.10351  [pdf, other

    math.NA

    Energy-preserving multi-symplectic Runge-Kutta methods for Hamiltonian wave equations

    Authors: Chuchu Chen, Jialin Hong, Chol Sim, Kwang Sonwu

    Abstract: It is well-known that a numerical method which is at the same time geometric structure-preserving and physical property-preserving cannot exist in general for Hamiltonian partial differential equations. In this paper, we present a novel class of parametric multi-symplectic Runge-Kutta methods for Hamiltonian wave equations, which can also conserve energy simultaneously in a weaker sense with a sui… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Comments: 26 pages, 6 figures

  30. arXiv:1905.07010  [pdf, ps, other

    math.OC

    A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems

    Authors: Jiaming Liang, Renato D. C. Monteiro, Chee-Khian Sim

    Abstract: In this paper, we describe and establish iteration-complexity of two accelerated composite gradient (ACG) variants to solve a smooth nonconvex composite optimization problem whose objective function is the sum of a nonconvex differentiable function $ f $ with a Lipschitz continuous gradient and a simple nonsmooth closed convex function $ h $. When $f$ is convex, the first ACG variant reduces to th… ▽ More

    Submitted 5 March, 2021; v1 submitted 16 May, 2019; originally announced May 2019.

    Comments: 28 pages

  31. arXiv:1811.06621  [pdf, other

    cs.CL

    Streaming End-to-end Speech Recognition For Mobile Devices

    Authors: Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-yiin Chang, Kanishka Rao, Alexander Gruenstein

    Abstract: End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specif… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

  32. arXiv:1810.01945  [pdf, ps, other

    cs.NI cs.CR

    Generating Labeled Flow Data from MAWILab Traces for Network Intrusion Detection

    Authors: **oh Kim, Caitlin Sim, **hwan Choi

    Abstract: A growing issue in the modern cyberspace world is the direct identification of malicious activity over network connections. The boom of the machine learning industry in the past few years has led to the increasing usage of machine learning technologies, which are especially prevalent in the network intrusion detection research community. When utilizing these fairly contemporary techniques, the com… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

    Comments: 4 pages

  33. arXiv:1808.05312  [pdf, other

    cs.CL eess.AS

    Toward domain-invariant speech recognition via large scale training

    Authors: Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani

    Abstract: Current state-of-the-art automatic speech recognition systems are trained to work in specific `domains', defined based on factors like application, sampling rate and codec. When such recognizers are used in conditions that do not match the training domain, performance significantly drops. This work explores the idea of building a single domain-invariant model for varied use-cases by combining larg… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

  34. arXiv:1802.03816  [pdf, other

    cs.CL

    Understanding Recurrent Neural State Using Memory Signatures

    Authors: Skanda Koppula, Khe Chai Sim, Kean Chin

    Abstract: We demonstrate a network visualization technique to analyze the recurrent state inside the LSTMs/GRUs used commonly in language and acoustic models. Interpreting intermediate state and network activations inside end-to-end models remains an open challenge. Our method allows users to understand exactly how much and what history is encoded inside recurrent state in grapheme sequence models. Our proc… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.

    Comments: Accepted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing

  35. arXiv:1712.01541  [pdf, other

    eess.AS cs.SD

    Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

    Authors: Bo Li, Tara N. Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yonghui Wu, Kanishka Rao

    Abstract: Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS), and explore the possibility of training a sin… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: submitted to ICASSP 2018

  36. arXiv:1611.03567  [pdf, other

    math.OC

    Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators

    Authors: Renato D. C. Monteiro, Chee-Khian Sim

    Abstract: This paper considers the relaxed Peaceman-Rachford (PR) splitting method for finding an approximate solution of a monotone inclusion whose underlying operator consists of the sum of two maximal strongly monotone operators. Using general results obtained in the setting of a non-Euclidean hybrid proximal extragradient framework, we extend a previous convergence result on the iterates generated by th… ▽ More

    Submitted 5 November, 2017; v1 submitted 10 November, 2016; originally announced November 2016.

    Comments: 26 pages, 2 figures

  37. Medium Resolution Near-Infrared Spectra of the Host Galaxies of Nearby Quasars

    Authors: Huynh Anh N. Le, Soojong Pak, Myungshin Im, Min** Kim, Chae Kyung Sim, Luis C. Ho

    Abstract: We present medium resolution near-infrared host galaxy spectra of low redshift quasars, PG 0844 + 349 (z=0.064), PG 1226 + 023 (z=0.158), and PG 1426+015 (z=0.086). The observations were done by using the Infrared Camera and Spectrograph (IRCS) at the Subaru 8.2 m telescope. The full width at half maximum of the point spread function was about 0.3 arcsec by operations of an adaptive optics system,… ▽ More

    Submitted 9 June, 2014; v1 submitted 20 May, 2014; originally announced May 2014.

    Comments: 16 pages, 5 figures

  38. arXiv:1310.2771  [pdf, ps, other

    cond-mat.mes-hall

    Asymmetry in effective fields of spin-orbit torques in Pt/Co/Pt stacks

    Authors: Cheow Hin Sim, Jian Cheng Huang, Michael Tran, Kwaku Eason

    Abstract: Measurements of switching via spin-orbit coupling (SOC) mechanisms are discussed for a pair of inverted Pt/Co/Pt stacks with asymmetrical Pt thicknesses. Taking into account the planar Hall effect contribution, effective fields of spin-orbit torques (SOT) are evaluated using lock-in measurements of the first and second harmonics of the Hall voltage. Reversing the stack structure leads to significa… ▽ More

    Submitted 10 October, 2013; originally announced October 2013.