Skip to main content

Showing 1–18 of 18 results for author: Wong, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2312.12153  [pdf, other

    cs.SD eess.AS

    Noise robust distillation of self-supervised speech models via correlation metrics

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

    Abstract: Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Te… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 6 pages

  3. arXiv:2306.02719  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Multiple output samples per input in a single-output Gaussian process

    Authors: Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen

    Abstract: The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This paper proposes to generalise the GP to allow for these multiple output samples in the training set, and thus make use of available output uncertainty… ▽ More

    Submitted 25 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: This paper is presented in the "Symposium for Celebrating 40 Years of Bayesian Learning in Speech and Language Processing and Beyond", which is a satellite event of the ASRU workshop, on 20 December 2023. https://bayesian40.github.io/

  4. arXiv:2305.18881  [pdf, other

    eess.AS

    MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization

    Authors: Victoria Y. H. Chua, Hexin Liu, Leibny Paola Garcia Perera, Fei Ting Woon, **yi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles

    Abstract: To enhance the reliability and robustness of language identification (LID) and language diarization (LD) systems for heterogeneous populations and scenarios, there is a need for speech processing models to be trained on datasets that feature diverse language registers and speech patterns. We present the MERLIon CCS challenge, featuring a first-of-its-kind Zoom video call dataset of parent-child sh… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023, 5 pages, 2 figures, 3 tables

  5. arXiv:2210.11923  [pdf, other

    cs.CR eess.SY

    RollBack: A New Time-Agnostic Replay Attack Against the Automotive Remote Keyless Entry Systems

    Authors: Levente Csikor, Hoon Wei Lim, Jun Wen Wong, Soundarya Ramesh, Rohini Poolat Parameswarath, Mun Choon Chan

    Abstract: Today's RKE systems implement disposable rolling codes, making every key fob button press unique, effectively preventing simple replay attacks. However, a prior attack called RollJam was proven to break all rolling code-based systems in general. By a careful sequence of signal jamming, capturing, and replaying, an attacker can become aware of the subsequent valid unlock signal that has not been us… ▽ More

    Submitted 14 September, 2022; originally announced October 2022.

    Comments: 24 pages, 5 figures Under submission to a journal

    Journal ref: ACM Transactions on Cyber-Physical Systems, 2024

  6. arXiv:2210.01158  [pdf, other

    eess.SP

    An Analysis of RF Transfer Learning Behavior Using Synthetic Data

    Authors: Lauren J. Wong, Sean McPherson, Alan J. Michaels

    Abstract: Transfer learning (TL) techniques, which leverage prior knowledge gained from data with different distributions to achieve higher performance and reduced training time, are often used in computer vision (CV) and natural language processing (NLP), but have yet to be fully utilized in the field of radio frequency machine learning (RFML). This work systematically evaluates how radio frequency (RF) TL… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.08329

  7. arXiv:2206.08329  [pdf, other

    eess.SP

    Assessing the Value of Transfer Learning Metrics for RF Domain Adaptation

    Authors: Lauren J. Wong, Sean McPherson, Alan J. Michaels

    Abstract: The use of transfer learning (TL) techniques has become common practice in fields such as computer vision (CV) and natural language processing (NLP). Leveraging prior knowledge gained from data with different distributions, TL offers higher performance and reduced training time, but has yet to be fully utilized in applications of machine learning (ML) and deep learning (DL) techniques to applicati… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  8. arXiv:2203.11903  [pdf

    cs.LG cs.CV eess.IV

    Enabling faster and more reliable sonographic assessment of gestational age through machine learning

    Authors: Chace Lee, Angelica Willis, Christina Chen, Marcin Sieniek, Akib Uddin, Jonny Wong, Rory Pilgrim, Katherine Chou, Daniel Tse, Shravya Shetty, Ryan G. Gomes

    Abstract: Fetal ultrasounds are an essential part of prenatal care and can be used to estimate gestational age (GA). Accurate GA assessment is important for providing appropriate prenatal care throughout pregnancy and identifying complications such as fetal growth disorders. Since derivation of GA from manual fetal biometry measurements (head, abdomen, femur) are operator-dependent and time-consuming, there… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  9. arXiv:2203.10139  [pdf

    cs.LG cs.AI cs.CV eess.IV

    AI system for fetal ultrasound in low-resource settings

    Authors: Ryan G. Gomes, Bellington Vwalika, Chace Lee, Angelica Willis, Marcin Sieniek, Joan T. Price, Christina Chen, Margaret P. Kasaro, James A. Taylor, Elizabeth M. Stringer, Scott Mayer McKinney, Ntazana Sindano, George E. Dahl, William Goodnight III, Justin Gilmer, Benjamin H. Chi, Charles Lau, Terry Spitz, T Saensuksopa, Kris Liu, Jonny Wong, Rory Pilgrim, Akib Uddin, Greg Corrado, Lily Peng , et al. (4 additional authors not shown)

    Abstract: Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  10. arXiv:2109.10598  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Diarisation using location tracking with agglomerative clustering

    Authors: Jeremy H. M. Wong, Igor Abramovski, Xiong Xiao, Yifan Gong

    Abstract: Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes to relax this assumption, by explicitly modelling the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framewo… ▽ More

    Submitted 23 September, 2021; v1 submitted 22 September, 2021; originally announced September 2021.

  11. arXiv:2105.01644  [pdf, other

    eess.SY econ.GN

    Market Potential for CO$_2$ Removal and Sequestration from Renewable Natural Gas Production in California

    Authors: Jun Wong, Jonathan Santoso, Marjorie Went, Daniel Sanchez

    Abstract: Bioenergy with Carbon Capture and Sequestration (BECCS) is critical for stringent climate change mitigation, but is commercially and technologically immature and resource-intensive. In California, state and federal fuel and climate policies can drive first-markets for BECCS. We develop a spatially explicit optimization model to assess niche markets for renewable natural gas (RNG) production with c… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: 25 pages, 6 figures

  12. arXiv:2101.01239  [pdf, other

    eess.SP

    Explainable Neural Network-based Modulation Classification via Concept Bottleneck Models

    Authors: Lauren J. Wong, Sean McPherson

    Abstract: While RFML is expected to be a key enabler of future wireless standards, a significant challenge to the widespread adoption of RFML techniques is the lack of explainability in deep learning models. This work investigates the use of CB models as a means to provide inherent decision explanations in the context of DL-based AMC. Results show that the proposed approach not only meets the performance of… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

  13. arXiv:2010.00432  [pdf, other

    eess.SP

    The RFML Ecosystem: A Look at the Unique Challenges of Applying Deep Learning to Radio Frequency Applications

    Authors: Lauren J. Wong, William H. Clark IV, Bryse Flowers, R. Michael Buehrer, Alan J. Michaels, William C. Headley

    Abstract: While deep machine learning technologies are now pervasive in state-of-the-art image recognition and natural language processing applications, only in recent years have these technologies started to sufficiently mature in applications related to wireless communications. In particular, recent research has shown deep machine learning to be an enabling technology for cognitive radio applications as w… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

  14. arXiv:2009.08563  [pdf, other

    eess.IV cs.CV cs.LG

    SCREENet: A Multi-view Deep Convolutional Neural Network for Classification of High-resolution Synthetic Mammographic Screening Scans

    Authors: Saeed Seyyedi, Margaret J. Wong, Debra M. Ikeda, Curtis P. Langlotz

    Abstract: Purpose: To develop and evaluate the accuracy of a multi-view deep learning approach to the analysis of high-resolution synthetic mammograms from digital breast tomosynthesis screening cases, and to assess the effect on accuracy of image resolution and training set size. Materials and Methods: In a retrospective study, 21,264 screening digital breast tomosynthesis (DBT) exams obtained at our insti… ▽ More

    Submitted 25 September, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

  15. arXiv:2008.04874  [pdf, other

    eess.SP

    Classification of Radio Signals Using Truncated Gaussian Discriminant Analysis of Convolutional Neural Network-Derived Features

    Authors: J. B. Persons, Lauren J. Wong, W. Chris Headley, Michael C. Fowler

    Abstract: To improve the utility and scalability of distributed radio frequency (RF) sensor and communication networks, reduce the need for convolutional neural network (CNN) retraining, and efficiently share learned information about signals, we examined a supervised bootstrap** approach for RF modulation classification. We show that CNN-bootstrapped features of new and existing modulation classes can be… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: Under peer review as of 11 August 2020. 11 pages, 13 figures

  16. arXiv:2003.07482  [pdf, other

    eess.AS cs.CL cs.SD

    High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

    Authors: **yu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong

    Abstract: While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved. In this paper, we detail our recent efforts to improve conventional hybrid LST… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: Accepted by ICASSP 2020

  17. arXiv:1808.02369  [pdf, other

    eess.SP

    Emitter Identification Using CNN IQ Imbalance Estimators

    Authors: Lauren J. Wong, William C. Headley, Alan J. Michaels

    Abstract: Specific Emitter Identification is the association of a received signal to a unique emitter, and is made possible by the naturally occurring and unintentional characteristics an emitter imparts onto each transmission, known as its radio frequency fingerprint. This work presents an approach for identifying emitters using Convolutional Neural Networks to estimate the IQ imbalance parameters of each… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

  18. arXiv:1802.00254  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

    Authors: Yu Wang, Xie Chen, Mark Gales, Anton Ragni, Jeremy Wong

    Abstract: State-of-the-art English automatic speech recognition systems typically use phonetic rather than graphemic lexicons. Graphemic systems are known to perform less well for English as the map** from the written form to the spoken form is complicated. However, in recent years the representational power of deep-learning based acoustic models has improved, raising interest in graphemic acoustic models… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

    Comments: 5 pages, 6 tables, to appear in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)