Skip to main content

Showing 1–50 of 63 results for author: Ma, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.18739  [pdf, other

    cs.NI eess.SP

    FlocOff: Data Heterogeneity Resilient Federated Learning with Communication-Efficient Edge Offloading

    Authors: Mulei Ma, Chenyu Gong, Liekang Zeng, Yang Yang, Liantao Wu

    Abstract: Federated Learning (FL) has emerged as a fundamental learning paradigm to harness massive data scattered at geo-distributed edge devices in a privacy-preserving way. Given the heterogeneous deployment of edge devices, however, their data are usually Non-IID, introducing significant challenges to FL including degraded training accuracy, intensive communication costs, and high computing complexity.… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2404.06674  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

    Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yu** Wang, Yuxuan Wang, Mingbo Ma

    Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  3. arXiv:2404.04904  [pdf, other

    cs.SD cs.AI eess.AS

    Cross-Domain Audio Deepfake Detection: Dataset and Analysis

    Authors: Yuang Li, Min Zhang, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Hao Yang

    Abstract: Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  4. arXiv:2403.02039  [pdf, other

    eess.SY

    A Frequency-Domain Approach for Enhanced Performance and Task Flexibility in Finite-Time ILC

    Authors: Max van Haren, Kentaro Tsurumoto, Masahiro Mae, Lennart Blanken, Wataru Ohnishi, Tom Oomen

    Abstract: Iterative learning control (ILC) is capable of improving the tracking performance of repetitive control systems by utilizing data from past iterations. The aim of this paper is to achieve both task flexibility, which is often achieved by ILC with basis functions, and the performance of frequency-domain ILC, with an intuitive design procedure. The cost function of norm-optimal ILC is determined tha… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  5. arXiv:2402.12482  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech

    Authors: Adam Sabra, Cyprian Wronka, Michelle Mao, Samer Hijazi

    Abstract: As more speech technologies rely on a supervised deep learning approach with clean speech as the ground truth, a methodology to onboard said speech at scale is needed. However, this approach needs to minimize the dependency on human listening and annotation, only requiring a human-in-the-loop when needed. In this paper, we address this issue by outlining Speech Enhancement-based Curation Pipeline… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

  6. arXiv:2311.07613  [pdf

    eess.SY cs.LG math.DS

    A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements

    Authors: Mason Ma, Jiajie Wu, Chase Post, Tony Shi, **gang Yi, Tony Schmitz, Hong Wang

    Abstract: This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-info… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  7. arXiv:2311.00332  [pdf, other

    q-bio.TO cs.CV eess.IV

    SDF4CHD: Generative Modeling of Cardiac Anatomies with Congenital Heart Defects

    Authors: Fanwei Kong, Sascha Stocker, Perry S. Choi, Michael Ma, Daniel B. Ennis, Alison Marsden

    Abstract: Congenital heart disease (CHD) encompasses a spectrum of cardiovascular structural abnormalities, often requiring customized treatment plans for individual patients. Computational modeling and analysis of these unique cardiac anatomies can improve diagnosis and treatment planning and may ultimately lead to improved outcomes. Deep learning (DL) methods have demonstrated the potential to enable effi… ▽ More

    Submitted 8 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  8. arXiv:2310.15407  [pdf, ps, other

    eess.SY eess.SP

    Finite-Time Adaptive Fuzzy Tracking Control for Nonlinear State Constrained Pure-Feedback Systems

    Authors: Ju Wu, Tong Wang, Min Ma

    Abstract: This paper investigates the finite-time adaptive fuzzy tracking control problem for a class of pure-feedback system with full-state constraints. With the help of Mean-Value Theorem, the pure-feedback nonlinear system is transformed into strict-feedback case. By employing finite-time-stable like function and state transformation for output tracking error, the output tracking error converges to a pr… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  9. arXiv:2310.08804  [pdf, other

    eess.SP

    Spiking Semantic Communication for Feature Transmission with HARQ

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

    Abstract: In Collaborative Intelligence (CI), the Artificial Intelligence (AI) model is divided between the edge and the cloud, with intermediate features being sent from the edge to the cloud for inference. Several deep learning-based Semantic Communication (SC) models have been proposed to reduce feature transmission overhead and mitigate channel noise interference. Previous research has demonstrated that… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  10. arXiv:2309.10567  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multimodal Modeling For Spoken Language Identification

    Authors: Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

    Abstract: Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  11. arXiv:2308.01317  [pdf

    cs.CV eess.IV

    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

    Authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Atilla Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden , et al. (3 additional authors not shown)

    Abstract: In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach… ▽ More

    Submitted 7 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  12. arXiv:2308.00393  [pdf, other

    cs.LG eess.SP

    A Survey of Time Series Anomaly Detection Methods in the AIOps Domain

    Authors: Zhenyu Zhong, Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang, Yongqian Sun, Qingwei Lin, Yuzhi Zhang, Dan Pei

    Abstract: Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  13. arXiv:2307.10982  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    MASR: Multi-label Aware Speech Representation

    Authors: Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth

    Abstract: In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Multi-label Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables th… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted at ASRU 2023

  14. arXiv:2307.04327  [pdf

    cs.RO eess.SY

    Legal Decision-making for Highway Automated Driving

    Authors: Xiaohan Ma, Wenhao Yu, Chengxiang Zhao, Changjun Wang, Wenhui Zhou, Guangming Zhao, Mingyue Ma, Weida Wang, Lin Yang, Rui Mu, Hong Wang, Jun Li

    Abstract: Compliance with traffic laws is a fundamental requirement for human drivers on the road, and autonomous vehicles must adhere to traffic laws as well. However, current autonomous vehicles prioritize safety and collision avoidance primarily in their decision-making and planning, which will lead to misunderstandings and distrust from human drivers and may even result in accidents in mixed traffic flo… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 14 pages, 17 figures

  15. arXiv:2306.17697  [pdf, ps, other

    eess.SP

    Analysis of Oversampling in Uplink Massive MIMO-OFDM with Low-Resolution ADCs

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Italo Atzeni, Markku Juntti

    Abstract: Low-resolution analog-to-digital converters (ADCs) have emerged as an efficient solution for massive multiple-input multiple-output (MIMO) systems to reap high data rates with reasonable power consumption and hardware complexity. In this paper, we analyze the performance of oversampling in uplink massive MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) systems with low-resolution ADCs.… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: 5 papges, 5 figures, to be appeared in IEEE SPAWC2023

  16. arXiv:2306.10232   

    cs.NI eess.SP

    Multi-Task Offloading via Graph Neural Networks in Heterogeneous Multi-access Edge Computing

    Authors: Mulei Ma

    Abstract: In the rapidly evolving field of Heterogeneous Multi-access Edge Computing (HMEC), efficient task offloading plays a pivotal role in optimizing system throughput and resource utilization. However, existing task offloading methods often fall short of adequately modeling the dependency topology relationships between offloaded tasks, which limits their effectiveness in capturing the complex interdepe… ▽ More

    Submitted 30 May, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Insufficient completion, there are some errors in the current version

  17. arXiv:2306.04374  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Label Aware Speech Representation Learning For Language Identification

    Authors: Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

    Abstract: Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using raw data. In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-train… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted at Interspeech 2023

  18. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yu** Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  19. Beam Squint Analysis and Mitigation via Hybrid Beamforming Design in THz Communications

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Markku Juntti

    Abstract: We investigate the beam squint effect in uniform planar arrays (UPAs) and propose an efficient hybrid beamforming (HBF) design to mitigate the beam squint in multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) systems operating at terahertz band. We first analyze the array gain and derive the closed-form beam squint ratio that characterizes the severity of the bea… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 6 pages, 7 figures, to be appeared in IEEE ICC2023

  20. arXiv:2303.03470  [pdf, other

    cs.CR eess.SY

    Partial-Information, Longitudinal Cyber Attacks on LiDAR in Autonomous Vehicles

    Authors: R. Spencer Hallyburton, Qingzhao Zhang, Z. Morley Mao, Miroslav Pajic

    Abstract: What happens to an autonomous vehicle (AV) if its data are adversarially compromised? Prior security studies have addressed this question through mostly unrealistic threat models, with limited practical relevance, such as white-box adversarial learning or nanometer-scale laser aiming and spoofing. With growing evidence that cyber threats pose real, imminent danger to AVs and cyber-physical systems… ▽ More

    Submitted 8 December, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  21. arXiv:2303.01723  [pdf, other

    cs.IT cs.AI eess.SP

    AI-Empowered Hybrid MIMO Beamforming

    Authors: Nir Shlezinger, Mengyuan Ma, Ortal Lavi, Nhan Thanh Nguyen, Yonina C. Eldar, Markku Juntti

    Abstract: Hybrid multiple-input multiple-output (MIMO) is an attractive technology for realizing extreme massive MIMO systems envisioned for future wireless communications in a scalable and power-efficient manner. However, the fact that hybrid MIMO systems implement part of their beamforming in analog and part in digital makes the optimization of their beampattern notably more challenging compared with conv… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  22. arXiv:2302.12041  [pdf, other

    cs.IT eess.SP

    Deep Unfolding Hybrid Beamforming Designs for THz Massive MIMO Systems

    Authors: Nhan Thanh Nguyen, Mengyuan Ma, Nir Shlezinger, Yonina C. Eldar, A. L. Swindlehurst, Markku Juntti

    Abstract: Hybrid beamforming (HBF) is a key enabler for wideband terahertz (THz) massive multiple-input multiple-output (mMIMO) communications systems. A core challenge with designing HBF systems stems from the fact their application often involves a non-convex, highly complex optimization of large dimensions. In this paper, we propose HBF schemes that leverage data to enable efficient designs for both the… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: This paper has been submitted to IEEE Transaction on Signal Processing

  23. arXiv:2212.05751  [pdf, other

    eess.AS

    Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

    Authors: Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yu** Wang, Yuxuan Wang

    Abstract: The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and data augmentation. Previous methods rely on reference utterances in the inference phase or are unable to preserve speaker identity. To address these issues, we pr… ▽ More

    Submitted 10 August, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted by INTERSPEECH 2023

  24. arXiv:2210.06890  [pdf, ps, other

    eess.SP

    Switch-based Hybrid Beamforming Transceiver Design for Wideband Communications with Beam Squint

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Markku Juntti

    Abstract: Hybrid beamforming (HBF) transceiver architectures based on frequency-independent phase shifters (PS-HBF) are sensitive to the phases and physical directions with limited capability to compensate for the detrimental effects of the beam squint. Motivated by the fact that switches are phase-independent and more power/cost efficient than PSs, we consider the switch-based HBF (SW-HBF) for wideband lar… ▽ More

    Submitted 20 November, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 15 pages, 15 figures

  25. arXiv:2210.06836  [pdf, other

    eess.SP

    SNN-SC: A Spiking Semantic Communication Framework for Feature Transmission

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan, Yonghong Tian

    Abstract: In Collaborative Intelligence (CI), Artificial Intelligence (AI) models are split between edge devices and cloud. Features extracted from input on edge devices are transmitted to the cloud for subsequent tasks. Extracting task-related and compact information is critical when transmission bandwidth is limited. In this paper, we propose a task-oriented Semantic Communication (SC) framework (SNN-SC)… ▽ More

    Submitted 17 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

  26. arXiv:2210.06747  [pdf, other

    eess.IV cs.CV

    DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation

    Authors: Lizhi Bai, Jun Yang, Chunqi Tian, Yaoru Sun, Maoyu Mao, Yanjun Xu, Weirong Xu

    Abstract: Combining RGB images and the corresponding depth maps in semantic segmentation proves the effectiveness in the past few years. Existing RGB-D modal fusion methods either lack the non-linear feature fusion ability or treat both modal images equally, regardless of the intrinsic distribution gap or information loss. Here we find that depth maps are suitable to provide intrinsic fine-grained patterns… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  27. arXiv:2208.02792  [pdf

    cs.RO eess.SY

    A Cooperative Perception Environment for Traffic Operations and Control

    Authors: Hanlin Chen, Brian Liu, Xumiao Zhang, Feng Qian, Z. Morley Mao, Yiheng Feng

    Abstract: Existing data collection methods for traffic operations and control usually rely on infrastructure-based loop detectors or probe vehicle trajectories. Connected and automated vehicles (CAVs) not only can report data about themselves but also can provide the status of all detected surrounding vehicles. Integration of perception data from multiple CAVs as well as infrastructure sensors (e.g., LiDAR)… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  28. Constellation Design for Deep Joint Source-Channel Coding

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

    Abstract: Deep learning-based joint source-channel coding (JSCC) has shown excellent performance in image and feature transmission. However, the output values of the JSCC encoder are continuous, which makes the constellation of modulation complex and dense. It is hard and expensive to design radio frequency chains for transmitting such full-resolution constellation points. In this paper, two methods of mapp… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  29. arXiv:2205.12446  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

    Authors: Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna

    Abstract: We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Languag… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  30. arXiv:2205.03524  [pdf, other

    eess.IV cs.CV

    Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

    Authors: Xiaoqian Xu, Pengxu Wei, Weikai Chen, Mingzhi Mao, Liang Lin, Guanbin Li

    Abstract: Due to the sophisticated imaging process, an identical scene captured by different cameras could exhibit distinct imaging patterns, introducing distinct proficiency among the super-resolution (SR) models trained on images from different devices. In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  31. arXiv:2205.03122  [pdf

    physics.med-ph eess.IV physics.optics

    Ultrathin, high-speed, all-optical photoacoustic endomicroscopy probe for guiding minimally invasive surgery

    Authors: Tianrui Zhao, Truc Thuy Pham, Christian Baker, Michelle T. Ma, Sebastien Ourselin, Tom Vercauteren, Edward Zhang, Paul C. Beard, Wenfeng Xia

    Abstract: Photoacoustic (PA) endoscopy has shown significant potential for clinical diagnosis and surgical guidance. Multimode fibres (MMFs) are becoming increasing attractive for the development of miniature endoscopy probes owing to ultrathin size, low cost and diffraction-limited spatial resolution enabled by wavefront sha**. However, current MMF-based PA endomicroscopy probes are either limited by a b… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  32. arXiv:2203.09690  [pdf, other

    eess.AS cs.CL cs.SD

    A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

    Authors: He Bai, Renjie Zheng, Junkun Chen, Xintong Li, Mingbo Ma, Liang Huang

    Abstract: Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-… ▽ More

    Submitted 18 June, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Accepted by ICML 2022, 12 pages, 10 figures

  33. arXiv:2111.01544  [pdf

    eess.IV cs.CV physics.med-ph

    Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study

    Authors: Dazhou Guo, Jia Ge, Xianghua Ye, Senxiang Yan, Yi Xin, Yuchen Song, Bing-shen Huang, Tsung-Min Hung, Zhuotun Zhu, Ling Peng, Yan** Ren, Rui Liu, Gong Zhang, Mengyuan Mao, Xiaohua Chen, Zhongjie Lu, Wenxiang Li, Yuzhen Chen, Lingyun Huang, **g Xiao, Adam P. Harrison, Le Lu, Chien-Yu Lin, Dakai **, Tsung-Ying Ho

    Abstract: Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose di… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  34. arXiv:2110.15561  [pdf, ps, other

    cs.CV eess.IV

    Exposing Deepfake with Pixel-wise AR and PPG Correlation from Faint Signals

    Authors: Maoyu Mao, Jun Yang

    Abstract: Deepfake poses a serious threat to the reliability of judicial evidence and intellectual property protection. In spite of an urgent need for Deepfake identification, existing pixel-level detection methods are increasingly unable to resist the growing realism of fake videos and lack generalization. In this paper, we propose a scheme to expose Deepfake through faint signals hidden in face videos. Th… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  35. arXiv:2110.09744  [pdf, ps, other

    eess.IV cs.CV cs.LG eess.SP

    Spectral Variability Augmented Sparse Unmixing of Hyperspectral Images

    Authors: Ge Zhang, Shaohui Mei, Mingyang Ma, Yan Feng, Qian Du

    Abstract: Spectral unmixing (SU) expresses the mixed pixels existed in hyperspectral images as the product of endmember and abundance, which has been widely used in hyperspectral imagery analysis. However, the influence of light, acquisition conditions and the inherent properties of materials, results in that the identified endmembers can vary spectrally within a given image (construed as spectral variabili… ▽ More

    Submitted 21 October, 2021; v1 submitted 19 October, 2021; originally announced October 2021.

  36. arXiv:2110.06301  [pdf, ps, other

    eess.SP

    Switch-based Hybrid Beamforming for Wideband Multi-carrier Communications

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Markku Juntti

    Abstract: Switch-based hybrid beamforming (SW-HBF) architectures are promising for realizing massive multiple-input multiple-output (MIMO) communications systems because of their low cost and low power consumption. In this paper, we study the performance of SW-HBF in a wideband multi-carrier MIMO communication system considering the beam squint effect. We aim at designing the switch-based combiner that maxi… ▽ More

    Submitted 21 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 6 pages, 8 figures, to appear in the Proceedings of the 25th International ITG Workshop on Smart Antennas (WSA 2021)

  37. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  38. arXiv:2108.06691  [pdf, ps, other

    eess.SP

    Closed-Form Hybrid Beamforming Solution for Spectral Efficiency Upper Bound Maximization in mmWave MIMO-OFDM Systems

    Authors: Mengyuan Ma, Nhan Thanh Nguyen, Markku Juntti

    Abstract: Hybrid beamforming is considered a key enabler to realize millimeter wave (mmWave) multiple-input multiple-output (MIMO) communications due to its capability of considerably reducing the number of costly and power-hungry radio frequency chains in the transceiver. However, in mmWave MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) systems, hybrid beamforming design is challenging because… ▽ More

    Submitted 24 August, 2021; v1 submitted 15 August, 2021; originally announced August 2021.

    Comments: 5 pages, 5 figures, to appear in the proceedings of VTC2021-Fall

  39. Federated Learning for Internet of Things: A Federated Learning Framework for On-device Anomaly Data Detection

    Authors: Tuo Zhang, Chaoyang He, Tianhao Ma, Lei Gao, Mark Ma, Salman Avestimehr

    Abstract: Federated learning can be a promising solution for enabling IoT cybersecurity (i.e., anomaly detection in the IoT environment) while preserving data privacy and mitigating the high communication/storage overhead (e.g., high-frequency data from time-series sensors) of centralized over-the-cloud approaches. In this paper, to further push forward this direction with a comprehensive study in both algo… ▽ More

    Submitted 18 October, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Journal ref: Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, November 2021, Pages 413-419

  40. arXiv:2106.07098  [pdf, other

    cs.CR cs.LG eess.SY

    Security Analysis of Camera-LiDAR Fusion Against Black-Box Attacks on Autonomous Vehicles

    Authors: R. Spencer Hallyburton, Yupei Liu, Yulong Cao, Z. Morley Mao, Miroslav Pajic

    Abstract: To enable safe and reliable decision-making, autonomous vehicles (AVs) feed sensor data to perception algorithms to understand the environment. Sensor fusion with multi-frame tracking is becoming increasingly popular for detecting 3D objects. Thus, in this work, we perform an analysis of camera-LiDAR fusion, in the AV context, under LiDAR spoofing attacks. Recently, LiDAR-only perception was shown… ▽ More

    Submitted 21 February, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

  41. arXiv:2106.06636  [pdf, other

    cs.CL cs.SD eess.AS

    Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

    Authors: Junkun Chen, Mingbo Ma, Renjie Zheng, Liang Huang

    Abstract: Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: accepted by Findings of ACL 2021

  42. arXiv:2104.14830  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling End-to-End Models for Large-Scale Multilingual ASR

    Authors: Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai

    Abstract: Building ASR models across many languages is a challenging multi-task learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource to low resource languages. However, degradations on high resource languages are commonly observed due to interference from the heterogeneous multilingual data and reduction in per-language capacity.… ▽ More

    Submitted 11 September, 2021; v1 submitted 30 April, 2021; originally announced April 2021.

    Comments: ASRU 2021

  43. arXiv:2104.04993  [pdf, other

    eess.AS

    The DKU System Description for The Interspeech 2021 Auto-KWS Challenge

    Authors: Yechen Wang, Yan Jia, Murong Ma, Zexin Cai, Ming Li

    Abstract: This paper introduces the system submitted by the DKU-SMIIP team for the Auto-KWS 2021 Challenge. Our implementation consists of a two-stage keyword spotting system based on query-by-example spoken term detection and a speaker verification system. We employ two different detection algorithms in our proposed keyword spotting system. The first stage adopts subsequence dynamic time war** for templa… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: 5 pages, 1 figures, submitted to INTERSPEECH

  44. arXiv:2102.03357  [pdf, other

    eess.SP cs.AI cs.LG eess.SY

    Machine Learning for Electronic Design Automation: A Survey

    Authors: Guyue Huang, **gbo Hu, Yifan He, Jialong Liu, Mingyuan Ma, Zhaoyang Shen, Juejian Wu, Yuanfan Xu, Hengrui Zhang, Kai Zhong, Xuefei Ning, Yuzhe Ma, Haoyu Yang, Bei Yu, Huazhong Yang, Yu Wang

    Abstract: With the down-scaling of CMOS technology, the design complexity of very large-scale integrated (VLSI) is increasing. Although the application of machine learning (ML) techniques in electronic design automation (EDA) can trace its history back to the 90s, the recent breakthrough of ML and the increasing complexity of EDA tasks have aroused more interests in incorporating ML to solve EDA tasks. In t… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 January, 2021; originally announced February 2021.

    Comments: Accepted by TODAES. The first 10 authors are ordered alphabetically

  45. arXiv:2102.00202  [pdf, other

    eess.SP eess.IV

    SNR-adaptive deep joint source-channel coding for wireless image transmission

    Authors: Mingze Ding, Jiahui Li, Mengyao Ma, Xiaopeng Fan

    Abstract: Considering the problem of joint source-channel coding (JSCC) for multi-user transmission of images over noisy channels, an autoencoder-based novel deep joint source-channel coding scheme is proposed in this paper. In the proposed JSCC scheme, the decoder can estimate the signal-to-noise ratio (SNR) and use it to adaptively decode the transmitted image. Experiments demonstrate that the proposed sc… ▽ More

    Submitted 2 February, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

    Comments: Accepted in IEEE ICASSP 2021

  46. arXiv:2011.01460  [pdf, other

    cs.LG cs.SD eess.AS

    Training Wake Word Detection with Synthesized Speech Data on Confusion Words

    Authors: Yan Jia, Zexin Cai, Murong Ma, Zeqing Zhao, Xuyang Wang, Junjie Wang, Ming Li

    Abstract: Confusing-words are commonly encountered in real-life keyword spotting applications, which causes severe degradation of performance due to complex spoken terms and various kinds of words that sound similar to the predefined keywords. To enhance the wake word detection system's robustness on such scenarios, we investigate two data augmentation setups for training end-to-end KWS systems. One is invo… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: Submitted to ICASSP 2021

  47. Predictive Monitoring with Logic-Calibrated Uncertainty for Cyber-Physical Systems

    Authors: Meiyi Ma, John Stankovic, Ezio Bartocci, Lu Feng

    Abstract: Predictive monitoring -- making predictions about future states and monitoring if the predicted states satisfy requirements -- offers a promising paradigm in supporting the decision making of Cyber-Physical Systems (CPS). Existing works of predictive monitoring mostly focus on monitoring individual predictions rather than sequential predictions. We develop a novel approach for monitoring sequentia… ▽ More

    Submitted 24 July, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

    Comments: This article appears as part of the ESWEEK-TECS special issue and was presented in the International Conference on Embedded Software (EMSOFT), 2021

    Journal ref: In 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS) (pp. 51-62). IEEE

  48. arXiv:2010.12096  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

    Authors: Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

    Abstract: Streaming end-to-end automatic speech recognition (ASR) models are widely used on smart speakers and on-device applications. Since these models are expected to transcribe speech with minimal latency, they are constrained to be causal with no future context, compared to their non-streaming counterparts. Consequently, streaming models usually perform worse than non-streaming models. We propose a nov… ▽ More

    Submitted 21 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  49. arXiv:2010.04753  [pdf, other

    eess.SY

    Impact Evaluation of Falsified Data Attacks on Connected Vehicle Based Traffic Signal Control

    Authors: Shihong Ed Huang, Wai Wong, Yiheng Feng, Qi Alfred Chen, Z. Morley Mao, Henry X. Liu

    Abstract: Connected vehicle (CV) technology enables data exchange between vehicles and transportation infrastructure and therefore has great potentials to improve current traffic signal control systems. However, this connectivity might also bring cyber security concerns. As the first step in investigating the cyber security of CV-based traffic signal control (CV-TSC) systems, potential cyber threats need to… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

  50. arXiv:2007.03724  [pdf, other

    cs.LG cs.CR cs.IT eess.SY

    Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data

    Authors: Alireza Sadeghi, Gang Wang, Meng Ma, Georgios B. Giannakis

    Abstract: Data used to train machine learning models can be adversarial--maliciously constructed by adversaries to fool the model. Challenge also arises by privacy, confidentiality, or due to legal constraints when data are geographically gathered and stored across multiple learners, some of which may hold even an "anonymized" or unreliable dataset. In this context, the distributionally robust optimization… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    Comments: 14 pages, 5 figures