Skip to main content

Showing 1–50 of 103 results for author: Liu, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04997  [pdf, ps, other

    eess.AS cs.LG

    On the social bias of speech self-supervised models

    Authors: Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by au… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  2. arXiv:2405.16791  [pdf, ps, other

    cs.IT eess.SP

    Joint Node Selection and Resource Allocation Optimization for Cooperative Sensing with a Shared Wireless Backhaul

    Authors: Mingxin Chen, Ming-Min Zhao, An Liu, Min Li, Qingjiang Shi

    Abstract: In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multi… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 13 pages, 10 figures

  3. arXiv:2405.04027  [pdf, other

    eess.SP

    Joint Visibility Region Detection and Channel Estimation for XL-MIMO Systems via Alternating MAP

    Authors: Wenkang Xu, An Liu, Min-jian Zhao

    Abstract: We investigate a joint visibility region (VR) detection and channel estimation problem in extremely large-scale multiple-input-multiple-output (XL-MIMO) systems, where near-field propagation and spatial non-stationary effects exist. In this case, each scatterer can only see a subset of antennas, i.e., it has a certain VR over the antennas. Because of the spatial correlation among adjacent sub-arra… ▽ More

    Submitted 21 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13 pages, 14 figures, submitted to IEEE TSP

  4. arXiv:2405.01200  [pdf, other

    eess.SY cs.LG

    Learning-to-solve unit commitment based on few-shot physics-guided spatial-temporal graph convolution network

    Authors: Mei Yang, Gao Qiu andJunyong Liu, Kai Liu

    Abstract: This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for cons… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  6. Design and Optimization of Cooperative Sensing With Limited Backhaul Capacity

    Authors: Wenrui Li, Min Li, An Liu, Tony Xiao Han

    Abstract: This paper introduces a cooperative sensing framework designed for integrated sensing and communication cellular networks. The framework comprises one base station (BS) functioning as the sensing transmitter, while several nearby BSs act as sensing receivers. The primary objective is to facilitate cooperative target localization by enabling each receiver to share specific information with a fusion… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This paper has been published in 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall)

  7. arXiv:2403.11974  [pdf, other

    eess.IV cs.CV

    OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images

    Authors: Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Bo Fu, Catherien C. Liu, Xingtao Zhou

    Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  8. arXiv:2402.13236  [pdf, other

    eess.AS cs.SD

    Towards audio language modeling -- an overview

    Authors: Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-yi Lee

    Abstract: Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency. Researchers recently discovered the potential of codecs as suitable tokenizers for converting continuous audio into discrete codes, which can be employed to develop audio language models (LMs). Numerous high-performance neural audio codecs and codec-based LMs have been developed.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  9. arXiv:2402.13071  [pdf, other

    eess.AS cs.SD

    Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

    Authors: Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee

    Abstract: The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance. Recent years have witnessed significant developments in codec models. The ideal sound codec should preserve content, paralinguistics, speakers, and audio information. However, the question of which codec achieves optimal sound information preservation remains unanswere… ▽ More

    Submitted 7 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Github: https://github.com/voidful/Codec-SUPERB

  10. arXiv:2401.13947  [pdf, other

    eess.SY cs.LG cs.MA

    Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading

    Authors: Chen Feng, Andrew L. Liu

    Abstract: Utilizing distributed renewable and energy storage resources in local distribution networks via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy systems' resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have the expertise to engage in repeated P2P trading, and the zero-marginal costs of renewa… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  11. arXiv:2401.13914  [pdf, ps, other

    eess.SP

    Analog Beamforming for In-Band Full-Duplex Phased Arrays with Quantized Phase Shifters under a Per-Antenna Received Power Constraint

    Authors: Ao Liu, Ian P. Roberts, Taneli Riihonen, Weixing Sheng

    Abstract: This letter develops a novel transmit beamforming (BF) design for canceling self-interference (SI) in analog in-band full-duplex phased arrays. Our design maximizes transmit BF gain in a desired direction while simultaneously reducing SI power to below a specified threshold on per-antenna basis to avoid saturating receive-chain components, such as LNAs. Core to our approach is that it accounts for… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: This paper has been submitted to the IEEE for review; copyright may change without notice

  12. arXiv:2401.08833  [pdf, other

    eess.AS cs.CL cs.SD

    Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

    Authors: Alexander H. Liu, Sung-Lin Yeh, James Glass

    Abstract: Existing studies on self-supervised speech representation learning have focused on develo** new training methods and applying pre-trained models for different applications. However, the quality of these models is often measured by the performance of different downstream tasks. How well the representations access the information of interest is less studied. In this work, we take a closer look int… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  13. arXiv:2401.00429  [pdf, other

    cs.NI eess.SP

    Deeper and Wider Networks for Performance Metrics Prediction in Communication Networks

    Authors: Aijia Liu, Shiqing Liu, Xiaobing Pei

    Abstract: In today's era, users have increasingly high expectations regarding the performance and efficiency of communication networks. Network operators aspire to achieve efficient network planning, operation, and optimization through Digital Twin Networks (DTN). The effectiveness of DTN heavily relies on the network model, with graph neural networks (GNN) playing a crucial role in network modeling. Howeve… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  14. arXiv:2310.16338  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Generative Pre-training for Speech with Flow Matching

    Authors: Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

    Abstract: Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples where generative models have shined. While generative models have been applied to different applications in speech, there… ▽ More

    Submitted 25 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  15. arXiv:2310.08851  [pdf, ps, other

    eess.SP

    A Two-Stage 2D Channel Extrapolation Scheme for TDD 5G NR Systems

    Authors: Yubo Wan, An Liu

    Abstract: Recently, channel extrapolation has been widely investigated in frequency division duplex (FDD) massive MIMO systems. However, in time division duplex (TDD) fifth generation (5G) new radio (NR) systems, the channel extrapolation problem also arises due to the hop** uplink pilot pattern, which has not been fully researched yet. This paper addresses this gap by formulating a channel extrapolation… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  16. arXiv:2310.05382  [pdf, other

    eess.SP

    A Stochastic Particle Variational Bayesian Inference Inspired Deep-Unfolding Network for Non-Convex Parameter Estimation

    Authors: Zhixiang Hu, An Liu, Minjian Zhao

    Abstract: Future wireless networks are envisioned to provide ubiquitous sensing services, which also gives rise to a substantial demand for high-dimensional non-convex parameter estimation, i.e., the associated likelihood function is non-convex and contains numerous local optima. Variational Bayesian inference (VBI) provides a powerful tool for modeling complex estimation problems and reasoning with prior i… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  17. arXiv:2309.14405  [pdf, other

    cs.SD cs.AI eess.AS

    Joint Audio and Speech Understanding

    Authors: Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

    Abstract: Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perce… ▽ More

    Submitted 10 December, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted at ASRU 2023. Code, dataset, and pretrained models are at https://github.com/yuangongnd/ltu. Interactive demo at https://huggingface.co/spaces/yuangongfdu/ltu-2

  18. arXiv:2309.04171  [pdf, other

    cs.CV cs.IR cs.IT eess.IV

    PRISTA-Net: Deep Iterative Shrinkage Thresholding Network for Coded Diffraction Patterns Phase Retrieval

    Authors: Aoxu Liu, Xiaohong Fan, Yin Yang, Jian** Zhang

    Abstract: The problem of phase retrieval (PR) involves recovering an unknown image from limited amplitude measurement data and is a challenge nonlinear inverse problem in computational imaging and image processing. However, many of the PR methods are based on black-box network models that lack interpretability and plug-and-play (PnP) frameworks that are computationally complex and require careful parameter… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 12 pages

  19. arXiv:2309.03815  [pdf, other

    cs.CV cs.MM eess.IV

    T2IW: Joint Text to Image & Watermark Generation

    Authors: An-An Liu, Guokai Zhang, Yuting Su, Ning Xu, Yongdong Zhang, Lanjun Wang

    Abstract: Recent developments in text-conditioned image generative models have revolutionized the production of realistic results. Unfortunately, this has also led to an increase in privacy violations and the spread of false information, which requires the need for traceability, privacy protection, and other security measures. However, existing text-to-image paradigms lack the technical capabilities to link… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  20. arXiv:2308.03027  [pdf, other

    cs.LG cs.CV eess.SP

    Causal Disentanglement Hidden Markov Model for Fault Diagnosis

    Authors: Rihao Chang, Yongtao Ma, Weizhi Nie, Jie Nie, An-an Liu

    Abstract: In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism a… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  21. arXiv:2307.09149  [pdf, other

    eess.SP

    Successive Linear Approximation VBI for Joint Sparse Signal Recovery and Dynamic Grid Parameters Estimation

    Authors: Wenkang Xu, An Liu, Bingpeng Zhou, Minjian Zhao

    Abstract: For many practical applications in wireless communications, we need to recover a structured sparse signal from a linear observation model with dynamic grid parameters in the sensing matrix. Conventional expectation maximization (EM)-based compressed sensing (CS) methods, such as turbo compressed sensing (Turbo-CS) and turbo variational Bayesian inference (Turbo-VBI), have double-loop iterations, w… ▽ More

    Submitted 12 November, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: 14 pages, 15 figures, submitted to IEEE Transactions on Wireless Communications

  22. arXiv:2306.08256  [pdf, other

    eess.SP cs.LG

    Data Augmentation for Seizure Prediction with Generative Diffusion Model

    Authors: Kai Shu, Yuchang Zhao, Le Wu, Ai** Liu, Ruobing Qian, Xun Chen

    Abstract: Objective: Seizure prediction is of great importance to improve the life of patients. The focal point is to distinguish preictal states from interictal ones. With the development of machine learning, seizure prediction methods have achieved significant progress. However, the severe imbalance problem between preictal and interictal data still poses a great challenge, restricting the performance of… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 12 pages, 6 figures

  23. arXiv:2306.02436  [pdf, ps, other

    cs.IT eess.SP

    Joint Activity Detection and Channel Estimation in Massive Machine-Type Communications with Low-Resolution ADC

    Authors: Ye Xue, An Liu, Yang Li, Qingjiang Shi, Vincent Lau

    Abstract: In massive machine-type communications, data transmission is usually considered sporadic, and thus inherently has a sparse structure. This paper focuses on the joint activity detection (AD) and channel estimation (CE) problems in massive-connected communication systems with low-resolution analog-to-digital converters. To further exploit the sparse structure in transmission, we propose a maximum po… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted by ICC 2023 as a regular paper

  24. arXiv:2306.01232  [pdf, other

    eess.IV cs.CV

    Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray is often utilized for diagnosing common thoracic diseases. In recent years, many approaches have been proposed to handle the problem of automatic diagnosis based on chest X-rays. However, the scarcity of labeled data for related diseases still poses a huge challenge to an accurate diagnosis. In this paper, we focus on the thorax disease diagnostic problem and propose a novel deep r… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  25. arXiv:2305.14997  [pdf, other

    eess.SP

    3GPP-Like GBSM THz Channel Modeling for Indoor Office and Urban Microcellular Scenarios

    Authors: Zhaowei Chang, Jianhua Zhang, Pan Tang, Lei Tian, Hao Jiang, Ximan Liu, and Guangyi Liu

    Abstract: Terahertz (THz) communication is envisioned as one of the possible technologies for the sixth-generation (6G) communication system due to its rich spectrum. To evaluate the performance of THz communication, it is essential to propose THz channel models within the common framework of the geometry-based stochastic model (GBSM) in the 3rd Generation Partnership Project (3GPP). This paper focuses on 3… ▽ More

    Submitted 22 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  26. arXiv:2305.12072  [pdf, other

    eess.IV cs.CV

    Chest X-ray Image Classification: A Causal Perspective

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is one of the most common and easy-to-get medical tests used to diagnose common diseases of the chest. Recently, many deep learning-based methods have been proposed that are capable of effectively classifying CXRs. Even though these techniques have worked quite well, it is difficult to establish whether what these algorithms actually learn is the cause-and-effect link between… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  27. arXiv:2305.12070  [pdf, other

    eess.IV cs.CV

    Instrumental Variable Learning for Chest X-ray Classification

    Authors: Weizhi Nie, Chen Zhang, Dan song, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is commonly employed to diagnose thoracic illnesses, but the challenge of achieving accurate automatic diagnosis through this method persists due to the complex relationship between pathology. In recent years, various deep learning-based approaches have been suggested to tackle this problem but confounding factors such as image resolution or noise problems often damage model… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  28. arXiv:2305.11072  [pdf, other

    cs.CL eess.AS

    Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

    Authors: Heng-Jui Chang, Alexander H. Liu, James Glass

    Abstract: Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging. We propose speaker-invariant clustering (Spin), a novel self-supervised learning method that clusters speech representations and performs swapped prediction between the original and speaker-perturbed utterances. Spin disentangles speaker… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  29. arXiv:2305.10790  [pdf, other

    eess.AS cs.SD

    Listen, Think, and Understand

    Authors: Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass

    Abstract: The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is crucial for many applications. Although significant progress has been made in this area since the development of AudioSet, most existing models are designed to map audio inputs to pre-defined, discrete sound label sets. In contrast, humans possess the ability to not only classify sounds into general cat… ▽ More

    Submitted 19 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at ICLR 2024. Code, dataset, and models are available at https://github.com/YuanGongND/ltu. The interactive demo is at https://huggingface.co/spaces/yuangongfdu/ltu

  30. arXiv:2305.07774  [pdf, other

    cs.CV eess.IV

    PanFlowNet: A Flow-Based Deep Network for Pan-sharpening

    Authors: Gang Yang, Xiangyong Cao, Wenzhe Xiao, Man Zhou, Ai** Liu, Xun chen, Deyu Meng

    Abstract: Pan-sharpening aims to generate a high-resolution multispectral (HRMS) image by integrating the spectral information of a low-resolution multispectral (LRMS) image with the texture details of a high-resolution panchromatic (PAN) image. It essentially inherits the ill-posed nature of the super-resolution (SR) task that diverse HRMS images can degrade into an LRMS image. However, existing deep learn… ▽ More

    Submitted 16 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

  31. arXiv:2303.07742  [pdf, other

    cs.LG cs.HC eess.SP

    ForDigitStress: A multi-modal stress dataset employing a digital job interview scenario

    Authors: Alexander Heimerl, Pooja Prajod, Silvan Mertes, Tobias Baur, Matthias Kraus, Ailin Liu, Helen Risack, Nicolas Rohleder, Elisabeth André, Linda Becker

    Abstract: We present a multi-modal stress dataset that uses digital job interviews to induce stress. The dataset provides multi-modal data of 40 participants including audio, video (motion capturing, facial recognition, eye tracking) as well as physiological information (photoplethysmography, electrodermal activity). In addition to that, the dataset contains time-continuous annotations for stress and occurr… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  32. arXiv:2302.11334  [pdf, other

    eess.SY

    Stabilization with Prescribed Instant via Lyapunov Method

    Authors: Jiyuan Kuang, Yabin Gao, Yizhuo Sun, Jiahui Wang, Aohua Liu, Yue Zhao, Jianxing Liu

    Abstract: This letter investigates the prescribed-instant stabilization problem for high-order integrator systems. In anothor word, the settling time under the presented controller is independent of the initial conditions and equals the prescribed time instant. The controller is designed with the concept of backstep**. A strict proof based on the Lyapunov method is presented to clamp the settling time to… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  33. arXiv:2302.02587  [pdf, other

    eess.SP

    Joint Scattering Environment Sensing and Channel Estimation Based on Non-stationary Markov Random Field

    Authors: Wenkang Xu, Yongbo Xiao, An Liu, Ming Lei, Minjian Zhao

    Abstract: This paper considers an integrated sensing and communication system, where some radar targets also serve as communication scatterers. A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a two-dimensional (2-D) joint burst sparsity. We propose a joint scatterin… ▽ More

    Submitted 18 July, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: 15 pages, 13 figures, submitted to IEEE Transactions on Wireless Communications

  34. arXiv:2302.01619  [pdf, other

    eess.SP

    Joint Scattering Environment Sensing and Channel Estimation for Integrated Sensing and Communication

    Authors: Wenkang Xu, Yongbo Xiao, An Liu, Minjian Zhao

    Abstract: This paper considers an integrated sensing and communication system, where some radar targets also serve as communication scatterers. A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a partially common sparsity. By exploiting this, we propose a joint scatte… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

  35. arXiv:2211.14313  [pdf, other

    eess.IV cs.AI cs.CY

    AICOM-MP: an AI-based Monkeypox Detector for Resource-Constrained Environments

    Authors: Tim Tianyi Yang, Tom Tianze Yang, Andrew Liu, Jie Tang, Na An, Shaoshan Liu, Xue Liu

    Abstract: Under the Autonomous Mobile Clinics (AMCs) initiative, we are develo**, open sourcing, and standardizing health AI technologies to enable healthcare access in least developed countries (LDCs). We deem AMCs as the next generation of health care delivery platforms, whereas health AI engines are applications on these platforms, similar to how various applications expand the usage scenarios of smart… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  36. arXiv:2211.11749  [pdf

    eess.IV cs.CV physics.med-ph

    Towards Automatic Prediction of Outcome in Treatment of Cerebral Aneurysms

    Authors: Ashutosh Jadhav, Satyananda Kashyap, Hakan Bulu, Ronak Dholakia, Amon Y. Liu, Tanveer Syeda-Mahmood, William R. Patterson, Hussain Rangwala, Mehdi Moradi

    Abstract: Intrasaccular flow disruptors treat cerebral aneurysms by diverting the blood flow from the aneurysm sac. Residual flow into the sac after the intervention is a failure that could be due to the use of an undersized device, or to vascular anatomy and clinical condition of the patient. We report a machine learning model based on over 100 clinical and imaging features that predict the outcome of wide… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: 10 pages

    Report number: https://s4.goeshow.com/amia/annual/2022/schedule_at_a_glance.cfm?session_key=1965BCBD-A832-92DD-9D05-FB2CB132FADB&session_date=

    Journal ref: AMAI 2022 Annual Symposium

  37. arXiv:2210.07839  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Contrastive Audio-Visual Masked Autoencoder

    Authors: Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

    Abstract: In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities. Subsequently, we propose the Contrastive Audio-Visual Masked Auto-Encoder (CAV-MAE) by combining contrastive learning and masked data modeling, two major self-supervised learning frameworks, to learn a joint and coordinated audio-visual representation. Our experiments… ▽ More

    Submitted 11 April, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: Accepted at ICLR 2023 as a notable top 25% paper. Code and pretrained models are at https://github.com/yuangongnd/cav-mae

  38. arXiv:2210.01032  [pdf

    cs.LG eess.IV

    A New Hip Fracture Risk Index Derived from FEA-Computed Proximal Femur Fracture Loads and Energies-to-Failure

    Authors: Xuewei Cao, Joyce H Keyak, Sigurdur Sigurdsson, Chen Zhao, Weihua Zhou, Anqi Liu, Thomas Lang, Hong-Wen Deng, Vilmundur Gudnason, Qiuying Sha

    Abstract: Hip fracture risk assessment is an important but challenging task. Quantitative CT-based patient specific finite element analysis (FEA) computes the force (fracture load) to break the proximal femur in a particular loading condition. It provides different structural information about the proximal femur that can influence a subject overall fracture risk. To obtain a more robust measure of fracture… ▽ More

    Submitted 18 November, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: 27 pages, 4 figures

  39. arXiv:2209.14505  [pdf, other

    eess.SY econ.GN math.OC

    Optimal Retail Tariff Design with Prosumers: Pursuing Equity at the Expenses of Economic Efficiencies?

    Authors: Yihsu Chen, Andrew L. Liu, Makoto Tanaka, Ryuta Takashima

    Abstract: Distributed renewable resources owned by prosumers can be an effective way of fortifying grid resilience and enhancing sustainability. However, prosumers serve their own interests and their objectives are unlikely to align with that of society. This paper develops a bilevel model to study the optimal design of retail electricity tariffs considering the balance between economic efficiency and energ… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  40. arXiv:2209.07773  [pdf, ps, other

    eess.SY

    Event-Triggered Extended State Observer Based Distributed Control of Nonlinear Vehicle Platoons

    Authors: Anquan Liu, Tao Li, Yu Gu

    Abstract: We study the platoon control of vehicles with third-order nonlinear dynamics under the constant spacing policy. We consider a vehicle model with parameter uncertainties and external disturbances and propose a distributed control law based on an event-triggered extended state observer (ESO). First, an event-triggered ESO is designed to estimate the unmodeled dynamics in the vehicle model. Then base… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

  41. Model-Guided Multi-Contrast Deep Unfolding Network for MRI Super-resolution Reconstruction

    Authors: Gang Yang, Li Zhang, Man Zhou, Ai** Liu, Xun Chen, Zhiwei Xiong, Feng Wu

    Abstract: Magnetic resonance imaging (MRI) with high resolution (HR) provides more detailed information for accurate diagnosis and quantitative image analysis. Despite the significant advances, most existing super-resolution (SR) reconstruction network for medical images has two flaws: 1) All of them are designed in a black-box principle, thus lacking sufficient interpretability and further limiting their p… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: Accepted to ACMMM 2022, 9 pages

  42. arXiv:2208.00061  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    UAVM: Towards Unifying Audio and Visual Models

    Authors: Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass

    Abstract: Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do… ▽ More

    Submitted 15 February, 2023; v1 submitted 29 July, 2022; originally announced August 2022.

    Comments: Published in Signal Processing Letters. Code at https://github.com/YuanGongND/uavm

    Journal ref: IEEE Signal Processing Letters, vol. 29, pp. 2437-2441, 2022

  43. arXiv:2207.10427  [pdf, other

    eess.SP

    A Two-stage Multiband WiFi Sensing Scheme via Stochastic Particle-Based Variational Bayesian Inference

    Authors: Zhixiang Hu, An Liu, Yubo Wan, Tony Xiao Han, Minjian Zhao

    Abstract: Multiband fusion enhances WiFi sensing by jointly utilizing signals from multiple non-contiguous frequency bands. However, in the multi-band WiFi sensing signal model, there are many local optimums in the associated likelihood function due to the existence of high frequency component and phase distortion factors, posing challenges for high-accuracy parameter estimation. To address this, we propose… ▽ More

    Submitted 9 October, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

  44. arXiv:2207.10306  [pdf, ps, other

    eess.SP

    Fundamental Limits and Optimization of Multiband Sensing

    Authors: Yubo Wan, An Liu, Rui Du, Tony Xiao Han

    Abstract: Multiband sensing is a promising technology that utilizes multiple non-contiguous frequency bands to achieve high-resolution target sensing. In this paper, we investigate the fundamental limits and optimization of multiband sensing, focusing on the fundamental limits associated with time delay. We first derive a Fisher information matrix (FIM) with a compact form using the Dirichlet kernel and the… ▽ More

    Submitted 31 January, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

  45. arXiv:2207.08123  [pdf, ps, other

    eess.SP

    Latency Minimization for mmWave D2D Mobile Edge Computing Systems: Joint Task Allocation and Hybrid Beamforming Design

    Authors: Yanzhen Liu, Yunlong Cai, An Liu, Minjian Zhao, Lajos Hanzo

    Abstract: Mobile edge computing (MEC) and millimeter wave (mmWave) communications are capable of significantly reducing the network's delay and enhancing its capacity. In this paper we investigate a mmWave and device-to-device (D2D) assisted MEC system, in which user A carries out some computational tasks and shares the results with user B with the aid of a base station (BS). We propose a novel two-timescal… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

  46. arXiv:2206.09751  [pdf, ps, other

    eess.SP

    Multiband Delay Estimation for Localization Using a Two-Stage Global Estimation Scheme

    Authors: Yubo Wan, An Liu, Qiyu Hu, Mianyi Zhang, Yunlong Cai

    Abstract: The time of arrival (TOA)-based localization techniques, which need to estimate the delay of the line-of-sight (LoS) path, have been widely employed in location-aware networks. To achieve a high-accuracy delay estimation, a number of multiband-based algorithms have been proposed recently, which exploit the channel state information (CSI) measurements over multiple non-contiguous frequency bands. H… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  47. Parallel Synthesis for Autoregressive Speech Generation

    Authors: Po-chun Hsu, Da-rong Liu, Andy T. Liu, Hung-yi Lee

    Abstract: Autoregressive neural vocoders have achieved outstanding performance in speech synthesis tasks such as text-to-speech and voice conversion. An autoregressive vocoder predicts a sample at some time step conditioned on those at previous time steps. Though it synthesizes natural human speech, the iterative generation inevitably makes the synthesis time proportional to the utterance length, leading to… ▽ More

    Submitted 5 June, 2024; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  48. arXiv:2204.02524  [pdf, other

    cs.SD cs.CL eess.AS

    Simple and Effective Unsupervised Speech Synthesis

    Authors: Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass

    Abstract: We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe. The framework leverages recent work in unsupervised speech recognition as well as existing neural-based speech synthesis. Using only unlabeled speech audio and unlabeled text as well as a lexicon, our method enables speech synthesis without the need for a human-labeled corpus. Experiments demonstra… ▽ More

    Submitted 20 April, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: preprint, equal contribution from first two authors

  49. arXiv:2204.02492  [pdf, other

    cs.CL cs.SD eess.AS

    Towards End-to-end Unsupervised Speech Recognition

    Authors: Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski

    Abstract: Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language. However, existing methods still heavily rely on hand-crafted pre-processing. Similar to the trend of making supervised speech recognition end-to-end, we introduce wav2vec-U 2.0 which does away with all audio-side pre-processing and improves accuracy through bet… ▽ More

    Submitted 15 June, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Preprint

  50. arXiv:2203.06849  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

    Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference