Skip to main content

Showing 1–50 of 54 results for author: Cao, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05913  [pdf, other

    cs.NI eess.SP

    Revisiting Multi-User Downlink in IEEE 802.11ax: A Designers Guide to MU-MIMO

    Authors: Liu Cao, Lyutianyang Zhang, Sumit Roy, Sian **

    Abstract: Downlink (DL) Multi-User (MU) Multiple Input Multiple Output (MU-MIMO) is a key technology that allows multiple concurrent data transmissions from an Access Point (AP) to a selected sub-set of clients for higher network efficiency in IEEE 802.11ax. However, DL MU-MIMO feature is typically turned off as the default setting in AP vendors' products, that is, turning on the DL MU-MIMO may not help inc… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. 7 pages, 6 figures, magazine paper

  2. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  3. arXiv:2405.11115  [pdf

    eess.IV physics.optics

    Ptychographic non-line-of-sight imaging for depth-resolved visualization of hidden objects

    Authors: Pengming Song, Qianhao Zhao, Ruihai Wang, Ninghe Liu, Yingqi Qiang, Tianbo Wang, Xincheng Zhang, Yi Zhang, Liangcai Cao, Guoan Zheng

    Abstract: Non-line-of-sight (NLOS) imaging enables the visualization of objects hidden from direct view, with applications in surveillance, remote sensing, and light detection and ranging. Here, we introduce a NLOS imaging technique termed ptychographic NLOS (pNLOS), which leverages coded ptychography for depth-resolved imaging of obscured objects. Our approach involves scanning a laser spot on a wall to il… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  4. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  5. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  6. arXiv:2404.11278  [pdf, other

    physics.ins-det eess.IV

    Study on the static detection of ICF target based on muonic X-ray sphere encoded imaging

    Authors: Dikai Li, Jian Yu, Qian Chen, Chunhui Zhang, Xiangyu Wan, Leifeng Cao

    Abstract: Muon Induced X-ray Emission (MIXE) was discovered by Chinese physicist Zhang Wenyu as early as 1947, and it can conduct non-destructive elemental analysis inside samples. Research has shown that MIXE can retain the high efficiency of direct imaging while benefiting from the low noise of pinhole imaging through encoding holes. The related technology significantly improves the counting rate while ma… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  7. arXiv:2404.01164  [pdf, ps, other

    eess.SY

    Unified Predefined-time Stability Conditions of Nonlinear Systems with Lyapunov Analysis

    Authors: Bing Xiao, Haichao Zhang, Shijie Zhao, Lu Cao

    Abstract: This brief gives a set of unified Lyapunov stability conditions to guarantee the predefined-time/finite-time stability of a dynamical systems. The derived Lyapunov theorem for autonomous systems establishes equivalence with existing theorems on predefined-time/finite-time stability. The findings proposed herein develop a nonsingular sliding mode control framework for an Euler-Lagrange system to an… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  8. arXiv:2312.11460  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

    Authors: Junfeng Long, Zirui Wang, Quanyi Li, Jiawei Gao, Liu Cao, Jiangmiao Pang

    Abstract: Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introdu… ▽ More

    Submitted 1 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

  9. arXiv:2311.03679  [pdf, other

    cs.CV eess.IV

    Unsupervised convolutional neural network fusion approach for change detection in remote sensing images

    Authors: Weidong Yan, Pei Yan, Li Cao

    Abstract: With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection.… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  10. arXiv:2311.02447  [pdf, other

    cs.IT eess.SP

    Quantized-but-uncoded Distributed Detection (QDD) with Unreliable Reporting Channels

    Authors: Lei Cao, Ramanarayanan Viswanathan

    Abstract: Distributed detection primarily centers around two approaches: Unquantized Distributed Detection (UDD), where each sensor reports its complete observation to the fusion center (FC), and quantized-and-Coded DD (CDD), where each sensor first partitions the observation space and then reports to the FC a codeword. In this paper, we introduce Quantized-but-uncoded DD (QDD), where each sensor, after qua… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: 11 pages, 8 figure, submitted to IEEE T-IT

  11. arXiv:2310.16137  [pdf, other

    cs.IT eess.SP

    Codebook-based Uplink Transmission Enhancement in 5G Advanced: Sub-band Precoding

    Authors: Liu Cao, Yahia Shabara, Parisa Cheraghi

    Abstract: The transformative enhancements of fifth-generation (5G) mobile devices bring about new challenges to achieve better uplink (UL) performance. Particularly, in codebook-based transmission, the wide-band (WB) precoding and the legacy UL codebook may become main bottlenecks for higher efficient data transmission. In this paper, we investigate the codebook-based UL single-layer transmission performanc… ▽ More

    Submitted 29 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: This work has been accepted by IEEE VCC 2023. 5 pages, 7 figures

  12. arXiv:2310.05368  [pdf, other

    cs.AI cs.MA cs.SD eess.AS

    Measuring Acoustics with Collaborative Multiple Agents

    Authors: Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun

    Abstract: As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by set… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Main paper (9 pages and 5 figures and 2 tables) and appendix (16 pages and 13 figures and 10 tables). Accepted for publication by IJCAI 2023

  13. arXiv:2309.16680  [pdf, other

    cs.NI eess.SY

    Semi-Persistent Scheduling in NR Sidelink Mode 2: MAC Packet Reception Ratio Model and Validation

    Authors: Liu Cao, Sumit Roy, Collin Brady

    Abstract: 5G NR Sidelink (SL) has demonstrated the promising capability for infrastructure-less cellular coverage. Understanding the fundamentals of the NR SL channel access mechanism, Semi-Persistent Scheduling (SPS), which is specified by the 3rd Generation Partnership Project (3GPP), is a necessity to enhance the NR SL Packet Reception Ratio (PRR). However, most existing works fail to account for the new… ▽ More

    Submitted 26 July, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. 13 pages, 21 figures

  14. arXiv:2309.09843  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Instruction-Following Speech Recognition

    Authors: Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

    Abstract: Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions. With the advent of Large Language Models (LLMs) in speech processing, more organic, text-prompt-based interactions have become possible. However, the mechanisms behind these models' speech understanding and "reasoning" capabilities remai… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  15. arXiv:2308.03263  [pdf, other

    eess.SP

    Prototy** and real-world field trials of RIS-aided wireless communications

    Authors: Xilong Pei, Haifan Yin, Li Tan, Lin Cao, Taorui Yang

    Abstract: Reconfigurable intelligent surface (RIS) is a promising technology that has the potential to change the way we interact with the wireless propagating environment. In this paper, we design and fabricate an RIS system that can be used in the fifth generation (5G) mobile communication networks. We also propose a practical two-step spatial-oversampling codebook algorithm for the beamforming of RIS, wh… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 10 pages, 21 figures

  16. arXiv:2307.02297  [pdf, other

    eess.SP

    RIS with insufficient phase shifting capability: Modeling, beamforming, and experimental validations

    Authors: Lin Cao, Haifan Yin, Li Tan, Xilong Pei

    Abstract: Most research works on reconfigurable intelligent surfaces (RIS) rely on idealized models of the reflection coefficients, i.e., uniform reflection amplitude for any phase and sufficient phase shifting capability. In practice however, such models are oversimplified. This paper introduces a realistic reflection coefficient model for RIS based on measurements. The reflection coefficients are modeled… ▽ More

    Submitted 16 April, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: 13 pages, 11 figures

  17. arXiv:2303.12693  [pdf, other

    eess.SY cs.AI

    Resilient Output Containment Control of Heterogeneous Multiagent Systems Against Composite Attacks: A Digital Twin Approach

    Authors: Yukang Cui, Lingbo Cao, Michael V. Basin, Jun Shen, Tingwen Huang, Xin Gong

    Abstract: This paper studies the distributed resilient output containment control of heterogeneous multiagent systems against composite attacks, including denial-of-services (DoS) attacks, false-data injection (FDI) attacks, camouflage attacks, and actuation attacks. Inspired by digital twins, a twin layer (TL) with higher security and privacy is used to decouple the above problem into two tasks: defense pr… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  18. arXiv:2303.02938  [pdf, other

    eess.SP

    RIS-aided Wireless Communications: Can RIS Beat Metal Plate?

    Authors: Jiangfeng Hu, Haifan Yin, Li Tan, Lin Cao, Xilong Pei

    Abstract: Reconfigurable Intelligent Surface (RIS) has recently been regarded as a paradigm-shifting technology beyond 5G, for its flexibility on smartly adjusting the response to the im**ing electromagnetic (EM) waves. Usually, RIS can be implemented by properly reconfiguring the adjustable parameters of each RIS unit to align the signal phase on the receiver side. And it is believed that the phase align… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 5 pages, 5 figures

  19. Learning Informative Representation for Fairness-aware Multivariate Time-series Forecasting: A Group-based Perspective

    Authors: Hui He, Qi Zhang, Shou** Wang, Kun Yi, Zhendong Niu, Longbing Cao

    Abstract: Performance unfairness among variables widely exists in multivariate time series (MTS) forecasting models since such models may attend/bias to certain (advantaged) variables. Addressing this unfairness problem is important for equally attending to all variables and avoiding vulnerable model biases/risks. However, fair MTS forecasting is challenging and has been less studied in the literature. To b… ▽ More

    Submitted 23 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 13 pages, 5 figures, accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

    MSC Class: 68Txx ACM Class: I.2.6

  20. arXiv:2301.02784  [pdf, other

    eess.SY

    Active Fault Isolation for Discrete Event Systems

    Authors: Lin Cao, Shaolong Shu, Feng Lin

    Abstract: In practice, we can not only disable some events, but also enforce the occurrence of some events prior to the occurrence of other events by external control. In this paper, we combine these two control mechanisms to synthesize a more powerful supervisor. Here our control goal is to design an isolation supervisor which ensures in the closed-loop system, faults are isolatable in the sense that after… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    MSC Class: 93B99 ACM Class: G.2; H.4

  21. arXiv:2301.00656  [pdf, other

    eess.AS cs.CL cs.LG

    TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR

    Authors: Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu

    Abstract: Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. TriNet learns the SSL latent embedding space and incorporates it to a higher level space for predicting pseudo target vectors generated by a frozen te… ▽ More

    Submitted 14 March, 2023; v1 submitted 12 December, 2022; originally announced January 2023.

    Comments: Accepted by ICASSP 2023

  22. arXiv:2210.13740  [pdf, other

    cs.NI eess.SY

    Latency-aware End-to-end Multi-path Data Transmission for URLLC Services

    Authors: Liu Cao, Abbas Kiani, Amanda Xiang, Kaippallimalil John, Tony Saboorian

    Abstract: 5th Generation Mobile Communication Technology (5G) utilizes the Access Traffic Steering, Switching, and Splitting (ATSSS) rule to enable multi-path data transmission, which is currently being standardized. Recently, the 3rd Generation Partnership Project (3GPP) SA1 and SA2 have been working on the multi-path solution for possible improvement from different perspectives. However, the existing 3GPP… ▽ More

    Submitted 21 October, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: This work has been submitted to the IEEE for possible publication. 5 pages, 6 figures

  23. arXiv:2210.01353  [pdf, other

    cs.SD cs.AI eess.AS

    Pay Self-Attention to Audio-Visual Navigation

    Authors: Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang

    Abstract: Audio-visual embodied navigation, as a hot research topic, aims training a robot to reach an audio target using egocentric visual (from the sensors mounted on the robot) and audio (emitted from the target) input. The audio-visual information fusion strategy is naturally important to the navigation performance, but the state-of-the-art methods still simply concatenate the visual and audio features,… ▽ More

    Submitted 5 October, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Main paper (10 pages and 7 figures) and appendix (21 figures and 4 tables). Accepted for publication by BMVC 2022. For data and code, see https://yyf17.github.io/FSAAVN/index.html

  24. arXiv:2209.02944  [pdf, other

    cs.IT eess.SP

    Architecture-Algorithmic Trade-offs in Multi-path Channel Estimation for mmWAVE Systems

    Authors: Lyutianyang Zhang, Sumit Roy, Liu Cao

    Abstract: 5G mmWave massive MIMO systems are likely to be deployed in dense urban scenarios, where increasing network capacity is the primary objective. A key component in mmWave transceiver design is channel estimation which is challenging due to the very large signal bandwidths (order of GHz) implying significant resolved spatial multipath, coupled with large # of Tx/Rx antennas for large-scale MIMO. This… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  25. arXiv:2206.12046  [pdf, other

    cs.CV cs.LG eess.IV

    Bilateral Network with Channel Splitting Network and Transformer for Thermal Image Super-Resolution

    Authors: Bo Yan, Leilei Cao, Fengliang Qi, Hongbin Wang

    Abstract: In recent years, the Thermal Image Super-Resolution (TISR) problem has become an attractive research topic. TISR would been used in a wide range of fields, including military, medical, agricultural and animal ecology. Due to the success of PBVS-2020 and PBVS-2021 workshop challenge, the result of TISR keeps improving and attracts more researchers to sign up for PBVS-2022 challenge. In this paper,… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: The second place solution for CVPR2022 PBVS-TISR challenge

  26. Multi-Access Point Coordination for Next-Gen Wi-Fi Networks Aided by Deep Reinforcement Learning

    Authors: Lyutianyang Zhang, Hao Yin, Sumit Roy, Liu Cao

    Abstract: Wi-Fi in the enterprise - characterized by overlap** Wi-Fi cells - constitutes the design challenge for next-generation networks. Standardization for recently started IEEE 802.11be (Wi-Fi 7) Working Groups has focused on significant medium access control layer changes that emphasize the role of the access point (AP) in radio resource management (RRM) for coordinating channel access due to the hi… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: To appear in IEEE Systems Journal. 12 pages, 13 figures

  27. arXiv:2205.10897  [pdf, other

    eess.SP

    Efficient PHY Layer Abstraction under Imperfect Channel Estimation

    Authors: Liu Cao, Lyutianyang Zhang, Sian **, Sumit Roy

    Abstract: As most existing work investigate the PHY layer abstraction under an assumption of perfect channel estimation, it may become unreliable if there exists channel estimation error in a real communication system. This letter improves an efficient PHY layer method, EESM-log-SGN PHY layer abstraction, by considering the presence of channel estimation error. We develop two methods for implementing the EE… ▽ More

    Submitted 8 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: Submitted to IEEE Wireless Communications Letters. 5 pages, 7 figures

  28. arXiv:2204.12736  [pdf

    cs.CV cs.LG eess.IV

    A Multi-Head Convolutional Neural Network With Multi-path Attention improves Image Denoising

    Authors: Jiahong Zhang, Meijun Qu, Ye Wang, Lihong Cao

    Abstract: Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input… ▽ More

    Submitted 3 November, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

  29. arXiv:2204.06746  [pdf, other

    eess.IV cs.CV

    Information fusion approach for biomass estimation in a plateau mountainous forest using a synergistic system comprising UAS-based digital camera and LiDAR

    Authors: Rong Huang, Wei Yao, Zhong Xu, Lin Cao, Xin Shen

    Abstract: Forest land plays a vital role in global climate, ecosystems, farming and human living environments. Therefore, forest biomass estimation methods are necessary to monitor changes in the forest structure and function, which are key data in natural resources research. Although accurate forest biomass measurements are important in forest inventory and assessments, high-density measurements that invol… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  30. arXiv:2203.02507  [pdf

    eess.IV physics.optics

    Parallel Fourier Ptychography reconstruction

    Authors: Guocheng Zhou, Shaohui Zhang, Yao Hu, Lei Cao, Yong Huang, Qun Hao

    Abstract: Fourier ptychography has attracted a wide range of focus for its ability of large space-bandwidth-produce, and quantative phase measurement. It is a typical computational imaging technique which refers to optimizing both the imaging hardware and reconstruction algorithms simultaneously. The data redundancy and inverse problem algorithms are the sources of FPM's excellent performance. But at the sa… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: 12 pages with 11 figures

  31. arXiv:2203.00008  [pdf

    physics.med-ph eess.IV physics.optics

    Learned end-to-end high-resolution lensless fiber imaging toward intraoperative real-time cancer diagnosis

    Authors: Jiachen Wu, Tijue Wang, Ortrud Uckermann, Roberta Galli, Gabriele Schackert, Liangcai Cao, Jürgen Czarske, Robert Kuschmierz

    Abstract: Endomicroscopy is indispensable for minimally invasive diagnostics in clinical practice. For optical keyhole monitoring of surgical interventions, high-resolution fiber endoscopic imaging is considered to be very promising, especially in combination with label-free imaging techniques to realize in vivo diagnosis. However, the inherent honeycomb-artifacts of coherent fiber bundles (CFB) reduce the… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

  32. arXiv:2202.10239  [pdf, other

    physics.optics eess.IV

    Fourier ptychography multi-parameter neural network with composite physical priori optimization

    Authors: Delong Yang, Shaohui Zhang, Chuanjian Zheng, Guocheng Zhou, Lei Cao, Yao Hu, Qun Hao

    Abstract: Fourier ptychography microscopy(FP) is a recently developed computational imaging approach for microscopic super-resolution imaging. By turning on each light-emitting-diode (LED) located on different position on the LED array sequentially and acquiring the corresponding images that contain different spatial frequency components, high spatial resolution and quantitative phase imaging can be achieve… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 13 pages, 12 figures, solving inverse problem of computational imaging by neural network

  33. arXiv:2112.12055  [pdf

    physics.optics eess.IV physics.bio-ph q-bio.QM

    Quantitative phase imaging through an ultra-thin lensless fiber endoscope

    Authors: Jiawei Sun, Jiachen Wu, Song Wu, Liangcai Cao, Ruchi Goswami, Salvatore Girardo, Jochen Guck, Nektarios Koukourakis, Juergen W. Czarske

    Abstract: Quantitative phase imaging (QPI) is a label-free technique providing both morphology and quantitative biophysical information in biomedicine. However, applying such a powerful technique to in vivo pathological diagnosis remains challenging. Multi-core fiber bundles (MCFs) enable ultra-thin probes for in vivo imaging, but current MCF imaging techniques are limited to amplitude imaging modalities. W… ▽ More

    Submitted 6 July, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: 16pages, 6 figures

  34. arXiv:2111.12758  [pdf

    physics.optics cs.AI eess.IV physics.bio-ph

    Lensless multicore-fiber microendoscope for real-time tailored light field generation with phase encoder neural network (CoreNet)

    Authors: Jiawei Sun, Jiachen Wu, Nektarios Koukourakis, Robert Kuschmierz, Liangcai Cao, Juergen Czarske

    Abstract: The generation of tailored light with multi-core fiber (MCF) lensless microendoscopes is widely used in biomedicine. However, the computer-generated holograms (CGHs) used for such applications are typically generated by iterative algorithms, which demand high computation effort, limiting advanced applications like in vivo optogenetic stimulation and fiber-optic cell manipulation. The random and di… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  35. arXiv:2110.03841  [pdf, ps, other

    eess.AS cs.CL

    Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition

    Authors: Zhiyun Lu, Yanwei Pan, Thibault Doutre, Parisa Haghani, Liangliang Cao, Rohit Prabhavalkar, Chao Zhang, Trevor Strohman

    Abstract: End-to-end models have achieved state-of-the-art results on several automatic speech recognition tasks. However, they perform poorly when evaluated on long-form data, e.g., minutes long conversational telephony audio. One reason the model fails on long-form speech is that it has only seen short utterances during training. In this paper we study the effect of training utterance length on the word e… ▽ More

    Submitted 1 April, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: submitted to INTERSPEECH 2022

  36. arXiv:2110.03327  [pdf, other

    eess.AS cs.LG

    Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

    Authors: Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

    Abstract: As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions,… ▽ More

    Submitted 2 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted as a conference paper at ICASSP 2022

  37. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  38. arXiv:2109.05496  [pdf

    eess.IV cs.CV

    A Complex Constrained Total Variation Image Denoising Algorithm with Application to Phase Retrieval

    Authors: Yunhui Gao, Liangcai Cao

    Abstract: This paper considers the constrained total variation (TV) denoising problem for complex-valued images. We extend the definition of TV seminorms for real-valued images to dealing with complex-valued ones. In particular, we introduce two types of complex TV in both isotropic and anisotropic forms. To solve the constrained denoising problem, we adopt a dual approach and derive an accelerated gradient… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: 11 pages, 7 figures

  39. arXiv:2104.14346  [pdf, other

    cs.CL cs.SD eess.AS

    Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

    Authors: Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

    Abstract: Streaming end-to-end automatic speech recognition (ASR) systems are widely used in everyday applications that require transcribing speech to text in real-time. Their minimal latency makes them suitable for such tasks. Unlike their non-streaming counterparts, streaming models are constrained to be causal with no future context and suffer from higher word error rates (WER). To improve streaming mode… ▽ More

    Submitted 25 April, 2021; originally announced April 2021.

  40. arXiv:2104.12870  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

    Authors: David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw

    Abstract: Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to joi… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021

  41. arXiv:2104.02757  [pdf, other

    eess.AS cs.LG cs.SD

    Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

    Authors: Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao

    Abstract: Although end-to-end automatic speech recognition (e2e ASR) models are widely deployed in many applications, there have been very few studies to understand models' robustness against adversarial perturbations. In this paper, we explore whether a targeted universal perturbation vector exists for e2e ASR models. Our goal is to find perturbations that can mislead the models to predict the given target… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH 2021

  42. arXiv:2103.14152  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Residual Energy-Based Models for End-to-End Speech Recognition

    Authors: Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland

    Abstract: End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR). These models formulate the sequence-level probability as a product of the conditional probabilities of all individual tokens given their histories. However, the performance of locally normalised models can be sub-optimal because of factors such as exposure bias. Consequently, the m… ▽ More

    Submitted 23 June, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: To appear in Proc. Interspeech 2021

  43. arXiv:2103.06716  [pdf, other

    eess.AS cs.CL cs.LG

    Learning Word-Level Confidence For Subword End-to-End ASR

    Authors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw

    Abstract: We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR). Although prior works have proposed training auxiliary confidence models for ASR systems, they do not extend naturally to systems that operate on word-pieces (WP) as their vocabulary. In particular, ground truth WP correctness labels are needed for training confi… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: To appear in ICASSP 2021

  44. RIS-Aided Wireless Communications: Prototy**, Adaptive Beamforming, and Indoor/Outdoor Field Trials

    Authors: Xilong Pei, Haifan Yin, Li Tan, Lin Cao, Zhanpeng Li, Kai Wang, Kun Zhang, Emil Björnson

    Abstract: The prospects of using a Reconfigurable Intelligent Surface (RIS) to aid wireless communication systems have recently received much attention from academia and industry. Most papers make theoretical studies based on elementary models, while the prototy** of RIS-aided wireless communication and real-world field trials are scarce. In this paper, we describe a new RIS prototype consisting of 1100 c… ▽ More

    Submitted 31 July, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: 13 pages, 18 figures, submitted

  45. arXiv:2012.02381  [pdf, other

    cs.CV eess.IV

    Generator Pyramid for High-Resolution Image Inpainting

    Authors: Leilei Cao, Tong Yang, Yixu Wang, Bo Yan, Yandong Guo

    Abstract: Inpainting high-resolution images with large holes challenges existing deep learning based image inpainting methods. We present a novel framework -- PyramidFill for high-resolution image inpainting task, which explicitly disentangles content completion and texture synthesis. PyramidFill attempts to complete the content of unknown regions in a lower-resolution image, and synthesis the textures of u… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: Under review

  46. arXiv:2010.12096  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

    Authors: Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

    Abstract: Streaming end-to-end automatic speech recognition (ASR) models are widely used on smart speakers and on-device applications. Since these models are expected to transcribe speech with minimal latency, they are constrained to be causal with no future context, compared to their non-streaming counterparts. Consequently, streaming models usually perform worse than non-streaming models. We propose a nov… ▽ More

    Submitted 21 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  47. arXiv:2010.11428  [pdf, other

    eess.AS cs.CL cs.LG

    Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

    Authors: Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman

    Abstract: For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based seq… ▽ More

    Submitted 23 October, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Submitted to ICASSP 2021

  48. arXiv:2007.05927  [pdf, other

    cs.RO cs.HC eess.SY

    A Three-limb Teleoperated Robotic System with Foot Control for Flexible Endoscopic Surgery

    Authors: Yanpei Huang, Wenjie Lai, Lin Cao, Jiajun Liu, Xiaoguo Li, Etienne Burdet, Soo Jay Phee

    Abstract: Flexible endoscopy requires high skills to manipulate both the endoscope and associated instruments. In most robotic flexible endoscopic systems, the endoscope and instruments are controlled separately by two operators, which may result in communication errors and inefficient operation. We present a novel teleoperation robotic endoscopic system that can be commanded by a surgeon alone. This 13 deg… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Comments: 9 pages, 11 figures

  49. arXiv:2005.03271  [pdf, other

    eess.AS cs.CL

    RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

    Authors: Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

    Abstract: In recent years, all-neural end-to-end approaches have obtained state-of-the-art results on several challenging automatic speech recognition (ASR) tasks. However, most existing works focus on building ASR models where train and test data are drawn from the same domain. This results in poor generalization characteristics on mismatched-domains: e.g., end-to-end models trained on short segments perfo… ▽ More

    Submitted 23 December, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: SLT camera-ready version

  50. arXiv:1911.09762  [pdf, other

    cs.CL cs.LG eess.AS

    Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

    Authors: Zhiyun Lu, Liangliang Cao, Yu Zhang, Chung-Cheng Chiu, James Fan

    Abstract: In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task. We show that end-to-end ASR features, which integrate both acoustic and text information from speech, achieve promising results. We use RNN with self-attention as the sentiment classifier, which also provides an easy visualization through attention weights to h… ▽ More

    Submitted 4 March, 2020; v1 submitted 21 November, 2019; originally announced November 2019.