Skip to main content

Showing 1–50 of 56 results for author: Gao, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.03905  [pdf, other

    cs.AR cs.CV cs.SD eess.AS

    A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

    Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network (ΔRNN) cla… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  2. arXiv:2404.15364  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    MP-DPD: Low-Complexity Mixed-Precision Neural Networks for Energy-Efficient Digital Predistortion of Wideband Power Amplifiers

    Authors: Yizhuo Wu, Ang Li, Mohammadreza Beikmirza, Gagan Deep Singh, Qinyu Chen, Leo C. N. de Vreede, Morteza Alavi, Chang Gao

    Abstract: Digital Pre-Distortion (DPD) enhances signal quality in wideband RF power amplifiers (PAs). As signal bandwidths expand in modern radio systems, DPD's energy consumption increasingly impacts overall system efficiency. Deep Neural Networks (DNNs) offer promising advancements in DPD, yet their high complexity hinders their practical deployment. This paper introduces open-source mixed-precision (MP)… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)

  3. arXiv:2404.01479  [pdf, other

    physics.optics eess.SP

    Information Processing in Hybrid Photonic Electrical Reservoir Computing

    Authors: Prabhav Gaur, Chengkuan Gao, Karl Johnson, Shimon Rubin, Yeshaiahu Fainman, Tzu-Chien Hsueh

    Abstract: Physical Reservoir Computing (PRC) is a recently developed variant of Neuromorphic Computing, where a pertinent physical system effectively projects information encoded in the input signal into a higher-dimensional space. While various physical hardware has demonstrated promising results for Reservoir Computing (RC), systems allowing tunability of their dynamical regimes have not received much att… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  4. arXiv:2403.18992  [pdf

    eess.IV

    Tractography with T1-weighted MRI and associated anatomical constraints on clinical quality diffusion MRI

    Authors: Tian Yu, Yunhe Li, Michael E. Kim, Chenyu Gao, Qi Yang, Leon Y. Cai, Susane M. Resnick, Lori L. Beason-Held, Daniel C. Moyer, Kurt G. Schilling, Bennett A. Landman

    Abstract: Diffusion MRI (dMRI) streamline tractography, the gold standard for in vivo estimation of brain white matter (WM) pathways, has long been considered indicative of macroscopic relationships with WM microstructure. However, recent advances in tractography demonstrated that convolutional recurrent neural networks (CoRNN) trained with a teacher-student framework have the ability to learn and propagate… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  5. arXiv:2403.05937  [pdf, other

    cs.CV eess.IV

    Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

    Authors: Cunhui Dong, Haichuan Ma, Haotian Zhang, Changsheng Gao, Li Li, Dong Liu

    Abstract: Neural network-based image coding has been develo** rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, t… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  6. arXiv:2402.09424  [pdf, other

    eess.SP cs.CV cs.LG cs.NE

    Epilepsy Seizure Detection and Prediction using an Approximate Spiking Convolutional Transformer

    Authors: Qinyu Chen, Congyi Sun, Chang Gao, Shih-Chii Liu

    Abstract: Epilepsy is a common disease of the nervous system. Timely prediction of seizures and intervention treatment can significantly reduce the accidental injury of patients and protect the life and health of patients. This paper presents a neuromorphic Spiking Convolutional Transformer, named Spiking Conformer, to detect and predict epileptic seizure segments from scalped long-term electroencephalogram… ▽ More

    Submitted 21 January, 2024; originally announced February 2024.

    Comments: To be published at the 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore

  7. arXiv:2402.00803  [pdf, other

    cs.LG eess.SP

    Signal Quality Auditing for Time-series Data

    Authors: Chufan Gao, Nicholas Gisolfi, Artur Dubrawski

    Abstract: Signal quality assessment (SQA) is required for monitoring the reliability of data acquisition systems, especially in AI-driven Predictive Maintenance (PMx) application contexts. SQA is vital for addressing "silent failures" of data acquisition hardware and software, which when unnoticed, misinform the users of data, creating the risk for incorrect decisions with unintended or even catastrophic co… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  8. Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models

    Authors: Chenyang Gao, Brecht Desplanques, Chelsea J. -T. Ju, Aman Chadha, Andreas Stolcke

    Abstract: Automated speaker identification (SID) is a crucial step for the personalization of a wide range of speech-enabled services. Typical SID systems use a symmetric enrollment-verification framework with a single model to derive embeddings both offline for voice profiles extracted from enrollment utterances, and online from runtime utterances. Due to the distinct circumstances of enrollment and runtim… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  9. arXiv:2401.08318  [pdf, other

    cs.LG eess.SP

    OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion

    Authors: Yizhuo Wu, Gagan Deep Singh, Mohammadreza Beikmirza, Leo C. N. de Vreede, Morteza Alavi, Chang Gao

    Abstract: With the rise in communication capacity, deep neural networks (DNN) for digital pre-distortion (DPD) to correct non-linearity in wideband power amplifiers (PAs) have become prominent. Yet, there is a void in open-source and measurement-setup-independent platforms for fast DPD exploration and objective DPD model comparison. This paper presents an open-source framework, OpenDPD, crafted in PyTorch,… ▽ More

    Submitted 24 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: To be published at the 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore

  10. arXiv:2401.06798  [pdf

    q-bio.NC eess.IV

    Evaluation of Mean Shift, ComBat, and CycleGAN for Harmonizing Brain Connectivity Matrices Across Sites

    Authors: Hanliang Xu, Nancy R. Newlin, Michael E. Kim, Chenyu Gao, Praitayini Kanakaraj, Aravind R. Krishnan, Lucas W. Remedios, Nazirah Mohd Khairi, Kimberly Pechman, Derek Archer, Timothy J. Hohman, Angela L. Jefferson, The BIOCARD Study Team, Ivana Isgum, Yuankai Huo, Daniel Moyer, Kurt G. Schilling, Bennett A. Landman

    Abstract: Connectivity matrices derived from diffusion MRI (dMRI) provide an interpretable and generalizable way of understanding the human brain connectome. However, dMRI suffers from inter-site and between-scanner variation, which impedes analysis across datasets to improve robustness and reproducibility of results. To evaluate different harmonization approaches on connectivity matrices, we compared graph… ▽ More

    Submitted 24 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: 11 pages, 5 figures, to be published in SPIE Medical Imaging 2024: Image Processing

  11. arXiv:2312.16987  [pdf

    cs.CV cs.GR eess.IV

    Image Quality, Uniformity and Computation Improvement of Compressive Light Field Displays with U-Net

    Authors: Chen Gao, Haifeng Li, Xu Liu, Xiaodi Tan

    Abstract: We apply the U-Net model for compressive light field synthesis. Compared to methods based on stacked CNN and iterative algorithms, this method offers better image quality, uniformity and less computation.

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 4 pages, 6 figures, conference

    MSC Class: 78-06 ACM Class: I.3.7

  12. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  13. arXiv:2312.03284  [pdf

    eess.SP

    Adaptive Multi-band Modulation for Robust and Low-complexity Faster-than-Nyquist Non-Orthogonal FDM IM-DD System

    Authors: Peiji Song, Zhouyi Hu, Yizhan Dai, Yuan Liu, Chao Gao, Chun-Kit Chan

    Abstract: Faster-than-Nyquist non-orthogonal frequency-division multiplexing (FTN-NOFDM) is robust against the steep frequency roll-off by saving signal bandwidth. Among the FTN-NOFDM techniques, the non-orthogonal matrix precoding (NOM-p) based FTN has high compatibility with the conventional orthogonal frequency division multiplexing (OFDM), in terms of the advanced digital signal processing already used… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  14. arXiv:2311.12199  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation

    Authors: Chenyang Gao, Yue Gu, Ivan Marsic

    Abstract: In supervised speech separation, permutation invariant training (PIT) is widely used to handle label ambiguity by selecting the best permutation to update the model. Despite its success, previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments. To address this issue, we propose a novel training stra… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted by INTERSPEECH 2023

  15. arXiv:2311.03500  [pdf

    eess.IV cs.CV q-bio.NC

    Predicting Age from White Matter Diffusivity with Residual Learning

    Authors: Chenyu Gao, Michael E. Kim, Ho Hin Lee, Qi Yang, Nazirah Mohd Khairi, Praitayini Kanakaraj, Nancy R. Newlin, Derek B. Archer, Angela L. Jefferson, Warren D. Taylor, Brian D. Boyd, Lori L. Beason-Held, Susan M. Resnick, The BIOCARD Study Team, Yuankai Huo, Katherine D. Van Schaik, Kurt G. Schilling, Daniel Moyer, Ivana Išgum, Bennett A. Landman

    Abstract: Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for develo** biomarkers that are sensitive to such deviations. Complementary to structural analysis,… ▽ More

    Submitted 21 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: SPIE Medical Imaging: Image Processing. San Diego, CA. February 2024 (accepted as poster presentation)

  16. arXiv:2311.02842  [pdf, other

    eess.IV eess.SP

    An invariant feature extraction for multi-modal images matching

    Authors: Chenzhong Gao, Wei Li

    Abstract: This paper aims at providing an effective multi-modal images invariant feature extraction and matching algorithm for the application of multi-source data analysis. Focusing on the differences and correlation of multi-modal images, a feature-based matching algorithm is implemented. The key technologies include phase congruency (PC) and Shi-Tomasi feature point for keypoints detection, LogGabor filt… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  17. arXiv:2310.17190  [pdf, other

    cs.CV eess.IV

    Lookup Table meets Local Laplacian Filter: Pyramid Reconstruction Network for Tone Map**

    Authors: Feng Zhang, Ming Tian, Zhiqiang Li, Bin Xu, Qingbo Lu, Changxin Gao, Nong Sang

    Abstract: Tone map** aims to convert high dynamic range (HDR) images to low dynamic range (LDR) representations, a critical task in the camera imaging pipeline. In recent years, 3-Dimensional LookUp Table (3D LUT) based methods have gained attention due to their ability to strike a favorable balance between enhancement performance and computational efficiency. However, these methods often fail to deliver… ▽ More

    Submitted 3 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: 12 pages, 6 figures, accepted by NeurlPS 2023

  18. arXiv:2310.09071  [pdf, other

    cs.LG eess.SY

    Online Relocating and Matching of Ride-Hailing Services: A Model-Based Modular Approach

    Authors: Chang Gao, Xi Lin, Fang He, Xindi Tang

    Abstract: This study proposes an innovative model-based modular approach (MMA) to dynamically optimize order matching and vehicle relocation in a ride-hailing platform. MMA utilizes a two-layer and modular modeling structure. The upper layer determines the spatial transfer patterns of vehicle flow within the system to maximize the total revenue of the current and future stages. With the guidance provided by… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  19. arXiv:2309.12953  [pdf

    eess.IV cs.CV

    Inter-vendor harmonization of Computed Tomography (CT) reconstruction kernels using unpaired image translation

    Authors: Aravind R. Krishnan, Kaiwen Xu, Thomas Li, Chenyu Gao, Lucas W. Remedios, Praitayini Kanakaraj, Ho Hin Lee, Shunxing Bao, Kim L. Sandler, Fabien Maldonado, Ivana Isgum, Bennett A. Landman

    Abstract: The reconstruction kernel in computed tomography (CT) generation determines the texture of the image. Consistency in reconstruction kernels is important as the underlying CT texture can impact measurements during quantitative image analysis. Harmonization (i.e., kernel conversion) minimizes differences in measurements due to inconsistent reconstruction kernels. Existing methods investigate harmoni… ▽ More

    Submitted 26 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: 10 pages, 6 figures, 1 table, Submitted to SPIE Medical Imaging : Image Processing. San Diego, CA. February 2024

  20. arXiv:2307.16508  [pdf, other

    cs.CV cs.MM eess.IV

    Towards General Low-Light Raw Noise Synthesis and Modeling

    Authors: Feng Zhang, Bin Xu, Zhiqiang Li, Xinran Liu, Qingbo Lu, Changxin Gao, Nong Sang

    Abstract: Modeling and synthesizing low-light raw noise is a fundamental problem for computational photography and image processing applications. Although most recent works have adopted physics-based models to synthesize noise, the signal-independent noise in low-light conditions is far more complicated and varies dramatically across camera sensors, which is beyond the description of these models. To addres… ▽ More

    Submitted 17 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: 11 pages, 7 figures. Accepted by ICCV 2023

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 10820-10830

  21. arXiv:2307.02953  [pdf, other

    eess.IV cs.CV cs.LG

    SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks

    Authors: Junlong Cheng, Chengrui Gao, Fengjie Wang, Min Zhu

    Abstract: Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexit… ▽ More

    Submitted 21 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

  22. arXiv:2306.07505  [pdf

    q-bio.TO eess.IV

    Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

    Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

    Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  23. arXiv:2304.11316  [pdf, other

    physics.optics eess.IV

    Iterative fluctuation ghost imaging

    Authors: Huan Zhao, Xiao-Qian Wang, Chao Gao, Zhuo Yu, Hong Wang, Yu Wang, Li-Dan Gou, Zhi-Hai Yao

    Abstract: We present a new technique, iterative fluctuation ghost imaging (IFGI) which dramatically enhances the resolution of ghost imaging (GI). It is shown that, by the fluctuation characteristics of the second-order correlation function, the imaging information with the narrower point spread function (PSF) than the original information can be got. The effects arising from the PSF and the iteration times… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

  24. arXiv:2303.17867  [pdf, other

    cs.CV cs.LG eess.IV

    CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer

    Authors: Linfeng Wen, Chengying Gao, Changqing Zou

    Abstract: Content affinity loss including feature and pixel affinity is a main problem which leads to artifacts in photorealistic and video style transfer. This paper proposes a new framework named CAP-VSTNet, which consists of a new reversible residual network and an unbiased linear transform module, for versatile style transfer. This reversible residual network can not only preserve content affinity but n… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  25. arXiv:2303.04255  [pdf, other

    cs.SD cs.LG eess.AS

    Self-supervised speech representation learning for keyword-spotting with light-weight transformers

    Authors: Chenyang Gao, Yue Gu, Francesco Caliva, Yuzong Liu

    Abstract: Self-supervised speech representation learning (S3RL) is revolutionizing the way we leverage the ever-growing availability of data. While S3RL related studies typically use large models, we employ light-weight networks to comply with tight memory of compute-constrained devices. We demonstrate the effectiveness of S3RL on a keyword-spotting (KS) problem by using transformers with 330k parameters an… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  26. arXiv:2302.13222  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Corpora Divergence Based Unsupervised Data Selection for ASR

    Authors: Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

    Abstract: Selecting application scenarios matching data is important for the automatic speech recognition (ASR) training, but it is difficult to measure the matching degree of the training corpus. This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora. We first use the self-supervised Hubert… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  27. arXiv:2209.09244  [pdf, other

    eess.IV cs.CV cs.LG

    Flexible Neural Image Compression via Code Editing

    Authors: Chenjian Gao, Tongda Xu, Dailan He, Hongwei Qin, Yan Wang

    Abstract: Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility.… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022

  28. arXiv:2208.00693  [pdf, other

    cs.AR cs.SD eess.AS

    A 23 $μ$W Keyword Spotting IC with Ring-Oscillator-Based Time-Domain Feature Extraction

    Authors: Kwantae Kim, Chang Gao, Rui Graça, Ilya Kiselev, Hoi-Jun Yoo, Tobi Delbruck, Shih-Chii Liu

    Abstract: This article presents the first keyword spotting (KWS) IC which uses a ring-oscillator-based time-domain processing technique for its analog feature extractor (FEx). Its extensive usage of time-encoding schemes allows the analog audio signal to be processed in a fully time-domain manner except for the voltage-to-time conversion stage of the analog front-end. Benefiting from fundamental building bl… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: 14 pages, 21 figures, 2 tables

  29. arXiv:2206.07219  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    A Projection-Based K-space Transformer Network for Undersampled Radial MRI Reconstruction with Limited Training Subjects

    Authors: Chang Gao, Shu-Fu Shih, J. Paul Finn, Xiaodong Zhong

    Abstract: The recent development of deep learning combined with compressed sensing enables fast reconstruction of undersampled MR images and has achieved state-of-the-art performance for Cartesian k-space trajectories. However, non-Cartesian trajectories such as the radial trajectory need to be transformed onto a Cartesian grid in each iteration of the network training, slowing down the training process and… ▽ More

    Submitted 25 July, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted at MICCAI 2022

  30. arXiv:2206.06127  [pdf, other

    eess.IV cs.CV cs.LG

    SyntheX: Scaling Up Learning-based X-ray Image Analysis Through In Silico Experiments

    Authors: Cong Gao, Benjamin D. Killeen, Yicheng Hu, Robert B. Grupp, Russell H. Taylor, Mehran Armand, Mathias Unberath

    Abstract: Artificial intelligence (AI) now enables automated interpretation of medical images for clinical use. However, AI's potential use for interventional images (versus those involved in triage or diagnosis), such as for guidance during surgery, remains largely untapped. This is because surgical AI systems are currently trained using post hoc analysis of data collected during live surgeries, which has… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

  31. arXiv:2205.14501  [pdf, other

    eess.IV

    PO-ELIC: Perception-Oriented Efficient Learned Image Coding

    Authors: Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, Yan Wang

    Abstract: In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, w… ▽ More

    Submitted 28 May, 2022; originally announced May 2022.

    Comments: CVPR2022 Workshop, 5-th CLIC Image Compression Track

  32. Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation

    Authors: Yue Zhao, Lingming Zhang, Yang Liu, Deyu Meng, Zhiming Cui, Chenqiang Gao, Xinbo Gao, Chunfeng Lian, Dinggang Shen

    Abstract: Precise segmentation of teeth from intra-oral scanner images is an essential task in computer-aided orthodontic surgical planning. The state-of-the-art deep learning-based methods often simply concatenate the raw geometric attributes (i.e., coordinates and normal vectors) of mesh cells to train a single-stream network for automatic intra-oral scanner image segmentation. However, since different ra… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2012.13697

    Journal ref: IEEE Transactions on Medical Images, 41(4): 826-835, 2022

  33. MS-HLMO: Multi-scale Histogram of Local Main Orientation for Remote Sensing Image Registration

    Authors: Chenzhong Gao, Wei Li, Ran Tao, Qian Du

    Abstract: Multi-source image registration is challenging due to intensity, rotation, and scale differences among the images. Considering the characteristics and differences of multi-source remote sensing images, a feature-based registration algorithm named Multi-scale Histogram of Local Main Orientation (MS-HLMO) is proposed. Harris corner detection is first adopted to generate feature points. The HLMO feat… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

  34. arXiv:2202.10916  [pdf, other

    cs.LG cs.AI eess.SP

    Remaining Useful Life Prediction Using Temporal Deep Degradation Network for Complex Machinery with Attention-based Feature Extraction

    Authors: Yuwen Qin, Ningbo Cai, Chen Gao, Yadong Zhang, Yonghong Cheng, Xin Chen

    Abstract: The precise estimate of remaining useful life (RUL) is vital for the prognostic analysis and predictive maintenance that can significantly reduce failure rate and maintenance costs. The degradation-related features extracted from the sensor streaming data with neural networks can dramatically improve the accuracy of the RUL prediction. The Temporal deep degradation network (TDDN) model is proposed… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  35. arXiv:2202.06707  [pdf, other

    eess.SP cs.CV cs.LG cs.SD eess.AS eess.SY

    Spiking Cochlea with System-level Local Automatic Gain Control

    Authors: Ilya Kiselev, Chang Gao, Shih-Chii Liu

    Abstract: Including local automatic gain control (AGC) circuitry into a silicon cochlea design has been challenging because of transistor mismatch and model complexity. To address this, we present an alternative system-level algorithm that implements channel-specific AGC in a silicon spiking cochlea by measuring the output spike activity of individual channels. The bandpass filter gain of a channel is adapt… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted for publication at the IEEE Transactions on Circuits and Systems I - Regular Papers, 2022

  36. arXiv:2112.12522  [pdf, other

    cs.SD cs.CL eess.AS

    Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition

    Authors: Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang

    Abstract: Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades significantly in far-field and noisy environments. The recent development of self-supervised learning (SSL) technology can improve the ASR performance by pre-training the model with additional unlabeled speech and the SSL pre-trained model has achieved the state-of-the-art result on several speech benchm… ▽ More

    Submitted 4 May, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: 6 pages, 3 figures

  37. arXiv:2112.01766  [pdf, other

    cs.CV eess.IV

    Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior

    Authors: Feng Zhang, Yuanjie Shao, Yishi Sun, Kai Zhu, Changxin Gao, Nong Sang

    Abstract: Deep learning-based methods for low-light image enhancement typically require enormous paired training data, which are impractical to capture in real-world scenarios. Recently, unsupervised approaches have been explored to eliminate the reliance on paired training data. However, they perform erratically in diverse real-world scenarios due to the absence of priors. To address this issue, we propose… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: submitted to IEEE Transactions on Image Processing

  38. arXiv:2110.14484  [pdf

    eess.IV cs.CV cs.LG

    PL-Net: Progressive Learning Network for Medical Image Segmentation

    Authors: Junlong Cheng, Chengrui Gao, Hongchun Lu, Zhangqiang Ming, Yong Yang, Min Zhu

    Abstract: In recent years, segmentation methods based on deep convolutional neural networks (CNNs) have made state-of-the-art achievements for many medical analysis tasks. However, most of these approaches improve performance by optimizing the structure or adding new functional modules of the U-Net, which ignoring the complementation and fusion of the coarse-grained and fine-grained semantic information. To… ▽ More

    Submitted 29 August, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

  39. arXiv:2110.10593  [pdf

    cs.SD cs.LG eess.AS

    Progressive Learning for Stabilizing Label Selection in Speech Separation with Map**-based Method

    Authors: Chenyang Gao, Yue Gu, Ivan Marsic

    Abstract: Speech separation has been studied in time domain because of lower latency and higher performance compared to time-frequency domain. The masking-based method has been mostly used in time domain, and the other common method (map**-based) has been inadequately studied. We investigate the use of the map**-based method in the time domain and show that it can perform better on a large training set… ▽ More

    Submitted 21 March, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: Submitted to Interspeech 2022

  40. arXiv:2108.06054  [pdf, other

    eess.IV

    Local Patch Network with Global Attention for Infrared Small Target Detection

    Authors: Fang Chen, Chenqiang Gao, Fangcen Liu, Yue Zhao, Yuxi Zhou, Deyu Meng, Wangmeng Zuo

    Abstract: Infrared small target detection plays an important role in the infrared search and tracking applications. In recent years, deep learning techniques were introduced to this task and achieved noteworthy effects. Following general object segmentation methods, existing deep learning methods usually processed the image from the global view. However, the imaging locality of small targets and extreme cla… ▽ More

    Submitted 29 September, 2021; v1 submitted 13 August, 2021; originally announced August 2021.

    Comments: 11 pages, 7 figures

  41. arXiv:2107.05254  [pdf, ps, other

    eess.SP

    Asymptotic analysis of V-BLAST MIMO for coherent optical wireless communications in Gamma-Gamma turbulence

    Authors: Yiming Li, Chao Gao, Mark S. Leeson, Xiaofeng Li

    Abstract: This paper investigates the asymptotic BER performance of coherent optical wireless communication systems in Gamma-Gamma turbulence when applying the V-BLAST MIMO scheme. A new method is proposed to quantify the performance of the system and mathematical solutions for asymptotic BER performance are derived. Counterintuitive results are shown since the diversity gain of the V-BLAST MIMO system is e… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  42. arXiv:2104.12572  [pdf, other

    eess.IV

    Multi-scale PIIFD for Registration of Multi-source Remote Sensing Images

    Authors: Chenzhong Gao, Wei Li

    Abstract: This paper aims at providing multi-source remote sensing images registered in geometric space for image fusion. Focusing on the characteristics and differences of multi-source remote sensing images, a feature-based registration algorithm is implemented. The key technologies include image scale-space for implementing multi-scale properties, Harris corner detection for keypoints extraction, and part… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

  43. arXiv:2009.02528  [pdf, other

    stat.AP eess.SP

    Structured Sparsity Modeling for Improved Multivariate Statistical Analysis based Fault Isolation

    Authors: Wei Chen, Jiusun Zeng, Xiaobin Xu, Shihua Luo, Chuanhou Gao

    Abstract: In order to improve the fault diagnosis capability of multivariate statistical methods, this article introduces a fault isolation framework based on structured sparsity modeling. The developed method relies on the reconstruction based contribution analysis and the process structure information can be incorporated into the reconstruction objective function in the form of structured sparsity regular… ▽ More

    Submitted 21 December, 2020; v1 submitted 5 September, 2020; originally announced September 2020.

    Comments: 36 pages, 12 figures

  44. A Learning-based Method for Online Adjustment of C-arm Cone-Beam CT Source Trajectories for Artifact Avoidance

    Authors: Mareike Thies, Jan-Nico Zäch, Cong Gao, Russell Taylor, Nassir Navab, Andreas Maier, Mathias Unberath

    Abstract: During spinal fusion surgery, screws are placed close to critical nerves suggesting the need for highly accurate screw placement. Verifying screw placement on high-quality tomographic imaging is essential. C-arm Cone-beam CT (CBCT) provides intraoperative 3D tomographic imaging which would allow for immediate verification and, if needed, revision. However, the reconstruction quality attainable wit… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

    Comments: 12 pages

    Journal ref: Int. J. CARS 15 (2020) 1787-1796

  45. arXiv:2006.01435  [pdf, other

    cs.CV eess.IV

    Recapture as You Want

    Authors: Chen Gao, Si Liu, Ran He, Shuicheng Yan, Bo Li

    Abstract: With the increasing prevalence and more powerful camera systems of mobile devices, people can conveniently take photos in their daily life, which naturally brings the demand for more intelligent photo post-processing techniques, especially on those portrait photos. In this paper, we present a portrait recapture method enabling users to easily edit their portrait to desired posture/view, body figur… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: 14 pages

  46. arXiv:2002.03197  [pdf, other

    eess.SY cs.RO eess.SP

    Recurrent Neural Network Control of a Hybrid Dynamic Transfemoral Prosthesis with EdgeDRNN Accelerator

    Authors: Chang Gao, Rachel Gehlhar, Aaron D. Ames, Shih-Chii Liu, Tobi Delbruck

    Abstract: Lower leg prostheses could improve the life quality of amputees by increasing comfort and reducing energy to locomote, but currently control methods are limited in modulating behaviors based upon the human's experience. This paper describes the first steps toward learning complex controllers for dynamical robotic assistive devices. We provide the first example of behavioral cloning to control a po… ▽ More

    Submitted 28 July, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: Accepted at 2020 International Conference on Robotics and Automation (ICRA 2020)

    Journal ref: 2020 IEEE International Conference on Robotics and Automation (ICRA)

  47. arXiv:2001.08290  [pdf, other

    eess.AS cs.LG cs.NE cs.SD stat.ML

    Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

    Authors: Haoran Miao, Gaofeng Cheng, Changfeng Gao, Pengyuan Zhang, Yonghong Yan

    Abstract: Recently, Transformer has gained success in automatic speech recognition (ASR) field. However, it is challenging to deploy a Transformer-based end-to-end (E2E) model for online speech recognition. In this paper, we propose the Transformer-based online CTC/attention E2E ASR architecture, which contains the chunk self-attention encoder (chunk-SAE) and the monotonic truncated attention (MTA) based se… ▽ More

    Submitted 11 February, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: Accepted by ICASSP 2020

  48. EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference

    Authors: Chang Gao, Antonio Rios-Navarro, Xi Chen, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper presents a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) accelerator called EdgeDRNN designed for portable edge computing. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. It reduces off-chip memory access by a factor of up to 10x with tolerable accuracy loss. Experimental results on a 10 million paramete… ▽ More

    Submitted 28 July, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

    Comments: This paper has been accepted for publication at the IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genoa, 2020

    Journal ref: 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)

  49. arXiv:1912.02037  [pdf, other

    cs.CV eess.IV

    AdversarialNAS: Adversarial Neural Architecture Search for GANs

    Authors: Chen Gao, Yunpeng Chen, Si Liu, Zhenxiong Tan, Shuicheng Yan

    Abstract: Neural Architecture Search (NAS) that aims to automate the procedure of architecture design has achieved promising results in many computer vision fields. In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation. The AdversarialNAS is the first method that… ▽ More

    Submitted 8 April, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted to CVPR 2020

  50. Automatic Annotation of Hip Anatomy in Fluoroscopy for Robust and Efficient 2D/3D Registration

    Authors: Robert Grupp, Mathias Unberath, Cong Gao, Rachel Hegeman, Ryan Murphy, Clayton Alexander, Yoshito Otake, Benjamin McArthur, Mehran Armand, Russell Taylor

    Abstract: Fluoroscopy is the standard imaging modality used to guide hip surgery and is therefore a natural sensor for computer-assisted navigation. In order to efficiently solve the complex registration problems presented during navigation, human-assisted annotations of the intraoperative image are typically required. This manual initialization interferes with the surgical workflow and diminishes any advan… ▽ More

    Submitted 18 March, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

    Comments: Revised article to address reviewer comments. Accepted to IPCAI 2020. Supplementary video at https://youtu.be/5AwGlNkcp9o and dataset/code at https://github.com/rg2/DeepFluoroLabeling-IPCAI2020

    Journal ref: International Journal of Computer Assisted Radiology and Surgery 15 (2020) 759-769