Skip to main content

Showing 1–50 of 52 results for author: Long, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16967  [pdf, other

    eess.SP eess.SY

    Remaining useful life prediction of rolling bearings based on refined composite multi-scale attention entropy and dispersion entropy

    Authors: Yunchong Long, Qinkang Pang, Guangjie Zhu, Junxian Cheng, Xiangshun Li

    Abstract: Remaining useful life (RUL) prediction based on vibration signals is crucial for ensuring the safe operation and effective health management of rotating machinery. Existing studies often extract health indicators (HI) from time domain and frequency domain features to analyze complex vibration signals, but these features may not accurately capture the degradation process. In this study, we propose… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 12pages, 9 figures

  2. arXiv:2404.15339  [pdf, other

    eess.IV

    Efficient EndoNeRF Reconstruction and Its Application for Data-driven Surgical Simulation

    Authors: Yuehao Wang, Bingchen Gong, Yonghao Long, Siu Hin Fan, Qi Dou

    Abstract: The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate sha… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures. Accepted by International Journal of Computer Assisted Radiology and Surgery

  3. arXiv:2401.03623  [pdf

    eess.IV

    A Video Coding Method Based on Neural Network for CLIC2024

    Authors: Zhengang Li, **gchi Zhang, Yonghua Wang, Xing Zeng, Zhen Zhang, Yunlin Long, Menghu Jia, Ning Wang

    Abstract: This paper presents a video coding scheme that combines traditional optimization methods with deep learning methods based on the Enhanced Compression Model (ECM). In this paper, the traditional optimization methods adaptively adjust the quantization parameter (QP). The key frame QP offset is set according to the video content characteristics, and the coding tree unit (CTU) level QP of all frames i… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  4. arXiv:2311.12071  [pdf, other

    eess.IV cs.CV cs.LG

    Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning

    Authors: Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar

    Abstract: Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with undersampled measurements or various types of noise. In this work, we propose a hybrid supervised-unsupervi… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: submitted to IEEE Transactions on Medical Imaging

  5. arXiv:2311.08829  [pdf, other

    cs.SD eess.AS

    Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection

    Authors: Yifan Zhou, Dongxing Xu, Haoran Wei, Yanhua Long

    Abstract: In industry, machine anomalous sound detection (ASD) is in great demand. However, collecting enough abnormal samples is difficult due to the high cost, which boosts the rapid development of unsupervised ASD algorithms. Autoencoder (AE) based methods have been widely used for unsupervised ASD, but suffer from problems including 'shortcut', poor anti-noise ability and sub-optimal quality of features… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Submitted to the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  6. arXiv:2308.12526  [pdf, other

    eess.AS cs.LG cs.SD

    UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023

    Authors: Yu Zheng, Yajun Zhang, Chuanying Niu, Yibin Zhan, Yanhua Long, Dongxing Xu

    Abstract: This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voice… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  7. arXiv:2306.11309  [pdf, other

    cs.SD cs.CL eess.AS eess.SP

    Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

    Authors: Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei

    Abstract: Low-resource accented speech recognition is one of the important challenges faced by current ASR technology in practical applications. In this study, we propose a Conformer-based architecture, called Aformer, to leverage both the acoustic information from large non-accented and limited accented training data. Specifically, a general encoder and an accent encoder are designed in the Aformer to extr… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  8. arXiv:2305.10055  [pdf, other

    cs.IT eess.SP

    Optimized Joint Beamforming for Wireless Powered Over-the-Air Computation

    Authors: Siyao Zhang, Xinmin Li, Yin Long, Jie Xu, Shuguang Cui

    Abstract: This correspondence studies the wireless powered over-the-air computation (AirComp) for achieving sustainable wireless data aggregation (WDA) by integrating AirComp and wireless power transfer (WPT) into a joint design. In particular, we consider that a multi-antenna hybrid access point (HAP) employs the transmit energy beamforming to charge multiple single-antenna low-power wireless devices (WDs)… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: 3 figures

  9. arXiv:2303.02388  [pdf, other

    cs.CV eess.IV

    Graph-based Representation for Image based on Granular-ball

    Authors: Xia Shuyin, Dai Dawei, Yang Long, Zhany Li, Lan Danf, Zhu hao, Wang Guoy

    Abstract: Current image processing methods usually operate on the finest-granularity unit; that is, the pixel, which leads to challenges in terms of efficiency, robustness, and understandability in deep learning models. We present an improved granular-ball computing method to represent the image as a graph, in which each node expresses a structural block in the image and each edge represents the association… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

    Comments: 9 pages, 5 figures

  10. arXiv:2301.10056  [pdf

    cs.CR cs.CV cs.MM cs.SD eess.AS

    Side Eye: Characterizing the Limits of POV Acoustic Eavesdrop** from Smartphone Cameras with Rolling Shutters and Movable Lenses

    Authors: Yan Long, Pirouz Naghavi, Blas Kojusner, Kevin Butler, Sara Rampazzi, Kevin Fu

    Abstract: Our research discovers how the rolling shutter and movable lens structures widely found in smartphone cameras modulate structure-borne sounds onto camera images, creating a point-of-view (POV) optical-acoustic side channel for acoustic eavesdrop**. The movement of smartphone camera hardware leaks acoustic information because images unwittingly modulate ambient sound as imperceptible distortions.… ▽ More

    Submitted 26 January, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Journal ref: 2023 IEEE Symposium on Security and Privacy

  11. arXiv:2211.12097  [pdf, other

    eess.AS

    Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

    Authors: Xiaofeng Ge, Jiangyu Han, Haixin Guan, Yanhua Long

    Abstract: Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed. However, two critical issues still limit the performance and generalization ability of the model: 1) Acoustic environment mismatch between the test noisy speech and target speaker enrollment speech; 2) Hard sample mining and learning. In this paper, dynamic acoustic compensation (DA… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  12. arXiv:2211.01571  [pdf, other

    eess.AS cs.SD

    Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

    Authors: Li Li, Dongxing Xu, Haoran Wei, Yanhua Long

    Abstract: Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR). In this work, we propose a phonetic-assisted multi target units (PMU) modeling approach, to enhance the Conformer-Transducer ASR system in a progressive representation learning manner. Specifically, PMU first uses the pronunciation-assisted subword modeling (… ▽ More

    Submitted 7 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted by Interspeech 2023

  13. arXiv:2211.01266  [pdf, other

    cs.LG cs.AI eess.SY

    Knowing the Past to Predict the Future: Reinforcement Virtual Learning

    Authors: Peng Zhang, Yawen Huang, Bingzhang Hu, Shizheng Wang, Haoran Duan, Noura Al Moubayed, Yefeng Zheng, Yang Long

    Abstract: Reinforcement Learning (RL)-based control system has received considerable attention in recent decades. However, in many real-world problems, such as Batch Process Control, the environment is uncertain, which requires expensive interaction to acquire the state and reward values. In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space us… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  14. arXiv:2210.17189   

    eess.AS cs.SD

    DiaCorrect: End-to-end error correction for speaker diarization

    Authors: Jiangyu Han, Yuhang Cao, Heng Lu, Yanhua Long

    Abstract: In recent years, speaker diarization has attracted widespread attention. To achieve better performance, some studies propose to diarize speech in multiple stages. Although these methods might bring additional benefits, most of them are quite complex. Motivated by spelling correction in automatic speech recognition (ASR), in this paper, we propose an end-to-end error correction framework, termed Di… ▽ More

    Submitted 18 September, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: This paper has been superseded by arXiv:2309.08377 (merged from arXiv:2210.17189)

  15. arXiv:2205.09587  [pdf, other

    eess.IV

    Combining Deep Learning and Adaptive Sparse Modeling for Low-dose CT Reconstruction

    Authors: Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar

    Abstract: Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with measurement undersampling or various types of noise. In this work, we propose a hybrid supervised-unsupervi… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  16. arXiv:2205.04821  [pdf, other

    eess.IV cs.CV

    Self-supervised regression learning using domain knowledge: Applications to improving self-supervised denoising in imaging

    Authors: Il Yong Chun, Dongwon Park, Xuehang Zheng, Se Young Chun, Yong Long

    Abstract: Regression that predicts continuous quantity is a central part of applications using computational imaging and computer vision technologies. Yet, studying and understanding self-supervised learning for regression tasks - except for a particular regression task, image denoising - have lagged behind. This paper proposes a general self-supervised regression learning (SSRL) framework that enables lear… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: 17 pages, 16 figures, 2 tables, submitted to IEEE T-IP

  17. arXiv:2204.11032  [pdf, other

    eess.AS cs.SD

    Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

    Authors: Jiangyu Han, Yanhua Long

    Abstract: Recently, supervised speech separation has made great progress. However, limited by the nature of supervised training, most existing separation methods require ground-truth sources and are trained on synthetic datasets. This ground-truth reliance is problematic, because the ground-truth signals are usually unavailable in real conditions. Moreover, in many industry scenarios, the real acoustic char… ▽ More

    Submitted 6 August, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

  18. arXiv:2203.11565  [pdf, other

    eess.IV cs.CV

    Multi-layer Clustering-based Residual Sparsifying Transform for Low-dose CT Image Reconstruction

    Authors: Xikai Yang, Zhishen Huang, Yong Long, Saiprasad Ravishankar

    Abstract: The recently proposed sparsifying transform models incur low computational cost and have been applied to medical imaging. Meanwhile, deep models with nested network structure reveal great potential for learning features in different layers. In this study, we propose a network-structured sparsifying transform learning approach for X-ray computed tomography (CT), which we refer to as multi-layer clu… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: 19 pages, 12 figures, submitted to the Medical Physics

  19. arXiv:2203.02263  [pdf, other

    eess.AS cs.SD

    PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

    Authors: Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan

    Abstract: PercepNet, a recent extension of the RNNoise, an efficient, high-quality and real-time full-band speech enhancement technique, has shown promising performance in various public deep noise suppression tasks. This paper proposes a new approach, named PercepNet+, to further extend the PercepNet with four significant improvements. First, we introduce a phase-aware structure to leverage the phase infor… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: This article was submitted to Interspeech 2022

  20. arXiv:2203.02191  [pdf, other

    eess.AS cs.SD

    Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection

    Authors: Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang

    Abstract: In recent years, exploring effective sound separation (SSep) techniques to improve overlap** sound event detection (SED) attracts more and more attention. Creating accurate separation signals to avoid the catastrophic error accumulation during SED model training is very important and challenging. In this study, we first propose a novel selective pseudo-labeling approach, termed SPL, to produce h… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: This article was submitted to Interspeech 2022

  21. Bi-level Volt/VAR Optimization in Distribution Networks with Smart PV Inverters

    Authors: Yao Long, Daniel S. Kirschen

    Abstract: Optimal Volt/VAR control (VVC) in distribution networks relies on an effective coordination between the conventional utility-owned mechanical devices and the smart residential photovoltaic (PV) inverters. Typically, a central controller carries out a periodic optimization and sends setpoints to the local controller of each device. However, instead of tracking centrally dispatched setpoints, smart… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

  22. arXiv:2112.13520  [pdf, other

    eess.AS

    DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction

    Authors: Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky

    Abstract: In recent years, a number of time-domain speech separation methods have been proposed. However, most of them are very sensitive to the environments and wide domain coverage tasks. In this paper, from the time-frequency domain perspective, we propose a densely-connected pyramid complex convolutional network, termed DPCCN, to improve the robustness of speech separation under complicated conditions.… ▽ More

    Submitted 29 January, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: accepted by ICASSP 2022

  23. Adaptive Coalition Formation-Based Coordinated Voltage Regulation in Distribution Networks

    Authors: Yao Long, Ryan T. Elliott, Daniel S. Kirschen

    Abstract: High penetrations of photovoltaic (PV) systems can cause severe voltage quality problems in distribution networks. This paper proposes a distributed control strategy based on the dynamic formation of coalitions to coordinate a large number of PV inverters for voltage regulation. In this strategy, a rule-based coalition formation scheme deals with the zonal voltage difference caused by the uneven i… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  24. arXiv:2110.03912  [pdf, other

    cs.CV cs.AI eess.IV

    Stereo Dense Scene Reconstruction and Accurate Localization for Learning-Based Navigation of Laparoscope in Minimally Invasive Surgery

    Authors: Ruofeng Wei, Bin Li, Hangjie Mo, Bo Lu, Yonghao Long, Bohan Yang, Qi Dou, Yunhui Liu, Dong Sun

    Abstract: Objective: The computation of anatomical information and laparoscope position is a fundamental block of surgical navigation in Minimally Invasive Surgery (MIS). Recovering a dense 3D structure of surgical scene using visual cues remains a challenge, and the online laparoscopic tracking primarily relies on external sensors, which increases system complexity. Methods: Here, we propose a learning-dri… ▽ More

    Submitted 27 November, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Journal ref: IEEE Transactions on Biomedical Engineering 2022

  25. arXiv:2109.14956  [pdf

    eess.IV cs.CV cs.LG

    Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark

    Authors: Martin Wagner, Beat-Peter Müller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars Mündermann, David M Lubotsky, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Annika Reinke, Tong Yu, Armine Vardazaryan, Chinedu Innocent Nwoye, Nicolas Padoy, Xinyang Liu, Eung-Joo Lee, Constantin Disch, Hans Meine, Tong Xia, Fucang Jia, Satoshi Kondo, Wolfgang Reiter, Yueming **, Yonghao Long , et al. (16 additional authors not shown)

    Abstract: PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

  26. arXiv:2108.01997  [pdf, other

    eess.IV cs.CV cs.LG

    DuCN: Dual-children Network for Medical Diagnosis and Similar Case Recommendation towards COVID-19

    Authors: Chengtao Peng, Yunfei Long, Senhua Zhu, Dandan Tu, Bin Li

    Abstract: Early detection of the coronavirus disease 2019 (COVID-19) helps to treat patients timely and increase the cure rate, thus further suppressing the spread of the disease. In this study, we propose a novel deep learning based detection and similar case recommendation network to help control the epidemic. Our proposed network contains two stages: the first one is a lung region segmentation step and i… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

  27. arXiv:2106.07564  [pdf

    cs.CV cs.LG eess.IV

    An optimized Capsule-LSTM model for facial expression recognition with video sequences

    Authors: Siwei Liu, Yuanpeng Long, Gao Xu, Lijia Yang, Shimei Xu, Xiaoming Yao, Kunxian Shu

    Abstract: To overcome the limitations of convolutional neural network in the process of facial expression recognition, a facial expression recognition model Capsule-LSTM based on video frame sequence is proposed. This model is composed of three networks includingcapsule encoders, capsule decoders and LSTM network. The capsule encoder extracts the spatial information of facial expressions in video frames. Ca… ▽ More

    Submitted 27 May, 2021; originally announced June 2021.

    Comments: 14pages,4 figurews

  28. arXiv:2106.07563  [pdf

    cs.CV cs.LG eess.IV

    BPLF: A Bi-Parallel Linear Flow Model for Facial Expression Generation from Emotion Set Images

    Authors: Gao Xu, Yuanpeng Long, Siwei Liu, Lijia Yang, Shimei Xu, Xiaoming Yao, Kunxian Shu

    Abstract: The flow-based generative model is a deep learning generative model, which obtains the ability to generate data by explicitly learning the data distribution. Theoretically its ability to restore data is stronger than other generative models. However, its implementation has many limitations, including limited model design, too many model parameters and tedious calculation. In this paper, a bi-paral… ▽ More

    Submitted 27 May, 2021; originally announced June 2021.

    Comments: 20 pages, 10 figures

  29. arXiv:2106.03113  [pdf, other

    eess.AS

    Improving Channel Decorrelation for Multi-Channel Target Speech Extraction

    Authors: Jiangyu Han, Wei Rao, Yannan Wang, Yanhua Long

    Abstract: Target speech extraction has attracted widespread attention. When microphone arrays are available, the additional spatial information can be helpful in extracting the target speech. We have recently proposed a channel decorrelation (CD) mechanism to extract the inter-channel differential information to enhance the reference channel encoder representation. Although the proposed mechanism has shown… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2010.09191

  30. arXiv:2103.14297  [pdf, other

    eess.AS cs.SD

    CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier

    Authors: Tiantian Tang, Xinyuan Zhou, Yanhua Long, Yijie Li, Jiaen Liang

    Abstract: Domain mismatch is a noteworthy issue in acoustic event detection tasks, as the target domain data is difficult to access in most real applications. In this study, we propose a novel CNN-based discriminative training framework as a domain compensation method to handle this issue. It uses a parallel CNN-based discriminator to learn a pair of high-level intermediate acoustic representations. Togethe… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

  31. EfficientTDNN: Efficient Architecture Search for Speaker Recognition

    Authors: Rui Wang, Zhihua Wei, Haoran Duan, Shouling Ji, Yang Long, Zhen Hong

    Abstract: Convolutional neural networks (CNNs), such as the time-delay neural network (TDNN), have shown their remarkable capability in learning speaker embedding. However, they meanwhile bring a huge computational cost in storage size, processing, and memory. Discovering the specialized CNN that meets a specific constraint requires a substantial effort of human experts. Compared with hand-designed approach… ▽ More

    Submitted 18 June, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 13 pages, 12 figures, accepted to TASLP

  32. Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

    Authors: Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang, Yu** Wang

    Abstract: A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously. In this study, we propose three methods to improve the best teacher-student framework in the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 for both audio tagging and acoustic ev… ▽ More

    Submitted 12 February, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Updated, please refer to "https://sciencedirect.53yu.com/science/article/abs/pii/S105120042200063X"

  33. arXiv:2012.01986  [pdf, other

    eess.IV cs.CV physics.med-ph

    An Improved Iterative Neural Network for High-Quality Image-Domain Material Decomposition in Dual-Energy CT

    Authors: Zhipeng Li, Yong Long, Il Yong Chun

    Abstract: Dual-energy computed tomography (DECT) has been widely used in many applications that need material decomposition. Image-domain methods directly decompose material images from high- and low-energy attenuation images, and thus, are susceptible to noise and artifacts on attenuation images. The purpose of this study is to develop an improved iterative neural network (INN) for high-quality image-domai… ▽ More

    Submitted 21 January, 2022; v1 submitted 2 December, 2020; originally announced December 2020.

  34. arXiv:2011.00428  [pdf, other

    eess.IV cs.CV cs.LG eess.SP

    Two-layer clustering-based sparsifying transform learning for low-dose CT reconstruction

    Authors: Xikai Yang, Yong Long, Saiprasad Ravishankar

    Abstract: Achieving high-quality reconstructions from low-dose computed tomography (LDCT) measurements is of much importance in clinical settings. Model-based image reconstruction methods have been proven to be effective in removing artifacts in LDCT. In this work, we propose an approach to learn a rich two-layer clustering-based sparsifying transform model (MCST2), where image patches and their subsequent… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: 5 pages, 3 figures, submitted to ISBI2021

  35. arXiv:2010.10923  [pdf, other

    eess.AS cs.SD

    Attention-based scaling adaptation for target speech extraction

    Authors: Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang

    Abstract: The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and the target speaker to exploit the discriminative target speaker clues. We propose a special attention mechanism without introducing any additional parameters in a scaling adaptation layer to better adapt the network towards… ▽ More

    Submitted 18 October, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: 5 pages, 2 figures. Accepted by ASRU 2021

  36. arXiv:2010.09191  [pdf, other

    eess.AS

    Multi-channel target speech extraction with channel decorrelation and target speaker adaptation

    Authors: Jiangyu Han, Xinyuan Zhou, Yanhua Long, Yijie Li

    Abstract: The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies for end-to-end multi-channel target speech extraction are still relatively limited. In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech. The first one is using a target speech adaptation layer in a paralle… ▽ More

    Submitted 21 October, 2020; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: 5 pages, 3 figures. Submitted to ICASSP 2021

  37. arXiv:2010.06144  [pdf, other

    eess.IV cs.LG eess.SP

    Multi-layer Residual Sparsifying Transform (MARS) Model for Low-dose CT Image Reconstruction

    Authors: Xikai Yang, Yong Long, Saiprasad Ravishankar

    Abstract: Signal models based on sparse representations have received considerable attention in recent years. On the other hand, deep models consisting of a cascade of functional layers, commonly known as deep neural networks, have been highly successful for the task of object classification and have been recently introduced to image reconstruction. In this work, we develop a new image reconstruction approa… ▽ More

    Submitted 28 May, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

    Comments: 28 pages, 12 figures, accepted by Medical Physics. arXiv admin note: text overlap with arXiv:2005.03825

  38. Unified Supervised-Unsupervised (SUPER) Learning for X-ray CT Image Reconstruction

    Authors: Siqi Ye, Zhipeng Li, Michael T. McCann, Yong Long, Saiprasad Ravishankar

    Abstract: Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent machine learning methods for image reconstruction typically involve supervised learning or unsupervised learning, both of which have their advantages and disadvantages. In this work, we propose a unified supervised-unsupervised (SUPER) learning framework for X-ray computed… ▽ More

    Submitted 8 April, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: 18 pages, 21 figures, submitted journal paper

    Journal ref: IEEE Transactions on Medical Imaging, vol. 40, no. 11, pp. 2986-3001, Nov. 2021

  39. arXiv:2007.13401  [pdf, ps, other

    eess.SP

    IEEE 802.11be-Wi-Fi 7: New Challenges and Opportunities

    Authors: Cailian Deng, Xuming Fang, Xiao Han, Xianbin Wang, Li Yan, Rong He, Yan Long, Yuchen Guo

    Abstract: With the emergence of 4k/8k video, the throughput requirement of video delivery will keep grow to tens of Gbps. Other new high-throughput and low-latency video applications including augmented reality (AR), virtual reality (VR), and online gaming, are also proliferating. Due to the related stringent requirements, supporting these applications over wireless local area network (WLAN) is far beyond t… ▽ More

    Submitted 3 August, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: Accepted for publication in IEEE Communications Surveys and Tutorials

  40. arXiv:2006.10414  [pdf, other

    eess.AS cs.SD

    Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

    Authors: Xinyuan Zhou, Emre Yılmaz, Yanhua Long, Yijie Li, Haizhou Li

    Abstract: Code-switching (CS) occurs when a speaker alternates words of two or more languages within a single sentence or across sentences. Automatic speech recognition (ASR) of CS speech has to deal with two or more languages at the same time. In this study, we propose a Transformer-based architecture with two symmetric language-specific encoders to capture the individual language attributes, that improve… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

  41. arXiv:2006.10407  [pdf, other

    eess.AS cs.SD

    Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR

    Authors: Xinyuan Zhou, Grandee Lee, Emre Yılmaz, Yanhua Long, Jiaen Liang, Haizhou Li

    Abstract: The Transformer has shown impressive performance in automatic speech recognition. It uses the encoder-decoder structure with self-attention to learn the relationship between the high-level representation of the source inputs and embedding of the target outputs. In this paper, we propose a novel decoder structure that features a self-and-mixed attention decoder (SMAD) with a deep acoustic structure… ▽ More

    Submitted 15 September, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted by INTERSPEECH 2020

  42. arXiv:2005.03825  [pdf, other

    eess.IV cs.LG stat.ML

    Learned Multi-layer Residual Sparsifying Transform Model for Low-dose CT Reconstruction

    Authors: Xikai Yang, Xuehang Zheng, Yong Long, Saiprasad Ravishankar

    Abstract: Signal models based on sparse representation have received considerable attention in recent years. Compared to synthesis dictionary learning, sparsifying transform learning involves highly efficient sparse coding and operator update steps. In this work, we propose a Multi-layer Residual Sparsifying Transform (MRST) learning model wherein the transform domain residuals are jointly sparsified over l… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  43. arXiv:2004.08498  [pdf, other

    physics.atom-ph eess.IV

    Enhanced principle component method for fringe removal in cold atom images

    Authors: Feng Xiong, Yun Long, Colin V. Parker

    Abstract: Many powerful imaging techniques for cold atoms are based on determining the optical density by comparing a beam image having passed through the atom cloud to a reference image taken under similar conditions with no atoms. In practice the beam profile typically contains interference fringes whose phase is not stable between camera exposures. To reduce the error of these fringes in the computed opt… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

  44. arXiv:2002.12018  [pdf, other

    eess.IV cs.LG eess.SP

    Momentum-Net for Low-Dose CT Image Reconstruction

    Authors: Siqi Ye, Yong Long, Il Yong Chun

    Abstract: This paper applies the recent fast iterative neural network framework, Momentum-Net, using appropriate models to low-dose X-ray computed tomography (LDCT) image reconstruction. At each layer of the proposed Momentum-Net, the model-based image reconstruction module solves the majorized penalized weighted least-square problem, and the image refining module uses a four-layer convolutional neural netw… ▽ More

    Submitted 8 September, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: Five pages conference paper. Accepted by 2020 Asilomar Conference on Signals, Systems, and Computers

  45. arXiv:1910.12024  [pdf, other

    cs.LG cs.CV eess.IV eess.SP stat.ML

    SUPER Learning: A Supervised-Unsupervised Framework for Low-Dose CT Image Reconstruction

    Authors: Zhipeng Li, Siqi Ye, Yong Long, Saiprasad Ravishankar

    Abstract: Recent years have witnessed growing interest in machine learning-based models and techniques for low-dose X-ray CT (LDCT) imaging tasks. The methods can typically be categorized into supervised learning methods and unsupervised or model-based learning methods. Supervised learning methods have recently shown success in image restoration tasks. However, they often rely on large training sets. Model-… ▽ More

    Submitted 26 October, 2019; originally announced October 2019.

    Comments: Accepted to International Conference on Computer Vision (ICCV) - Learning for Computational Imaging (LCI) Workshop, 2019

  46. arXiv:1908.01287  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    BCD-Net for Low-dose CT Reconstruction: Acceleration, Convergence, and Generalization

    Authors: Il Yong Chun, Xuehang Zheng, Yong Long, Jeffrey A. Fessler

    Abstract: Obtaining accurate and reliable images from low-dose computed tomography (CT) is challenging. Regression convolutional neural network (CNN) models that are learned from training data are increasingly gaining attention in low-dose CT reconstruction. This paper modifies the architecture of an iterative regression CNN, BCD-Net, for fast, stable, and accurate low-dose CT reconstruction, and presents t… ▽ More

    Submitted 4 August, 2019; originally announced August 2019.

    Comments: Accepted to MICCAI 2019, and the authors indicated by asterisks (*) equally contributed to this work

  47. arXiv:1906.00165  [pdf, other

    eess.IV cs.LG stat.ML

    Two-layer Residual Sparsifying Transform Learning for Image Reconstruction

    Authors: Xuehang Zheng, Saiprasad Ravishankar, Yong Long, Marc Louis Klasky, Brendt Wohlberg

    Abstract: Signal models based on sparsity, low-rank and other properties have been exploited for image reconstruction from limited and corrupted data in medical imaging and other computational imaging applications. In particular, sparsifying transform models have shown promise in various applications, and offer numerous advantages such as efficiencies in sparse coding and learning. This work investigates pr… ▽ More

    Submitted 7 January, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

    Comments: Accepted to IEEE ISBI 2020

  48. arXiv:1901.00106  [pdf, other

    eess.IV cs.LG stat.ML

    DECT-MULTRA: Dual-Energy CT Image Decomposition With Learned Mixed Material Models and Efficient Clustering

    Authors: Zhipeng Li, Saiprasad Ravishankar, Yong Long, Jeffrey A. Fessler

    Abstract: Dual energy computed tomography (DECT) imaging plays an important role in advanced imaging applications due to its material decomposition capability. Image-domain decomposition operates directly on CT images using linear matrix inversion, but the decomposed material images can be severely degraded by noise and artifacts. This paper proposes a new method dubbed DECT-MULTRA for image-domain DECT mat… ▽ More

    Submitted 18 August, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

  49. arXiv:1810.12126  [pdf, other

    eess.IV cs.CV

    ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition

    Authors: Federico Angelini, Zeyu Fu, Yang Long, Ling Shao, Syed Mohsen Naqvi

    Abstract: We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

  50. arXiv:1808.08791  [pdf, other

    eess.SP eess.IV math.OC physics.med-ph

    SPULTRA: Low-Dose CT Image Reconstruction with Joint Statistical and Learned Image Models

    Authors: Siqi Ye, Saiprasad Ravishankar, Yong Long, Jeffrey A. Fessler

    Abstract: Low-dose CT image reconstruction has been a popular research topic in recent years. A typical reconstruction method based on post-log measurements is called penalized weighted-least squares (PWLS). Due to the underlying limitations of the post-log statistical model, the PWLS reconstruction quality is often degraded in low-dose scans. This paper investigates a shifted-Poisson (SP) model based likel… ▽ More

    Submitted 12 August, 2019; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: Accepted to IEEE Transaction on Medical Imaging