Skip to main content

Showing 1–50 of 68 results for author: Qin, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Contrast MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: **g Zou, Lanqing Liu, Qi Chen, Shujun Wang, Xiaohan Xing, **g Qin

    Abstract: Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  3. arXiv:2406.14534  [pdf, other

    eess.IV cs.CV

    Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

    Authors: Long Lei, Jun Zhou, Jialun Pei, Baoliang Zhao, Yueming **, Yuen-Chun Jeremy Teoh, **g Qin, Pheng-Ann Heng

    Abstract: A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations betwe… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by MICCAI 2024

  4. arXiv:2405.09752  [pdf, other

    eess.SP math.NA math.OC

    Time-Varying Graph Signal Recovery Using High-Order Smoothness and Adaptive Low-rankness

    Authors: Weihong Guo, Yifei Lou, **g Qin, Ming Yan

    Abstract: Time-varying graph signal recovery has been widely used in many applications, including climate change, environmental hazard monitoring, and epidemic studies. It is crucial to choose appropriate regularizations to describe the characteristics of the underlying signals, such as the smoothness of the signal over the graph domain and the low-rank structure of the spatial-temporal signal modeled in a… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  5. arXiv:2405.07648  [pdf, other

    cs.CV eess.IV

    CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

    Authors: Qingguo Liu, Chenyi Zhuang, Pan Gao, Jie Qin

    Abstract: Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details. In this paper, we propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations. However, low-resolution images cannot provide enough content details… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  6. arXiv:2404.01082  [pdf, other

    eess.IV

    The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

    Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Li** Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

    Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More

    Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 25 pages, 17 figures

  7. arXiv:2403.06197  [pdf, other

    eess.IV cs.CV cs.LG

    DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

    Authors: Wenfang Yao, Ke**g Yin, William K. Cheung, Jia Liu, **g Qin

    Abstract: The combination of electronic health records (EHR) and medical images is crucial for clinicians in making diagnoses and forecasting prognosis. Strategically fusing these two data modalities has great potential to improve the accuracy of machine learning models in clinical prediction tasks. However, the asynchronous and complementary nature of EHR and medical images presents unique challenges. Miss… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI-24

  8. arXiv:2402.00773  [pdf, other

    eess.SY math.OC

    On the Choice of Loss Function in Learning-based Optimal Power Flow

    Authors: Ge Chen, Junjie Qin

    Abstract: We analyze and contrast two ways to train machine learning models for solving AC optimal power flow (OPF) problems, distinguished with the loss functions used. The first trains a map** from the loads to the optimal dispatch decisions, utilizing mean square error (MSE) between predicted and optimal dispatch decisions as the loss function. The other intends to learn the same map**, but directly… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 5 pages, Accepted by PESGM2024

  9. arXiv:2402.00772  [pdf, other

    math.OC eess.SY

    Neural Risk Limiting Dispatch in Power Networks: Formulation and Generalization Guarantees

    Authors: Ge Chen, Junjie Qin

    Abstract: Risk limiting dispatch (RLD) has been proposed as an approach that effectively trades off economic costs with operational risks for power dispatch under uncertainty. However, how to solve the RLD problem with provably near-optimal performance still remains an open problem. This paper presents a learning-based solution to this challenge. We first design a data-driven formulation for the RLD problem… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 10 pages

  10. arXiv:2401.12789  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

    Authors: W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath

    Abstract: In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck. We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware. Our approach combines the Universal Speech Model (USM) and the PaLM 2 language model in per-segment scoring mode, achieving an average… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  11. arXiv:2401.07206  [pdf, other

    stat.ML cs.LG eess.SY

    Probabilistic Reduced-Dimensional Vector Autoregressive Modeling with Oblique Projections

    Authors: Yanfang Mo, S. Joe Qin

    Abstract: In this paper, we propose a probabilistic reduced-dimensional vector autoregressive (PredVAR) model to extract low-dimensional dynamics from high-dimensional noisy data. The model utilizes an oblique projection to partition the measurement space into a subspace that accommodates the reduced-dimensional dynamics and a complementary static subspace. An optimal oblique decomposition is derived for th… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: 16pages, 5 figures

  12. arXiv:2312.06995  [pdf, other

    cs.CV eess.IV

    Transformer-based No-Reference Image Quality Assessment via Supervised Contrastive Learning

    Authors: **song Shi, Pan Gao, Jie Qin

    Abstract: Image Quality Assessment (IQA) has long been a research hotspot in the field of image processing, especially No-Reference Image Quality Assessment (NR-IQA). Due to the powerful feature extraction ability, existing Convolution Neural Network (CNN) and Transformers based NR-IQA methods have achieved considerable progress. However, they still exhibit limited capability when facing unknown authentic d… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI24

  13. arXiv:2311.15216  [pdf, other

    eess.SY cs.LG

    Solve Large-scale Unit Commitment Problems by Physics-informed Graph Learning

    Authors: **gtao Qin, Nanpeng Yu

    Abstract: Unit commitment (UC) problems are typically formulated as mixed-integer programs (MIP) and solved by the branch-and-bound (B&B) scheme. The recent advances in graph neural networks (GNN) enable it to enhance the B&B algorithm in modern MIP solvers by learning to dive and branch. Existing GNN models that tackle MIP problems are mostly constructed from mathematical formulation, which is computationa… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  14. arXiv:2310.07255  [pdf, other

    cs.CV eess.IV

    ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion

    Authors: **ghui Qin, Lihuang Fang, Ruitao Lu, Liang Lin, Yukai Shi

    Abstract: Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention. However, neural networks require large amounts of training data, hindering their application in real-world scenarios. In this letter, we propos… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Geoscience and Remote Sensing Letters. Code is released at https://github.com/fangfang11-plog/ADASR

  15. arXiv:2309.17136  [pdf, other

    eess.SY

    Latent Dynamic Networked System Identification with High-Dimensional Networked Data

    Authors: Jiaxin Yu, Yanfang Mo, S. Joe Qin

    Abstract: Networked dynamic systems are ubiquitous in various domains, such as industrial processes, social networks, and biological systems. These systems produce high-dimensional data that reflect the complex interactions among the network nodes with rich sensor measurements. In this paper, we propose a novel algorithm for latent dynamic networked system identification that leverages the network structure… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  16. arXiv:2309.12963  [pdf, ps, other

    eess.AS cs.SD

    Massive End-to-end Models for Short Search Queries

    Authors: Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

    Abstract: In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters. The encoders of our models use the neural architecture of Google's universal speech model (USM), with additional funnel pooling layers to signifi… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  17. arXiv:2309.01161  [pdf, other

    math.OC eess.SY stat.ME

    Probabilistic Reduced-Dimensional Vector Autoregressive Modeling for Dynamics Prediction and Reconstruction with Oblique Projections

    Authors: Yanfang Mo, Jiaxin Yu, S. Joe Qin

    Abstract: In this paper, we propose a probabilistic reduced-dimensional vector autoregressive (PredVAR) model with oblique projections. This model partitions the measurement space into a dynamic subspace and a static subspace that do not need to be orthogonal. The partition allows us to apply an oblique projection to extract dynamic latent variables (DLVs) from high-dimensional data with maximized predictab… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  18. arXiv:2306.12925  [pdf, other

    cs.CL cs.AI cs.SD eess.AS stat.ML

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

    Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Technical report

  19. arXiv:2306.08131  [pdf, other

    eess.AS cs.SD

    Efficient Adapters for Giant Speech Models

    Authors: Nanxin Chen, Izhak Shafran, Yu Zhang, Chung-Cheng Chiu, Hagen Soltau, James Qin, Yonghui Wu

    Abstract: Large pre-trained speech models are widely used as the de-facto paradigm, especially in scenarios when there is a limited amount of labeled data available. However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales. In this paper, we propose a novel approach c… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  20. arXiv:2306.04730  [pdf, other

    eess.SP cs.LG math.NA math.OC stat.ML

    Stochastic Natural Thresholding Algorithms

    Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, **g Qin

    Abstract: Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  21. arXiv:2304.05723  [pdf, ps, other

    eess.SY cs.RO

    Distributed Coverage Control of Constrained Constant-Speed Unicycle Multi-Agent Systems

    Authors: Qingchen Liu, Zengjie Zhang, Nhan Khanh Le, Jiahu Qin, Fangzhou Liu, Sandra Hirche

    Abstract: This paper proposes a novel distributed coverage controller for a multi-agent system with constant-speed unicycle robots (CSUR). The work is motivated by the limitation of the conventional method that does not ensure the satisfaction of hard state- and input-dependent constraints and leads to feasibility issues for multi-CSUR systems. In this paper, we solve these problems by designing a novel cov… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

  22. arXiv:2303.09704  [pdf, other

    eess.SY

    Mobile Energy Storage in Power Network: Marginal Value and Optimal Operation

    Authors: Utkarsha Agwan, Junjie Qin, Kameshwar Poolla, Pravin Varaiya

    Abstract: This paper examines the marginal value of mobile energy storage, i.e., energy storage units that can be efficiently relocated to other locations in the power network. In particular, we formulate and analyze the joint problem for operating the power grid and a fleet of mobile storage units. We use two different storage models: rapid storage, which disregards travel time and power constraints, and g… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 14 pages, submitted to IEEE Transactions on Smart Grids

  23. arXiv:2303.08113  [pdf, other

    eess.IV cs.CV

    Learning Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation

    Authors: **g Zou, Noémie Debroux, Lihao Liu, **g Qin, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

    Abstract: Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving tr… ▽ More

    Submitted 30 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: 13 pages, 3 figures

  24. arXiv:2303.01037  [pdf, other

    cs.CL cs.SD eess.AS

    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

    Authors: Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk , et al. (2 additional authors not shown)

    Abstract: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quant… ▽ More

    Submitted 24 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 20 pages, 7 figures, 8 tables

  25. arXiv:2303.00155  [pdf, other

    eess.SY

    Exponential Consensus of Multiple Agents over Dynamic Network Topology: Controllability, Connectivity, and Compactness

    Authors: Qichao Ma, Jiahu Qin, Brian D. O. Anderson, Long Wang

    Abstract: This paper investigates the problem of securing exponentially fast consensus (exponential consensus for short) for identical agents with finite-dimensional linear system dynamics over dynamic network topology. Our aim is to find the weakest possible conditions that guarantee exponentially fast consensus using a Lyapunov function consisting of a sum of terms of the same functional form. We first in… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  26. arXiv:2212.02198  [pdf, other

    cs.CV eess.IV

    Rethinking Generative Methods for Image Restoration in Physics-based Vision: A Theoretical Analysis from the Perspective of Information

    Authors: Xudong Kang, Haoran Xie, Man-Leung Wong, **g Qin

    Abstract: End-to-end generative methods are considered a more promising solution for image restoration in physics-based vision compared with the traditional deconstructive methods based on handcrafted composition models. However, existing generative methods still have plenty of room for improvement in quantitative performance. More crucially, these methods are considered black boxes due to weak interpretabi… ▽ More

    Submitted 8 December, 2022; v1 submitted 5 December, 2022; originally announced December 2022.

  27. arXiv:2209.06461  [pdf, other

    eess.SY

    Electric Vehicle Battery Sharing Game for Mobile Energy Storage Provision in Power Networks

    Authors: Utkarsha Agwan, Junjie Qin, Kameshwar Poolla, Pravin Varaiya

    Abstract: Electric vehicles (EVs) equipped with a bidirectional charger can provide valuable grid services as mobile energy storage. However, proper financial incentives need to be in place to enlist EV drivers to provide services to the grid. In this paper, we consider two types of EV drivers who may be willing to provide mobile storage service using their EVs: commuters taking a fixed route, and on-demand… ▽ More

    Submitted 16 December, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: 10 pages, IEEE CDC 2022

  28. arXiv:2209.04789  [pdf, other

    math.OC eess.SY

    Optimal Ordering Policies for Multi-Echelon Supply Networks

    Authors: Jose I. Caiza, Ian Walter, Jitesh H. Panchal, Junjie Qin, Philip E. Pare

    Abstract: In this paper, we formulate an optimal ordering policy as a stochastic control problem where each firm decides the amount of input goods to order from their upstream suppliers based on the current inventory level of its output good. For this purpose, we provide a closed-form solution for the optimal request of the raw materials for given a fixed production policy. We implement the proposed policy… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

  29. arXiv:2207.00769  [pdf, other

    eess.IV cs.CV

    Test-time Adaptation with Calibration of Medical Image Classification Nets for Label Distribution Shift

    Authors: Wenao Ma, Cheng Chen, Shuang Zheng, **g Qin, Huimao Zhang, Qi Dou

    Abstract: Class distribution plays an important role in learning deep classifiers. When the proportion of each class in the test set differs from the training set, the performance of classification nets usually degrades. Such a label distribution shift problem is common in medical diagnosis since the prevalence of disease vary over location and time. In this paper, we propose the first method to tackle labe… ▽ More

    Submitted 9 July, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

    Comments: This paper has been accepted by MICCAI 2022

  30. arXiv:2207.00141  [pdf, other

    eess.IV cs.CV

    A New Dataset and A Baseline Model for Breast Lesion Detection in Ultrasound Videos

    Authors: Zhi Lin, Junhao Lin, Lei Zhu, Huazhu Fu, **g Qin, Liansheng Wang

    Abstract: Breast lesion detection in ultrasound is critical for breast cancer diagnosis. Existing methods mainly rely on individual 2D ultrasound images or combine unlabeled video and labeled 2D images to train models for breast lesion detection. In this paper, we first collect and annotate an ultrasound video dataset (188 videos) for breast lesion detection. Moreover, we propose a clip-level and video-leve… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: 11 pages, 4 figures

    Report number: Report-no: MICCAI-1016

    Journal ref: Medical Image Computing and Computer Assisted Interventions 2022

  31. arXiv:2206.04249  [pdf, other

    eess.SY cs.LG

    An Optimization Method-Assisted Ensemble Deep Reinforcement Learning Algorithm to Solve Unit Commitment Problems

    Authors: **gtao Qin, Yuanqi Gao, Mikhail Bragin, Nanpeng Yu

    Abstract: Unit commitment (UC) is a fundamental problem in the day-ahead electricity market, and it is critical to solve UC problems efficiently. Mathematical optimization techniques like dynamic programming, Lagrangian relaxation, and mixed-integer quadratic programming (MIQP) are commonly adopted for UC problems. However, the calculation time of these methods increases at an exponential rate with the amou… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  32. arXiv:2206.02609  [pdf, other

    cs.CV cs.LG eess.IV

    Real-World Image Super-Resolution by Exclusionary Dual-Learning

    Authors: Hao Li, **ghui Qin, Zhi**g Yang, Pengxu Wei, **shan Pan, Liang Lin, Yukai Shi

    Abstract: Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input, has recently received considerable attention with regard to its tremendous application potentials. Although deep learning-based methods have achieved promising restoration quality on real-world image super-resolution datasets, they ignore the relationship betwe… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: IEEE TMM 2022; Considering large volume of RealSR datasets, a multi-dataset sampling scheme is developed

  33. arXiv:2204.02663  [pdf, other

    eess.IV cs.CV

    Towards An End-to-End Framework for Flow-Guided Video Inpainting

    Authors: Zhen Li, Cheng-Ze Lu, Jianhua Qin, Chun-Le Guo, Ming-Ming Cheng

    Abstract: Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes in these methods are applied separately to form the whole inpainting pipeline. Thus, these methods are less efficient and rely heavily on the intermediate results from earlier stages. In this… ▽ More

    Submitted 7 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  34. arXiv:2204.01954  [pdf, other

    physics.comp-ph cs.SD eess.AS

    Application of a Spectral Method to Simulate Quasi-Three-Dimensional Underwater Acoustic Fields

    Authors: Houwang Tu, Yongxian Wang, Wei Liu, Chunmei Yang, Jixing Qin, Shuqing Ma, Xiaodong Wang

    Abstract: The calculation of a three-dimensional underwater acoustic field has always been a key problem in computational ocean acoustics. Traditionally, this solution is usually obtained by directly solving the acoustic Helmholtz equation using a finite difference or finite element algorithm. Solving the three-dimensional Helmholtz equation directly is computationally expensive. For quasi-three-dimensional… ▽ More

    Submitted 10 November, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: 31 pages, 22 figures. arXiv admin note: text overlap with arXiv:2112.13602

  35. arXiv:2203.13963  [pdf, other

    eess.IV cs.CV

    Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

    Authors: Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, **g Qin

    Abstract: Magnetic resonance imaging (MRI) can present multi-contrast images of the same anatomical structures, enabling multi-contrast super-resolution (SR) techniques. Compared with SR reconstruction using a single-contrast, multi-contrast SR reconstruction is promising to yield SR images with higher quality by leveraging diverse yet complementary information embedded in different imaging modalities. Howe… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 accepted

  36. arXiv:2202.01855  [pdf, other

    cs.CL cs.SD eess.AS

    Self-supervised Learning with Random-projection Quantizer for Speech Recognition

    Authors: Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu

    Abstract: We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict the masked speech signals, in the form of discrete labels generated with a random-projection quantizer. In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook. Neither… ▽ More

    Submitted 29 June, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  37. arXiv:2112.02743  [pdf, other

    eess.IV cs.CV cs.LG

    Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

    Authors: Jiacheng Wang, Xiaomeng Li, Yiming Han, **g Qin, Liansheng Wang, Zhou Qichao

    Abstract: Automatic delineation of organ-at-risk (OAR) and gross-tumor-volume (GTV) is of great significance for radiotherapy planning. However, it is a challenging task to learn powerful representations for accurate delineation under limited pixel (voxel)-wise annotations. Contrastive learning at pixel-level can alleviate the dependency on annotations by learning dense representations from unlabeled data.… ▽ More

    Submitted 20 April, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: Accepted in AAAI-22 (Oral)

  38. arXiv:2111.04733  [pdf, other

    eess.IV cs.CV

    Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network

    Authors: Jiacheng Wang, Yueming **, Shuntian Cai, Hongzhi Xu, Pheng-Ann Heng, **g Qin, Liansheng Wang

    Abstract: We propose a novel shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection (ESD) surgery. This task is of great clinical significance but extremely challenging due to bleeding, lighting reflection, and motion blur in the complicated surgical environment. Compared with existing solutions, which either neglect geometric relationships among targe… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

  39. Boundary-aware Transformers for Skin Lesion Segmentation

    Authors: Jiacheng Wang, Lan Wei, Liansheng Wang, Qichao Zhou, Lei Zhu, **g Qin

    Abstract: Skin lesion segmentation from dermoscopy images is of great importance for improving the quantitative analysis of skin cancer. However, the automatic segmentation of melanoma is a very challenging task owing to the large variation of melanoma and ambiguous boundaries of lesion areas. While convolutional neutral networks (CNNs) have achieved remarkable progress in this task, most of existing soluti… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Journal ref: Medical Image Computing and Computer Assisted Intervention 2021

  40. Efficient Global-Local Memory for Real-time Instrument Segmentation of Robotic Surgical Video

    Authors: Jiacheng Wang, Yueming **, Liansheng Wang, Shuntian Cai, Pheng-Ann Heng, **g Qin

    Abstract: Performing a real-time and accurate instrument segmentation from videos is of great significance for improving the performance of robotic-assisted surgery. We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration. However, most existing works perform segmentation purely using… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Journal ref: Medical Image Computing and Computer Assisted Intervention, 2021

  41. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  42. arXiv:2108.06209  [pdf, other

    cs.LG cs.SD eess.AS

    W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

    Authors: Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

    Abstract: Motivated by the success of masked language modeling~(MLM) in pre-training natural language processing models, we propose w2v-BERT that explores MLM for self-supervised speech representation learning. w2v-BERT is a framework that combines contrastive learning and MLM, where the former trains the model to discretize input continuous speech signals into a finite set of discriminative speech tokens,… ▽ More

    Submitted 13 September, 2021; v1 submitted 7 August, 2021; originally announced August 2021.

  43. Optimal Online Algorithms for Peak-Demand Reduction Maximization with Energy Storage

    Authors: Yanfang Mo, Qiulin Lin, Minghua Chen, Si-Zhao Joe Qin

    Abstract: The high proportions of demand charges in electric bills motivate large-power customers to leverage energy storage for reducing the peak procurement from the outer grid. Given limited energy storage, we expect to maximize the peak-demand reduction in an online fashion, challenged by the highly uncertain demands and renewable injections, the non-cumulative nature of peak consumption, and the coupli… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: 12 pages, 6 figures

  44. arXiv:2104.14830  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling End-to-End Models for Large-Scale Multilingual ASR

    Authors: Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai

    Abstract: Building ASR models across many languages is a challenging multi-task learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource to low resource languages. However, degradations on high resource languages are commonly observed due to interference from the heterogeneous multilingual data and reduction in per-language capacity.… ▽ More

    Submitted 11 September, 2021; v1 submitted 30 April, 2021; originally announced April 2021.

    Comments: ASRU 2021

  45. arXiv:2103.09407  [pdf, ps, other

    eess.SY

    Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective

    Authors: Man Li, Jiahu Qin, Wei Xing Zheng, Yaonan Wang, Yu Kang

    Abstract: To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stoch… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  46. arXiv:2011.11373  [pdf, other

    eess.SY

    Optimal Power Control for DoS Attack over Fading Channel: A Game-Theoretic Approach

    Authors: Jie Wang, Jiahu Qin, Menglin Li, Yang Shi

    Abstract: In this paper, we investigate remote state estimation against an intelligent denial-of-service (DoS) attack over a vulnerable wireless network whose channel undergoes attenuation and distortion caused by fading. We use the sensor to observe system states and transmit its local state estimates to the remote center. Meanwhile, the attacker injects a jamming signal to destroy the packet accepted by t… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

  47. arXiv:2011.10798  [pdf, other

    eess.AS cs.SD

    A Better and Faster End-to-End Model for Streaming ASR

    Authors: Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

    Abstract: End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this i… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: Accepted in ICASSP 2021

  48. arXiv:2010.10504  [pdf, other

    eess.AS cs.LG cs.SD

    Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

    Abstract: We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-e… ▽ More

    Submitted 20 July, 2022; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 11 pages, 3 figures, 5 tables. Accepted to NeurIPS SAS 2020 Workshop; v2: minor errors corrected

  49. arXiv:2008.13093  [pdf, other

    eess.AS cs.CL

    Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition

    Authors: Wei Li, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He

    Abstract: Recent advances of end-to-end models have outperformed conventional models through employing a two-pass model. The two-pass model provides better speed-quality trade-offs for on-device speech recognition, where a 1st-pass model generates hypotheses in a streaming fashion, and a 2nd-pass model re-scores the hypotheses with full audio sequence context. The 2nd-pass model plays a key role in the qual… ▽ More

    Submitted 2 September, 2020; v1 submitted 30 August, 2020; originally announced August 2020.

    Comments: Proceedings of Interspeech, 2020

  50. Two-stage short-term wind power forecasting algorithm using different feature-learning models

    Authors: Jiancheng Qin, ** Yang, Ying Chen, Qiang Ye, Hua Li

    Abstract: Two-stage ensemble-based forecasting methods have been studied extensively in the wind power forecasting field. However, deep learning-based wind power forecasting studies have not investigated two aspects. In the first stage, different learning structures considering multiple inputs and multiple outputs have not been discussed. In the second stage, the model extrapolation issue has not been inves… ▽ More

    Submitted 28 June, 2021; v1 submitted 30 May, 2020; originally announced June 2020.

    Journal ref: Fundamental Research, 2021