Skip to main content

Showing 1–50 of 70 results for author: Gu, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04680  [pdf, other

    eess.IV cs.CV

    MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

    Authors: Yixin Huang, Yiqi **, Ke Tao, Kaijian Xia, Jianfeng Gu, Lei Yu, Lan Du, Cunjian Chen

    Abstract: May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2406.02014  [pdf, other

    q-bio.NC cs.LG cs.SD eess.AS

    Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer

    Authors: Wanli Ma, Xuegang Tang, ** Gu, Ying Wang, Yuling Xia

    Abstract: In the fields of brain-computer interaction and cognitive neuroscience, effective decoding of auditory signals from task-based functional magnetic resonance imaging (fMRI) is key to understanding how the brain processes complex auditory information. Although existing methods have enhanced decoding capabilities, limitations remain in information utilization and model representation. To overcome the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2404.15370  [pdf, other

    eess.SP cs.AI cs.LG cs.NI

    Self-Supervised Learning for User Localization

    Authors: Ankan Dash, **gyi Gu, Guiling Wang, Nirwan Ansari

    Abstract: Machine learning techniques have shown remarkable accuracy in localization tasks, but their dependency on vast amounts of labeled data, particularly Channel State Information (CSI) and corresponding coordinates, remains a bottleneck. Self-supervised learning techniques alleviate the need for labeled data, a potential that remains largely untapped and underexplored in existing research. Addressing… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  4. arXiv:2403.10146  [pdf, other

    cs.SD cs.IR eess.AS

    Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

    Authors: Qian Wang, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention. Existing methods typically aggregate information from each modality into a single vector for matching, but this sacrifices local details and can hardly capture intricate relationships within and between modalities. Furthermore, current ATR d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 5 pages, accepted to ICASSP2024

  5. arXiv:2403.05247  [pdf, other

    cs.CV eess.IV

    Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

    Authors: Tianrui Lou, Xiaojun Jia, **dong Gu, Li Liu, Siyuan Liang, Bangyan He, Xiaochun Cao

    Abstract: Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. An… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  6. arXiv:2403.03631  [pdf, other

    cs.LG eess.SY

    Tackling Missing Values in Probabilistic Wind Power Forecasting: A Generative Approach

    Authors: Honglin Wen, Pierre Pinson, Jie Gu, Zhijian **

    Abstract: Machine learning techniques have been successfully used in probabilistic wind power forecasting. However, the issue of missing values within datasets due to sensor failure, for instance, has been overlooked for a long time. Although it is natural to consider addressing this issue by imputing missing values before model estimation and forecasting, we suggest treating missing values and forecasting… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 8 pages, to be presented at Power Systems Computation Conference (PSCC) 2024

  7. arXiv:2403.02601  [pdf, other

    eess.IV cs.CV

    Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

    Authors: Haoyu Chen, Wenbo Li, **** Gu, **g**g Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

    Abstract: For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  8. arXiv:2401.17450  [pdf, other

    quant-ph cs.AR eess.SY

    Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers

    Authors: Junyao Zhang, Hanrui Wang, Qi Ding, Jiaqi Gu, Reouven Assouly, William D. Oliver, Song Han, Kenneth R. Brown, Hai "Helen" Li, Yiran Chen

    Abstract: Noisy Intermediate-Scale Quantum (NISQ) computers face a critical limitation in qubit numbers, hindering their progression towards large-scale and fault-tolerant quantum computing. A significant challenge impeding scaling is crosstalk, characterized by unwanted interactions among neighboring components on quantum chips, including qubits, resonators, and substrate. We motivate a general approach to… ▽ More

    Submitted 8 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  9. arXiv:2311.07596  [pdf, ps, other

    cs.SI cs.LG eess.SP

    Graph GOSPA metric: a metric to measure the discrepancy between graphs of different sizes

    Authors: **hao Gu, Ángel F. García-Fernández, Robert E. Firth, Lennart Svensson

    Abstract: This paper proposes a metric to measure the dissimilarity between graphs that may have a different number of nodes. The proposed metric extends the generalised optimal subpattern assignment (GOSPA) metric, which is a metric for sets, to graphs. The proposed graph GOSPA metric includes costs associated with node attribute errors for properly assigned nodes, missed and false nodes and edge mismatche… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  10. arXiv:2309.01273  [pdf, other

    cs.AR eess.SY

    WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow

    Authors: Haojia Hui, Jiangyuan Gu, Xunbo Hu, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin

    Abstract: With the cross-fertilization of applications and the ever-increasing scale of models, the efficiency and productivity of hardware computing architectures have become inadequate. This inadequacy further exacerbates issues in design flexibility, design complexity, development cycle, and development costs (4-d problems) in divergent scenarios. To address these challenges, this paper proposed a flexib… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: 7 pages, 10 figures

  11. arXiv:2308.02263  [pdf, other

    cs.SD cs.CL eess.AS

    Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

    Authors: **yu Long, Jetic Gū, Binhao Bai, Zhibo Yang, ** Wei, Junli Li

    Abstract: Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we pre… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  12. arXiv:2305.18107  [pdf, other

    cs.CV eess.IV

    Crafting Training Degradation Distribution for the Accuracy-Generalization Trade-off in Real-World Super-Resolution

    Authors: Ruofan Zhang, **** Gu, Haoyu Chen, Chao Dong, Yulun Zhang, Wenming Yang

    Abstract: Super-resolution (SR) techniques designed for real-world applications commonly encounter two primary challenges: generalization performance and restoration accuracy. We demonstrate that when methods are trained using complex, large-range degradations to enhance generalization, a decline in accuracy is inevitable. However, since the degradation in a certain real-world applications typically exhibit… ▽ More

    Submitted 1 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: This paper has been accepted to ICML 2023

  13. arXiv:2305.13770  [pdf, other

    cs.CV eess.IV

    MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qingpeng Zhu, Qianhui Sun, Wenxiu Sun, Chen Change Loy, **wei Gu

    Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: CVPR 2023 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2023/

  14. Proportional Fair Scheduling Using Water-Filling Technique for SC-FDMA Based D2D Communication

    Authors: Syed Tariq Shah, Jaheon Gu, Syed Faraz Hasan, Min Young Chung

    Abstract: The resource allocation in SC-FDMA is constrained by the condition that multiple subchannels should be allocated to a single user only if they are adjacent. Therefore, the scheduling scheme of a D2D-cellular system that uses SC-FDMA must also conform to the so-called adjacency constraint. This paper proposes a heuristic algorithm with low computational complexity that applies proportional fair (PF… ▽ More

    Submitted 2 June, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

  15. arXiv:2304.10551  [pdf, other

    eess.IV cs.CV

    MIPI 2023 Challenge on RGBW Remosaic: Methods and Results

    Authors: Qianhui Sun, Qingyu Yang, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Wenxiu Sun, Qingpeng Zhu, Chen Change Loy, **wei Gu

    Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for an in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imag… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: CVPR 2023 Mobile Intelligent Photography and Imaging (MIPI) Workshop--RGBW Sensor Remosaic Challenge Report. Website: https://mipi-challenge.org/MIPI2023/. arXiv admin note: substantial text overlap with arXiv:2209.08471, arXiv:2209.07060, arXiv:2209.07530, arXiv:2304.10089

  16. arXiv:2304.10089  [pdf, other

    eess.IV cs.CV

    MIPI 2023 Challenge on RGBW Fusion: Methods and Results

    Authors: Qianhui Sun, Qingyu Yang, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Wenxiu Sun, Qingpeng Zhu, Chen Change Loy, **wei Gu

    Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for an in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imag… ▽ More

    Submitted 24 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: CVPR 2023 Mobile Intelligent Photography and Imaging (MIPI) Workshop--RGBW Sensor Fusion Challenge Report. Website: https://mipi-challenge.org/MIPI2023/. arXiv admin note: substantial text overlap with arXiv:2209.07530, arXiv:2209.08471, arXiv:2209.07060

  17. arXiv:2304.06019  [pdf, other

    cs.CV eess.IV

    Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera

    Authors: Ruicheng Feng, Chongyi Li, Huai** Chen, Shuai Li, **wei Gu, Chen Change Loy

    Abstract: Due to the difficulty in collecting large-scale and perfectly aligned paired training data for Under-Display Camera (UDC) image restoration, previous methods resort to monitor-based image systems or simulation-based methods, sacrificing the realness of the data and introducing domain gaps. In this work, we revisit the classic stereo setup for training data collection -- capturing two images of the… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023

  18. arXiv:2303.03640  [pdf, other

    cs.LG cs.DC eess.SY

    AHPA: Adaptive Horizontal Pod Autoscaling Systems on Alibaba Cloud Container Service for Kubernetes

    Authors: Zhiqiang Zhou, Chaoli Zhang, Lingna Ma, **g Gu, Huajie Qian, Qingsong Wen, Liang Sun, Peng Li, Zhimin Tang

    Abstract: The existing resource allocation policy for application instances in Kubernetes cannot dynamically adjust according to the requirement of business, which would cause an enormous waste of resources during fluctuations. Moreover, the emergence of new cloud services puts higher resource management requirements. This paper discusses horizontal POD resources management in Alibaba Cloud Container Servic… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  19. arXiv:2210.11511  [pdf, other

    cs.CV cs.LG eess.IV

    Overexposure Mask Fusion: Generalizable Reverse ISP Multi-Step Refinement

    Authors: **ha Kim, Jun Jiang, **wei Gu

    Abstract: With the advent of deep learning methods replacing the ISP in transforming sensor RAW readings into RGB images, numerous methodologies solidified into real-life applications. Equally potent is the task of inverting this process which will have applications in enhancing computational photography tasks that are conducted in the RAW domain, addressing lack of available RAW data while rea** from the… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: 15 pages, 8 figures, ECCV

  20. arXiv:2210.05960  [pdf, other

    eess.IV cs.CV

    Efficient Image Super-Resolution using Vast-Receptive-Field Attention

    Authors: Lin Zhou, Haoming Cai, **** Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, Chao Dong

    Abstract: The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of th… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  21. arXiv:2210.04198  [pdf, other

    eess.IV cs.CV

    Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images

    Authors: **** Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan

    Abstract: Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorit… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: This article has been accepted by ECCV2022

  22. arXiv:2209.08471  [pdf, other

    cs.CV eess.IV

    MIPI 2022 Challenge on RGBW Sensor Re-mosaic: Dataset and Report

    Authors: Qingyu Yang, Guang Yang, Jun Jiang, Chongyi Li, Ruicheng Feng, Shangchen Zhou, Wenxiu Sun, Qingpeng Zhu, Chen Change Loy, **wei Gu

    Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: ECCV 2022 Mobile Intelligent Photography and Imaging (MIPI) Workshop--RGBW Sensor Re-mosaic Challenge Report. MIPI workshop website: http://mipi-challenge.org/. arXiv admin note: substantial text overlap with arXiv:2209.07060, arXiv:2209.07530, arXiv:2209.07057

  23. arXiv:2209.07530  [pdf, other

    eess.IV cs.CV

    MIPI 2022 Challenge on RGBW Sensor Fusion: Dataset and Report

    Authors: Qingyu Yang, Guang Yang, Jun Jiang, Chongyi Li, Ruicheng Feng, Shangchen Zhou, Wenxiu Sun, Qingpeng Zhu, Chen Change Loy, **wei Gu

    Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging… ▽ More

    Submitted 27 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: ECCV 2022 Mobile Intelligent Photography and Imaging (MIPI) Workshop--RGBW Sensor Fusion Challenge Report. MIPI workshop website: http://mipi-challenge.org/. arXiv admin note: substantial text overlap with arXiv:2209.07060

  24. arXiv:2209.07060  [pdf, other

    eess.IV cs.CV

    MIPI 2022 Challenge on Quad-Bayer Re-mosaic: Dataset and Report

    Authors: Qingyu Yang, Guang Yang, Jun Jiang, Chongyi Li, Ruicheng Feng, Shangchen Zhou, Wenxiu Sun, Qingpeng Zhu, Chen Change Loy, **wei Gu

    Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: ECCV 2022 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Quad-Bayer Re-mosaic Challenge Report. MIPI workshop website: http://mipi-challenge.org/

  25. arXiv:2209.07052  [pdf, other

    eess.IV cs.CV

    MIPI 2022 Challenge on Under-Display Camera Image Restoration: Methods and Results

    Authors: Ruicheng Feng, Chongyi Li, Shangchen Zhou, Wenxiu Sun, Qingpeng Zhu, Jun Jiang, Qingyu Yang, Chen Change Loy, **wei Gu

    Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging… ▽ More

    Submitted 23 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: ECCV 2022 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Under-display Camera Image Restoration Challenge Report. MIPI workshop website: http://mipi-challenge.org/

  26. arXiv:2207.13434  [pdf

    cs.SD cs.CV cs.MM eess.AS

    End-To-End Audiovisual Feature Fusion for Active Speaker Detection

    Authors: Fiseha B. Tesema, Zheyuan Lin, Shiqiang Zhu, Wei Song, Jason Gu, Hong Wu

    Abstract: Active speaker detection plays a vital role in human-machine interaction. Recently, a few end-to-end audiovisual frameworks emerged. However, these models' inference time was not explored and are not applicable for real-time applications due to their complexity and large input size. In addition, they explored a similar feature extraction strategy that employs the ConvNet on audio and visual inputs… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: To appear on the proceeding of the Fourteenth International Conference on Digital Image Processing (ICDIP 2022), May 20-23, Wuhan, China, 8 pages, 3 figures

    Journal ref: Proceedings Volume 12342, Fourteenth International Conference on Digital Image Processing (ICDIP 2022); 123422A (2022)

  27. arXiv:2207.12391  [pdf, other

    cs.CV eess.IV

    SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness

    Authors: **dong Gu, Hengshuang Zhao, Volker Tresp, Philip Torr

    Abstract: Deep neural network-based image classifications are vulnerable to adversarial perturbations. The image classifications can be easily fooled by adding artificial small and imperceptible perturbations to input images. As one of the most effective defense strategies, adversarial training was proposed to address the vulnerability of classification models, where the adversarial examples are created and… ▽ More

    Submitted 14 August, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Journal ref: European Conference on Computer Vision (ECCV) , 2022

  28. arXiv:2206.11695  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Perceptual Image Quality Assessment

    Authors: **** Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Radu Timofte

    Abstract: This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics fro… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: This report has been published in CVPR 2022 NTIRE workshop. arXiv admin note: text overlap with arXiv:2105.03072

  29. arXiv:2206.02433  [pdf, other

    eess.SY cs.LG stat.AP

    Continuous and Distribution-free Probabilistic Wind Power Forecasting: A Conditional Normalizing Flow Approach

    Authors: Honglin Wen, Pierre Pinson, **ghuan Ma, Jie Gu, Zhijian **

    Abstract: We present a data-driven approach for probabilistic wind power forecasting based on conditional normalizing flow (CNF). In contrast with the existing, this approach is distribution-free (as for non-parametric and quantile-based approaches) and can directly yield continuous probability densities, hence avoiding quantile crossing. It relies on a base distribution and a set of bijective map**s. Bot… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: The second revision to IEEE Transactions on Sustainable Energy

  30. arXiv:2205.07019  [pdf, other

    cs.CV eess.IV

    Evaluating the Generalization Ability of Super-Resolution Networks

    Authors: Yihao Liu, Hengyuan Zhao, **** Gu, Yu Qiao, Chao Dong

    Abstract: Performance and generalization ability are two important aspects to evaluate the deep learning models. However, research on the generalization ability of Super-Resolution (SR) networks is currently absent. Assessing the generalization ability of deep models not only helps us to understand their intrinsic mechanisms, but also allows us to quantitatively measure their applicability boundaries, which… ▽ More

    Submitted 3 September, 2023; v1 submitted 14 May, 2022; originally announced May 2022.

    Comments: Accepted by TPAMI

  31. arXiv:2205.05996  [pdf, other

    cs.CV eess.IV

    Blueprint Separable Residual Network for Efficient Image Super-Resolution

    Authors: Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, **** Gu, Yu Qiao, Chao Dong

    Abstract: Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR Workshops

  32. arXiv:2204.02967  [pdf, other

    cs.CL cs.SD eess.AS

    Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

    Authors: Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee

    Abstract: Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis. In this work, we explore self-supervised pre-training with unlabeled speech data and… ▽ More

    Submitted 13 September, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Accepted to be published in the Proceedings of Interspeech 2022

  33. Wind energy forecasting with missing values within a fully conditional specification framework

    Authors: Honglin Wen, Pierre Pinson, Jie Gu, Zhijian **

    Abstract: Wind power forecasting is essential to power system operation and electricity markets. As abundant data became available thanks to the deployment of measurement infrastructures and the democratization of meteorological modelling, extensive data-driven approaches have been developed within both point and probabilistic forecasting frameworks. These models usually assume that the dataset at hand is c… ▽ More

    Submitted 22 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: revision to International Journal of Forecasting

  34. arXiv:2203.02118  [pdf, other

    cs.RO eess.SY

    OmniWheg: An Omnidirectional Wheel-Leg Transformable Robot

    Authors: Ruixiang Cao, Jun Gu, Chen Yu, Andre Rosendo

    Abstract: This paper presents the design, analysis, and performance evaluation of an omnidirectional transformable wheel-leg robot called OmniWheg. We design a novel mechanism consisting of a separable omni-wheel and 4-bar linkages, allowing the robot to transform between omni-wheeled and legged modes smoothly. In wheeled mode, the robot can move in all directions and efficiently adjust the relative positio… ▽ More

    Submitted 25 July, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 6 pages, 10 figures, IROS

  35. arXiv:2202.02172  [pdf

    cs.SI cs.CY eess.SY

    Facebook's Architecture Undermines Vaccine Misinformation Removal Efforts

    Authors: David A. Broniatowski, Jiayan Gu, Amelia M. Jamison, Joseph R. Simons, Lorien C. Abroms

    Abstract: Misinformation promotes distrust in science, undermines public health, and may drive civil unrest. Vaccine misinformation, in particular, has stalled efforts to overcome the COVID-19 pandemic, prompting social media platforms' attempts to reduce it. Some have questioned whether "soft" content moderation remedies -- e.g., flagging and downranking misinformation -- were successful, suggesting that t… ▽ More

    Submitted 11 August, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  36. arXiv:2112.08352  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    Textless Speech-to-Speech Translation on Real Data

    Authors: Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Pino, Jiatao Gu, Wei-Ning Hsu

    Abstract: We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based… ▽ More

    Submitted 4 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted to NAACL 2022 (long paper)

  37. arXiv:2111.11368  [pdf, other

    cs.CV eess.IV

    Adversarial Examples on Segmentation Models Can be Easy to Transfer

    Authors: **dong Gu, Hengshuang Zhao, Volker Tresp, Philip Torr

    Abstract: Deep neural network-based image classification can be misled by adversarial examples with small and quasi-imperceptible perturbations. Furthermore, the adversarial examples created on one classification model can also fool another different model. The transferability of the adversarial examples has recently attracted a growing interest since it makes black-box attacks on classification models feas… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  38. arXiv:2111.10659  [pdf, other

    cs.CV eess.IV

    Are Vision Transformers Robust to Patch Perturbations?

    Authors: **dong Gu, Volker Tresp, Yao Qin

    Abstract: Recent advances in Vision Transformer (ViT) have demonstrated its impressive performance in image classification, which makes it a promising alternative to Convolutional Neural Network (CNN). Unlike CNNs, ViT represents an input image as a sequence of image patches. The patch-based input image representation makes the following question interesting: How does ViT perform when individual input image… ▽ More

    Submitted 18 July, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

    Journal ref: European Conference on Computer Vision (ECCV) , 2022

  39. arXiv:2109.06912  [pdf, other

    eess.AS cs.CL cs.SD

    fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

    Authors: Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino

    Abstract: This paper presents fairseq S^2, a fairseq extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis,… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021 Demo

  40. arXiv:2109.04760  [pdf, other

    eess.IV cs.CV

    ReconfigISP: Reconfigurable Camera Image Processing Pipeline

    Authors: Ke Yu, Zexian Li, Yue Peng, Chen Change Loy, **wei Gu

    Abstract: Image Signal Processor (ISP) is a crucial component in digital cameras that transforms sensor signals into images for us to perceive and understand. Existing ISP designs always adopt a fixed architecture, e.g., several sequential modules connected in a rigid order. Such a fixed ISP architecture may be suboptimal for real-world applications, where camera sensors, scenes and tasks are diverse. In th… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: ICCV 2021

  41. Efficient Medical Image Segmentation Based on Knowledge Distillation

    Authors: Dian Qin, Jiajun Bu, Zhe Liu, Xin Shen, Sheng Zhou, **gjun Gu, Zhijua Wang, Lei Wu, Huifen Dai

    Abstract: Recent advances have been made in applying convolutional neural networks to achieve more precise prediction results for medical image segmentation problems. However, the success of existing methods has highly relied on huge computational complexity and massive storage, which is impractical in the real-world scenario. To deal with this problem, we propose an efficient architecture by distilling kno… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: Accepted by IEEE TMI, Code Avalivable

  42. arXiv:2108.05563  [pdf, other

    cs.CV eess.IV

    Deep Camera Obscura: An Image Restoration Pipeline for Lensless Pinhole Photography

    Authors: Joshua D. Rego, Huai** Chen, Shuai Li, **wei Gu, Suren Jayasuriya

    Abstract: The lensless pinhole camera is perhaps the earliest and simplest form of an imaging system using only a pinhole-sized aperture in place of a lens. They can capture an infinite depth-of-field and offer greater freedom from optical distortion over their lens-based counterparts. However, the inherent limitations of a pinhole system result in lower sharpness from blur caused by optical diffraction and… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: 11 pages, 10 figures

  43. arXiv:2107.05604  [pdf, other

    cs.CL cs.LG eess.AS

    Direct speech-to-speech translation with discrete units

    Authors: Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu

    Abstract: We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. We tackle the problem by first applying a self-supervised discrete speech encoder on the target speech and then training a sequence-to-sequence speech-to-unit translation (S2UT) model to predict the discrete representa… ▽ More

    Submitted 21 March, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: Accepted to ACL 2022 (long paper)

  44. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: **** Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, **gyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  45. arXiv:2104.09556  [pdf, other

    cs.CV eess.IV

    Removing Diffraction Image Artifacts in Under-Display Camera via Dynamic Skip Connection Network

    Authors: Ruicheng Feng, Chongyi Li, Huai** Chen, Shuai Li, Chen Change Loy, **wei Gu

    Abstract: Recent development of Under-Display Camera (UDC) systems provides a true bezel-less and notch-free viewing experience on smartphones (and TV, laptops, tablets), while allowing images to be captured from the selfie camera embedded underneath. In a typical UDC system, the microstructure of the semi-transparent organic light-emitting diode (OLED) pixel array attenuates and diffracts the incident ligh… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 camera-ready version

  46. arXiv:2103.10982  [pdf, other

    eess.IV cs.CV

    HDR Video Reconstruction with Tri-Exposure Quad-Bayer Sensors

    Authors: Yitong Jiang, Inchang Choi, Jun Jiang, **wei Gu

    Abstract: We propose a novel high dynamic range (HDR) video reconstruction method with new tri-exposure quad-bayer sensors. Thanks to the larger number of exposure sets and their spatially uniform deployment over a frame, they are more robust to noise and spatial artifacts than previous spatially varying exposure (SVE) HDR video methods. Nonetheless, the motion blur from longer exposures, the noise from sho… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

  47. iToF2dToF: A Robust and Flexible Representation for Data-Driven Time-of-Flight Imaging

    Authors: Felipe Gutierrez-Barragan, Huai** Chen, Mohit Gupta, Andreas Velten, **wei Gu

    Abstract: Indirect Time-of-Flight (iToF) cameras are a promising depth sensing technology. However, they are prone to errors caused by multi-path interference (MPI) and low signal-to-noise ratio (SNR). Traditional methods, after denoising, mitigate MPI by estimating a transient image that encodes depths. Recently, data-driven methods that jointly denoise and mitigate MPI have become state-of-the-art without… ▽ More

    Submitted 21 December, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: 35 pages

  48. arXiv:2011.15002  [pdf, other

    eess.IV cs.CV

    Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric

    Authors: **** Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

    Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality a… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2007.12142

  49. arXiv:2011.04994  [pdf, other

    cs.CV eess.IV

    AIM 2020 Challenge on Learned Image Signal Processing Pipeline

    Authors: Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, Wangmeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li , et al. (14 additional authors not shown)

    Abstract: This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB map** problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of com… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: Published in ECCV 2020 Workshops (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

  50. arXiv:2011.00747  [pdf, other

    cs.CL cs.SD eess.AS

    Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

    Authors: Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

    Abstract: We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other: one… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: Accepted at COLING 2020 (Oral)

    Journal ref: The 28th International Conference on Computational Linguistics (COLING 2020)