Skip to main content

Showing 1–50 of 61 results for author: Chao, F

.
  1. arXiv:2404.12903  [pdf, other

    cs.MM

    ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model

    Authors: Dingming Liu, Shaowei Li, Ruoyan Zhou, Lili Liang, Yongguan Hong, Fei Chao, Rongrong Ji

    Abstract: Chinese landscape painting is a gem of Chinese cultural and artistic heritage that showcases the splendor of nature through the deep observations and imaginations of its painters. Limited by traditional techniques, these artworks were confined to static imagery in ancient times, leaving the dynamism of landscapes and the subtleties of artistic sentiment to the viewer's imagination. Recently, emerg… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  2. arXiv:2404.11064  [pdf, other

    cs.CV cs.AI

    Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

    Authors: Yongdong Luo, Haojia Lin, Xiawu Zheng, Yigeng Jiang, Fei Chao, Jie Hu, Guannan Jiang, Songan Zhang, Rongrong Ji

    Abstract: 3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships. Therefore, existing approaches adopt the two-stage "detect-then-describe/discriminate" pipeline, which relies heavily on the performance of the detector, resulting in suboptimal perform… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  3. arXiv:2404.07847  [pdf, other

    cs.CV

    The Effectiveness of a Simplified Model Structure for Crowd Counting

    Authors: Lei Chen, Xinghang Gao, Fei Chao, Xiang Chang, Chih Min Lin, Xingen Gao, Shaopeng Lin, Hongyi Zhang, Juqiang Lin

    Abstract: In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the F… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  4. arXiv:2404.07575  [pdf

    cs.SD cs.AI eess.AS

    An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

    Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

    Abstract: Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distri… ▽ More

    Submitted 11 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Findings

  5. arXiv:2403.12544  [pdf, other

    cs.LG

    AffineQuant: Affine Transformation Quantization for Large Language Models

    Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

    Abstract: The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the cont… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  6. arXiv:2402.12419  [pdf, other

    cs.LG cs.AI cs.CL

    EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

    Authors: Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji

    Abstract: Existing methods for fine-tuning sparse LLMs often suffer from resource-intensive requirements and high retraining costs. Additionally, many fine-tuning methods often rely on approximations or heuristic optimization strategies, which may lead to suboptimal solutions. To address these issues, we propose an efficient and fast framework for fine-tuning sparse LLMs based on minimizing reconstruction e… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  7. arXiv:2401.13221  [pdf, other

    cs.CV

    Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration

    Authors: Yimin Xu, Nanxi Gao, Zhongyun Shan, Fei Chao, Rongrong Ji

    Abstract: In contrast to traditional image restoration methods, all-in-one image restoration techniques are gaining increased attention for their ability to restore images affected by diverse and unknown corruption types and levels. However, contemporary all-in-one image restoration methods omit task-wise difficulties and employ the same networks to reconstruct images afflicted by diverse degradations. This… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  8. arXiv:2401.02719  [pdf, other

    cs.CV cs.AI

    Learning Image Demoireing from Unpaired Real Data

    Authors: Yunshan Zhong, Yuyao Zhou, Yuxin Zhang, Fei Chao, Rongrong Ji

    Abstract: This paper focuses on addressing the issue of image demoireing. Unlike the large volume of existing studies that rely on learning from paired real data, we attempt to learn a demoireing model from unpaired real data, i.e., moire images associated with irrelevant clean images. The proposed method, referred to as Unpaired Demoireing (UnDeM), synthesizes pseudo moire images from unpaired datasets, ge… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: AAAI2024

  9. arXiv:2312.05598  [pdf, other

    cs.LG

    Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

    Authors: Lirui Zhao, Yuxin Zhang, Fei Chao, Rongrong Ji

    Abstract: The poor cross-architecture generalization of dataset distillation greatly weakens its practical significance. This paper attempts to mitigate this issue through an empirical study, which suggests that the synthetic datasets undergo an inductive bias towards the distillation model. Therefore, the evaluation model is strictly confined to having similar architectures of the distillation model. We pr… ▽ More

    Submitted 26 June, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

  10. arXiv:2309.10438  [pdf, other

    cs.CV

    AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

    Authors: Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, Rongrong Ji

    Abstract: Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optim… ▽ More

    Submitted 23 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  11. arXiv:2308.11887  [pdf, other

    cs.CV

    A Unified Framework for 3D Point Cloud Visual Grounding

    Authors: Haojia Lin, Yongdong Luo, Xiawu Zheng, Lijiang Li, Fei Chao, Taisong **, Donghao Luo, Yan Wang, Liujuan Cao, Rongrong Ji

    Abstract: Thanks to its precise spatial referencing, 3D point cloud visual grounding is essential for deep understanding and dynamic interaction in 3D environments, encompassing 3D Referring Expression Comprehension (3DREC) and Segmentation (3DRES). We argue that 3DREC and 3DRES should be unified in one framework, which is also a natural progression in the community. To explain, 3DREC help 3DRES locate the… ▽ More

    Submitted 20 November, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  12. arXiv:2306.05612  [pdf, other

    cs.CV

    Spatial Re-parameterization for N:M Sparsity

    Authors: Yuxin Zhang, Mingbao Lin, Yunshan Zhong, Mengzhao Chen, Fei Chao, Rongrong Ji

    Abstract: This paper presents a Spatial Re-parameterization (SpRe) method for the N:M sparsity in CNNs. SpRe is stemmed from an observation regarding the restricted variety in spatial sparsity present in N:M sparsity compared with unstructured sparsity. Particularly, N:M sparsity exhibits a fixed sparsity rate within the spatial domains due to its distinctive pattern that mandates N non-zero components amon… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 11 pages, 4 figures

  13. arXiv:2305.18146  [pdf

    eess.AS cs.SD eess.SP

    A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment

    Authors: Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

    Abstract: Automatic Pronunciation Assessment (APA) plays a vital role in Computer-assisted Pronunciation Training (CAPT) when evaluating a second language (L2) learner's speaking proficiency. However, an apparent downside of most de facto methods is that they parallelize the modeling process throughout different speech granularities without accounting for the hierarchical and local contextual relationships… ▽ More

    Submitted 7 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  14. arXiv:2305.17997  [pdf, other

    cs.CV

    DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

    Authors: Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, ** Luo

    Abstract: Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (drop**) or merging tokens. It is an important but challenging task. Although recent advanced approaches achieved great success, they need to carefully handcraft a compression rate (i.e. number of tokens to remove), which is tedious and leads to sub-optimal performance. To tackle this problem, we propose Di… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 16 pages, 8 figures, 13 tables

  15. arXiv:2305.08117  [pdf, other

    cs.CV

    MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

    Authors: Yunshan Zhong, Yuyao Zhou, Fei Chao, Rongrong Ji

    Abstract: Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuan… ▽ More

    Submitted 2 June, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

  16. arXiv:2305.05888  [pdf, other

    cs.CV

    Distribution-Flexible Subset Quantization for Post-Quantizing Super-Resolution Networks

    Authors: Yunshan Zhong, Mingbao Lin, **g**g Xie, Yuxin Zhang, Fei Chao, Rongrong Ji

    Abstract: This paper introduces Distribution-Flexible Subset Quantization (DFSQ), a post-training quantization method for super-resolution networks. Our motivation for develo** DFSQ is based on the distinctive activation distributions of current super-resolution models, which exhibit significant variance across samples and channels. To address this issue, DFSQ conducts channel-wise normalization of the ac… ▽ More

    Submitted 12 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

  17. arXiv:2303.11906  [pdf, other

    cs.CV

    Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

    Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Xuefeng Xiao, Rui Wang, Shilei Wen, Xin Pan, Fei Chao, Rongrong Ji

    Abstract: Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to… ▽ More

    Submitted 4 April, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  18. arXiv:2302.06058  [pdf, other

    cs.CV

    Bi-directional Masks for Efficient N:M Sparse Training

    Authors: Yuxin Zhang, Yiting Luo, Mingbao Lin, Yunshan Zhong, **g**g Xie, Fei Chao, Rongrong Ji

    Abstract: We focus on addressing the dense backward propagation issue for training efficiency of N:M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves practical speedups supported by the N:M sparse tensor core. Therefore, we present a novel method of Bi-directional Masks (Bi-Mask) with its two central innovations in: 1) Separate sparse masks in the two directions of fo… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: 10 pages, 4 figures

  19. arXiv:2302.02184  [pdf, other

    cs.CV

    Real-Time Image Demoireing on Mobile Devices

    Authors: Yuxin Zhang, Mingbao Lin, Xunchao Li, Han Liu, Guozhi Wang, Fei Chao, Shuai Ren, Yafei Wen, Xiaoxin Chen, Rongrong Ji

    Abstract: Moire patterns appear frequently when taking photos of digital screens, drastically degrading the image quality. Despite the advance of CNNs in image demoireing, existing networks are with heavy design, causing redundant computation burden for mobile devices. In this paper, we launch the first study on accelerating demoireing networks and propose a dynamic demoireing acceleration method (DDA) towa… ▽ More

    Submitted 4 February, 2023; originally announced February 2023.

    Comments: To appear in the eleventh International Conference on Learning Representations (ICLR 2023)

  20. arXiv:2212.14169  [pdf, other

    cs.CV

    Discriminator-Cooperated Feature Map Distillation for GAN Compression

    Authors: Tie Hu, Mingbao Lin, Lizhou You, Fei Chao, Rongrong Ji

    Abstract: Despite excellent performance in image generation, Generative Adversarial Networks (GANs) are notorious for its requirements of enormous storage and intensive computation. As an awesome ''performance maker'', knowledge distillation is demonstrated to be particularly efficacious in exploring low-priced GANs. In this paper, we investigate the irreplaceability of teacher discriminator and present an… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  21. arXiv:2212.12977  [pdf, other

    cs.CV

    SMMix: Self-Motivated Image Mixing for Vision Transformers

    Authors: Mengzhao Chen, Mingbao Lin, ZhiHang Lin, Yuxin Zhang, Fei Chao, Rongrong Ji

    Abstract: CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed images and the corresponding labels harms its efficacy. Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels, but inevitably introduce heavy training overhead or… ▽ More

    Submitted 16 March, 2023; v1 submitted 25 December, 2022; originally announced December 2022.

  22. arXiv:2212.11091  [pdf, other

    cs.CV

    Exploring Content Relationships for Distilling Efficient GANs

    Authors: Lizhou You, Mingbao Lin, Tie Hu, Fei Chao, Rongrong Ji

    Abstract: This paper proposes a content relationship distillation (CRD) to tackle the over-parameterized generative adversarial networks (GANs) for the serviceability in cutting-edge devices. In contrast to traditional instance-level distillation, we design a novel GAN compression oriented knowledge by slicing the contents of teacher outputs into multiple fine-grained granularities, such as row/column strip… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  23. arXiv:2212.04108  [pdf, other

    cs.CV

    Shadow Removal by High-Quality Shadow Synthesis

    Authors: Yunshan Zhong, Lizhou You, Yuxin Zhang, Fei Chao, Yonghong Tian, Rongrong Ji

    Abstract: Most shadow removal methods rely on the invasion of training images associated with laborious and lavish shadow region annotations, leading to the increasing popularity of shadow image synthesis. However, the poor performance also stems from these synthesized images since they are often shadow-inauthentic and details-impaired. In this paper, we present a novel generation framework, referred to as… ▽ More

    Submitted 8 July, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  24. arXiv:2211.14462  [pdf, other

    cs.CV

    Meta Architecture for Point Cloud Analysis

    Authors: Haojia Lin, Xiawu Zheng, Lijiang Li, Fei Chao, Shanshan Wang, Yan Wang, Yonghong Tian, Rongrong Ji

    Abstract: Recent advances in 3D point cloud analysis bring a diverse set of network architectures to the field. However, the lack of a unified framework to interpret those networks makes any systematic comparison, contrast, or analysis challenging, and practically limits healthy development of the field. In this paper, we take the initiative to explore and propose a unified framework called PointMeta, to wh… ▽ More

    Submitted 13 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

  25. arXiv:2211.08544  [pdf, other

    cs.CV

    Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training

    Authors: Yunshan Zhong, Gongrui Nan, Yuxin Zhang, Fei Chao, Rongrong Ji

    Abstract: Quantization-aware training (QAT) receives extensive popularity as it well retains the performance of quantized networks. In QAT, the contemporary experience is that all quantized weights are updated for an entire training process. In this paper, this experience is challenged based on an interesting phenomenon we observed. Specifically, a large portion of quantized weights reaches the optimal quan… ▽ More

    Submitted 25 July, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

  26. arXiv:2208.13039  [pdf, other

    cs.CV

    LAB-Net: LAB Color-Space Oriented Lightweight Network for Shadow Removal

    Authors: Hong Yang, Gongrui Nan, Mingbao Lin, Fei Chao, Yunhang Shen, Ke Li, Rongrong Ji

    Abstract: This paper focuses on the limitations of current over-parameterized shadow removal models. We present a novel lightweight deep neural network that processes shadow images in the LAB color space. The proposed network termed "LAB-Net", is motivated by the following three observations: First, the LAB color space can well separate the luminance information and color properties. Second, sequentially-st… ▽ More

    Submitted 4 September, 2022; v1 submitted 27 August, 2022; originally announced August 2022.

    Comments: 10 pages, 6 figures, 29 references

  27. arXiv:2208.09110  [pdf

    cs.SD eess.AS eess.SP

    3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

    Authors: Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

    Abstract: As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leverag… ▽ More

    Submitted 11 September, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

    Comments: Accepted to APSIPA ASC 2022

  28. arXiv:2206.06662  [pdf, other

    cs.LG cs.CV

    Learning Best Combination for Efficient N:M Sparsity

    Authors: Yuxin Zhang, Mingbao Lin, Zhihang Lin, Yiting Luo, Ke Li, Fei Chao, Yongjian Wu, Rongrong Ji

    Abstract: By forcing at most N out of M consecutive weights to be non-zero, the recent N:M network sparsity has received increasing attention for its two attractive advantages: 1) Promising performance at a high sparsity. 2) Significant speedups on NVIDIA A100 GPUs. Recent studies require an expensive pre-training phase or a heavy dense-gradient computation. In this paper, we show that the N:M learning can… ▽ More

    Submitted 7 October, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted by 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  29. arXiv:2205.04908  [pdf, other

    cs.CV cs.MM

    Shadow-Aware Dynamic Convolution for Shadow Removal

    Authors: Yimin Xu, Mingbao Lin, Hong Yang, Fei Chao, Rongrong Ji

    Abstract: With a wide range of shadows in many collected images, shadow removal has aroused increasing attention since uncontaminated images are of vital importance for many downstream multimedia tasks. Current methods consider the same convolution operations for both shadow and non-shadow regions while ignoring the large gap between the color map**s for the shadow region and the non-shadow region, leadin… ▽ More

    Submitted 29 August, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

  30. arXiv:2203.03844  [pdf, other

    eess.IV cs.CV

    Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks

    Authors: Yunshan Zhong, Mingbao Lin, Xunchao Li, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

    Abstract: Light-weight super-resolution (SR) models have received considerable attention for their serviceability in mobile devices. Many efforts employ network quantization to compress SR models. However, these methods suffer from severe performance degradation when quantizing the SR models to ultra-low precision (e.g., 2-bit and 3-bit) with the low-cost layer-wise quantizer. In this paper, we identify tha… ▽ More

    Submitted 3 July, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: ECCV2022

  31. arXiv:2203.03821  [pdf, other

    cs.CV

    CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

    Authors: Mengzhao Chen, Mingbao Lin, Ke Li, Yunhang Shen, Yongjian Wu, Fei Chao, Rongrong Ji

    Abstract: Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerable redundancy arises in the spatial dimension of an input image, leading to massive computational costs. Therefore, We propose a coarse-to-fine vision transformer (CF-ViT) to relieve computational burden while retaining performance in this paper. Our proposed CF-ViT is motivated by two important obs… ▽ More

    Submitted 21 November, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: Accepted by AAAI 2023

  32. arXiv:2201.12826  [pdf, other

    cs.CV

    OptG: Optimizing Gradient-driven Criteria in Network Sparsity

    Authors: Yuxin Zhang, Mingbao Lin, Mengzhao Chen, Fei Chao, Rongrong Ji

    Abstract: Network sparsity receives popularity mostly due to its capability to reduce the network complexity. Extensive studies excavate gradient-driven sparsity. Typically, these methods are constructed upon premise of weight independence, which however, is contrary to the fact that weights are mutually influenced. Thus, their performance remains to be improved. In this paper, we propose to optimize gradie… ▽ More

    Submitted 30 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: 11 pages, 4 figures

  33. Uncovering the Over-smoothing Challenge in Image Super-Resolution: Entropy-based Quantification and Contrastive Optimization

    Authors: Tianshuo Xu, Lijiang Li, Peng Mi, Xiawu Zheng, Fei Chao, Rongrong Ji, Yonghong Tian, Qiang Shen

    Abstract: PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-or… ▽ More

    Submitted 15 March, 2024; v1 submitted 4 January, 2022; originally announced January 2022.

    Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence

  34. arXiv:2110.14439  [pdf, other

    cs.CV

    Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

    Authors: Shaojie Li, Jie Wu, Xuefeng Xiao, Fei Chao, Xudong Mao, Rongrong Ji

    Abstract: Recently, a series of algorithms have been explored for GAN compression, which aims to reduce tremendous computational overhead and memory usages when deploying GANs on resource-constrained edge devices. However, most of the existing GAN compression work only focuses on how to compress the generator, while fails to take the discriminator into account. In this work, we revisit the role of discrimin… ▽ More

    Submitted 9 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted by NeurIPS2021 (The 35th Conference on Neural Information Processing Systems)

  35. arXiv:2109.04186  [pdf, other

    cs.CV

    Fine-grained Data Distribution Alignment for Post-Training Quantization

    Authors: Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

    Abstract: While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images. To alleviate this limitation, in this paper, we leverage the synthetic data introduced by zero-shot quantization with calibration dataset and propose a fine-grained data distribution alignment (FDDA) method to boost… ▽ More

    Submitted 4 July, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: ECCV2022

  36. arXiv:2109.02517  [pdf, other

    cs.LG

    Error Controlled Actor-Critic

    Authors: Xingen Gao, Fei Chao, Changle Zhou, Zhen Ge, Chih-Min Lin, Longzhi Yang, Xiang Chang, Chang**g Shang

    Abstract: On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic which ensures confining the approximation error in value function. We present an analysis of how the approximation error can hinder the optimization process of… ▽ More

    Submitted 6 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

  37. arXiv:2108.13816  [pdf

    eess.AS

    Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech

    Authors: Bi-Cheng Yan, Shao-Wei Fan Jiang, Fu-An Chao, Berlin Chen

    Abstract: End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation,… ▽ More

    Submitted 9 July, 2022; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME 2022)

  38. arXiv:2108.11627  [pdf

    cs.MM cs.SD eess.AS

    Towards Robust Mispronunciation Detection and Diagnosis for L2 English Learners with Accent-Modulating Methods

    Authors: Shao-Wei Fan Jiang, Bi-Cheng Yan, Tien-Hong Lo, Fu-An Chao, Berlin Chen

    Abstract: With the acceleration of globalization, more and more people are willing or required to learn second languages (L2). One of the major remaining challenges facing current mispronunciation and diagnosis (MDD) models for use in computer-assisted pronunciation training (CAPT) is to handle speech from L2 learners with a diverse set of accents. In this paper, we set out to mitigate the adverse effects o… ▽ More

    Submitted 3 October, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: Accepted by ASRU 2021

  39. arXiv:2108.11598  [pdf

    eess.AS cs.MM cs.SD eess.SP

    Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

    Authors: Fu-An Chao, Jeih-weih Hung, Berlin Chen

    Abstract: In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in tim… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: 6 pages, 3 figures, Accepted by ICME 2021

  40. arXiv:2107.06916  [pdf, other

    cs.CV cs.AI

    Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion

    Authors: Mingbao Lin, Bohong Chen, Fei Chao, Rongrong Ji

    Abstract: The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select "important" filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to der… ▽ More

    Submitted 18 December, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

  41. arXiv:2107.01531  [pdf

    eess.AS cs.SD eess.SP

    TENET: A Time-reversal Enhancement Network for Noise-robust ASR

    Authors: Fu-An Chao, Shao-Wei Fan Jiang, Bi-Cheng Yan, Jeih-weih Hung, Berlin Chen

    Abstract: Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech. To increase the perceptual quality of speech, current state-of-the-art in the SE field adopts adversarial training by connecting an objective metric to the discriminator. Howe… ▽ More

    Submitted 14 September, 2021; v1 submitted 3 July, 2021; originally announced July 2021.

    Comments: Accepted to ASRU 2021

  42. arXiv:2106.06922  [pdf

    cs.CL eess.AS

    Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition

    Authors: Shih-Hsuan Chiu, Tien-Hong Lo, Fu-An Chao, Berlin Chen

    Abstract: How to effectively incorporate cross-utterance information cues into a neural language model (LM) has emerged as one of the intriguing issues for automatic speech recognition (ASR). Existing research efforts on improving contextualization of an LM typically regard previous utterances as a sequence of additional input and may fail to capture complex global structural dependencies among these uttera… ▽ More

    Submitted 1 October, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

    Comments: 6 pages, 5 figures, Accepted to APSIPA ASC 2021

  43. arXiv:2106.02435  [pdf, other

    cs.CL

    You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

    Authors: Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

    Abstract: Despite superior performance on various natural language processing tasks, pre-trained models such as BERT are challenged by deploying on resource-constraint devices. Most existing model compression approaches require re-compression or fine-tuning across diverse constraints to accommodate various hardware deployments. This practically limits the further application of model compression. Moreover,… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 12 pages, 3 figures

  44. arXiv:2105.14713  [pdf, other

    cs.CV cs.AI

    1xN Pattern for Pruning Convolutional Neural Networks

    Authors: Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji

    Abstract: Though network pruning receives popularity in reducing the complexity of convolutional neural networks (CNNs), it remains an open issue to concurrently maintain model accuracy as well as achieve significant speedups on general CPUs. In this paper, we propose a novel 1xN pruning pattern to break this limitation. In particular, consecutive N output kernels with the same input channel index are group… ▽ More

    Submitted 15 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: Accepted by IEEE TPAMI, 2022

  45. arXiv:2105.11228  [pdf, other

    cs.CV cs.AI

    Towards Compact CNNs via Collaborative Compression

    Authors: Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, **cheng Ma, Qi Tian, Rongrong Ji

    Abstract: Channel pruning and tensor decomposition have received extensive attention in convolutional neural network compression. However, these two techniques are traditionally deployed in an isolated manner, leading to significant accuracy drop when pursuing high compression rates. In this paper, we propose a Collaborative Compression (CC) scheme, which joints channel pruning and tensor decomposition to c… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: This paper is published in CVPR 2021

  46. arXiv:2104.08700  [pdf, other

    cs.CV

    Lottery Jackpots Exist in Pre-trained Models

    Authors: Yuxin Zhang, Mingbao Lin, Yunshan Zhong, Fei Chao, Rongrong Ji

    Abstract: Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight training or complex searching on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involv… ▽ More

    Submitted 2 September, 2023; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI)

  47. arXiv:2104.04221  [pdf

    eess.AS eess.SP

    The NTNU Taiwanese ASR System for Formosa Speech Recognition Challenge 2020

    Authors: Fu-An Chao, Tien-Hong Lo, Shi-Yan Weng, Shih-Hsuan Chiu, Yao-Ting Sung, Berlin Chen

    Abstract: This paper describes the NTNU ASR system participating in the Formosa Speech Recognition Challenge 2020 (FSR-2020) supported by the Formosa Speech in the Wild project (FSW). FSR-2020 aims at fostering the development of Taiwanese speech recognition. Apart from the issues on tonal and dialectical variations of the Taiwanese language, speech artificially contaminated with different types of real-wor… ▽ More

    Submitted 9 July, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: 17 pages, 3 figures, Accepted for publication in IJCLCLP

  48. arXiv:2102.07981  [pdf, other

    cs.CV cs.AI

    SiMaN: Sign-to-Magnitude Network Binarization

    Authors: Mingbao Lin, Rongrong Ji, Zihan Xu, Baochang Zhang, Fei Chao, Chia-Wen Lin, Ling Shao

    Abstract: Binary neural networks (BNNs) have attracted broad research interest due to their efficient storage and computational ability. Nevertheless, a significant challenge of BNNs lies in handling discrete constraints while ensuring bit entropy maximization, which typically makes their weight optimization very difficult. Existing methods relax the learning using the sign function, which simply encodes po… ▽ More

    Submitted 4 October, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted by IEEE TPAMI, 2022

  49. arXiv:2011.08382  [pdf, other

    cs.CV

    Learning Efficient GANs for Image Translation via Differentiable Masks and co-Attention Distillation

    Authors: Shaojie Li, Mingbao Lin, Yan Wang, Fei Chao, Ling Shao, Rongrong Ji

    Abstract: Generative Adversarial Networks (GANs) have been widely-used in image translation, but their high computation and storage costs impede the deployment on mobile devices. Prevalent methods for CNN compression cannot be directly applied to GANs due to the peculiarties of GAN tasks and the unstable adversarial training. To solve these, in this paper, we introduce a novel GAN compression method, termed… ▽ More

    Submitted 2 March, 2022; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: Accepted by IEEE Transactions on Multimedia (IEEE TMM)

  50. arXiv:2007.00437  [pdf, other

    stat.AP

    Levels and trends in the sex ratio at birth in seven provinces of Nepal between 1980 and 2016 with probabilistic projections to 2050: a Bayesian modeling approach

    Authors: Fengqing Chao, Samir KC, Hernando Ombao

    Abstract: The sex ratio at birth (SRB; ratio of male to female births) in Nepal has been reported without imbalance on the national level. However, the national SRB could mask the disparity within the country. Given the demographic and cultural heterogeneities in Nepal, it is crucial to model Nepal SRB on the subnational level. Prior studies on subnational SRB in Nepal are mostly based on reporting observed… ▽ More

    Submitted 30 August, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    MSC Class: 62P25 (Primary) 91D20; 62F15; 62M10 (Secondary)

    Journal ref: BMC Public Health 2022, Vol. 22, No. 1, 358