Skip to main content

Showing 1–50 of 76 results for author: Gong, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09041  [pdf, other

    cs.CL cs.AI cs.LG

    ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

    Authors: **g Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang

    Abstract: The typical process for develo** LLMs involves pre-training a general foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts. Serving these experts poses challenges, as loading all experts onto devices is impractical, and frequent switching between experts in response to user requests incurs substantial I/O costs, increasing latency and expe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Tech report

  2. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.05915  [pdf, other

    cs.CV eess.IV

    Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering

    Authors: Yueyu Hu, Ran Gong, Yao Wang

    Abstract: Point cloud is a promising 3D representation for volumetric streaming in emerging AR/VR applications. Despite recent advances in point cloud compression, decoding and rendering high-quality images from lossy compressed point clouds is still challenging in terms of quality and complexity, making it a major roadblock to achieve real-time 6-Degree-of-Freedom video streaming. In this paper, we address… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  4. Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection

    Authors: Yunqian Fan, Xiuying Wei, Ruihao Gong, Yuqing Ma, Xiangguo Zhang, Qi Zhang, Xianglong Liu

    Abstract: Lane detection (LD) plays a crucial role in enhancing the L2+ capabilities of autonomous driving, capturing widespread attention. The Post-Processing Quantization (PTQ) could facilitate the practical application of LD models, enabling fast speeds and limited memories without labeled data. However, prior PTQ methods do not consider the complex LD outputs that contain physical semantics, such as off… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by AAAI-24

    Journal ref: AAAI 2024, 38, 11936-11943

  5. arXiv:2405.06001  [pdf, other

    cs.LG cs.AI cs.CL

    LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models

    Authors: Ruihao Gong, Yang Yong, Shiqiao Gu, Yushi Huang, Yunchen Zhang, Xianglong Liu, Dacheng Tao

    Abstract: Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence, thanks to their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements of LLMs limit their widespread adoption. Quan- tization, a key compression technique, offers a viable solution to mitigate these demands by compressing a… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2405.05808  [pdf, other

    cs.CV

    Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes

    Authors: Ruihao Gong, Yang Yong, Zining Wang, **yang Guo, Xiuying Wei, Yuqing Ma, Xianglong Liu

    Abstract: Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accura… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2403.07153  [pdf, other

    cs.CV

    2023 Low-Power Computer Vision Challenge (LPCVC) Summary

    Authors: Leo Chen, Benjamin Boardley, ** Hu, Yiru Wang, Yifan Pu, Xin **, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dong** Liu, Ruijie Shan, Zheng** Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

    Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: LPCVC 2023, website: https://lpcv.ai/

  8. arXiv:2403.00833  [pdf, other

    cs.AI

    Position Paper: Agent AI Towards a Holistic Intelligence

    Authors: Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao

    Abstract: Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize develo** Agent AI -- an embodied system that… ▽ More

    Submitted 28 February, 2024; originally announced March 2024.

    Comments: 22 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2401.03568

  9. arXiv:2402.19270  [pdf, other

    cs.CV

    Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

    Authors: Rui Gong, Weide Liu, Zaiwang Gu, Xulei Yang, Jun Cheng

    Abstract: Geometric knowledge has been shown to be beneficial for the stereo matching task. However, prior attempts to integrate geometric insights into stereo matching algorithms have largely focused on geometric knowledge from single images while crucial cross-view factors such as occlusion and matching uniqueness have been overlooked. To address this gap, we propose a novel Intra-view and Cross-view Geom… ▽ More

    Submitted 6 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR2024

  10. arXiv:2402.13485  [pdf, other

    cs.LG cs.CL

    ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

    Authors: Shuzhang Zhong, Zebin Yang, Meng Li, Ruihao Gong, Runsheng Wang, Ru Huang

    Abstract: Recent advancements in generative large language models (LLMs) have significantly boosted the performance in natural language processing tasks. However, their efficiency is hampered by the inherent limitations in autoregressive token generation. While parallel decoding with token tree verification, e.g., Medusa, has been proposed to improve decoding parallelism and efficiency, it often struggles w… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  11. arXiv:2402.07066  [pdf, other

    cs.CR cs.LG stat.ME

    Differentially Private Range Queries with Correlated Input Perturbation

    Authors: Prathamesh Dharangutte, Jie Gao, Ruobin Gong, Guanyang Wang

    Abstract: This work proposes a class of locally differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database str… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 26 pages, 8 figures

  12. arXiv:2402.05929  [pdf, other

    cs.AI cs.LG cs.RO

    An Interactive Agent Foundation Model

    Authors: Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang

    Abstract: The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradi… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  13. arXiv:2401.03568  [pdf, other

    cs.AI cs.HC cs.LG

    Agent AI: Surveying the Horizons of Multimodal Interaction

    Authors: Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Ye** Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao

    Abstract: Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the a… ▽ More

    Submitted 25 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  14. arXiv:2311.16503  [pdf, other

    cs.CV cs.AI cs.LG

    TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

    Authors: Yushi Huang, Ruihao Gong, **g Liu, Tianlong Chen, Xianglong Liu

    Abstract: The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step $t$ to… ▽ More

    Submitted 11 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  15. arXiv:2310.13513  [pdf, other

    cs.PF

    Exploring the Potential of Flexible 8-bit Format: Design and Algorithm

    Authors: Zhuoyi Zhang, Yunchen Zhang, Gonglei Shi, Yu Shen, Ruihao Gong, Xiaoxu Xia, Qi Zhang, Lewei Lu, Xianglong Liu

    Abstract: Neural network quantization is widely used to reduce model inference complexity in real-world deployments. However, traditional integer quantization suffers from accuracy degradation when adapting to various dynamic ranges. Recent research has focused on a new 8-bit format, FP8, with hardware support for both training and inference of neural networks but lacks guidance for hardware design. In this… ▽ More

    Submitted 26 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

  16. arXiv:2310.08041  [pdf, other

    cs.CL cs.AI cs.LG

    QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

    Authors: **g Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

    Abstract: Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform t… ▽ More

    Submitted 6 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 camera ready; Code is available at https://github.com/ziplab/QLLM and https://github.com/ModelTC/QLLM

  17. arXiv:2309.09971  [pdf, other

    cs.AI cs.HC cs.MA

    MindAgent: Emergent Gaming Interaction

    Authors: Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao

    Abstract: Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks towards building general multi-agents collaboration infrastructure that encompass b… ▽ More

    Submitted 19 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: The first three authors contributed equally. 28 pages

  18. arXiv:2308.04269  [pdf, other

    cs.CV cs.AI

    Lossy and Lossless (L$^2$) Post-training Model Size Compression

    Authors: Yumeng Shi, Shihao Bai, Xiuying Wei, Ruihao Gong, Jianlei Yang

    Abstract: Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model size compression. However, these studies often approach various lossy and lossless compression methods in isolation, leading to challenges in achieving high com… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  19. arXiv:2308.00937  [pdf, other

    cs.RO cs.AI cs.MA

    LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

    Authors: Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai, Gaurav S. Sukhatme

    Abstract: Complex manipulation tasks often require robots with complementary capabilities to collaborate. We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting. LEMMA features 8 types of procedurally generated tasks with varying degree of complexity, some of… ▽ More

    Submitted 16 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 8 pages, 3 figures, accepted by RA-L

    Journal ref: IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6835-6842, Oct. 2023

  20. arXiv:2307.02138  [pdf, other

    cs.CV

    Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

    Authors: Rui Gong, Martin Danelljan, Han Sun, Julio Delgado Mangas, Luc Van Gool

    Abstract: While originally designed for image generation, diffusion models have recently shown to provide excellent pretrained feature representations for semantic segmentation. Intrigued by this result, we set out to explore how well diffusion-pretrained representations generalize to new domains, a crucial ability for any representation. We find that diffusion-pretraining achieves extraordinary domain gene… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: 17 pages, 3 figures, 11 tables

  21. arXiv:2307.00280  [pdf, other

    cs.LG cs.AI cs.CV

    SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

    Authors: Yan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu

    Abstract: Extensive studies have shown that deep learning models are vulnerable to adversarial and natural noises, yet little is known about model robustness on noises caused by different system implementations. In this paper, we for the first time introduce SysNoise, a frequently occurred but often overlooked noise in the deep learning training-deployment cycle. In particular, SysNoise happens when the sou… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: Proceedings of Machine Learning and Systems. 2023 Mar 18

    Journal ref: Proceedings of Machine Learning and Systems 2023

  22. arXiv:2306.04385  [pdf, other

    cs.CV

    SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with Efficient Labeled Data Factory

    Authors: Han Sun, Rui Gong, Konrad Schindler, Luc Van Gool

    Abstract: Domain adaptive object detection aims to leverage the knowledge learned from a labeled source domain to improve the performance on an unlabeled target domain. Prior works typically require the access to the source domain data for adaptation, and the availability of sufficient data on the target domain. However, these assumptions may not hold due to data privacy and rare data collection. In this pa… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  23. arXiv:2305.00970  [pdf, other

    cs.CV

    ArK: Augmented Reality with Knowledge Interactive Emergent Ability

    Authors: Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Ye** Choi, Jianfeng Gao

    Abstract: Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite a… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Report number: EFI-94-11

  24. arXiv:2304.09145  [pdf, other

    cs.CL

    Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

    Authors: Xiuying Wei, Yunchen Zhang, Yuhang Li, Xiangguo Zhang, Ruihao Gong, **yang Guo, Xianglong Liu

    Abstract: Post-training quantization~(PTQ) of transformer language models faces significant challenges due to the existence of detrimental outliers in activations. We observe that these outliers are concentrated in specific channels and are asymmetric across channels. To address this issue, we propose the Outlier Suppression+~(OS+) framework, which contains the channel-wise shifting for asymmetry and channe… ▽ More

    Submitted 23 October, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted to EMNLP23 (main)

  25. arXiv:2304.04321  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

    Authors: Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

    Abstract: Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete (e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's abil… ▽ More

    Submitted 11 September, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: The first two authors contributed equally; 20 pages; 17 figures; project availalbe: https://arnold-benchmark.github.io/ ICCV 2023

  26. arXiv:2212.07292  [pdf, other

    cs.CV

    One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers

    Authors: Rui Gong, Qin Wang, Dengxin Dai, Luc Van Gool

    Abstract: Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. It can save the cost of manually labeling data in real-world applications such as robot vision and autonomous driving. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for th… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 15 pages, 6 figures, 10 Tables

  27. arXiv:2212.00936  [pdf, other

    cs.CR stat.AP

    Integer Subspace Differential Privacy

    Authors: Prathamesh Dharangutte, Jie Gao, Ruobin Gong, Fang-Yi Yu

    Abstract: We propose new differential privacy solutions for when external \emph{invariants} and \emph{integer} constraints are simultaneously enforced on the data product. These requirements arise in real world applications of private data curation, including the public release of the 2020 U.S. Decennial Census. They pose a great challenge to the production of provably private data products with adequate st… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023

  28. arXiv:2210.05990  [pdf, other

    cs.CV

    GGViT:Multistream Vision Transformer Network in Face2Face Facial Reenactment Detection

    Authors: Haotian Wu, Peipei Wang, Xin Wang, Ji Xiang, Rui Gong

    Abstract: Detecting manipulated facial images and videos on social networks has been an urgent problem to be solved. The compression of videos on social media has destroyed some pixel details that could be used to detect forgeries. Hence, it is crucial to detect manipulated faces in videos of different quality. We propose a new multi-stream network architecture named GGViT, which utilizes global information… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 6 pages,4 figures,to be published in ICPR2022

  29. arXiv:2209.14105  [pdf, other

    cs.LG

    Exploring the Relationship between Architecture and Adversarially Robust Generalization

    Authors: Aishan Liu, Shiyu Tang, Siyuan Liang, Ruihao Gong, Boxi Wu, Xianglong Liu, Dacheng Tao

    Abstract: Adversarial training has been demonstrated to be one of the most effective remedies for defending adversarial examples, yet it often suffers from the huge robustness generalization gap on unseen testing adversaries, deemed as the adversarially robust generalization problem. Despite the preliminary understandings devoted to adversarially robust generalization, little is known from the architectural… ▽ More

    Submitted 10 March, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

  30. arXiv:2209.13325  [pdf, other

    cs.LG

    Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

    Authors: Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang Zhang, Qi Zhang, Fengwei Yu, Xianglong Liu

    Abstract: Transformer architecture has become the fundamental element of the widespread natural language processing~(NLP) models. With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices. Therefore, transformer quantization attracts wide research interest. Recent work recognizes that structured outliers are the critical bo… ▽ More

    Submitted 21 February, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by NeurIPS (spotlight) 2022

  31. arXiv:2203.13919  [pdf

    eess.AS cs.AI

    Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

    Authors: Dushyant Sharma, Rong Gong, James Fosburgh, Stanislav Yu. Kruchinin, Patrick A. Naylor, Ljubomir Milanovic

    Abstract: We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem. We show that the proposed system used as part of a ContextNet based end-to-end (E2E) ASR system outperforms… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: to be presented at ICASSP 2022

  32. arXiv:2203.05740  [pdf, other

    cs.CV cs.AI

    QDrop: Randomly Drop** Quantization for Extremely Low-bit Post-Training Quantization

    Authors: Xiuying Wei, Ruihao Gong, Yuhang Li, Xianglong Liu, Fengwei Yu

    Abstract: Recently, post-training quantization (PTQ) has driven much attention to produce efficient neural networks without long-time retraining. Despite its low cost, current PTQ works tend to fail under the extremely low-bit setting. In this study, we pioneeringly confirm that properly incorporating activation quantization into the PTQ reconstruction benefits the final accuracy. To deeply understand the i… ▽ More

    Submitted 21 February, 2023; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: Accepted by ICLR 2022

  33. DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

    Authors: Xiaofeng Gao, Qiaozi Gao, Ran Gong, Kaixiang Lin, Govind Thattai, Gaurav S. Sukhatme

    Abstract: Language-guided Embodied AI benchmarks requiring an agent to navigate an environment and manipulate objects typically allow one-way communication: the human user gives a natural language command to the agent, and the agent can only follow the command passively. We present DialFRED, a dialogue-enabled embodied instruction following benchmark based on the ALFRED benchmark. DialFRED allows an agent t… ▽ More

    Submitted 15 August, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

    Comments: 8 pages, 5 figures, accepted by RA-L

    Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10049-10056, Oct. 2022

  34. arXiv:2111.03759  [pdf, other

    cs.LG cs.CV

    MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

    Authors: Yuhang Li, Mingzhu Shen, Jian Ma, Yan Ren, Mingxin Zhao, Qi Zhang, Ruihao Gong, Fengwei Yu, Junjie Yan

    Abstract: Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Mode… ▽ More

    Submitted 25 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted by 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks

  35. arXiv:2109.12338  [pdf, other

    cs.CV

    Distribution-sensitive Information Retention for Accurate Binary Neural Network

    Authors: Haotong Qin, Xiangguo Zhang, Ruihao Gong, Yifu Ding, Yi Xu, Xianglong Liu

    Abstract: Model binarization is an effective method of compressing neural networks and accelerating their inference process. However, a significant performance gap still exists between the 1-bit model and the 32-bit one. The empirical study shows that binarization causes a great loss of information in the forward and backward propagation. We present a novel Distribution-sensitive Information Retention Netwo… ▽ More

    Submitted 23 September, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

    Journal ref: INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022

  36. arXiv:2109.05211  [pdf, other

    cs.CV

    RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

    Authors: Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, Dacheng Tao

    Abstract: Deep neural networks (DNNs) are vulnerable to adversarial noises, which motivates the benchmark of model robustness. Existing benchmarks mainly focus on evaluating defenses, but there are no comprehensive studies of how architecture design and training techniques affect robustness. Comprehensively benchmarking their relationships is beneficial for better understanding and develo** robust DNNs. T… ▽ More

    Submitted 13 January, 2022; v1 submitted 11 September, 2021; originally announced September 2021.

  37. arXiv:2109.04813  [pdf, other

    cs.CV

    TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation

    Authors: Rui Gong, Martin Danelljan, Dengxin Dai, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc Van Gool

    Abstract: Traditional domain adaptive semantic segmentation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many r… ▽ More

    Submitted 28 July, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted by ECCV 2022

  38. arXiv:2109.04783  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

    Authors: Rong Gong, Carl Quillen, Dushyant Sharma, Andrew Goderre, José Laínez, Ljubomir Milanović

    Abstract: When a sufficiently large far-field training data is presented, jointly optimizing a multichannel frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows promising results. Recent literature has shown traditional beamformer designs, such as MVDR (Minimum Variance Distortionless Response) or fixed beamformers can be successfully integrated as the frontend into an E2E ASR s… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: In Proceedings of Interspeech 2021

  39. arXiv:2109.00864  [pdf

    cs.CV

    Real World Robustness from Systematic Noise

    Authors: Yan Wang, Yuhang Li, Ruihao Gong

    Abstract: Systematic error, which is not determined by chance, often refers to the inaccuracy (involving either the observation or measurement process) inherent to a system. In this paper, we exhibit some long-neglected but frequent-happening adversarial examples caused by systematic error. More specifically, we find the trained neural network classifier can be fooled by inconsistent implementations of imag… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

  40. arXiv:2108.11527  [pdf, other

    cs.CR stat.AP

    Subspace Differential Privacy

    Authors: Jie Gao, Ruobin Gong, Fang-Yi Yu

    Abstract: Many data applications have certain invariant constraints due to practical needs. Data curators who employ differential privacy need to respect such constraints on the sanitized data product as a primary utility requirement. Invariants challenge the formulation, implementation, and interpretation of privacy guarantees. We propose subspace differential privacy, to honestly characterize the depend… ▽ More

    Submitted 29 April, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: 25 pages, 3 figures; Published in AAAI'22

  41. arXiv:2108.07582  [pdf, other

    cs.CV

    Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images

    Authors: Xiaochen Zheng, Benjamin Kellenberger, Rui Gong, Irena Hajnsek, Devis Tuia

    Abstract: Automated animal censuses with aerial imagery are a vital ingredient towards wildlife conservation. Recent models are generally based on deep learning and thus require vast amounts of training data. Due to their scarcity and minuscule size, annotating animals in aerial imagery is a highly tedious process. In this project, we present a methodology to reduce the amount of required training data by r… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

    Comments: accepted by 2021 IEEE/CVF International Conference on Computer Vision (ICCV) Workshops

  42. A Plant Root System Algorithm Based on Swarm Intelligence for One-dimensional Biomedical Signal Feature Engineering

    Authors: Rui Gong, Kazunori Hase

    Abstract: To date, very few biomedical signals have transitioned from research applications to clinical applications. This is largely due to the lack of trust in the diagnostic ability of non-stationary signals. To reach the level of clinical diagnostic application, classification using high-quality signal features is necessary. While there has been considerable progress in machine learning in recent years,… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

  43. arXiv:2106.06984  [pdf, other

    cs.LG

    A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural Networks Calibration

    Authors: Yuhang Li, Shikuang Deng, Xin Dong, Ruihao Gong, Shi Gu

    Abstract: Spiking Neural Network (SNN) has been recognized as one of the next generation of neural networks. Conventionally, SNN can be converted from a pre-trained ANN by only replacing the ReLU activation to spike activation while kee** the parameters intact. Perhaps surprisingly, in this work we show that a proper way to calibrate the parameters during the conversion of ANN to SNN can bring significant… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  44. arXiv:2105.04165  [pdf, other

    cs.CL cs.AI cs.CV cs.FL

    Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

    Authors: Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu

    Abstract: Geometry problem solving has attracted much attention in the NLP community recently. The task is challenging as it requires abstract problem understanding and symbolic reasoning with axiomatic knowledge. However, current datasets are either small in scale or not publicly available. Thus, we construct a new large-scale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotati… ▽ More

    Submitted 20 July, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted to ACL 2021, 13 pages, 6 figures

  45. arXiv:2103.01049  [pdf, other

    cs.CV

    Diversifying Sample Generation for Accurate Data-Free Quantization

    Authors: Xiangguo Zhang, Haotong Qin, Yifu Ding, Ruihao Gong, Qinghua Yan, Renshuai Tao, Yuhang Li, Fengwei Yu, Xianglong Liu

    Abstract: Quantization has emerged as one of the most prevalent approaches to compress and accelerate neural networks. Recently, data-free quantization has been widely studied as a practical and promising solution. It synthesizes data for calibrating the quantized model according to the batch normalization (BN) statistics of FP32 ones and significantly relieves the heavy dependency on real training data in… ▽ More

    Submitted 1 December, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  46. arXiv:2102.05426  [pdf, other

    cs.LG cs.CV

    BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

    Authors: Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, Shi Gu

    Abstract: We study the challenging task of neural network quantization without end-to-end retraining, called Post-training Quantization (PTQ). PTQ usually requires a small subset of training data but produces less powerful quantized models than Quantization-Aware Training (QAT). In this work, we propose a novel PTQ framework, dubbed BRECQ, which pushes the limits of bitwidth in PTQ down to INT2 for the firs… ▽ More

    Submitted 25 July, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

  47. arXiv:2101.06228  [pdf

    eess.IV cs.CV

    Task-driven Self-supervised Bi-channel Networks for Diagnosis of Breast Cancers with Mammography

    Authors: Ronglin Gong, Jun Wang, Jun Shi

    Abstract: Deep learning can promote the mammography-based computer-aided diagnosis (CAD) for breast cancers, but it generally suffers from the small sample size problem. Self-supervised learning (SSL) has shown its effectiveness in medical image analysis with limited training samples. However, the network model sometimes cannot be well pre-trained in the conventional SSL framework due to the limitation of t… ▽ More

    Submitted 30 August, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

  48. arXiv:2012.14011  [pdf, other

    cs.CL cs.AI

    SMART: A Situation Model for Algebra Story Problems via Attributed Grammar

    Authors: Yining Hong, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, Song-Chun Zhu

    Abstract: Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability. Previous neural solvers of math word problems directly translate problem texts into equations, lacking an explicit interpretation of the situations, and often fail to handle more sophisticated situatio… ▽ More

    Submitted 27 December, 2020; originally announced December 2020.

    Journal ref: AAAI2021

  49. arXiv:2012.08385  [pdf, other

    cs.CV

    mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets

    Authors: Rui Gong, Dengxin Dai, Yuhua Chen, Wen Li, Luc Van Gool

    Abstract: One challenge of object recognition is to generalize to new domains, to more classes and/or to new modalities. This necessitates methods to combine and reuse existing datasets that may belong to different domains, have partial annotations, and/or have different data modalities. This paper formulates this as a multi-source domain adaptation and label unification problem, and proposes a novel method… ▽ More

    Submitted 27 September, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: ICCV 2021 Camera-Ready

  50. arXiv:2012.08278  [pdf, other

    cs.CV

    Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation

    Authors: Rui Gong, Yuhua Chen, Danda Pani Paudel, Yawei Li, Ajad Chhatkuli, Wen Li, Dengxin Dai, Luc Van Gool

    Abstract: Open compound domain adaptation (OCDA) is a domain adaptation setting, where target domain is modeled as a compound of multiple unknown homogeneous domains, which brings the advantage of improved generalization to unseen domains. In this work, we propose a principled meta-learning based approach to OCDA for semantic segmentation, MOCDA, by modeling the unlabeled target domain continuously. Our app… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: 18 pages, 8 figures, 8 tables