Skip to main content

Showing 1–50 of 110 results for author: Zou, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15177  [pdf, other

    cs.LG cs.AI

    Diffusion Actor-Critic with Entropy Regulator

    Authors: Yinuo Wang, Likun Wang, Yuxuan Jiang, Wenjun Zou, Tong Liu, Xujie Song, Wenxuan Wang, Liming Xiao, Jiang Wu, **gliang Duan, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diff… ▽ More

    Submitted 15 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2405.13923  [pdf, other

    cs.CL

    Why Not Transform Chat Large Language Models to Non-English?

    Authors: Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, Shujian Huang

    Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo… ▽ More

    Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2405.05497  [pdf, other

    cs.CV

    Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

    Authors: Yunxiang Li, Wenbin Zou, Qiaomu Wei, Feng Huang, **g Wu

    Abstract: Stereo image super-resolution utilizes the cross-view complementary information brought by the disparity effect of left and right perspective images to reconstruct higher-quality images. Cascading feature extraction modules and cross-view feature interaction modules to make use of the information from stereo images is the focus of numerous methods. However, this adds a great deal of network parame… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, CVPRWorkshop NTIRE2024

  4. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi **, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, **g Lin, Alan Yuille, Ben Shao, ** Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  5. arXiv:2404.08938  [pdf, other

    cs.CL

    Enforcing Paraphrase Generation via Controllable Latent Diffusion

    Authors: Wei Zou, Ziyuan Zhuang, Shujian Huang, Jia Liu, Jiajun Chen

    Abstract: Paraphrase generation aims to produce high-quality and diverse utterances of a given text. Though state-of-the-art generation via the diffusion model reconciles generation quality and diversity, textual diffusion suffers from a truncation issue that hinders efficiency and quality control. In this work, we propose \textit{L}atent \textit{D}iffusion \textit{P}araphraser~(LDP), a novel paraphrase gen… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  6. arXiv:2404.08631  [pdf, other

    cs.CR

    FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models

    Authors: Yanting Wang, Wei Zou, **yuan Jia

    Abstract: Few-shot classification with foundation models (e.g., CLIP, DINOv2, PaLM-2) enables users to build an accurate classifier with a few labeled training samples (called support samples) for a classification task. However, an attacker could perform data poisoning attacks by manipulating some support samples such that the classifier makes the attacker-desired, arbitrary prediction for a testing input.… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: To appear in IEEE Symposium on Security and Privacy, 2024

  7. arXiv:2403.19080  [pdf, other

    cs.CV cs.CR

    MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

    Authors: Yanting Wang, Hongye Fu, Wei Zou, **yuan Jia

    Abstract: Different from a unimodal model whose input is from a single modality, the input (called multi-modal input) of a multi-modal model is from multiple modalities such as image, 3D points, audio, text, etc. Similar to unimodal models, many existing studies show that a multi-modal model is also vulnerable to adversarial perturbation, where an attacker could add small perturbation to all modalities of a… ▽ More

    Submitted 1 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: To appear in CVPR'24

  8. arXiv:2403.16059  [pdf, other

    stat.ML cs.LG math.OC

    Manifold Regularization Classification Model Based On Improved Diffusion Map

    Authors: Hongfu Guo, Wencheng Zou, Zeyu Zhang, Shuishan Zhang, Ruitong Wang, **tao Zhang

    Abstract: Manifold regularization model is a semi-supervised learning model that leverages the geometric structure of a dataset, comprising a small number of labeled samples and a large number of unlabeled samples, to generate classifiers. However, the original manifold norm limits the performance of models to local regions. To address this limitation, this paper proposes an approach to improve manifold reg… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 20 pages, 24figures

  9. arXiv:2403.12847  [pdf, other

    cs.LG

    Policy Bifurcation in Safe Reinforcement Learning

    Authors: Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, **gliang Duan, Xianyuan Zhan, **g**g Liu, Yaqin Zhang, Keqiang Li

    Abstract: Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous l… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  10. arXiv:2403.03145  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

    Authors: Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

    Abstract: Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips. Existing methods predominantly rely on self-supervised contrastive learning of audio-visual correspondence. Without any bounding-box annotations, they struggle to achieve precise localization, especially for small objects, and suffer from blurry boundaries and false positives.… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to NeurIPS2023

  11. arXiv:2403.03095  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

    Authors: Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

    Abstract: Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo-labeling. To address the issues with vanilla hard pseudo-labels including bias accumulation, noise sensitivity, and instability, we propose a novel method named Cross Pseudo-Labeling (XPL), wherein two models learn fro… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted To ICASSP2024

  12. arXiv:2402.17456  [pdf, other

    cs.HC cs.AI cs.CL

    A Piece of Theatre: Investigating How Teachers Design LLM Chatbots to Assist Adolescent Cyberbullying Education

    Authors: Michael A. Hedderich, Natalie N. Bazarova, Wenting Zou, Ryun Shim, Xinda Ma, Qian Yang

    Abstract: Cyberbullying harms teenagers' mental health, and teaching them upstanding intervention is crucial. Wizard-of-Oz studies show chatbots can scale up personalized and interactive cyberbullying education, but implementing such chatbots is a challenging and delicate task. We created a no-code chatbot design tool for K-12 teachers. Using large language models and prompt chaining, our tool allows teache… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  13. arXiv:2402.07867  [pdf, other

    cs.CR cs.LG

    PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models

    Authors: Wei Zou, Runpeng Geng, Binghui Wang, **yuan Jia

    Abstract: Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate those limitations. In particular, given a question, RAG retrieves relevant knowledge from… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Code is available at https://github.com/sleeepeer/PoisonedRAG

  14. arXiv:2401.16820  [pdf, other

    cs.CR

    Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

    Authors: Wenjie Qu, Dong Yin, Zixin He, Wei Zou, Tianyang Tao, **yuan Jia, Jiaheng Zhang

    Abstract: Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text gen… ▽ More

    Submitted 15 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  15. arXiv:2401.06838  [pdf, other

    cs.CL

    MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

    Authors: Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen

    Abstract: Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimizatio… ▽ More

    Submitted 13 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: The project is available at https://github.com/NJUNLP/MAPO

  16. arXiv:2401.02777  [pdf, other

    cs.CL cs.AI

    From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

    Authors: Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui

    Abstract: This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. I… ▽ More

    Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  17. arXiv:2312.15614  [pdf, other

    cs.SE cs.AI cs.CL

    A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Software Engineering Tasks

    Authors: Wentao Zou, Qi Li, Jidong Ge, Chuanyi Li, Xiaoyu Shen, Liguo Huang, Bin Luo

    Abstract: Pre-trained models (PTMs) have achieved great success in various Software Engineering (SE) downstream tasks following the ``pre-train then fine-tune'' paradigm. As fully fine-tuning all parameters of PTMs can be computationally expensive, a widely used solution is parameter-efficient fine-tuning (PEFT), which freezes PTMs while introducing extra parameters. Though work has been done to test PEFT m… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  18. arXiv:2312.08606  [pdf, other

    cs.CV

    VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook

    Authors: Wenbin Zou, Hongxia Gao, Tian Ye, Liang Chen, Weipeng Yang, Shasha Huang, Hongsheng Chen, Sixiang Chen

    Abstract: Night photography often struggles with challenges like low light and blurring, stemming from dark environments and prolonged exposures. Current methods either disregard priors and directly fitting end-to-end networks, leading to inconsistent illumination, or rely on unreliable handcrafted priors to constrain the network, thereby bringing the greater error to the final result. We believe in the str… ▽ More

    Submitted 16 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: This paper is accepted by AAAI2024

  19. arXiv:2311.09721  [pdf, other

    cs.CL

    On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

    Authors: Linyong Nan, Ellen Zhang, Wei** Zou, Yilun Zhao, Wenfei Zhou, Arman Cohan

    Abstract: This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models (LLMs) interact with a SQL interpreter. The task necessitates LLMs to strategically generate multiple SQL queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. Our findings high… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  20. arXiv:2311.08896  [pdf, other

    cs.CL

    HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation

    Authors: Junyi Bian, Xiaolei Qin, Wuhe Zou, Mengzuo Huang, Congyi Luo, Ke Zhang, Weidong Zhang

    Abstract: Large models have demonstrated significant progress across various domains, particularly in tasks related to text generation. In the domain of Table to Text, many Large Language Model (LLM)-based methods currently resort to modifying prompts to invoke public APIs, incurring potential costs and information leaks. With the advent of open-source large models, fine-tuning LLMs has become feasible. In… ▽ More

    Submitted 27 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  21. arXiv:2310.18075  [pdf, other

    cs.CL cs.AI

    DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

    Authors: Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui

    Abstract: Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow thinking respectively. The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the… ▽ More

    Submitted 24 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  22. arXiv:2309.14282  [pdf, other

    cs.CV

    Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

    Authors: Muxin Liao, Shishun Tian, Yuhang Zhang, Guoguang Hua, Wenbin Zou, Xia Li

    Abstract: Prototypical contrastive learning (PCL) has been widely used to learn class-wise domain-invariant features recently. These methods are based on the assumption that the prototypes, which are represented as the central value of the same class in a certain domain, are domain-invariant. Since the prototypes of different domains have discrepancies as well, the class-wise domain-invariant features learn… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ACM MM'23

  23. arXiv:2308.10855  [pdf, other

    cs.CL

    LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles

    Authors: Shulin Huang, Shirong Ma, Yinghui Li, Mengzuo Huang, Wuhe Zou, Weidong Zhang, Hai-Tao Zheng

    Abstract: With the continuous evolution and refinement of LLMs, they are endowed with impressive logical reasoning or vertical thinking capabilities. But can they think out of the box? Do they possess proficient lateral thinking abilities? Following the setup of Lateral Thinking Puzzles, we propose a novel evaluation benchmark, LatEval, which assesses the model's lateral thinking within an interactive frame… ▽ More

    Submitted 17 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by LREC-COLING 2024

  24. arXiv:2307.15290  [pdf, other

    cs.CL

    ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation

    Authors: Cheng Wen, Xianghui Sun, Shuaijiang Zhao, Xiaoquan Fang, Liangyu Chen, Wei Zou

    Abstract: This paper presents the development and evaluation of ChatHome, a domain-specific language model (DSLM) designed for the intricate field of home renovation. Considering the proven competencies of large language models (LLMs) like GPT-4 and the escalating fascination with home renovation, this study endeavors to reconcile these aspects by generating a dedicated model that can yield high-fidelity, p… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: ChatHome,DSLM for home renovation

  25. arXiv:2305.14303  [pdf, other

    cs.CL

    QTSumm: Query-Focused Summarization over Tabular Data

    Authors: Yilun Zhao, Zhenting Qi, Linyong Nan, Boyu Mi, Yixin Liu, Wei** Zou, Simeng Han, Ruizhe Chen, Xiangru Tang, Yumo Xu, Dragomir Radev, Arman Cohan

    Abstract: People primarily consult tables to conduct data analysis or answer specific questions. Text generation systems that can provide accurate table summaries tailored to users' information needs can facilitate more efficient access to relevant data insights. Motivated by this, we define a new query-focused table summarization task, where text generation models have to perform human-like reasoning and a… ▽ More

    Submitted 6 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023

  26. arXiv:2305.12586  [pdf, other

    cs.CL

    Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

    Authors: Linyong Nan, Yilun Zhao, Wei** Zou, Narutatsu Ri, Jaesung Tae, Ellen Zhang, Arman Cohan, Dragomir Radev

    Abstract: In-context learning (ICL) has emerged as a new approach to various natural language processing tasks, utilizing large language models (LLMs) to make predictions based on context that has been supplemented with a few examples or task-specific instructions. In this paper, we aim to extend this method to question answering tasks that utilize structured knowledge sources, and improve Text-to-SQL syste… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  27. arXiv:2305.08459  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG q-bio.NC

    Introduction to dynamical mean-field theory of randomly connected neural networks with bidirectionally correlated couplings

    Authors: Wenxuan Zou, Hai** Huang

    Abstract: Dynamical mean-field theory is a powerful physics tool used to analyze the typical behavior of neural networks, where neurons can be recurrently connected, or multiple layers of neurons can be stacked. However, it is not easy for beginners to access the essence of this tool and the underlying physics. Here, we give a pedagogical introduction of this method in a particular example of random neural… ▽ More

    Submitted 7 October, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: 27 pages, 5 figures, 44 references, revised version for SciPost Physics Lecture Notes

    Journal ref: SciPost Phys. Lect. Notes 79 (2024)

  28. arXiv:2304.06236  [pdf, other

    cs.CV

    Cross-View Hierarchy Network for Stereo Image Super-Resolution

    Authors: Wenbin Zou, Hongxia Gao, Liang Chen, Yunchen Zhang, Mingchao Jiang, Zhongxin Yu, Ming Tan

    Abstract: Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views. To attain superior performance, many methods have prioritized designing complex modules to fuse similar information across views, yet overlooking the importance of intra-view information for high-resolution reconstruction. It also leads to problems o… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 10 pages, 7 figures, CVPRW, NTIRE2023

  29. arXiv:2303.09773  [pdf, other

    eess.IV cs.CV

    Progressive Content-aware Coded Hyperspectral Compressive Imaging

    Authors: Xuanyu Zhang, Bin Chen, Wenzhen Zou, Shuai Liu, Yongbing Zhang, Ruiqin Xiong, Jian Zhang

    Abstract: Hyperspectral imaging plays a pivotal role in a wide range of applications, like remote sensing, medicine, and cytology. By acquiring 3D hyperspectral images (HSIs) via 2D sensors, the coded aperture snapshot spectral imaging (CASSI) has achieved great success due to its hardware-friendly implementation and fast imaging speed. However, for some less spectrally sparse scenes, single snapshot and un… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: a novel hyperspectral snapshot compressive imaging and restoration framework

  30. arXiv:2212.14154  [pdf, other

    cs.CV

    A Class-wise Non-salient Region Generalized Framework for Video Semantic Segmentation

    Authors: Yuhang Zhang, Shishun Tian, Muxin Liao, Zhengyu Zhang, Wenbin Zou, Chen Xu

    Abstract: Video semantic segmentation (VSS) is beneficial for dealing with dynamic scenes due to the continuous property of the real-world environment. On the one hand, some methods alleviate the predicted inconsistent problem between continuous frames. On the other hand, other methods employ the previous frame as the prior information to assist in segmenting the current frame. Although the previous methods… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  31. arXiv:2212.06369  [pdf, other

    cs.CL

    Technical Report -- Competition Solution for Prompt Tuning using Pretrained Language Model

    Authors: Jiang-Long Song, Wu-He Zou, Feng Li, Xiao-Lei Qin, Wei-Dong Zhang

    Abstract: Prompt tuning recently becomes a hot-spot in the applications of large pretrained language models on specific downstream tasks. Regarding the Language Model as a Service (LMaaS), black-box tuning using derivative-free optimization (DFO) provides a novel approach to expand the practical scenarios of pretrained models and enrich the researches of few-shot learning. In this report, we present our sol… ▽ More

    Submitted 20 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

  32. arXiv:2212.02846  [pdf, other

    cond-mat.stat-mech cond-mat.dis-nn cs.LG q-bio.NC

    Statistical mechanics of continual learning: variational principle and mean-field potential

    Authors: Chan Li, Zhenye Huang, Wenxuan Zou, Hai** Huang

    Abstract: An obstacle to artificial general intelligence is set by continual learning of multiple tasks of different nature. Recently, various heuristic tricks, both from machine learning and from neuroscience angles, were proposed, but they lack a unified theory ground. Here, we focus on continual learning in single-layered and multi-layered neural networks of binary weights. A variational Bayesian learnin… ▽ More

    Submitted 20 June, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: 48 pages, 8 figures, final version to Phys Rev E

    Journal ref: Phys. Rev. E 108, 014309 (2023)

  33. arXiv:2210.11237  [pdf, other

    cs.CR cs.AI cs.LG

    Emerging Threats in Deep Learning-Based Autonomous Driving: A Comprehensive Survey

    Authors: Hui Cao, Wenlong Zou, Yinkun Wang, Ting Song, Mengjun Liu

    Abstract: Since the 2004 DARPA Grand Challenge, the autonomous driving technology has witnessed nearly two decades of rapid development. Particularly, in recent years, with the application of new sensors and deep learning technologies extending to the autonomous field, the development of autonomous driving technology has continued to make breakthroughs. Thus, many carmakers and high-tech giants dedicated to… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 28 pages,10 figures

  34. arXiv:2210.07553  [pdf, other

    cs.RO cs.LG eess.SY

    Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

    Authors: Dongjie Yu, Wenjun Zou, Yujie Yang, Haitong Ma, Shengbo Eben Li, **gliang Duan, Jianyu Chen

    Abstract: Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an i… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: 12 pages, 6 figures

  35. arXiv:2209.15203  [pdf, other

    cs.LG cs.DC

    Downlink Compression Improves TopK Sparsification

    Authors: William Zou, Hans De Sterck, Jun Liu

    Abstract: Training large neural networks is time consuming. To speed up the process, distributed training is often used. One of the largest bottlenecks in distributed training is communicating gradients across different nodes. Different gradient compression techniques have been proposed to alleviate the communication bottleneck, including topK gradient sparsification, which truncates the gradient to the lar… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  36. arXiv:2208.11184  [pdf, other

    eess.IV cs.CV

    AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

    Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, **gzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

    Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More

    Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: Camera-ready version

  37. arXiv:2208.08509  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

    Authors: Goutham Rajendran, Wei Zou

    Abstract: We investigate robustness properties of pre-trained neural models for automatic speech recognition. Real life data in machine learning is usually very noisy and almost never clean, which can be attributed to various factors depending on the domain, e.g. outliers, random noise and adversarial noise. Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, whi… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 5 pages, 14 figures

  38. arXiv:2208.00817  [pdf, other

    cs.CV

    DSLA: Dynamic smooth label assignment for efficient anchor-free object detection

    Authors: Hu Su, Yonghao He, Rui Jiang, Jiabin Zhang, Wei Zou, Bin Fan

    Abstract: Anchor-free detectors basically formulate object detection as dense classification and regression. For popular anchor-free detectors, it is common to introduce an individual prediction branch to estimate the quality of localization. The following inconsistencies are observed when we delve into the practices of classification and quality estimation. Firstly, for some adjacent samples which are assi… ▽ More

    Submitted 29 September, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: single column, 33 pages, 7 figures, accepted by Pattern Recognition

  39. arXiv:2206.13329  [pdf, other

    cs.CV

    Prior-Guided One-shot Neural Architecture Search

    Authors: Peijie Dong, Xin Niu, Lujun Li, Linzhen Xie, Wenbin Zou, Tian Ye, Zimian Wei, Hengyue Pan

    Abstract: Neural architecture search methods seek optimal candidates with efficient weight-sharing supernet training. However, recent studies indicate poor ranking consistency about the performance between stand-alone architectures and shared-weight networks. In this paper, we present Prior-Guided One-shot NAS (PGONAS) to strengthen the ranking correlation of supernets. Specifically, we first explore the ef… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Official 3st Place Solution for the Second workshop Neural Architecture Search Second lightweight NAS Challenge 2022 - Track1 Supernet Track. Official leaderboard: https://aistudio.baidu.com/aistudio/competition/detail/149/0/leaderboard CVPR 2022 Workshop: https://cvpr-nas.com/competition

  40. arXiv:2205.12467  [pdf, other

    cs.CL

    R2D2: Robust Data-to-Text with Replacement Detection

    Authors: Linyong Nan, Lorenzo Jaime Yu Flores, Yilun Zhao, Yixin Liu, Luke Benson, Wei** Zou, Dragomir Radev

    Abstract: Unfaithful text generation is a common problem for text generation systems. In the case of Data-to-Text (D2T) systems, the factuality of the generated text is particularly crucial for any real-world applications. We introduce R2D2, a training framework that addresses unfaithful Data-to-Text generation by training a system both as a generator and a faithfulness discriminator with additional replace… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  41. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  42. arXiv:2204.08913  [pdf, other

    cs.CV

    Self-Calibrated Efficient Transformer for Lightweight Super-Resolution

    Authors: Wenbin Zou, Tian Ye, Weixin Zheng, Yunchen Zhang, Liang Chen, Yi Wu

    Abstract: Recently, deep learning has been successfully applied to the single-image super-resolution (SISR) with remarkable performance. However, most existing methods focus on building a more complex network with a large number of layers, which can entail heavy computational costs and memory storage. To address this problem, we present a lightweight Self-Calibrated Efficient Transformer (SCET) network to s… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 10 pages, 3 figures, CVPRWorkshop

  43. arXiv:2204.08720  [pdf, other

    eess.AS cs.CR cs.SD

    Audio Deep Fake Detection System with Neural Stitching for ADD 2022

    Authors: Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

    Abstract: This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. The first round of Track 3.2 data is generated from Text-to-Speech(TTS) or voice conversion (VC) algorithms, while the second round of data consists of… ▽ More

    Submitted 19 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Accepted to ICASSP 2022

  44. arXiv:2204.08692  [pdf, other

    eess.AS cs.CR cs.SD

    Time Domain Adversarial Voice Conversion for ADD 2022

    Authors: Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

    Abstract: In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) system to convert source speech with arbitrary language content into the target speaker%u2019s fake speech. Then the converted speech generated from VC is post-processed in the time domain to improve the deception ability.… ▽ More

    Submitted 19 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Accepted to ICASSP 2022

  45. arXiv:2204.08686  [pdf, ps, other

    cs.SD eess.AS

    Audio-Visual Wake Word Spotting System For MISP Challenge 2021

    Authors: Yanguang Xu, Jianwei Sun, Yang Han, Shuaijiang Zhao, Chaoyang Mei, Tingwei Guo, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

    Abstract: This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. The purpose of Task 1 is to leverage both audio and video information to improve the environmental robustness of far-field wake word spotting. In the proposed system, firstly, we take advantage of speech enhancement algorithms such as beamforming and weight… ▽ More

    Submitted 19 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Accepted to ICASSP 2022

  46. arXiv:2203.08667  [pdf, other

    cs.CV

    Graph Flow: Cross-layer Graph Flow Distillation for Dual Efficient Medical Image Segmentation

    Authors: Wenxuan Zou, Muyi Sun

    Abstract: With the development of deep convolutional neural networks, medical image segmentation has achieved a series of breakthroughs in recent years. However, the high-performance convolutional neural networks always mean numerous parameters and high computation costs, which will hinder the applications in clinical scenarios. Meanwhile, the scarceness of large-scale annotated medical image datasets furth… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

  47. arXiv:2112.12322  [pdf

    cs.ET physics.optics

    High-order tensor flow processing using integrated photonic circuits

    Authors: Shaofu Xu, **g Wang, Sicheng Yi, Weiwen Zou

    Abstract: Tensor analytics lays mathematical basis for the prosperous promotion of multiway signal processing. To increase computing throughput, mainstream processors transform tensor convolutions to matrix multiplications to enhance parallelism of computing. However, such order-reducing transformation produces data duplicates and consumes additional memory. Here, we demonstrate an integrated photonic tenso… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

  48. arXiv:2110.05803  [pdf, other

    eess.IV cs.CV

    SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring

    Authors: Wenbin Zou, Mingchao Jiang, Yunchen Zhang, Liang Chen, Zhiyong Lu, Yi Wu

    Abstract: Image deblurring is a classical computer vision problem that aims to recover a sharp image from a blurred image. To solve this problem, existing methods apply the Encode-Decode architecture to design the complex networks to make a good performance. However, most of these methods use repeated up-sampling and down-sampling structures to expand the receptive field, which results in texture informatio… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 10 pages, 7 figures, ICCVW 2021

  49. arXiv:2109.12418  [pdf

    cs.ET physics.optics

    Comb-based photonic neural population for parallel and nonlinear processing

    Authors: Bowen Ma, Junfeng Zhang, Weiwen Zou

    Abstract: It is believed that neural information representation and processing relies on the neural population instead of a single neuron. In neuromorphic photonics, photonic neurons in the form of nonlinear responses have been extensively studied in single devices and temporal nodes. However, to construct a photonic neural population (PNP), the process of scaling up and massive interconnections remain chal… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

    Comments: 8 pages, 8 figures

  50. arXiv:2109.03555  [pdf, other

    cs.SE

    BLESER: Bug Localization Based on Enhanced Semantic Retrieval

    Authors: Weiqin Zou, Enming Li, Chunrong Fang

    Abstract: Static bug localization techniques that locate bugs at method granularity have gained much attention from both researchers and practitioners. For a static method-level bug localization technique, a key but challenging step is to fully retrieve the semantics of methods and bug reports. Currently, existing studies mainly use the same bag-of-word space to represent the semantics of methods and bug re… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.