Skip to main content

Showing 1–50 of 136 results for author: Wen, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14797  [pdf, other

    cs.CV cs.AI

    Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

    Authors: Jiangbo Pei, Zhuqing Jiang, Aidong Men, Haiying Wang, Haiyong Luo, Shi** Wen

    Abstract: Single-camera-training person re-identification (SCT re-ID) aims to train a re-ID model using SCT datasets where each person appears in only one camera. The main challenge of SCT re-ID is to learn camera-invariant feature representations without cross-camera same-person (CCSP) data as supervision. Previous methods address it by assuming that the most similar person should be found in another camer… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.14217  [pdf, other

    cs.LG cs.CR

    Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning

    Authors: Yu**g Wang, Hainan Zhang, Sijia Wen, Wangjie Qiu, Binghui Guo

    Abstract: Federated learning is highly susceptible to model poisoning attacks, especially those meticulously crafted for servers. Traditional defense methods mainly focus on updating assessments or robust aggregation against manually crafted myopic attacks. When facing advanced attacks, their defense stability is notably insufficient. Therefore, it is imperative to develop adaptive defenses against such adv… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.11422  [pdf, other

    cs.LG

    Cross-domain Open-world Discovery

    Authors: Shuo Wen, Maria Brbic

    Abstract: In many real-world applications, test data may commonly exhibit categorical shifts, characterized by the emergence of novel classes, as well as distribution shifts arising from feature distributions different from the ones the model was trained on. However, existing methods either discover novel classes in the open-world setting or assume domain shifts without the ability to discover novel classes… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures, 24 tables

  4. arXiv:2406.06305  [pdf, other

    cs.CV cs.AI

    NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural Networks

    Authors: Yuqi Ma, Huamin Wang, Hangchi Shen, Xuemei Chen, Shukai Duan, Shi** Wen

    Abstract: Recently, brain-inspired spiking neural networks (SNNs) have attracted great research attention owing to their inherent bio-interpretability, event-triggered properties and powerful perception of spatiotemporal information, which is beneficial to handling event-based neuromorphic datasets. In contrast to conventional static image datasets, event-based neuromorphic datasets present heightened compl… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 32 pages,4 figures,4 tables

  5. arXiv:2406.02630  [pdf, other

    cs.CR cs.AI

    AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways

    Authors: Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, Yang Xiang

    Abstract: An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or makes decisions based on pre-defined objectives and data inputs. AI agents, capable of perceiving user inputs, reasoning and planning tasks, and executing actions, have seen remarkable advancements in algorithm development and task performance. However, the security challenges they pose remain under-expl… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ACM Computing Survey

  6. arXiv:2406.01196  [pdf, other

    cs.CV cs.AI

    3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information

    Authors: Sihan Wen, Xiantan Zhu, Zhiming Tan

    Abstract: In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global contex… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2405.21050  [pdf, other

    cs.CV cs.LG

    Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

    Authors: Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. Metaxas

    Abstract: Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and the… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  8. arXiv:2405.15258  [pdf, other

    cs.CR

    Leakage-Resilient and Carbon-Neutral Aggregation Featuring the Federated AI-enabled Critical Infrastructure

    Authors: Zehang Deng, Ruoxi Sun, Minhui Xue, Sheng Wen, Seyit Camtepe, Surya Nepal, Yang Xiang

    Abstract: AI-enabled critical infrastructures (ACIs) integrate artificial intelligence (AI) technologies into various essential systems and services that are vital to the functioning of society, offering significant implications for efficiency, security and resilience. While adopting decentralized AI approaches (such as federated learning technology) in ACIs is plausible, private and sensitive data are stil… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  9. arXiv:2405.14660  [pdf, other

    cs.LG cs.AI cs.CL

    Implicit In-context Learning

    Authors: Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao Wang, Dimitris N. Metaxas

    Abstract: In-context Learning (ICL) empowers large language models (LLMs) to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is susceptible to the selection and order of demonstration examples. In this work, we introduce Implicit In-con… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  10. arXiv:2405.12706  [pdf, other

    cs.IR

    Disentangled Representation with Cross Experts Covariance Loss for Multi-Domain Recommendation

    Authors: Zhutian Lin, Junwei Pan, Haibin Yu, Xi Xiao, Ximei Wang, Zhixiang Feng, Shifeng Wen, Shudong Huang, Lei Xiao, Jie Jiang

    Abstract: Multi-domain learning (MDL) has emerged as a prominent research area aimed at enhancing the quality of personalized services. The key challenge in MDL lies in striking a balance between learning commonalities across domains while preserving the distinct characteristics of each domain. However, this gives rise to a challenging dilemma. On one hand, a model needs to leverage domain-specific modules,… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  11. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  12. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  13. arXiv:2404.04458  [pdf, other

    cs.CV

    JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

    Authors: Simindokht Jahangard, Zhixi Cai, Shiki Wen, Hamid Rezatofighi

    Abstract: Understanding human social behaviour is crucial in computer vision and robotics. Micro-level observations like individual actions fall short, necessitating a comprehensive approach that considers individual behaviour, intra-group dynamics, and social group levels for a thorough understanding. To address dataset limitations, this paper introduces JRDB-Social, an extension of JRDB. Designed to fill… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://jrdb.erc.monash.edu/dataset/social

  14. arXiv:2403.12544  [pdf, other

    cs.LG

    AffineQuant: Affine Transformation Quantization for Large Language Models

    Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

    Abstract: The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the cont… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  15. arXiv:2401.10061  [pdf, other

    cs.CV cs.AI

    DiffusionGPT: LLM-Driven Text-to-Image Generation System

    Authors: Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Huixia Li, Hefeng Wu, Xuefeng Xiao, Rui Wang, Shilei Wen

    Abstract: Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms. However, a major challenge persists in current text-to-image systems are often unable to handle diverse inputs, or are limited to single model results. Current unified attempts often fall into two orthogonal aspects: i) parse Diverse… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  16. arXiv:2311.02995  [pdf, other

    cs.CV cs.GR

    Zero-Shot Enhancement of Low-Light Image Based on Retinex Decomposition

    Authors: Wenchao Li, Bangshu Xiong, Qiaofeng Ou, Xiaoyun Long, **hao Zhu, Jiabao Chen, Shuyuan Wen

    Abstract: Two difficulties here make low-light image enhancement a challenging task; firstly, it needs to consider not only luminance restoration but also image contrast, image denoising and color distortion issues simultaneously. Second, the effectiveness of existing low-light enhancement methods depends on paired or unpaired training data with poor generalization performance. To solve these difficult pr… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 16 pages, 66 figures, TCSVT

  17. arXiv:2310.10962  [pdf, other

    cs.CL

    Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning

    Authors: Huiming Wang, Zhaodonghui Li, Liying Cheng, Soh De Wen, Lidong Bing

    Abstract: Recently, large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task. Existing methods have explored utilizing LLMs as data annotators to generate synthesized data for training contrastive learning based sentence embedding models such… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: NAACL 2024

  18. arXiv:2310.10462  [pdf, other

    cs.LG

    Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

    Authors: Yunli Wang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Dongying Kong, Han Li, Kun Gai

    Abstract: Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems, and learning-to-rank is an important way to optimize the models in cascade ranking. Previous works on learning-to-rank usually focus on letting the model learn the complete order or top-k order, and adopt the corresponding rank metrics (e.g. OPA and NDCG@k) as optimization targ… ▽ More

    Submitted 21 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 12 pages, Accepted by www2024

  19. arXiv:2310.06311  [pdf, other

    cs.CV cs.MM

    Improving Compositional Text-to-image Generation with Large Vision-Language Models

    Authors: Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas

    Abstract: Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships. To address this limitation, we employ large vision-lan… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  20. arXiv:2310.02529  [pdf, other

    cs.SI cs.AI cs.HC

    MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

    Authors: Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang

    Abstract: We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users,… ▽ More

    Submitted 20 February, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: To appear at AAAI'24. System demo video and more info: info-pathways.github.io

  21. arXiv:2309.03905  [pdf, other

    cs.MM cs.CL cs.CV cs.LG cs.SD eess.AS

    ImageBind-LLM: Multi-modality Instruction Tuning

    Authors: Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

    Abstract: We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training… ▽ More

    Submitted 11 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Code is available at https://github.com/OpenGVLab/LLaMA-Adapter

  22. SHAPFUZZ: Efficient Fuzzing via Shapley-Guided Byte Selection

    Authors: Kunpeng Zhang, Xiaogang Zhu, Xi Xiao, Minhui Xue, Chao Zhang, Sheng Wen

    Abstract: Mutation-based fuzzing is popular and effective in discovering unseen code and exposing bugs. However, only a few studies have concentrated on quantifying the importance of input bytes, which refers to the degree to which a byte contributes to the discovery of new code. They often focus on obtaining the relationship between input bytes and path constraints, ignoring the fact that not all constrain… ▽ More

    Submitted 22 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Journal ref: Network and Distributed System Security (NDSS) Symposium 2024, 26 February - 1 March 2024, San Diego, CA, USA

  23. arXiv:2308.03684  [pdf, other

    eess.AS cs.SD

    Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm

    Authors: Dongyuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, Xiaoyi Shen

    Abstract: Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: Conference: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2020 At Korea Volume: 261

  24. arXiv:2308.00591  [pdf, other

    cs.CV

    Visibility Enhancement for Low-light Hazy Scenarios

    Authors: Chaoqun Zhuang, Yunfei Liu, Sijia Wen, Feng Lu

    Abstract: Low-light hazy scenes commonly appear at dusk and early morning. The visual enhancement for low-light hazy images is an ill-posed problem. Even though numerous methods have been proposed for image dehazing and low-light enhancement respectively, simply integrating them cannot deliver pleasing results for this particular task. In this paper, we present a novel method to enhance visibility for low-l… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  25. NEON: Living Needs Prediction System in Meituan

    Authors: Xiaochong Lan, Chen Gao, Shiqi Wen, Xiuqi Chen, Yingge Che, Han Zhang, Huazhou Wei, Hengliang Luo, Yong Li

    Abstract: Living needs refer to the various needs in human's daily lives for survival and well-being, including food, housing, entertainment, etc. On life service platforms that connect users to service providers, such as Meituan, the problem of living needs prediction is fundamental as it helps understand users and boost various downstream applications such as personalized recommendation. However, the prob… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  26. arXiv:2307.00828  [pdf, other

    eess.SY cs.LG math.OC

    Model-Assisted Probabilistic Safe Adaptive Control With Meta-Bayesian Learning

    Authors: Shengbo Wang, Ke Li, Yin Yang, Yuting Cao, Tingwen Huang, Shi** Wen

    Abstract: Breaking safety constraints in control systems can lead to potential risks, resulting in unexpected costs or catastrophic damage. Nevertheless, uncertainty is ubiquitous, even among similar tasks. In this paper, we develop a novel adaptive safe control framework that integrates meta learning, Bayesian models, and control barrier function (CBF) method. Specifically, with the help of CBF method, we… ▽ More

    Submitted 13 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

  27. arXiv:2306.05414  [pdf, other

    cs.CV

    Improving Tuning-Free Real Image Editing with Proximal Guidance

    Authors: Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Anastasis Stathopoulos, Xiaoxiao He, Yuxiao Chen, Di Liu, Qilong Zhangli, **dong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas

    Abstract: DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing… ▽ More

    Submitted 5 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Added inversion guidance, and fixed typos

  28. arXiv:2305.15591  [pdf, other

    cs.LG

    Lightweight Learner for Shared Knowledge Lifelong Learning

    Authors: Yunhao Ge, Yuecheng Li, Di Wu, Ao Xu, Adam M. Jones, Amanda Sofie Rios, Iordanis Fostiropoulos, Shixian Wen, Po-Hsuan Huang, Zachary William Murdock, Gozde Sahin, Shuo Ni, Kiran Lekkala, Sumedh Anand Sontakke, Laurent Itti

    Abstract: In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentral… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Transactions on Machine Learning Research (TMLR) paper

  29. arXiv:2305.13122  [pdf, other

    cs.LG

    Policy Representation via Diffusion Probability Model for Reinforcement Learning

    Authors: Long Yang, Zhixiong Huang, Fenghao Lei, Yucun Zhong, Yiming Yang, Cong Fang, Shiting Wen, Binbin Zhou, Zhouchen Lin

    Abstract: Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to learn complicated multimodal distributions, which has shown promising and potential applications to RL. In this paper, we formally build a theoretical foundation of… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  30. arXiv:2305.12747  [pdf, other

    cs.CR

    The "code'' of Ethics:A Holistic Audit of AI Code Generators

    Authors: Wanlun Ma, Yiliao Song, Minhui Xue, Sheng Wen, Yang Xiang

    Abstract: AI-powered programming language generation (PLG) models have gained increasing attention due to their ability to generate source code of programs in a few seconds with a plain program description. Despite their remarkable performance, many concerns are raised over the potential risks of their development and deployment, such as legal issues of copyright infringement induced by training usage of li… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  31. arXiv:2303.17225  [pdf, other

    cs.CV

    FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

    Authors: Jie Qin, Jie Wu, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang

    Abstract: Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios. However, existing methods devote to designing specialized architectures or parameters for specific segmentation tasks. These customized design paradigms lead to fragmentation between v… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023; camera-ready version

  32. arXiv:2303.15718  [pdf, other

    cs.CV cs.AI

    MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction

    Authors: Congyi Wang, Feida Zhu, Shilei Wen

    Abstract: Existing methods proposed for hand reconstruction tasks usually parameterize a generic 3D hand model or predict hand mesh positions directly. The parametric representations consisting of hand shapes and rotational poses are more stable, while the non-parametric methods can predict more accurate mesh positions. In this paper, we propose to reconstruct meshes and estimate MANO parameters of two hand… ▽ More

    Submitted 16 April, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

  33. arXiv:2303.11906  [pdf, other

    cs.CV

    Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

    Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Xuefeng Xiao, Rui Wang, Shilei Wen, Xin Pan, Fei Chao, Rongrong Ji

    Abstract: Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to… ▽ More

    Submitted 4 April, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  34. arXiv:2303.04627  [pdf, other

    cs.DB

    Fairness-driven Skilled Task Assignment with Extra Budget in Spatial Crowdsourcing

    Authors: Yunjun Zhou, Shuhan Wan, Detian Zhang, Shiting Wen

    Abstract: With the prevalence of mobile devices and ubiquitous wireless networks, spatial crowdsourcing has attracted much attention from both academic and industry communities. On spatial crowdsourcing platforms, task requesters can publish spatial tasks and workers need to move to destinations to perform them. In this paper, we formally define the Skilled Task Assignment with Extra Budget (STAEB), which a… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  35. arXiv:2303.03667  [pdf, other

    cs.CV

    Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

    Authors: Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S. -H. Gary Chan

    Abstract: To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstra… ▽ More

    Submitted 21 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  36. arXiv:2302.05637  [pdf, other

    cs.CV

    Dual Relation Knowledge Distillation for Object Detection

    Authors: Zhenliang Ni, Fukui Yang, Shengzhao Wen, Gang Zhang

    Abstract: Knowledge distillation is an effective method for model compression. However, it is still a challenging topic to apply knowledge distillation to detection tasks. There are two key points resulting in poor distillation performance for detection tasks. One is the serious imbalance between foreground and background features, another one is that small object lacks enough feature representation. To sol… ▽ More

    Submitted 1 June, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted by IJCAI-2023

  37. arXiv:2212.11761  [pdf

    cs.NI

    Optical Bar Code for Internet Access Application based on Optical camera communication and Bluetooth Control

    Authors: Shangsheng Wen, Manxi Liu, Yanyi Chen, Yirong Chen, Futong An, Yingcong Chen, Weipeng Guan

    Abstract: We demonstrate an internet access application based on optical camera communication and bluetooth. The app will access the website while the camera in the phone receives the optical signal. \c{opyright} 2022 The Author(s)

    Submitted 31 October, 2022; originally announced December 2022.

    Comments: 3 pages, 1 figure

  38. arXiv:2212.07896  [pdf

    cs.NI

    Modern Location-based Service Technologies: Visible Light Positioning

    Authors: Shangsheng Wen, Yingcong Chen

    Abstract: With the development of wireless communications and the increasing computing power of variety mobile devices, LBS (Location Based Service) technologies getting more and more attention as it can provide most flexibility and convenience in modern people' s life. For this survey, we will first give a comprehensive introduction about LBS, including definition, advantages, application, and potential pr… ▽ More

    Submitted 5 November, 2022; originally announced December 2022.

    Comments: 5 pages, 2 figures, 1 table

  39. arXiv:2212.00089  [pdf, other

    cs.AR cs.ET

    Ferroelectric FET based Context-Switching FPGA Enabling Dynamic Reconfiguration for Adaptive Deep Learning Machines

    Authors: Yixin Xu, Zijian Zhao, Yi Xiao, Tongguang Yu, Halid Mulaosmanovic, Dominik Kleimaier, Stefan Duenkel, Sven Beyer, Xiao Gong, Rajiv Joshi, X. Sharon Hu, Shixian Wen, Amanda Sofie Rios, Kiran Lekkala, Laurent Itti, Eric Homan, Sumitha George, Vijaykrishnan Narayanan, Kai Ni

    Abstract: Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perfor… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

    Comments: 54 pages, 15 figures

  40. arXiv:2211.08071  [pdf, other

    cs.CV

    Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

    Authors: Yu Wang, Xin Li, Shengzhao Wen, Fukui Yang, Wan** Zhang, Gang Zhang, Haocheng Feng, Junyu Han, Errui Ding

    Abstract: DETR is a novel end-to-end transformer architecture object detector, which significantly outperforms classic detectors when scaling up the model size. In this paper, we focus on the compression of DETR with knowledge distillation. While knowledge distillation has been well-studied in classic detectors, there is a lack of researches on how to make it work effectively on DETR. We first provide exper… ▽ More

    Submitted 15 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

  41. arXiv:2209.11715  [pdf, other

    cs.CR cs.AI

    The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram Matrices

    Authors: Wanlun Ma, Derui Wang, Ruoxi Sun, Minhui Xue, Sheng Wen, Yang Xiang

    Abstract: Deep Neural Networks (DNNs) are susceptible to backdoor attacks during training. The model corrupted in this way functions normally, but when triggered by certain patterns in the input, produces a predefined target label. Existing defenses usually rely on the assumption of the universal backdoor setting in which poisoned samples share the same uniform trigger. However, recent advanced backdoor att… ▽ More

    Submitted 18 December, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: 18 pages, 23 figures. Accepted to NDSS 2023. Camera-ready version. Code availability: https://github.com/wanlunsec/Beatrix

  42. arXiv:2208.05706  [pdf

    cs.RO

    A Cooperative Positioning Flamework for Robot and Smart Phone Based on Visible Light Communication

    Authors: Junye Chen, Fangdi Li, Futong An, Chen Yang, Hongzhan Song, Shangsheng Wen, Weipeng Guan

    Abstract: A cooperative positioning flamework of human and robots based on visible light communication (VLC) is proposed. Based on the experiment system, we demonstrated it is feasible and has high-accuracy and real-time performance.

    Submitted 20 October, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: high accuracy, cooperative positioning system

  43. arXiv:2206.14597  [pdf, other

    cs.LG cs.AI eess.SP

    Generative Anomaly Detection for Time Series Datasets

    Authors: Zhuangwei Kang, Ayan Mukhopadhyay, Aniruddha Gokhale, Shijie Wen, Abhishek Dubey

    Abstract: Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of mul… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: A shorter version of the paper was accepted at the ITSC 2022

  44. Rotated Object Detection via Scale-invariant Mahalanobis Distance in Aerial Images

    Authors: Siyang Wen, Wei Guo, Yi Liu, Ruijie Wu

    Abstract: Rotated object detection in aerial images is a meaningful yet challenging task as objects are densely arranged and have arbitrary orientations. The eight-parameter (coordinates of box vectors) methods in rotated object detection usually use ln-norm losses (L1 loss, L2 loss, and smooth L1 loss) as loss functions. As ln-norm losses are mainly based on non-scale-invariant Minkowski distance, using ln… ▽ More

    Submitted 7 April, 2022; v1 submitted 2 April, 2022; originally announced April 2022.

    Comments: 5 pages, 7 figures

  45. arXiv:2203.16000  [pdf, other

    cs.CV cs.CR

    StyleFool: Fooling Video Classification Systems via Style Transfer

    Authors: Yuxin Cao, Xi Xiao, Ruoxi Sun, Derui Wang, Minhui Xue, Sheng Wen

    Abstract: Video classification systems are vulnerable to adversarial attacks, which can create severe security problems in video verification. Current black-box attacks need a large number of queries to succeed, resulting in high computational overhead in the process of attack. On the other hand, attacks with restricted perturbations are ineffective against defenses such as denoising or adversarial training… ▽ More

    Submitted 1 April, 2024; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: 18 pages, 9 figures. Accepted to S&P 2023

  46. arXiv:2203.15380  [pdf, other

    cs.CV

    SepViT: Separable Vision Transformer

    Authors: Wei Li, Xing Wang, Xin Xia, Jie Wu, Jiashi Li, Xuefeng Xiao, Min Zheng, Shi** Wen

    Abstract: Vision Transformers have witnessed prevailing success in a series of vision tasks. However, these Transformers often rely on extensive computational costs to achieve high performance, which is burdensome to deploy on resource-constrained devices. To alleviate this issue, we draw lessons from depthwise separable convolution and imitate its ideology to design an efficient Transformer backbone, i.e.,… ▽ More

    Submitted 15 June, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

  47. arXiv:2203.14683  [pdf, other

    cs.IR cs.AI

    AMCAD: Adaptive Mixed-Curvature Representation based Advertisement Retrieval System

    Authors: Zhirong Xu, Shiyang Wen, Junshan Wang, Guojun Liu, Liang Wang, Zhi Yang, Lei Ding, Yan Zhang, Di Zhang, Jian Xu, Bo Zheng

    Abstract: Graph embedding based retrieval has become one of the most popular techniques in the information retrieval community and search engine industry. The classical paradigm mainly relies on the flat Euclidean geometry. In recent years, hyperbolic (negative curvature) and spherical (positive curvature) representation methods have shown their superiority to capture hierarchical and cyclic data structures… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: To appear in ICDE 2022

  48. arXiv:2203.10726  [pdf, other

    eess.IV cs.CV

    TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

    Authors: Di Liu, Yunhe Gao, Qilong Zhangli, Ligong Han, Xiaoxiao He, Zhaoyang Xia, Song Wen, Qi Chang, Zhennan Yan, Mu Zhou, Dimitris Metaxas

    Abstract: Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view im… ▽ More

    Submitted 5 September, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

  49. arXiv:2203.02846  [pdf, other

    cs.CV

    Region Proposal Rectification Towards Robust Instance Segmentation of Biological Images

    Authors: Qilong Zhangli, **gru Yi, Di Liu, Xiaoxiao He, Zhaoyang Xia, Qi Chang, Ligong Han, Yunhe Gao, Song Wen, Haiming Tang, He Wang, Mu Zhou, Dimitris Metaxas

    Abstract: Top-down instance segmentation framework has shown its superiority in object detection compared to the bottom-up framework. While it is efficient in addressing over-segmentation, top-down instance segmentation suffers from over-crop problem. However, a complete segmentation mask is crucial for biological image analysis as it delivers important morphological properties such as shapes and volumes. I… ▽ More

    Submitted 3 November, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

  50. arXiv:2202.04261  [pdf, other

    cs.SD cs.AI eess.AS

    The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

    Authors: Chen Shen, Yi Liu, Wenzhi Fan, Bin Wang, Shixue Wen, Yao Tian, Jun Zhang, **gsheng Yang, Zejun Ma

    Abstract: This paper describes our submission to ICASSP 2022 Multi-channel Multi-party Meeting Transcription (M2MeT) Challenge. For Track 1, we propose several approaches to empower the clustering-based speaker diarization system to handle overlapped speech. Front-end dereverberation and the direction-of-arrival (DOA) estimation are used to improve the accuracy of speaker diarization. Multi-channel combinat… ▽ More

    Submitted 9 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.