Skip to main content

Showing 1–50 of 110 results for author: Ye, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01191  [pdf, other

    cs.RO cs.AI cs.CV

    MARS: Multimodal Active Robotic Sensing for Articulated Characterization

    Authors: Hongliang Zeng, ** Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

    Abstract: Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characteri… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.17342  [pdf, other

    cs.CV cs.AI

    Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds

    Authors: Hongliang Zeng, ** Zhang, Fang Li, Jiahua Wang, Tingyu Ye, Pengteng Guo

    Abstract: In the field of 2D image generation modeling and representation learning, Masked Generative Encoder (MAGE) has demonstrated the synergistic potential between generative modeling and representation learning. Inspired by this, we propose Point-MAGE to extend this concept to point cloud data. Specifically, this framework first utilizes a Vector Quantized Variational Autoencoder (VQVAE) to reconstruct… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.11935  [pdf, other

    cs.PL cs.AI cs.SE

    Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization

    Authors: Tong Ye, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, Wenhai Wang

    Abstract: Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. However, LLMs have rarely been explored for code optimization. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time. The recently proposed first PIE dataset for performance optimization const… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.11247  [pdf, other

    cs.CV

    STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to begin our exploration within the Minecraft environment. Our STEVE Series agents can complete basic tasks in a virtual environment and more challengin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Embodied AI Workshop

  5. arXiv:2406.07801  [pdf, other

    cs.CL cs.SD eess.AS

    PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

    Authors: Runyan Yang, Huibao Yang, Xiqing Zhang, Tiantian Ye, Ying Liu, Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the performance of individual tasks. In this paper we present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures

  6. arXiv:2405.16133  [pdf, other

    cs.SE cs.AI

    Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

    Authors: Tong Ye, Yangkai Du, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, Wenhai Wang

    Abstract: Large Language Models (LLMs) have exhibited remarkable proficiency in generating code. However, the misuse of LLM-generated (Synthetic) code has prompted concerns within both educational and industrial domains, highlighting the imperative need for the development of synthetic code detectors. Existing methods for detecting LLM-generated content are primarily tailored for general text and often stru… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: Previously submitted to EMNLP2023

  7. arXiv:2405.05811  [pdf, other

    cs.CV

    Parallel Cross Strip Attention Network for Single Image Dehazing

    Authors: Lihan Tong, Yun Liu, Tian Ye, Weijia Li, Liyuan Chen, Erkang Chen

    Abstract: The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals. Traditional convolutional models struggle with long-range dependencies due to their limited receptive field size. While Transformers excel at capturing such dependencies, their quadratic computational complexity in relation to feature map resolution makes them less suitable for pixel-to-pixel… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 10 pages , 4 figures, CTISC'24

    Report number: C052

  8. arXiv:2404.17747  [pdf, other

    cs.CV

    MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion

    Authors: **gxue Huang, Xilai Li, Tianshu Tan, Xiaosong Li, Tao Ye

    Abstract: Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space, thereby producing an informative fused image. However, the existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality in certain regions of the fusion results. In this study, we analyzed the… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  9. arXiv:2404.17176  [pdf, other

    cs.CV

    MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

    Authors: Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks. Yet, existing methods either employ complex spatial-temporal modules or rely heavily on additional perception models to extract temporal features for video understanding, and they only perform well on short videos. For long… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  10. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi **, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, **g Lin, Alan Yuille, Ben Shao, ** Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  11. arXiv:2404.03225  [pdf, other

    cs.CV cs.LG

    FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification

    Authors: Xu Wang, Tian Ye, Rajgopal Kannan, Viktor Prasanna

    Abstract: Deep Learning (DL) Models for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR), while delivering improved performance, have been shown to be quite vulnerable to adversarial attacks. Existing works improve robustness by training models on adversarial samples. However, by focusing mostly on attacks that manipulate images randomly, they neglect the real-world feasibility of such atta… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 2024 IEEE Radar Conference

  12. arXiv:2403.18493  [pdf, other

    cs.CV

    VersaT2I: Improving Text-to-Image Models with Versatile Reward

    Authors: Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  13. arXiv:2403.18318  [pdf, other

    cs.CV

    Uncertainty-Aware SAR ATR: Defending Against Adversarial Attacks via Bayesian Neural Networks

    Authors: Tian Ye, Rajgopal Kannan, Viktor Prasanna, Carl Busart

    Abstract: Adversarial attacks have demonstrated the vulnerability of Machine Learning (ML) image classifiers in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) systems. An adversarial attack can deceive the classifier into making incorrect predictions by perturbing the input SAR images, for example, with a few scatterers attached to the on-ground objects. Therefore, it is critical to devel… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  14. arXiv:2403.12056  [pdf, other

    cs.CV physics.optics

    Enhancing Digital Hologram Reconstruction Using Reverse-Attention Loss for Untrained Physics-Driven Deep Learning Models with Uncertain Distance

    Authors: Xiwen Chen, Hao Wang, Zhao Zhang, Zhenmin Li, Huayu Li, Tong Ye, Abolfazl Razi

    Abstract: Untrained Physics-based Deep Learning (DL) methods for digital holography have gained significant attention due to their benefits, such as not requiring an annotated training dataset, and providing interpretability since utilizing the governing laws of hologram formation. However, they are sensitive to the hard-to-obtain precise object distance from the imaging plane, posing the… ▽ More

    Submitted 10 January, 2024; originally announced March 2024.

  15. arXiv:2403.10340  [pdf, other

    cs.CV cs.RO

    Thermal-NeRF: Neural Radiance Fields from an Infrared Camera

    Authors: Tianxiang Ye, Qi Wu, Junyuan Deng, Guoqing Liu, Liu Liu, Songpengcheng Xia, Liang Pang, Wenxian Yu, Ling Pei

    Abstract: In recent years, Neural Radiance Fields (NeRFs) have demonstrated significant potential in encoding highly-detailed 3D geometry and environmental appearance, positioning themselves as a promising alternative to traditional explicit representation for 3D scene reconstruction. However, the predominant reliance on RGB imaging presupposes ideal lighting conditions: a premise frequently unmet in roboti… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  16. arXiv:2403.08282  [pdf, other

    cs.CV

    Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

    Authors: Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang

    Abstract: Due to the dynamic and unpredictable open-world setting, navigating complex environments in Minecraft poses significant challenges for multi-agent systems. Agents must interact with the environment and coordinate their actions with other agents to achieve common objectives. However, traditional approaches often struggle to efficiently manage inter-agent communication and task distribution, crucial… ▽ More

    Submitted 18 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 Workshop on LLM Agents

  17. arXiv:2403.06144  [pdf, other

    cs.CY

    Simulating Family Conversations using LLMs: Demonstration of Parenting Styles

    Authors: Frank Tian-fang Ye, Xiaozi Gao

    Abstract: This study presents a framework for conducting psychological and linguistic research through simulated conversations using large language models (LLMs). The proposed methodology offers significant advantages, particularly for simulating human interactions involving potential unethical language or behaviors that would be impermissible in traditional experiments with human participants. As a demonst… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  18. arXiv:2402.16043  [pdf, other

    cs.CR cs.SE

    LuaTaint: A Static Taint Analysis System for Web Interface Framework Vulnerability of IoT Devices

    Authors: Jiahui Xiang, Wenhai Wang, Tong Ye, Peiyu Liu

    Abstract: IoT devices are currently facing continuous malicious attacks due to their widespread use. Among these IoT devices, web vulnerabilities are also widely exploited because of their inherent characteristics, such as improper permission controls and insecure interfaces. Recently, the embedded system web interface framework has become highly diverse, and specific vulnerabilities can arise if developers… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  19. arXiv:2401.13560  [pdf, other

    cs.CV

    SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

    Authors: Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, Lei Zhu

    Abstract: The Transformer architecture has shown a remarkable ability in modeling global relationships. However, it poses a significant computational challenge when processing high-dimensional medical images. This hinders its development and widespread adoption in this task. Mamba, as a State Space Model (SSM), recently emerged as a notable manner for long-range dependencies in sequential modeling, excellin… ▽ More

    Submitted 25 February, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Code has released

  20. arXiv:2401.03692  [pdf, other

    math.OC cs.LG

    Boosting Column Generation with Graph Neural Networks for Joint Rider Trip Planning and Crew Shift Scheduling

    Authors: Jiawei Lu, Tinghan Ye, Wenbo Chen, Pascal Van Hentenryck

    Abstract: Optimizing service schedules is pivotal to the reliable, efficient, and inclusive on-demand mobility. This pressing challenge is further exacerbated by the increasing needs of an aging population, the over-subscription of existing services, and the lack of effective solution methods. This study addresses the intricacies of service scheduling, by jointly optimizing rider trip planning and crew sche… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  21. arXiv:2312.13470  [pdf, ps, other

    cs.MM cs.NI

    Coffee: Cost-Effective Edge Caching for 360 Degree Live Video Streaming

    Authors: Chen Li, Tingwei Ye, Tongyu Zong, Liyang Sun, Houwei Cao, Yong Liu

    Abstract: While live 360 degree video streaming delivers immersive viewing experience, it poses significant bandwidth and latency challenges for content delivery networks. Edge servers are expected to play an important role in facilitating live streaming of 360 degree videos. In this paper, we propose a novel predictive edge caching algorithm (Coffee) for live 360 degree video that employ collaborative FoV… ▽ More

    Submitted 27 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  22. arXiv:2312.08874  [pdf, other

    cs.CV

    Agent Attention: On the Integration of Softmax and Linear Attention

    Authors: Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

    Abstract: The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention… ▽ More

    Submitted 22 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  23. arXiv:2312.08606  [pdf, other

    cs.CV

    VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook

    Authors: Wenbin Zou, Hongxia Gao, Tian Ye, Liang Chen, Weipeng Yang, Shasha Huang, Hongsheng Chen, Sixiang Chen

    Abstract: Night photography often struggles with challenges like low light and blurring, stemming from dark environments and prolonged exposures. Current methods either disregard priors and directly fitting end-to-end networks, leading to inconsistent illumination, or rely on unreliable handcrafted priors to constrain the network, thereby bringing the greater error to the final result. We believe in the str… ▽ More

    Submitted 16 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: This paper is accepted by AAAI2024

  24. Benchmarking Deep Learning Classifiers for SAR Automatic Target Recognition

    Authors: Jacob Fein-Ashley, Tian Ye, Rajgopal Kannan, Viktor Prasanna, Carl Busart

    Abstract: Synthetic Aperture Radar SAR Automatic Target Recognition ATR is a key technique of remote-sensing image recognition which can be supported by deep neural networks The existing works of SAR ATR mostly focus on improving the accuracy of the target recognition while ignoring the systems performance in terms of speed and storage which is critical to real-world applications of SAR ATR For decision-mak… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 6 Pages

  25. arXiv:2312.03775  [pdf, other

    cs.CV

    FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

    Authors: Linze Li, Sunqi Fan, Hengjun Pu, Zhaodong Bing, Yao Tang, Tianzhu Ye, Tong Yang, Liangyu Chen, Jiajun Liang

    Abstract: Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontrollable human poses. To address these challenges, we introduce a facial animation generation method that enhances both face identity fidelity and edit… ▽ More

    Submitted 20 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  26. arXiv:2312.02912  [pdf, other

    cs.CV

    Realistic Scatterer Based Adversarial Attacks on SAR Image Classifiers

    Authors: Tian Ye, Rajgopal Kannan, Viktor Prasanna, Carl Busart, Lance Kaplan

    Abstract: Adversarial attacks have highlighted the vulnerability of classifiers based on machine learning for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) tasks. An adversarial attack perturbs SAR images of on-ground targets such that the classifiers are misled into making incorrect predictions. However, many existing attacking techniques rely on arbitrary manipulation of SAR images whi… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  27. arXiv:2311.18173  [pdf

    eess.IV cs.CE cs.CV

    Quantification of cardiac capillarization in single-immunostained myocardial slices using weakly supervised instance segmentation

    Authors: Zhao Zhang, Xiwen Chen, William Richardson, Bruce Z. Gao, Abolfazl Razi, Tong Ye

    Abstract: Decreased myocardial capillary density has been reported as an important histopathological feature associated with various heart disorders. Quantitative assessment of cardiac capillarization typically involves double immunostaining of cardiomyocytes (CMs) and capillaries in myocardial slices. In contrast, single immunostaining of basement membrane components is a straightforward approach to simult… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  28. arXiv:2311.15209  [pdf, other

    cs.AI

    See and Think: Embodied Agent in Virtual Environment

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Large language models (LLMs) have achieved impressive progress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. In this paper, we propose STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE consists of three key components: vision perception, language instruction, and code action. Vision perception involves t… ▽ More

    Submitted 2 December, 2023; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: Preprint. First three authors contribute equally to this work. Project Website https://rese1f.github.io/STEVE/

  29. arXiv:2311.12358  [pdf, other

    cs.LG cs.DC

    Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence

    Authors: Shu Zheng, Tiandi Ye, Xiang Li, Ming Gao

    Abstract: Federated learning (FL) on heterogeneous data (non-IID data) has recently received great attention. Most existing methods focus on studying the convergence guarantees for the global objective. While these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address the problem,we propose FedCOME… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  30. arXiv:2311.11638  [pdf, other

    cs.CV

    Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model

    Authors: Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, Sina Farsiu

    Abstract: Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To ta… ▽ More

    Submitted 9 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 20 pages, 11 figures, 11 tables

  31. arXiv:2311.01886  [pdf, other

    cs.CV

    Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion

    Authors: Xilai Li, Xiaosong Li, Tao Ye, Xiaoqi Cheng, Wuyang Liu, Haishu Tan

    Abstract: Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one. However, the fusion of multiple visible images with different focal regions and infrared images is a unprecedented challenge in real MMIF applications. This is because of the limited depth of the focus of visible optical lenses, which impedes the simultaneous capture of the focal inform… ▽ More

    Submitted 31 January, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

  32. arXiv:2310.16853  [pdf, other

    cs.PL cs.AI

    CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

    Authors: Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

    Abstract: Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main Conference

  33. arXiv:2310.16102  [pdf, other

    eess.IV cs.CV physics.optics

    Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy

    Authors: Cassandra Tong Ye, Jiashu Han, Kunzan Liu, Anastasios Angelopoulos, Linda Griffith, Kristina Monakhova, Sixian You

    Abstract: Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  34. arXiv:2310.16002  [pdf, other

    cs.CV

    Integrating View Conditions for Image Synthesis

    Authors: **bin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou

    Abstract: In the field of image processing, applying intricate semantic modifications within existing images remains an enduring challenge. This paper introduces a pioneering framework that integrates viewpoint information to enhance the control of image editing tasks, especially for interior design scenes. By surveying existing object editing methodologies, we distill three essential criteria -- consistenc… ▽ More

    Submitted 8 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by IJCAI 2024

  35. arXiv:2310.15196  [pdf, other

    cs.LG cs.AI

    Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

    Authors: **biao Chen, Jiahai Wang, Zizhen Zhang, Zhiguang Cao, Te Ye, Siyuan Chen

    Abstract: Recently, neural heuristics based on deep reinforcement learning have exhibited promise in solving multi-objective combinatorial optimization problems (MOCOPs). However, they are still struggling to achieve high learning efficiency and solution quality. To tackle this issue, we propose an efficient meta neural heuristic (EMNH), in which a meta-model is first trained and then fine-tuned with a few… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023

  36. arXiv:2310.15195  [pdf, other

    cs.LG cs.AI cs.NE

    Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement

    Authors: **biao Chen, Zizhen Zhang, Zhiguang Cao, Yaoxin Wu, Yining Ma, Te Ye, Jiahai Wang

    Abstract: Most of existing neural methods for multi-objective combinatorial optimization (MOCO) problems solely rely on decomposition, which often leads to repetitive solutions for the respective subproblems, thus a limited Pareto set. Beyond decomposition, we propose a novel neural heuristic with diversity enhancement (NHDE) to produce more Pareto solutions from two perspectives. On the one hand, to hinder… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023

  37. arXiv:2308.14153  [pdf, other

    cs.CV

    Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks

    Authors: Sixiang Chen, Tian Ye, **bin Bai, Erkang Chen, Jun Shi, Lei Zhu

    Abstract: In the real world, image degradations caused by rain often exhibit a combination of rain streaks and raindrops, thereby increasing the challenges of recovering the underlying clean image. Note that the rain streaks and raindrops have diverse shapes, sizes, and locations in the captured image, and thus modeling the correlation relationship between irregular degradations caused by rain artifacts is… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV'23

  38. arXiv:2308.01006  [pdf, other

    cs.CV cs.AI cs.RO

    FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving

    Authors: Tengju Ye, Wei **g, Chunyong Hu, Shikun Huang, Ling** Gao, Fangzhen Li, **gke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo Chen, Kaicheng Yu

    Abstract: Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the in… ▽ More

    Submitted 14 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  39. arXiv:2307.16449  [pdf, other

    cs.CV

    MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

    Authors: Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks. Yet, existing systems can only handle videos with very few frames. For long videos, the computation complexity, memory cost, and long-term temporal connection impose additional challenges. Taking advantage of the Atkinson-S… ▽ More

    Submitted 9 March, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: CVPR 2024. First three authors contribute equally to this work. Project Website https://rese1f.github.io/MovieChat/

  40. arXiv:2307.15994  [pdf, other

    cs.LG cs.AI

    UPFL: Unsupervised Personalized Federated Learning towards New Clients

    Authors: Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao

    Abstract: Personalized federated learning has gained significant attention as a promising approach to address the challenge of data heterogeneity. In this paper, we address a relatively unexplored problem in federated learning. When a federated model has been trained and deployed, and an unlabeled new client joins, providing a personalized model for the new client becomes a highly challenging task. To addre… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

  41. arXiv:2307.15971  [pdf, other

    cs.CR cs.AI

    You Can Backdoor Personalized Federated Learning

    Authors: Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao

    Abstract: Existing research primarily focuses on backdoor attacks and defenses within the generic federated learning scenario, where all clients collaborate to train a single global model. A recent study conducted by Qin et al. (2023) marks the initial exploration of backdoor attacks within the personalized federated learning (pFL) scenario, where each client constructs a personalized model based on its loc… ▽ More

    Submitted 18 September, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: Submitted to TKDD

    Report number: 2024

    Journal ref: ACM Trans. Knowl. Discov. Data 2024

  42. arXiv:2306.17201  [pdf, other

    cs.CV

    MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling

    Authors: Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Estimating 3D human poses only from a 2D human pose sequence is thoroughly explored in recent years. Yet, prior to this, no such work has attempted to unify 2D and 3D pose representations in the shared feature space. In this paper, we propose MPM, a unified 2D-3D human pose representation framework via masked pose modeling. We treat 2D and 3D poses as two different modalities like vision and langu… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Codes and model checkpoints are available at https://github.com/vvirgooo2/MPM

  43. arXiv:2306.12276  [pdf, other

    cs.CV

    Wildfire Detection Via Transfer Learning: A Survey

    Authors: Ziliang Hong, Emadeldeen Hamdan, Yifei Zhao, Tianxiao Ye, Hongyi Pan, A. Enis Cetin

    Abstract: This paper surveys different publicly available neural network models used for detecting wildfires using regular visible-range cameras which are placed on hilltops or forest lookout towers. The neural network models are pre-trained on ImageNet-1K and fine-tuned on a custom wildfire dataset. The performance of these models is evaluated on a diverse set of wildfire images, and the survey provides us… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  44. arXiv:2305.11074  [pdf, other

    cs.AI

    Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

    Authors: Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

    Abstract: Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although neural language models achieve significant performance in this field, they are limited by their inability to access external knowledge. To address this limitation, an emerging trend is combining neural models with external knowledge through retrieval methods.… ▽ More

    Submitted 30 March, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: NAACL 2024 Findings

  45. arXiv:2305.09533  [pdf, other

    cs.CV

    NightHazeFormer: Single Nighttime Haze Removal Using Prior Query Transformer

    Authors: Yun Liu, Zhongsheng Yan, Sixiang Chen, Tian Ye, Wenqi Ren, Erkang Chen

    Abstract: Nighttime image dehazing is a challenging task due to the presence of multiple types of adverse degrading effects including glow, haze, blurry, noise, color distortion, and so on. However, most previous studies mainly focus on daytime image dehazing or partial degradations presented in nighttime hazy scenes, which may lead to unsatisfactory restoration results. In this paper, we propose an end-to-… ▽ More

    Submitted 13 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: 10 pages, 11 figures

  46. arXiv:2305.08824  [pdf, other

    cs.CV

    Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement

    Authors: **gxia Jiang, Tian Ye, **bin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, Erkang Chen

    Abstract: A lightweight underwater image enhancement network is of great significance for resource-constrained platforms, but balancing model size, computational efficiency, and enhancement performance has proven difficult for previous approaches. In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  47. arXiv:2304.10716  [pdf, other

    cs.CV

    Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

    Authors: Siyuan Wei, Tianzhu Ye, Shen Zhang, Yao Tang, Jiajun Liang

    Abstract: Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated a good trade-off between performance and computation costs. Nevertheless, errors caused by pruning strategies can lead to significant information loss. Our qua… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR2023

  48. arXiv:2304.04237  [pdf, other

    cs.CV

    Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention

    Authors: Xuran Pan, Tianzhu Ye, Zhuofan Xia, Shiji Song, Gao Huang

    Abstract: Self-attention mechanism has been a key factor in the recent progress of Vision Transformer (ViT), which enables adaptive feature extraction from global contexts. However, existing self-attention methods either adopt sparse global attention or window attention to reduce the computation complexity, which may compromise the local feature learning or subject to some handcrafted designs. In contrast,… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR2023

  49. arXiv:2304.03935  [pdf, other

    cs.LG

    Last-Layer Fairness Fine-tuning is Simple and Effective for Neural Networks

    Authors: Yuzhen Mao, Zhun Deng, Huaxiu Yao, Ting Ye, Kenji Kawaguchi, James Zou

    Abstract: As machine learning has been deployed ubiquitously across applications in modern data science, algorithmic fairness has become a great concern. Among them, imposing fairness constraints during learning, i.e. in-processing fair training, has been a popular type of training method because they don't require accessing sensitive attributes during test time in contrast to post-processing methods. While… ▽ More

    Submitted 14 July, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Published at the ICML 2023 Workshop on Spurious Correlations, Invariance, and Stability

  50. arXiv:2303.08606  [pdf, other

    cs.CL cs.AI

    On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models

    Authors: Tong Ye, Shi**g Si, Jianzong Wang, Ning Cheng, Zhitao Li, **g Xiao

    Abstract: Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging. Most dialog response retrieval models output a single score for a response on how relevant it is to a given question. However, the bad calibration of deep neural network results in various uncertainty for the single score such that the unreliable predictions alw… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI 2023