Skip to main content

Showing 1–50 of 312 results for author: Pan, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19976  [pdf, other

    cs.LG math.OC

    ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

    Authors: Rui Pan, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Xiaoyu Wang, Tong Zhang

    Abstract: Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particu… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  3. arXiv:2406.12442  [pdf, other

    cs.CL cs.AI

    Abstraction-of-Thought Makes Language Models Better Reasoners

    Authors: Ruixin Hong, Hongming Zhang, Xiaoman Pan, Dong Yu, Changshui Zhang

    Abstract: Abstract reasoning, the ability to reason from the abstract essence of a problem, serves as a key to generalization in human reasoning. However, eliciting language models to perform reasoning with abstraction remains unexplored. This paper seeks to bridge this gap by introducing a novel structured reasoning format called Abstraction-of-Thought (AoT). The uniqueness of AoT lies in its explicit requ… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in Process

  4. arXiv:2406.08924  [pdf, other

    cs.GR cs.CV cs.LG

    Learning Images Across Scales Using Adversarial Training

    Authors: Krzysztof Wolski, Adarsh Djeacoumar, Alireza Javanmardi, Hans-Peter Seidel, Christian Theobalt, Guillaume Cordonnier, Karol Myszkowski, George Drettakis, Xingang Pan, Thomas Leimkühler

    Abstract: The real world exhibits rich structure and detail across many scales of observation. It is difficult, however, to capture and represent a broad spectrum of scales using ordinary images. We devise a novel paradigm for learning a representation that captures an orders-of-magnitude variety of scales from an unstructured collection of ordinary images. We treat this collection as a distribution of scal… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH 2024; project page: https://scalespacegan.mpi-inf.mpg.de/

  5. arXiv:2406.02924  [pdf, other

    cs.LG cs.CL cs.NE

    Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

    Authors: Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

    Abstract: Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. How… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML2024, 29 pages, 4 figures

  6. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, **gyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  7. arXiv:2405.16537  [pdf, other

    cs.CV

    I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

    Authors: Wenqi Ouyang, Yi Dong, Lei Yang, Jianlou Si, Xingang Pan

    Abstract: The remarkable generative capabilities of diffusion models have motivated extensive research in both image and video editing. Compared to video editing which faces additional challenges in the time dimension, image editing has witnessed the development of more diverse, high-quality approaches and more capable software like Photoshop. In light of this gap, we introduce a novel and generic solution… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 19 pages

  8. arXiv:2405.14880  [pdf, other

    cs.CV cs.AI

    Dissecting Query-Key Interaction in Vision Transformers

    Authors: Xu Pan, Aaron Philip, Ziqian Xie, Odelia Schwartz

    Abstract: Self-attention in vision transformers is often thought to perform perceptual grou** where tokens attend to other tokens with similar embeddings, which could correspond to semantically similar features of an object. However, attending to dissimilar tokens can be beneficial by providing contextual information. We propose to use the Singular Value Decomposition to dissect the query-key interaction… ▽ More

    Submitted 26 May, 2024; v1 submitted 4 April, 2024; originally announced May 2024.

  9. arXiv:2405.14864  [pdf, other

    cs.CV

    Video Diffusion Models are Training-free Motion Interpreter and Controller

    Authors: Zeqi Xiao, Yifan Zhou, Shuai Yang, Xingang Pan

    Abstract: Video generation primarily aims to model authentic and customized motion across frames, making understanding and controlling the motion a crucial topic. Most diffusion-based studies on video motion focus on motion customization with training-based paradigms, which, however, demands substantial training resources and necessitates retraining for diverse models. Crucially, these approaches do not exp… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Project Page: https://xizaoqu.github.io/moft/

  10. arXiv:2405.13089  [pdf, other

    cs.LG

    SEGAN: semi-supervised learning approach for missing data imputation

    Authors: Xiaohua Pan, Weifeng Wu, Peiran Liu, Zhen Li, Peng Lu, Peijian Cao, Jianfeng Zhang, Xianfei Qiu, YangYang Wu

    Abstract: In many practical real-world applications, data missing is a very common phenomenon, making the development of data-driven artificial intelligence theory and technology increasingly difficult. Data completion is an important method for missing data preprocessing. Most existing miss-ing data completion models directly use the known information in the missing data set but ignore the impact of the da… ▽ More

    Submitted 12 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  11. arXiv:2405.12915  [pdf, other

    cs.CL

    G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation

    Authors: Xingyuan Pan, Luyang Huang, Liyan Kang, Zhicheng Liu, Yu Lu, Shanbo Cheng

    Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities in general scenarios. Instruction finetuning empowers them to align with humans in various tasks. Nevertheless, the Diversity and Quality of the instruction data remain two main challenges for instruction finetuning. With regard to this, in this paper, we propose a novel gradient-based method to automatically select high-quality a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 main conference

  12. arXiv:2405.05722  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph

    A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

    Authors: Shi Yin, Xinyang Pan, Fengyan Wang, Feng Wu, Lixin He

    Abstract: We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and… ▽ More

    Submitted 18 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  13. arXiv:2405.02859  [pdf, other

    cs.CV

    MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior

    Authors: Honghua Chen, Chen Change Loy, Xingang Pan

    Abstract: Despite the emergence of successful NeRF inpainting methods built upon explicit RGB and depth 2D inpainting supervisions, these methods are inherently constrained by the capabilities of their underlying 2D inpainters. This is due to two key reasons: (i) independently inpainting constituent images results in view-inconsistent imagery, and (ii) 2D inpainters struggle to ensure high-quality geometry… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 14 pages, 10 figures, conference

  14. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  15. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  16. arXiv:2404.10407  [pdf

    cs.CV

    Comprehensive Survey of Model Compression and Speed up for Vision Transformers

    Authors: Feiyang Chen, Ziqian Luo, Lisang Zhou, Xueting Pan, Ying Jiang

    Abstract: Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks. However, their practical deployment is hampered by high computational and memory demands. This study addresses the challenge by evaluating four primary model compression techniques: quantization, low-rank approximation, knowledge distillation, and pruning. We metho… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Journal ref: Journal of Information, Technology and Policy (2024): 1-12

  17. arXiv:2404.06769  [pdf

    cs.NE

    Solving the Food-Energy-Water Nexus Problem via Intelligent Optimization Algorithms

    Authors: Qi Deng, Zheng Fan, Zhi Li, Xinna Pan, Qi Kang, MengChu Zhou

    Abstract: The application of evolutionary algorithms (EAs) to multi-objective optimization problems has been widespread. However, the EA research community has not paid much attention to large-scale multi-objective optimization problems arising from real-world applications. Especially, Food-Energy-Water systems are intricately linked among food, energy and water that impact each other. They usually involve… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  18. arXiv:2404.03893  [pdf, other

    cs.AI

    KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion

    Authors: Tengfei Ma, Xiang song, Wen Tao, Mufei Li, Jiani Zhang, Xiaoqin Pan, Jianxin Lin, Bosheng Song, xiangxiang Zeng

    Abstract: Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountabili… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 13 pages, 7 figures, 11 tables. Under Review

  19. arXiv:2403.17357  [pdf, other

    cs.SE cs.AI

    MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation

    Authors: Xinglu Pan, Chenxiao Liu, Yanzhen Zou, Tao Xie, Bing Xie

    Abstract: Code comments are important for developers in program comprehension. In scenarios of comprehending and reusing a method, developers expect code comments to provide supplementary information beyond the method signature. However, the extent of such supplementary information varies a lot in different code comments. In this paper, we raise the awareness of the supplementary nature of method-level comm… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: In 32nd IEEE/ACM International Conference on Program Comprehension (ICPC'24)

  20. Navigating the Landscape of Distributed File Systems: Architectures, Implementations, and Considerations

    Authors: Xueting Pan, Ziqian Luo, Lisang Zhou

    Abstract: Distributed File Systems (DFS) have emerged as sophisticated solutions for efficient file storage and management across interconnected computer nodes. The main objective of DFS is to achieve flexible, scalable, and resilient file storage management by dispersing file data across multiple interconnected computer nodes, enabling users to seamlessly access and manipulate files distributed across dive… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  21. arXiv:2403.12409  [pdf, other

    cs.CV

    ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

    Authors: Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu

    Abstract: Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently c… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: https://cyw-3d.github.io/ComboVerse/

  22. arXiv:2403.12100  [pdf, other

    cs.IR cs.AI cs.LG

    Learning Time Slot Preferences via Mobility Tree for Next POI Recommendation

    Authors: Tianhao Huang, Xuan Pan, Xiangrui Cai, Ying Zhang, Xiaojie Yuan

    Abstract: Next Point-of-Interests (POIs) recommendation task aims to provide a dynamic ranking of POIs based on users' current check-in trajectories. The recommendation performance of this task is contingent upon a comprehensive understanding of users' personalized behavioral patterns through Location-based Social Networks (LBSNs) data. While prior studies have adeptly captured sequential patterns and trans… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  23. arXiv:2403.12019  [pdf, other

    cs.CV

    LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

    Authors: Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy

    Abstract: The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harn… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: project webpage: https://nirvanalan.github.io/projects/ln3diff/

  24. arXiv:2403.11125  [pdf

    stat.ML cs.LG math.PR

    Machine learning-based system reliability analysis with Gaussian Process Regression

    Authors: Lisang Zhou, Ziqian Luo, Xueting Pan

    Abstract: Machine learning-based reliability analysis methods have shown great advancements for their computational efficiency and accuracy. Recently, many efficient learning strategies have been proposed to enhance the computational performance. However, few of them explores the theoretical optimal learning strategy. In this article, we propose several theorems that facilitates such exploration. Specifical… ▽ More

    Submitted 20 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  25. arXiv:2403.06424  [pdf, other

    stat.ML cs.CV cs.LG

    Bridging Domains with Approximately Shared Features

    Authors: Ziliang Samuel Zhong, Xiang Pan, Qi Lei

    Abstract: Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains. A fundamental challenge is devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. To address the challenge, we propose a… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  26. arXiv:2403.05753  [pdf, other

    eess.IV cs.CV

    UDCR: Unsupervised Aortic DSA/CTA Rigid Registration Using Deep Reinforcement Learning and Overlap Degree Calculation

    Authors: Wentao Liu, Bowen Liang, Wei** Xu, Tong Tian, Qingsheng Lu, Xipeng Pan, Haoyuan Li, Siyu Tian, Huihua Yang, Ruisheng Su

    Abstract: The rigid registration of aortic Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) can provide 3D anatomical details of the vasculature for the interventional surgical treatment of conditions such as aortic dissection and aortic aneurysms, holding significant value for clinical research. However, the current methods for 2D/3D image registration are dependent on manual… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2403.05748  [pdf, other

    cs.RO

    Image-Guided Autonomous Guidewire Navigation in Robot-Assisted Endovascular Interventions using Reinforcement Learning

    Authors: Wentao Liu, Tong Tian, Wei** Xu, Bowen Liang, Qingsheng Lu, Xipeng Pan, Wenyi Zhao, Huihua Yang, Ruisheng Su

    Abstract: Autonomous robots in endovascular interventions possess the potential to navigate guidewires with safety and reliability, while reducing human error and shortening surgical time. However, current methods of guidewire navigation based on Reinforcement Learning (RL) depend on manual demonstration data or magnetic guidance. In this work, we propose an Image-guided Autonomous Guidewire Navigation (IAG… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  28. arXiv:2402.17124  [pdf, other

    cs.CL

    Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

    Authors: Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen

    Abstract: For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance. While it is now common sense that LLM performances are greatly impacted by prompts, the confidence calibration in prompting LLMs has yet to be thoroughly explored. In this paper, we explore how different prompting strategies influence LLM confidence calibration and how it could be improved. We… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 17 pages, 10 figures

  29. arXiv:2402.15328  [pdf, other

    cs.LG

    Towards Principled Task Grou** for Multi-Task Learning

    Authors: Chenguang Wang, Xuanhao Pan, Tianshu Yu

    Abstract: This paper presents a novel approach to task grou** in Multitask Learning (MTL), advancing beyond existing methods by addressing key theoretical and practical limitations. Unlike prior studies, our approach offers a more theoretically grounded method that does not rely on restrictive assumptions for constructing transfer gains. We also propose a flexible mathematical programming formulation whic… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  30. arXiv:2402.14034  [pdf, other

    cs.MA cs.AI

    AgentScope: A Flexible yet Robust Multi-Agent Platform

    Authors: Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, **gren Zhou

    Abstract: With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications. However, the complexities in coordinating agents' cooperation and LLMs' erratic performance pose notable challenges in develo** robust and efficient multi-agent applications. To tackle these challenges, we propose AgentScope, a developer-centric multi-agent platform with me… ▽ More

    Submitted 20 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: We have released code on https://github.com/modelscope/agentscope

  31. arXiv:2402.13349  [pdf, other

    cs.CV cs.AI cs.HC

    Aria Everyday Activities Dataset

    Authors: Zhaoyang Lv, Nicholas Charron, Pierre Moulon, Alexander Gamino, Cheng Peng, Chris Sweeney, Edward Miller, Huixuan Tang, Jeff Meissner, **g Dong, Kiran Somasundaram, Luis Pesqueira, Mark Schwesinger, Omkar Parkhi, Qiao Gu, Renzo De Nardi, Shangyi Cheng, Steve Saarinen, Vijay Baiyya, Yuyang Zou, Richard Newcombe, Jakob Julian Engel, Xiaqing Pan, Carl Ren

    Abstract: We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data includi… ▽ More

    Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Dataset website: https://www.projectaria.com/datasets/aea/

  32. arXiv:2402.10207  [pdf, other

    cs.LG cs.AI cs.CL

    Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

    Authors: Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen

    Abstract: We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. However, it is generally costly and unstable to fine-tune large foundation models using reinforcement learning (RL), and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignme… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  33. arXiv:2402.10184  [pdf, other

    cs.LG cs.AI cs.CL cs.DM

    Reward Generalization in RLHF: A Topological Perspective

    Authors: Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

    Abstract: Existing alignment methods share a common topology of information flow, where reward information is collected from humans, modeled with preference learning, and used to tune language models. However, this shared topology has not been systematically characterized, nor have its alternatives been thoroughly explored, leaving the problems of low data efficiency and unreliable generalization unaddresse… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  34. arXiv:2402.10010  [pdf, other

    physics.med-ph cs.CV eess.IV

    Enhancing signal detectability in learning-based CT reconstruction with a model observer inspired loss function

    Authors: Megan Lantz, Emil Y. Sidky, Ingrid S. Reiser, Xiaochuan Pan, Gregory Ongie

    Abstract: Deep neural networks used for reconstructing sparse-view CT data are typically trained by minimizing a pixel-wise mean-squared error or similar loss function over a set of training images. However, networks trained with such pixel-wise losses are prone to wipe out small, low-contrast features that are critical for screening and diagnosis. To remedy this issue, we introduce a novel training loss in… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  35. arXiv:2402.02416  [pdf, other

    cs.CL cs.AI cs.LG

    Aligner: Efficient Alignment by Learning to Correct

    Authors: Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang

    Abstract: With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective alignment method has never been more critical. However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate un… ▽ More

    Submitted 24 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  36. arXiv:2402.02105  [pdf, other

    cs.CV

    ParZC: Parametric Zero-Cost Proxies for Efficient NAS

    Authors: Peijie Dong, Lujun Li, Xinglin Pan, Zimian Wei, Xiang Liu, Qiang Wang, Xiaowen Chu

    Abstract: Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks. Several studies propose the automated design of zero-cost proxies to achieve SOTA performance but require tedious searching progress. Furthermore, we identify a critical issue with current zero-cost proxies: they aggregate node-wise zero-cost statistics without c… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  37. arXiv:2402.00518  [pdf, other

    cs.LG cs.AI cs.CL

    EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

    Authors: Xuchen Pan, Yanxi Chen, Yaliang Li, Bolin Ding, **gren Zhou

    Abstract: This work introduces EE-Tuning, a lightweight and economical solution to training/tuning early-exit large language models (LLMs). In contrast to the common approach of full-parameter pre-training, EE-Tuning augments any pre-trained (and possibly fine-tuned) standard LLM with additional early-exit layers that are tuned in a parameter-efficient manner, which requires significantly less computational… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  38. Gland Segmentation Via Dual Encoders and Boundary-Enhanced Attention

    Authors: Huadeng Wang, Jiejiang Yu, Bingbing Li, Xipeng Pan, Zhenbing Liu, Rushi Lan, Xiaonan Luo

    Abstract: Accurate and automated gland segmentation on pathological images can assist pathologists in diagnosing the malignancy of colorectal adenocarcinoma. However, due to various gland shapes, severe deformation of malignant glands, and overlap** adhesions between glands. Gland segmentation has always been very challenging. To address these problems, we propose a DEA model. This model consists of two b… ▽ More

    Submitted 9 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Published in: ICASSP 2024

    Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2345-2349,

  39. arXiv:2401.13959  [pdf, other

    eess.IV cs.CV

    Conditional Neural Video Coding with Spatial-Temporal Super-Resolution

    Authors: Henan Wang, Xiaohan Pan, Runsen Feng, Zongyu Guo, Zhibo Chen

    Abstract: This document is an expanded version of a one-page abstract originally presented at the 2024 Data Compression Conference. It describes our proposed method for the video track of the Challenge on Learned Image Compression (CLIC) 2024. Our scheme follows the typical hybrid coding framework with some novel techniques. Firstly, we adopt Spynet network to produce accurate motion vectors for motion esti… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted by the 2024 Data Compression Conference (DCC) for presentation as a poster

  40. arXiv:2401.11202  [pdf, other

    cs.LG cs.DC cs.PL

    PartIR: Composing SPMD Partitioning Strategies for Machine Learning

    Authors: Sami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan, Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Joel Wee

    Abstract: Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN par… ▽ More

    Submitted 3 March, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

  41. arXiv:2401.03115  [pdf, other

    cs.CV cs.MM eess.IV

    Transferable Learned Image Compression-Resistant Adversarial Perturbations

    Authors: Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Zhenzhong Chen

    Abstract: Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of D… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted as poster at Data Compression Conference 2024 (DCC 2024)

  42. arXiv:2401.01702  [pdf, other

    cs.GR cs.CV

    Image Sculpting: Precise Object Editing with 3D Geometry Control

    Authors: Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie

    Abstract: We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics. This approach differs markedly from existing methods, which are confined to 2D spaces and typically rely on textual instructions, leading to ambiguity and limited control. Image Sculpting converts 2D objects into 3D, enabling direct interaction with their 3D geometry. Post-editin… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Code and project page: https://image-sculpting.github.io

  43. arXiv:2401.00744  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci cs.LG

    Towards Harmonization of SO(3)-Equivariance and Expressiveness: a Hybrid Deep Learning Framework for Electronic-Structure Hamiltonian Prediction

    Authors: Shi Yin, Xinyang Pan, Xudong Zhu, Tianyu Gao, Haochong Zhang, Feng Wu, Lixin He

    Abstract: Deep learning for predicting the electronic-structure Hamiltonian of quantum systems necessitates satisfying the covariance laws, among which achieving SO(3)-equivariance without sacrificing the non-linear expressive capability of networks remains unsolved. To navigate the harmonization between equivariance and expressiveness, we propose a deep learning method synergizing two distinct categories o… ▽ More

    Submitted 21 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  44. arXiv:2401.00616  [pdf, other

    cs.CV

    GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

    Authors: Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang

    Abstract: In this paper, we focus on the One-shot Novel View Synthesis (O-NVS) task which targets synthesizing photo-realistic novel views given only one reference image per scene. Previous One-shot Generalizable Neural Radiance Fields (OG-NeRF) methods solve this task in an inference-time finetuning-free manner, yet suffer the blurry issue due to the encoder-only architecture that highly relies on the limi… ▽ More

    Submitted 29 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Submitted to Journal

  45. arXiv:2312.13449  [pdf, other

    cs.CV

    Building Lane-Level Maps from Aerial Images

    Authors: Jiawei Yao, Xiaochao Pan, Tong Wu, Xiaofeng Zhang

    Abstract: Detecting lane lines from sensors is becoming an increasingly significant part of autonomous driving systems. However, less development has been made on high-definition lane-level map** based on aerial images, which could automatically build and update offline maps for auto-driving systems. To this end, our work focuses on extracting fine-level detailed lane lines together with their topological… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted at ICASSP 2024. Project page: https://github.com/Jiawei-Yao0812/AerialLaneNet

  46. arXiv:2312.10103  [pdf, other

    cs.CV

    GSVA: Generalized Segmentation via Multimodal Large Language Models

    Authors: Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

    Abstract: Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. GRES poses challenges in modeling the complex spatial relationships of the instances in the image and identifying non-existing referents. Multimodal Large Language Models (MLLMs) have recently shown tremendous progre… ▽ More

    Submitted 21 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR2024 (19 pages, 9 figures, 11 tables)

  47. arXiv:2312.09494  [pdf, other

    cs.CR cs.CL

    No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models

    Authors: Shengyao Zhang, Mi Zhang, Xudong Pan, Min Yang

    Abstract: To reduce the computation cost and the energy consumption in large language models (LLM), skimming-based acceleration dynamically drops unimportant tokens of the input sequence progressively along layers of the LLM while preserving the tokens of semantic importance. However, our work for the first time reveals the acceleration may be vulnerable to Denial-of-Service (DoS) attacks. In this paper, we… ▽ More

    Submitted 17 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  48. arXiv:2312.08618  [pdf, other

    cs.CL

    Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

    Authors: Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu

    Abstract: This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information. Recognizing the inherent challenges in extending the context window for LLMs, primarily built on Transformer architecture, we propose… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  49. arXiv:2312.07409  [pdf, other

    cs.CV

    DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

    Authors: Kaiwen Zhang, Yifan Zhou, Xudong Xu, Xingang Pan, Bo Dai

    Abstract: Diffusion models have achieved remarkable image generation quality surpassing previous generative models. However, a notable limitation of diffusion models, in comparison to GANs, is their difficulty in smoothly interpolating between two image samples, due to their highly unstructured latent space. Such a smooth interpolation is intriguing as it naturally serves as a solution for the image morphin… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  50. arXiv:2312.04916  [pdf, other

    cs.LG cs.AI cs.DC

    EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

    Authors: Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, **gren Zhou

    Abstract: We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). While recent works have shown preliminary evidence for the efficacy of early exiting in accelerating LLM inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs by supporting their training and inference with massive 3D parallelism. Built upon Megatron-LM, EE-LLM… ▽ More

    Submitted 16 June, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: ICML 2024 camera-ready version