Skip to main content

Showing 1–50 of 214 results for author: Lu, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13444  [pdf, other

    cs.CL cs.CV

    VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

    Authors: Xueqing Wu, Zongyu Lin, Songyan Zhao, Te-Lin Wu, Pan Lu, Nanyun Peng, Kai-Wei Chang

    Abstract: Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, these programs are prone to logic errors, with our preliminary evaluation showing that 58% of the total errors are caused by program logic errors. Debug… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: update reference

  2. arXiv:2406.12799  [pdf, ps, other

    cs.DS

    Sample-Based Matroid Prophet Inequalities

    Authors: Hu Fu, Pinyan Lu, Zhihao Gavin Tang, Hongxun Wu, **zhao Wu, Qianfan Zhang

    Abstract: We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way)… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: To appear at EC'24

  3. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  4. arXiv:2406.08552  [pdf, other

    cs.CV

    DiTFastAttn: Attention Compression for Diffusion Transformer Models

    Authors: Zhihang Yuan, Pu Lu, Hanling Zhang, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to self-attention's quadratic complexity. We propose DiTFastAttn, a novel post-training compression method to alleviate DiT's computational bottleneck. We identify three key redundancies in the attention computation during DiT inference: 1. spatial redundancy, where many attention heads focus on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables

  6. arXiv:2405.13872  [pdf, other

    cs.AI cs.CL cs.CV

    Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

    Authors: Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT ha… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Correct the case title

  7. arXiv:2405.13089  [pdf, other

    cs.LG

    SEGAN: semi-supervised learning approach for missing data imputation

    Authors: Xiaohua Pan, Weifeng Wu, Peiran Liu, Zhen Li, Peng Lu, Peijian Cao, Jianfeng Zhang, Xianfei Qiu, YangYang Wu

    Abstract: In many practical real-world applications, data missing is a very common phenomenon, making the development of data-driven artificial intelligence theory and technology increasingly difficult. Data completion is an important method for missing data preprocessing. Most existing miss-ing data completion models directly use the known information in the missing data set but ignore the impact of the da… ▽ More

    Submitted 12 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  8. arXiv:2404.10696  [pdf, other

    cs.CL

    Integrating knowledge bases to improve coreference and bridging resolution for the chemical domain

    Authors: Pengcheng Lu, Massimo Poesio

    Abstract: Resolving coreference and bridging relations in chemical patents is important for better understanding the precise chemical process, where chemical domain knowledge is very critical. We proposed an approach incorporating external knowledge into a multi-task learning model for both coreference and bridging resolution in the chemical domain. The results show that integrating external knowledge can b… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: working in progress

  9. arXiv:2404.06252  [pdf, other

    cs.GT

    Design and Characterization of Strategy-Proof Mechanisms for Two-Facility Game on a Line

    Authors: Pinyan Lu, Zihan Luo, Jialin Zhang

    Abstract: We focus on the problem of placing two facilities along a linear space to serve a group of agents. Each agent is committed to minimizing the distance between her location and the closest facility. A mechanism is an algorithm that maps the reported agent locations to the facility locations. We are interested in mechanisms without money that are deterministic, strategy-proof, and provide a bounded a… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  10. arXiv:2403.19484  [pdf, other

    cs.NE

    Improved Genetic Algorithm Based on Greedy and Simulated Annealing Ideas for Vascular Robot Ordering Strategy

    Authors: Zixi Wang, Yubo Huang, Yukai Zhang, Yifei Sheng, Xin Lai, Peng Lu

    Abstract: This study presents a comprehensive approach for optimizing the acquisition, utilization, and maintenance of ABLVR vascular robots in healthcare settings. Medical robotics, particularly in vascular treatments, necessitates precise resource allocation and optimization due to the complex nature of robot and operator maintenance. Traditional heuristic methods, though intuitive, often fail to achieve… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 17 pages

  11. arXiv:2403.14624  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

    Authors: Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li

    Abstract: The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently evaluated and understood. We investigate current benchmarks to incorporate excessive visual content within textual questions, which potentially assist MLLMs in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 46 Pages, Work in Progress, Benchmark Project Page: https://mathverse-cuhk.github.io

  12. arXiv:2403.05225  [pdf, other

    cs.HC

    Trust Recognition in Human-Robot Cooperation Using EEG

    Authors: Caiyue Xu, Changming Zhang, Yanmin Zhou, Zhipeng Wang, ** Lu, Bin He

    Abstract: Collaboration between humans and robots is becoming increasingly crucial in our daily life. In order to accomplish efficient cooperation, trust recognition is vital, empowering robots to predict human behaviors and make trust-aware decisions. Consequently, there is an urgent need for a generalized approach to recognize human-robot trust. This study addresses this need by introducing an EEG-based m… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted at IEEE International Conference on Robotics and Automation (ICRA) 2024

  13. arXiv:2403.03740  [pdf, other

    cs.CV cs.MM

    Self-supervised Photographic Image Layout Representation Learning

    Authors: Zhaoran Zhao, Peng Lu, Xujun Peng, Wenhao Guo

    Abstract: In the domain of image layout representation learning, the critical process of translating image layouts into succinct vector forms is increasingly significant across diverse applications, such as image retrieval, manipulation, and generation. Most approaches in this area heavily rely on costly labeled datasets and notably lack in adapting their modeling and learning methods to the specific nuance… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  14. arXiv:2403.00071  [pdf, other

    cs.CL cs.AI

    Resonance RoPE: Improving Context Length Generalization of Large Language Models

    Authors: Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu

    Abstract: This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences. We introduce Resonance RoPE, a novel approach designed to narrow the generalization gap in TSTL scenarios by refi… ▽ More

    Submitted 10 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: 13 pages, 4 figures, accepted at ACL 2024 Findings

  15. arXiv:2402.17644  [pdf, other

    cs.CL cs.AI

    Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

    Authors: Xiao Liu, Zirui Wu, Xueqing Wu, Pan Lu, Kai-Wei Chang, Yansong Feng

    Abstract: Quantitative reasoning is a critical skill to analyze data, yet the assessment of such ability remains limited. To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data. The benchmark comprises a carefully constructed dataset of 411 questions accompanied b… ▽ More

    Submitted 9 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Findings of ACL 2024. Project website: https://xxxiaol.github.io/QRData/

  16. arXiv:2402.05935  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

    Authors: Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng **, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao, Peng Gao

    Abstract: We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we… ▽ More

    Submitted 26 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

  17. Local Feature Matching Using Deep Learning: A Survey

    Authors: Shibiao Xu, Shunpeng Chen, Rongtao Xu, Changwei Wang, Peng Lu, Li Guo

    Abstract: Local feature matching enjoys wide-ranging applications in the realm of computer vision, encompassing domains such as image retrieval, 3D reconstruction, and object recognition. However, challenges persist in improving the accuracy and robustness of matching due to factors like viewpoint and lighting variations. In recent years, the introduction of deep learning models has sparked widespread explo… ▽ More

    Submitted 10 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted by Information Fusion 2024. Project page: https://github.com/vignywang/Awesome-Local-Feature-Matching

  18. arXiv:2401.04700  [pdf, other

    cs.CL

    Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue

    Authors: Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng

    Abstract: Model editing is a technique that edits the large language models (LLMs) with updated knowledge to alleviate hallucinations without resource-intensive retraining. While current model editing methods can effectively modify a model's behavior within a specific area of interest, they often overlook the potential unintended side effects on the general abilities of LLMs such as reasoning, natural langu… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Propose a new regularization method

  19. arXiv:2312.11911  [pdf, other

    cs.CV cs.RO

    EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Map**

    Authors: Weipeng Guan, Peiyu Chen, Huibin Zhao, Yu Wang, Peng Lu

    Abstract: Event cameras are bio-inspired, motion-activated sensors that demonstrate substantial potential in handling challenging situations, such as motion blur and high-dynamic range. In this paper, we proposed EVI-SAM to tackle the problem of 6 DoF pose tracking and 3D reconstruction using monocular event camera. A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging t… ▽ More

    Submitted 23 May, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  20. arXiv:2312.08743  [pdf, other

    cs.RO eess.SY

    FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments

    Authors: Minghao Lu, Xiyu Fan, Han Chen, Peng Lu

    Abstract: Obstacle avoidance for Unmanned Aerial Vehicles (UAVs) in cluttered environments is significantly challenging. Existing obstacle avoidance for UAVs either focuses on fully static environments or static environments with only a few dynamic objects. In this paper, we take the initiative to consider the obstacle avoidance of UAVs in dynamic cluttered environments in which dynamic objects are the domi… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  21. arXiv:2312.07526  [pdf, other

    cs.CV

    RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

    Authors: Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang

    Abstract: Real-time multi-person pose estimation presents significant challenges in balancing speed and precision. While two-stage top-down methods slow down as the number of people in the image increases, existing one-stage methods often fail to simultaneously deliver high accuracy and real-time performance. This paper introduces RTMO, a one-stage pose estimation framework that seamlessly integrates coordi… ▽ More

    Submitted 8 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted at CVPR 2024. Project page: https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo

  22. arXiv:2312.00949  [pdf, other

    cs.CL math.OC

    Hyperparameter Optimization for Large Language Model Instruction-Tuning

    Authors: Christophe Tribes, Sacha Benarroch-Lelong, Peng Lu, Ivan Kobyzev

    Abstract: The fine-tuning of Large Language Models (LLMs) has enabled them to recently achieve milestones in natural language processing applications. The emergence of ever larger LLMs has paved the way for more efficient fine-tuning methods. Among these, the Low-Rank Adaptation (LoRA) method keeps most of the weights of the pre-trained LLM frozen while introducing a low-rank decomposition of the weight mat… ▽ More

    Submitted 30 January, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  23. arXiv:2312.00111  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Multimodal Learning for Materials

    Authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Samuel Kim, Peter Y. Lu, Thomas Christensen, Marin Soljačić

    Abstract: Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning effo… ▽ More

    Submitted 12 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  24. arXiv:2311.02327  [pdf, other

    cs.RO cs.DB

    ECMD: An Event-Centric Multisensory Driving Dataset for SLAM

    Authors: Peiyu Chen, Weipeng Guan, Feng Huang, Yihan Zhong, Weisong Wen, Li-Ta Hsu, Peng Lu

    Abstract: Leveraging multiple sensors enhances complex environmental perception and increases resilience to varying luminance conditions and high-speed motion patterns, achieving precise localization and map**. This paper proposes, ECMD, an event-centric multisensory dataset containing 81 sequences and covering over 200 km of various challenging driving scenarios including high-speed motion, repetitive sc… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  25. arXiv:2310.11954  [pdf, other

    cs.CL cs.MM eess.AS

    MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

    Authors: Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

    Abstract: AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data… ▽ More

    Submitted 25 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  26. arXiv:2310.10967  [pdf, other

    cs.CL cs.AI cs.HC

    EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset

    Authors: Hang Yin, Pinren Lu, Ziang Li, Bin Sun, Kan Li

    Abstract: The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling, and large pre-trained models. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements, and toxic dialogues. Automatic data generation through large models is a cost-… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  27. arXiv:2310.02995   

    cs.LG

    IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning

    Authors: Pengyuan Lu, Michele Caprio, Eric Eaton, Insup Lee

    Abstract: Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some previous tasks. This means that there exist multiple models that are Pareto-optimal at different times, each addressing a… ▽ More

    Submitted 9 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: This should be a replacement for arXiv:2305.14782. I falsely submitted a new paper

  28. arXiv:2310.02255  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

    Authors: Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao

    Abstract: Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. It consists of 6,141 examples, derived… ▽ More

    Submitted 20 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: 116 pages, 120 figures. Accepted to ICLR 2024

  29. arXiv:2309.15414  [pdf, other

    cs.GT

    Competitive Auctions with Imperfect Predictions

    Authors: Pinyan Lu, Zongqi Wan, Jialin Zhang

    Abstract: The competitive auction was first proposed by Goldberg, Hartline, and Wright. In their paper, they introduce the competitive analysis framework of online algorithm designing into the traditional revenue-maximizing auction design problem. While the competitive analysis framework only cares about the worst-case bound, a growing body of work in the online algorithm community studies the learning-augm… ▽ More

    Submitted 18 June, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by EC 2024. Improved the error-tolerant results

  30. arXiv:2309.04735  [pdf, other

    cs.CC

    Two-State Spin Systems with Negative Interactions

    Authors: Yumou Fei, Leslie Ann Goldberg, Pinyan Lu

    Abstract: We study the approximability of computing the partition functions of two-state spin systems. The problem is parameterized by a $2\times 2$ symmetric matrix. Previous results on this problem were restricted either to the case where the matrix has non-negative entries, or to the case where the diagonal entries are equal, i.e. Ising models. In this paper, we study the generalization to arbitrary… ▽ More

    Submitted 21 November, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

  31. arXiv:2309.00894  [pdf, other

    cs.LG cs.AI

    Regularly Truncated M-estimators for Learning with Noisy Labels

    Authors: Xiaobo Xia, Pengqian Lu, Chen Gong, Bo Han, Jun Yu, Jun Yu, Tongliang Liu

    Abstract: The sample selection approach is very popular in learning with noisy labels. As deep networks learn pattern first, prior methods built on sample selection share a similar training procedure: the small-loss examples can be regarded as clean examples and used for hel** generalization, while the large-loss examples are treated as mislabeled ones and excluded from network parameter updates. However,… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: 16 pages, 11 tables, 9 figures

  32. arXiv:2308.07611  [pdf

    eess.IV cs.CV q-bio.NC

    GAMER-MRIL identifies Disability-Related Brain Changes in Multiple Sclerosis

    Authors: Po-Jui Lu, Benjamin Odry, Muhamed Barakovic, Matthias Weigel, Robin Sandkühler, Reza Rahmanzadeh, Xinjie Chen, Mario Ocampo-Pineda, Jens Kuhle, Ludwig Kappos, Philippe Cattin, Cristina Granziera

    Abstract: Objective: Identifying disability-related brain changes is important for multiple sclerosis (MS) patients. Currently, there is no clear understanding about which pathological features drive disability in single MS patients. In this work, we propose a novel comprehensive approach, GAMER-MRIL, leveraging whole-brain quantitative MRI (qMRI), convolutional neural network (CNN), and an interpretability… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  33. arXiv:2308.04639  [pdf, ps, other

    cs.AI

    A Hierarchical Destroy and Repair Approach for Solving Very Large-Scale Travelling Salesman Problem

    Authors: Zhang-Hua Fu, Sipeng Sun, **tong Ren, Tianshu Yu, Haoyu Zhang, Yuanyuan Liu, Lingxiao Huang, Xiang Yan, Pinyan Lu

    Abstract: For prohibitively large-scale Travelling Salesman Problems (TSPs), existing algorithms face big challenges in terms of both computational efficiency and solution quality. To address this issue, we propose a hierarchical destroy-and-repair (HDR) approach, which attempts to improve an initial solution by applying a series of carefully designed destroy-and-repair operations. A key innovative concept… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  34. arXiv:2307.10635  [pdf, other

    cs.CL cs.AI cs.LG

    SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

    Authors: Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang

    Abstract: Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capabilities required for solving complex scientific problems, we introduce an expansive benchmark suite SciBench for LLMs. SciBench contains a carefully curated dat… ▽ More

    Submitted 28 June, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: To appear at ICML 2024

  35. arXiv:2307.08209  [pdf, other

    cs.CV

    Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

    Authors: Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redu… ▽ More

    Submitted 8 August, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV2023

  36. arXiv:2307.04302  [pdf, ps, other

    cs.GT

    Auction Design for Value Maximizers with Budget and Return-on-spend Constraints

    Authors: Pinyan Lu, Chenyang Xu, Ruilong Zhang

    Abstract: The paper designs revenue-maximizing auction mechanisms for agents who aim to maximize their total obtained values rather than the classical quasi-linear utilities. Several models have been proposed to capture the behaviors of such agents in the literature. In the paper, we consider the model where agents are subject to budget and return-on-spend constraints. The budget constraint of an agent limi… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 29 pages

  37. arXiv:2307.01229  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    EmoGen: Eliminating Subjective Bias in Emotional Music Generation

    Authors: Chenfei Kang, Peiling Lu, Botao Yu, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

    Abstract: Music is used to convey emotions, and thus generating emotional music is important in automatic music generation. Previous work on emotional music generation directly uses annotated emotion labels as control signals, which suffers from subjective bias: different people may annotate different emotions on the same music, and one person may feel different emotions under different situations. Therefor… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 12 pages, 7 pages

  38. arXiv:2306.08954  [pdf, other

    cs.LG

    Re-Benchmarking Pool-Based Active Learning for Binary Classification

    Authors: Po-Yi Lu, Chun-Liang Li, Hsuan-Tien Lin

    Abstract: Active learning is a paradigm that significantly enhances the performance of machine learning models when acquiring labeled data is expensive. While several benchmarks exist for evaluating active learning strategies, their findings exhibit some misalignment. This discrepancy motivates us to develop a transparent and reproducible benchmark for the community. Our efforts result in an open-sourced im… ▽ More

    Submitted 23 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

  39. arXiv:2306.07207  [pdf, other

    cs.CV cs.AI cs.CL

    Valley: Video Assistant with Large Language model Enhanced abilitY

    Authors: Ruipu Luo, Ziwang Zhao, Min Yang, Junwei Dong, Da Li, Pengcheng Lu, Tao Wang, Linmei Hu, Minghui Qiu, Zhongyu Wei

    Abstract: Large language models (LLMs), with their remarkable conversational capabilities, have demonstrated impressive performance across various applications and have emerged as formidable AI assistants. In view of this, it raises an intuitive question: Can we harness the power of LLMs to build multimodal AI assistants for visual applications? Recently, several multi-modal models have been developed for t… ▽ More

    Submitted 8 October, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

  40. arXiv:2306.06955  [pdf, other

    cs.LG

    A Brief Review of Hypernetworks in Deep Learning

    Authors: Vinod Kumar Chauhan, Jiandong Zhou, ** Lu, Soheila Molaei, David A. Clifton

    Abstract: Hypernetworks, or hypernets in short, are neural networks that generate weights for another neural network, known as the target network. They have emerged as a powerful deep learning technique that allows for greater flexibility, adaptability, dynamism, faster training, information sharing, and model compression etc. Hypernets have shown promising results in a variety of deep learning problems, in… ▽ More

    Submitted 10 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: revised categorisation, added new Section '5 When can we use Hypernets?', and other corrections(2 figures and 2 tables) (under review)

  41. arXiv:2306.01665  [pdf, other

    cs.SE cs.AI

    SourceP: Detecting Ponzi Schemes on Ethereum with Source Code

    Authors: Pengcheng Lu, Liang Cai, Keting Yin

    Abstract: As blockchain technology becomes more and more popular, a typical financial scam, the Ponzi scheme, has also emerged in the blockchain platform Ethereum. This Ponzi scheme deployed through smart contracts, also known as the smart Ponzi scheme, has caused a lot of economic losses and negative impacts. Existing methods for detecting smart Ponzi schemes on Ethereum mainly rely on bytecode features, o… ▽ More

    Submitted 29 February, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 12 pages, 5 figures, 4 tables

  42. arXiv:2306.01187  [pdf, other

    cs.LG math.DS

    Training neural operators to preserve invariant measures of chaotic attractors

    Authors: Ruoxi Jiang, Peter Y. Lu, Elena Orlova, Rebecca Willett

    Abstract: Chaotic systems make long-horizon forecasts difficult because small perturbations in initial conditions cause trajectories to diverge at an exponential rate. In this setting, neural operators trained to minimize squared error losses, while capable of accurate short-term forecasts, often fail to reproduce statistical or structural properties of the dynamics over longer time horizons and can yield d… ▽ More

    Submitted 16 April, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023

  43. arXiv:2306.00110  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    MuseCoco: Generating Symbolic Music from Text

    Authors: Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian

    Abstract: Generating music from text descriptions is a user-friendly mode since the text is a relatively easy interface for user engagement. While some approaches utilize texts to control music audio generation, editing musical elements in generated audio is challenging for users. In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  44. arXiv:2305.19685  [pdf, other

    cs.LG quant-ph stat.ML

    Deep Stochastic Mechanics

    Authors: Elena Orlova, Aleksei Ustimenko, Ruoxi Jiang, Peter Y. Lu, Rebecca Willett

    Abstract: This paper introduces a novel deep-learning-based approach for numerical simulation of a time-evolving Schrödinger equation inspired by stochastic mechanics and generative diffusion models. Unlike existing approaches, which exhibit computational complexity that scales exponentially in the problem dimension, our method allows us to adapt to the latent low-dimensional structure of the wave function… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

  45. arXiv:2305.14782  [pdf, other

    cs.LG

    IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning

    Authors: Pengyuan Lu, Michele Caprio, Eric Eaton, Insup Lee

    Abstract: Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some previous tasks. This means that there exist multiple models that are Pareto-optimal at different times, each addressing a… ▽ More

    Submitted 9 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: We falsely submitted this as a new article at 2310.02995. That article is withdrawn. This is the correct submission (to replace a previous version)

  46. arXiv:2305.12524  [pdf, other

    cs.CL cs.AI

    TheoremQA: A Theorem-driven Question Answering dataset

    Authors: Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia

    Abstract: The recent LLMs like GPT-4 and PaLM-2 have made tremendous progress in solving fundamental math problems like GSM8K by achieving over 90% accuracy. However, their capabilities to solve more challenging math problems which require domain-specific knowledge (i.e. theorem) have yet to be investigated. In this paper, we introduce TheoremQA, the first theorem-driven question-answering dataset designed… ▽ More

    Submitted 5 December, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted to Main Conference of EMNLP 2023

  47. arXiv:2305.10841  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

    Authors: Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan

    Abstract: Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there's a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However,… ▽ More

    Submitted 29 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 13 pages, 4 figures

  48. arXiv:2305.08178  [pdf, other

    cs.RO cs.AI

    Path Planning for Air-Ground Robot Considering Modal Switching Point Optimization

    Authors: Xiaoyu Wang, Kangyao Huang, Xinyu Zhang, Honglin Sun, Wenzhuo Liu, Hua** Liu, Jun Li, **** Lu

    Abstract: An innovative sort of mobility platform that can both drive and fly is the air-ground robot. The need for an agile flight cannot be satisfied by traditional path planning techniques for air-ground robots. Prior studies had mostly focused on improving the energy efficiency of paths, seldom taking the seeking speed and optimizing take-off and landing places into account. A robot for the field applic… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

  49. arXiv:2305.04971  [pdf, other

    cs.LG cs.CL

    LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

    Authors: Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

    Abstract: Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Con… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL2023 (Findings)

  50. arXiv:2305.01795  [pdf, other

    cs.CL

    Multimodal Procedural Planning via Dual Text-Image Prompting

    Authors: Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang

    Abstract: Embodied agents have achieved prominent performance in following human instructions to complete tasks. However, the potential of providing instructions informed by texts and images to assist humans in completing tasks remains underexplored. To uncover this capability, we present the multimodal procedural planning (MPP) task, in which models are given a high-level goal and generate plans of paired… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.