Skip to main content

Showing 1–50 of 500 results for author: He, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01067  [pdf, other

    cs.AI cs.CL cs.CV cs.HC cs.LG

    Human-like object concept representations emerge naturally in multimodal large language models

    Authors: Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, **peng Li, Shuang Qiu, Le Chang, Huiguang He

    Abstract: The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vas… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2406.14136  [pdf, other

    cs.RO

    One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging

    Authors: Linhan Yang, Lei Yang, Haoran Sun, Zeqing Zhang, Haibin He, Fang Wan, Chaoyang Song, Jia Pan

    Abstract: Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we pr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.14132  [pdf, other

    cs.AI

    Enhancing Monotonic Modeling with Spatio-Temporal Adaptive Awareness in Diverse Marketing

    Authors: Bin Li, Jiayan Pei, Feiyang Xiao, Yifan Zhao, Zhixing Zhang, Diwei Liu, HengXu He, Jia Jia

    Abstract: In the mobile internet era, the Online Food Ordering Service (OFOS) emerges as an integral component of inclusive finance owing to the convenience it brings to people. OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency. Despite significant progress, the marketing doma… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 pages

  5. arXiv:2406.13583  [pdf, other

    cs.CV

    Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

    Authors: Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu

    Abstract: The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.13358  [pdf, other

    cs.CV eess.IV

    Multi-scale Restoration of Missing Data in Optical Time-series Images with Masked Spatial-Temporal Attention Network

    Authors: Zaiyan Zhang, **ing Yan, Yuanqi Liang, Jiaxin Feng, Haixu He, Wei Han

    Abstract: Due to factors such as thick cloud cover and sensor limitations, remote sensing images often suffer from significant missing data, resulting in incomplete time-series information. Existing methods for imputing missing values in remote sensing images do not fully exploit spatio-temporal auxiliary information, leading to limited accuracy in restoration. Therefore, this paper proposes a novel deep le… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.12158  [pdf, other

    cs.CL cs.AI

    LLMs Are Prone to Fallacies in Causal Inference

    Authors: Nitish Joshi, Abulhair Saparov, Yixin Wang, He He

    Abstract: Recent work shows that causal facts can be effectively extracted from LLMs through prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is unclear if this success is limited to explicitly-mentioned causal facts in the pretraining data which the model can memorize. Thus, this work investigates: Can LLMs infer causal relations from other relational data in te… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.07529  [pdf, other

    cs.LG

    MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

    Authors: Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

    Abstract: Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the ob… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.05452  [pdf, other

    eess.SP cs.IT

    Near-Field Channel Estimation for Extremely Large-Scale Terahertz Communications

    Authors: Songjie Yang, Yizhou Peng, Wanting Lyu, Ya Li, Hongjun He, Zhongpei Zhang, Chau Yuen

    Abstract: Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  10. arXiv:2406.03262  [pdf, other

    cs.CV

    ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection

    Authors: Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong Liu

    Abstract: Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across differen… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  11. arXiv:2406.02578  [pdf, other

    cs.LG

    Pretrained Mobility Transformer: A Foundation Model for Human Mobility

    Authors: Xinhua Wu, Haoyu He, Yanchao Wang, Qi Wang

    Abstract: Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the \textbf{P}retrained \textbf{M}obility \textbf{T}ransformer (P… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

  12. arXiv:2406.02213  [pdf, other

    cs.LG

    Rectifying Reinforcement Learning for Reward Matching

    Authors: Haoran He, Emmanuel Bengio, Qingpeng Cai, Ling Pan

    Abstract: The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sample objects with probability proportional to an unnormalized reward function. GFlowNets share a strong resemblance to reinforcement learning (RL), that typically aims to maximize reward, due to their sequential decision-making processes. Recent works have studie… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  13. arXiv:2406.01150  [pdf, other

    cs.LG

    Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

    Authors: Haoran He, Can Chang, Huazhe Xu, Ling Pan

    Abstract: Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a stochastic policy to sequentially generate compositional objects with probabilities proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward objects, in contrast to standard return maximization reinforcement learning approaches, which often converge to a single op… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  14. arXiv:2406.00050  [pdf, other

    cs.CL cs.AI

    An Empirical Analysis on Large Language Models in Debate Evaluation

    Authors: Xinyi Liu, Pinxin Liu, Hangfeng He

    Abstract: In this study, we investigate the capabilities and inherent biases of advanced large language models (LLMs) such as GPT-3.5 and GPT-4 in the context of debate evaluation. We discover that LLM's performance exceeds humans and surpasses the performance of state-of-the-art methods fine-tuned on extensive datasets in debate evaluation. We additionally explore and analyze biases present in LLMs, includ… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main

  15. arXiv:2405.20763  [pdf, other

    cs.LG math.OC stat.ML

    Improving Generalization and Convergence by Enhancing Implicit Regularization

    Authors: Mingze Wang, Haotian He, **bo Wang, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

    Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 35 pages

  16. arXiv:2405.20600  [pdf, other

    cs.AI

    Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning

    Authors: Kaicheng Fu, Changde Du, Xiaoyu Chen, Jie Peng, Huiguang He

    Abstract: Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious cata… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  17. arXiv:2405.19678  [pdf, other

    cs.CV cs.AI

    View-Consistent Hierarchical 3D SegmentationUsing Ultrametric Feature Fields

    Authors: Haodi He, Colton Stearns, Adam W. Harley, Leonidas J. Guibas

    Abstract: Large-scale vision foundation models such as Segment Anything (SAM) demonstrate impressive performance in zero-shot image segmentation at multiple levels of granularity. However, these zero-shot predictions are rarely 3D-consistent. As the camera viewpoint changes in a scene, so do the segmentation predictions, as well as the characterizations of ``coarse" or ``fine" granularity. In this work, we… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  18. arXiv:2405.19586  [pdf, other

    cs.CV cs.LG cs.RO

    SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

    Authors: Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li

    Abstract: Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of scene understanding and action prediction. Current methods employ both 3D representation and multi-view 2D representation to predict the poses of the robot's end-effector. However, they still require a considerable amount of high-quality robot trajectories, and suffer from limited generalization in unseen tasks… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024. Project page: https://sam-embodied.github.io

  19. arXiv:2405.18726  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

    Authors: Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

    Abstract: Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction method. Leveraging non-invasive functional Magnetic Resonance Imaging (fMRI) data, our approach mimics the inverse pathway of auditory processing. Initially, we utili… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  20. arXiv:2405.17976  [pdf

    cs.AI cs.CL

    Yuan 2.0-M32: Mixture of Experts with Attention Router

    Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

    Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages,3 figures, 7 tables

  21. arXiv:2405.17414  [pdf, other

    cs.CV cs.GR

    Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

    Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein

    Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  22. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  23. arXiv:2405.15525  [pdf, other

    cs.CL

    Sparse Matrix in Large Language Model Fine-tuning

    Authors: Haoze He, Juncheng Billy Li, Xuan Jiang, Heather Miller

    Abstract: LoRA and its variants have become popular parameter-efficient fine-tuning (PEFT) methods due to their ability to avoid excessive computational costs. However, an accuracy gap often exists between PEFT methods and full fine-tuning (FT), and this gap has yet to be systematically studied. In this work, we introduce a method for selecting sparse sub-matrices that aim to minimize the performance gap be… ▽ More

    Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 14 pages

  24. arXiv:2405.15214  [pdf, other

    cs.CV

    PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

    Authors: Qingdong He, Jiangning Zhang, **long Peng, Haoyang He, Yabiao Wang, Chengjie Wang

    Abstract: Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  25. arXiv:2405.14018  [pdf, other

    cs.CR cs.LG stat.AP

    Watermarking Generative Tabular Data

    Authors: Hengzhi He, Peiyu Yu, Junpeng Ren, Ying Nian Wu, Guang Cheng

    Abstract: In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  26. arXiv:2405.11739  [pdf

    cs.LG cs.AI cs.CY

    Contactless Polysomnography: What Radio Waves Tell Us about Sleep

    Authors: Hao He, Chao Li, Wolfgang Ganglberger, Kaileigh Gallagher, Rumen Hristov, Michail Ouroutzoglou, Haoqi Sun, Jimeng Sun, Brandon Westover, Dina Katabi

    Abstract: The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therape… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally to this work

  27. arXiv:2405.11389  [pdf, other

    cs.LG

    Adjacent Leader Decentralized Stochastic Gradient Descent

    Authors: Haoze He, **g Wang, Anna Choromanska

    Abstract: This work focuses on the decentralized deep learning optimization framework. We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD), for improving final model performance, accelerating convergence, and reducing the communication overhead of decentralized deep learning optimizers. AL-DSGD relies on two main ideas. Firstly, to increase the influence of the strongest learners on the lear… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 9 pages of main paper, and 12 pages of appendix

  28. arXiv:2405.11021  [pdf, other

    cs.CV

    Enhanced 3D Urban Scene Reconstruction and Point Cloud Densification using Gaussian Splatting and Google Earth Imagery

    Authors: Kyle Gao, Dening Lu, Hongjie He, Linlin Xu, Jonathan Li

    Abstract: 3D urban scene reconstruction and modelling is a crucial research area in remote sensing with numerous applications in academia, commerce, industry, and administration. Recent advancements in view synthesis models have facilitated photorealistic 3D reconstruction solely from 2D images. Leveraging Google Earth imagery, we construct a 3D Gaussian Splatting model of the Waterloo region centered on th… ▽ More

    Submitted 1 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    ACM Class: I.4; I.3

  29. arXiv:2405.10096  [pdf, other

    cs.LG cs.CR cs.DC

    The Effect of Quantization in Federated Learning: A Rényi Differential Privacy Perspective

    Authors: Tianqu Kang, Lumin Liu, Hengtao He, Jun Zhang, S. H. Song, Khaled B. Letaief

    Abstract: Federated Learning (FL) is an emerging paradigm that holds great promise for privacy-preserving machine learning using distributed data. To enhance privacy, FL can be combined with Differential Privacy (DP), which involves adding Gaussian noise to the model weights. However, FL faces a significant challenge in terms of large communication overhead when transmitting these model weights. To address… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 6 pages, 5 figures, submitted to 2024 IEEE MeditCom

  30. arXiv:2405.09514  [pdf, other

    eess.SP cs.IT cs.LG

    Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck

    Authors: Hongru Li, Jiawei Shao, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that t… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages, 8 figures, submitted to IEEE for potential publication

  31. arXiv:2405.07840  [pdf, other

    cs.HC cs.CL

    Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

    Authors: Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He

    Abstract: Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel m… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  32. arXiv:2405.06707  [pdf, other

    cs.CL cs.AI

    Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models

    Authors: Yitian Li, Jidong Tian, Hao He, Yaohui **

    Abstract: Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assum… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  33. arXiv:2405.04872  [pdf, other

    cs.CL cs.AI cs.LO

    Logical Negation Augmenting and Debiasing for Prompt-based Methods

    Authors: Yitian Li, Jidong Tian, Hao He, Yaohui **

    Abstract: Prompt-based methods have gained increasing attention on NLP and shown validity on many downstream tasks. Many works have focused on mining these methods' potential for knowledge extraction, but few explore their ability to make logical reasoning. In this work, we focus on the effectiveness of the prompt-based methods on first-order logical reasoning and find that the bottleneck lies in logical ne… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  34. arXiv:2405.03280  [pdf, other

    cs.CV cs.AI

    Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

    Authors: Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He

    Abstract: Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a map** between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of nat… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  35. arXiv:2404.19733  [pdf, other

    cs.CL cs.AI

    Iterative Reasoning Preference Optimization

    Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  36. arXiv:2404.16019  [pdf, other

    cs.CL

    The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

    Authors: Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

    Abstract: Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, t… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  37. arXiv:2404.11209  [pdf, ps, other

    cs.AI cs.CV cs.MM

    Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM

    Authors: Hongzhao Li, Hongyu Wang, Xia Sun, Hua He, Jun Feng

    Abstract: Medical report generation automates radiology descriptions from images, easing the burden on physicians and minimizing errors. However, current methods lack structured outputs and physician interactivity for clear, clinically relevant reports. Our method introduces a prompt-guided approach to generate structured chest X-ray reports using a pre-trained large language model (LLM). First, we identify… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Conference on Multimedia Expo 2024

  38. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 15 April, 2024; originally announced April 2024.

  39. arXiv:2404.07443  [pdf

    physics.optics cs.ET cs.LG

    1-bit Quantized On-chip Hybrid Diffraction Neural Network Enabled by Authentic All-optical Fully-connected Architecture

    Authors: Yu Shao, Haiqi Gao, Yipeng Chen, Yujie liu, Junren Wen, Haidong He, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

    Abstract: Optical Diffraction Neural Networks (DNNs), a subset of Optical Neural Networks (ONNs), show promise in mirroring the prowess of electronic networks. This study introduces the Hybrid Diffraction Neural Network (HDNN), a novel architecture that incorporates matrix multiplication into DNNs, synergizing the benefits of conventional ONNs with those of DNNs to surmount the modulation limitations inhere… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  40. arXiv:2404.06687  [pdf, other

    cs.RO eess.SY

    Fast and Accurate Relative Motion Tracking for Two Industrial Robots

    Authors: Honglu He, Chen-lung Lu, Glenn Saunders, **hai Yang, Jeffrey Schoonover, John Wason, Santiago Paternain, Agung Julius, John T. Wen

    Abstract: Industrial robotic applications such as spraying, welding, and additive manufacturing frequently require fast, accurate, and uniform motion along a 3D spatial curve. To increase process throughput, some manufacturers propose a dual-robot setup to overcome the speed limitation of a single robot. Industrial robot motion is programmed through waypoints connected by motion primitives (Cartesian linear… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  41. arXiv:2404.06564  [pdf, other

    cs.CV

    MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

    Authors: Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie

    Abstract: Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches. However, CNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Mamba-based models, with their superior long-range modeling and linear efficiency, have garnered substantial attention. This study pioneers the application of Mamba to mu… ▽ More

    Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  42. arXiv:2404.04920  [pdf, other

    cs.LG

    Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

    Authors: Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li

    Abstract: Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks. Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions. Nevertheless, the return-conditioned paradigm relies on pre-defined reward functions, facing challenges when applied in multi-task… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  43. arXiv:2404.03384  [pdf, other

    cs.CV

    LongVLM: Efficient Long Video Understanding via Large Language Models

    Authors: Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

    Abstract: Empowered by Large Language Models (LLMs), recent advancements in VideoLLMs have driven progress in various video understanding tasks. These models encode video representations through pooling or query aggregation over a vast number of visual tokens, making computational and memory costs affordable. Despite successfully providing an overall comprehension of video content, existing VideoLLMs still… ▽ More

    Submitted 10 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  44. arXiv:2404.03176  [pdf, other

    cs.LG cs.IT

    Information-Theoretic Generalization Bounds for Deep Neural Networks

    Authors: Haiyun He, Christina Lee Yu, Ziv Goldfeld

    Abstract: Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds. We first derive two hierarchical bounds on the generalization error in terms of the Kullback-Leibler (KL) divergence or the 1-Wasserstein distance between the tra… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 25 pages, 5 figures

  45. arXiv:2404.02101  [pdf, other

    cs.CV

    CameraCtrl: Enabling Camera Control for Text-to-Video Generation

    Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang

    Abstract: Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely paramet… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Project page: https://hehao13.github.io/projects-CameraCtrl/ Code: https://github.com/hehao13/CameraCtrl

  46. arXiv:2404.01453  [pdf, other

    cs.CL cs.AI

    Unveiling Divergent Inductive Biases of LLMs on Temporal Data

    Authors: Sindhu Kishore, Hangfeng He

    Abstract: Unraveling the intricate details of events in natural language necessitates a subtle understanding of temporal dynamics. Despite the adeptness of Large Language Models (LLMs) in discerning patterns and relationships from data, their inherent comprehension of temporal dynamics remains a formidable challenge. This research meticulously explores these intrinsic challenges within LLMs, with a specific… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  47. arXiv:2404.00309  [pdf, other

    cs.IT eess.SP

    Model-Driven Deep Learning for Distributed Detection with Binary Quantization

    Authors: Wei Guo, Meng He, Chuan Huang, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Within the realm of rapidly advancing wireless sensor networks (WSNs), distributed detection assumes a significant role in various practical applications. However, critical challenge lies in maintaining robust detection performance while operating within the constraints of limited bandwidth and energy resources. This paper introduces a novel approach that combines model-driven deep learning (DL) w… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  48. arXiv:2404.00246  [pdf, other

    cs.CL cs.AI cs.HC

    Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

    Authors: Guande Wu, Chen Zhao, Claudio Silva, He He

    Abstract: Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and c… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  49. arXiv:2404.00057  [pdf, other

    cs.HC cs.AI cs.CR cs.OS

    PerOS: Personalized Self-Adapting Operating Systems in the Cloud

    Authors: Hongyu Hè

    Abstract: Operating systems (OSes) are foundational to computer systems, managing hardware resources and ensuring secure environments for diverse applications. However, despite their enduring importance, the fundamental design objectives of OSes have seen minimal evolution over decades. Traditionally prioritizing aspects like speed, memory efficiency, security, and scalability, these objectives often overlo… ▽ More

    Submitted 26 March, 2024; originally announced April 2024.

    Comments: 29 pages, 3 figures

  50. arXiv:2403.13250  [pdf, other

    cs.CL

    Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

    Authors: Huachuan Qiu, Shuai Zhang, Hongliang He, Anqi Li, Zhenzhong Lan

    Abstract: Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether t… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted to CSCWD 2024 (27th International Conference on Computer Supported Cooperative Work in Design). arXiv admin note: text overlap with arXiv:2309.09749