Skip to main content

Showing 1–50 of 349 results for author: Xiao, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

    Authors: Yuwei Zhang, Tong Xia, **g Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  3. arXiv:2406.15178  [pdf, other

    cs.CL

    Hybrid Alignment Training for Large Language Models

    Authors: Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, **gbo Zhu

    Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guara… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by ACL (Findings) 2024

  4. arXiv:2406.14250  [pdf, other

    cs.CV cs.HC

    E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

    Authors: Ke Wang, Tianyu Xia, Zhangxuan Gu, Yi Zhao, Shuheng Shen, Changhua Meng, Weiqiang Wang, Ke Xu

    Abstract: Online GUI navigation on mobile devices has driven a lot of attention recent years since it contributes to many real-world applications. With the rapid development of large language models (LLM), multimodal large language models (MLLM) have tremendous potential on this task. However, existing MLLMs need high quality data to improve its abilities of making the correct navigation decisions according… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures, Under review

  5. arXiv:2406.13542  [pdf, other

    cs.CL cs.AI cs.LG

    Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

    Authors: Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, **gren Zhou

    Abstract: One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-fol… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.12297  [pdf, other

    cs.LG cs.AI

    Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

    Authors: Ji Xu, Tianlong Xiao, **ye Yang, Panpan Zhu

    Abstract: Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper presents a novel approach FaithPDP that takes advantages of both hardware (multi-core architecture of CPU) and modern programming language (Python or Matlab for efficient vector and matrix computation) to achieve clustering result identical to vanilla DP algorithm, while the computing complexity is reduced to pseudo-linear

  7. arXiv:2406.10808  [pdf, other

    cs.LG

    Diffusion Model With Optimal Covariance Matching

    Authors: Zi**g Ou, Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yingzhen Li, David Barber

    Abstract: The probabilistic diffusion model has become highly effective across various domains. Typically, sampling from a diffusion model involves using a denoising distribution characterized by a Gaussian with a learned mean and either fixed or learned covariances. In this paper, we leverage the recently proposed full covariance moment matching technique and introduce a novel method for learning covarianc… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  8. FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

    Authors: Tong Xia, Abhirup Ghosh, Xinchi Qiu, Cecilia Mascolo

    Abstract: Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This work was intended as a replacement of arXiv:2312.02327 and any subsequent updates will appear there

  9. arXiv:2406.09246  [pdf, other

    cs.RO cs.LG

    OpenVLA: An Open-Source Vision-Language-Action Model

    Authors: Moo ** Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

    Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Website: https://openvla.github.io/

  10. arXiv:2406.09196  [pdf, other

    cs.CV cs.LG

    Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

    Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang

    Abstract: Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  11. arXiv:2406.07168  [pdf, other

    cs.CL

    Teaching Language Models to Self-Improve by Learning from Language Feedback

    Authors: Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, **gbo Zhu

    Abstract: Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotati… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  12. arXiv:2406.04344  [pdf, other

    cs.LG cs.CL cs.CV

    Verbalized Machine Learning: Revisiting Machine Learning with Language Models

    Authors: Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu

    Abstract: Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Technical Report v1 (92 pages, 15 figures)

  13. arXiv:2406.01276  [pdf, other

    cs.CL

    EduNLP: Towards a Unified and Modularized Library for Educational Resources

    Authors: Zhenya Huang, Yuting Ning, Longhu Qin, Shiwei Tong, Shangzi Xue, Tong Xiao, Xin Lin, Jiayu Liu, Qi Liu, Enhong Chen, Shi**g Wang

    Abstract: Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridg… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  14. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, **gbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  15. arXiv:2405.16030  [pdf, other

    cs.LG

    Constrained Ensemble Exploration for Unsupervised Skill Discovery

    Authors: Chenjia Bai, Rushuai Yang, Qiaosheng Zhang, Kang Xu, Yi Chen, Ting Xiao, Xuelong Li

    Abstract: Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training. Existing methods for unsupervised RL mainly conduct empowerment-driven skill discovery or entropy-based exploration. However, empowerment often leads to static skills, and pure exploration only maximizes the state coverage rather than learning useful behaviors. In this… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  16. arXiv:2405.14804  [pdf, other

    cs.CL

    Can LLMs Solve longer Math Word Problems Better?

    Authors: Xin Xu, Tong Xiao, Zitong Chao, Zhenya Huang, Can Yang, Yang Wang

    Abstract: Math Word Problems (MWPs) are crucial for evaluating the capability of Large Language Models (LLMs), with current research primarily focusing on questions with concise contexts. However, as real-world math problems often involve complex circumstances, LLMs' ability to solve long MWPs is vital for their applications in these scenarios, yet remains under-explored. This study pioneers the exploration… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  17. arXiv:2405.13409  [pdf, other

    cs.GR

    Specular Polynomials

    Authors: Zhimin Fan, Jie Guo, Yiming Wang, Tianyu Xiao, Hao Zhang, Chenxi Zhou, Zhenyu Chen, Pengpei Hong, Yanwen Guo, Ling-Qi Yan

    Abstract: Finding valid light paths that involve specular vertices in Monte Carlo rendering requires solving many non-linear, transcendental equations in high-dimensional space. Existing approaches heavily rely on Newton iterations in path space, which are limited to obtaining at most a single solution each time and easily diverge when initialized with improper seeds. We propose specular polynomials, a Ne… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, accepted by SIGGRAPH 2024

    ACM Class: I.3.3

  18. arXiv:2405.12609  [pdf, other

    eess.AS cs.SD

    Mamba in Speech: Towards an Alternative to Self-Attention

    Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

    Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  19. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  20. arXiv:2405.10516  [pdf, other

    cs.CL cs.AI

    Language Models can Evaluate Themselves via Probability Discrepancy

    Authors: Tingyu Xia, Bowen Yu, Yuan Wu, Yi Chang, Chang Zhou

    Abstract: In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. T… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings

  21. arXiv:2405.06232  [pdf, other

    cs.AI

    Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process

    Authors: Tong Xiao, Jiayu Liu, Zhenya Huang, **ze Wu, **g Sha, Shi** Wang, Enhong Chen

    Abstract: Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand both text and diagram, master essential geometry knowledge, and appropriately apply it in reasoning. However, existing works follow a paradigm of neural machine translation and only focus on enhancing the capability of enc… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024 Accepted

  22. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  23. arXiv:2405.01649  [pdf, other

    cs.CL

    Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

    Authors: Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, Dacheng Tao

    Abstract: Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propo… ▽ More

    Submitted 8 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  24. arXiv:2404.18930  [pdf, other

    cs.CV

    Hallucination of Multimodal Large Language Models: A Survey

    Authors: Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou

    Abstract: This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge k… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 140 references

  25. arXiv:2404.13885  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals

    Authors: Qingyang Wu, Ying Xu, Tingsong Xiao, Yunze Xiao, Yitong Li, Tianyang Wang, Yichi Zhang, Shanghai Zhong, Yuwei Zhang, Wei Lu, Yifan Yang

    Abstract: Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  26. arXiv:2404.08679  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

    Authors: Andi Zhang, Tim Z. Xiao, Weiyang Liu, Robert Bamler, Damon Wischik

    Abstract: We revisit the likelihood ratio between a pretrained large language model (LLM) and its finetuned variant as a criterion for out-of-distribution (OOD) detection. The intuition behind such a criterion is that, the pretrained LLM has the prior knowledge about OOD data due to its large amount of training data, and once finetuned with the in-distribution data, the LLM has sufficient knowledge to disti… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  27. Direct May Not Be the Best: An Incremental Evolution View of Pose Generation

    Authors: Yuelong Li, Tengfei Xiao, Lei Geng, Jianming Wang

    Abstract: Pose diversity is an inherent representative characteristic of 2D images. Due to the 3D to 2D projection mechanism, there is evident content discrepancy among distinct pose images. This is the main obstacle bothering pose transformation related researches. To deal with this challenge, we propose a fine-grained incremental evolution centered pose generation framework, rather than traditional direct… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at AAAI2024

  28. arXiv:2404.01077  [pdf, other

    cs.CL

    Efficient Prompting Methods for Large Language Models: A Survey

    Authors: Kaiyan Chang, Songcheng Xu, Chenglong Wang, Yingfeng Luo, Tong Xiao, **gbo Zhu

    Abstract: Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. While this approach opens the door to in-context learning of LLMs, it brings the additional computational burden of model inference and human effort of manual-designed prompts, particularly when using lengthy and complex prompts to guide and control the behavior of LL… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  29. arXiv:2404.00978  [pdf, other

    cs.CL

    Prior Constraints-based Reward Model Training for Aligning Large Language Models

    Authors: Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, **gbo Zhu

    Abstract: Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs.However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model.This paper proposes a Prior Constraints-b… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  30. arXiv:2403.19899  [pdf, other

    cs.IR

    Inclusive Design Insights from a Preliminary Image-Based Conversational Search Systems Evaluation

    Authors: Yue Zheng, Lei Yu, Junmian Chen, Tianyu Xia, Yuanyuan Yin, Shan Wang, Haiming Liu

    Abstract: The digital realm has witnessed the rise of various search modalities, among which the Image-Based Conversational Search System stands out. This research delves into the design, implementation, and evaluation of this specific system, juxtaposing it against its text-based and mixed counterparts. A diverse participant cohort ensures a broad evaluation spectrum. Advanced tools facilitate emotion anal… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  31. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  32. arXiv:2403.12373  [pdf, other

    cs.CL

    RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners

    Authors: Chi Hu, Yuan Ge, Xiangnan Ma, Hang Cao, Qiang Li, Yonghua Yang, Tong Xiao, **gbo Zhu

    Abstract: Large Language Models (LLMs) have achieved impressive performance across various reasoning tasks. However, even state-of-the-art LLMs such as ChatGPT are prone to logical errors during their reasoning processes. Existing solutions, such as deploying task-specific verifiers or voting over multiple reasoning paths, either require extensive human annotations or fail in scenarios with inconsistent res… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: LREC-Coling 2024 Long Paper

  33. arXiv:2403.10572  [pdf, other

    cs.LG cs.SI

    Discovering Invariant Neighborhood Patterns for Heterophilic Graphs

    Authors: Ruihao Zhang, Zhengyu Chen, Teng Xiao, Yueyang Wang, Kun Kuang

    Abstract: This paper studies the problem of distribution shifts on non-homophilous graphs Mosting existing graph neural network methods rely on the homophilous assumption that nodes from the same class are more likely to be linked. However, such assumptions of homophily do not always hold in real-world graphs, which leads to more complex distribution shifts unaccounted for in previous methods. The distribut… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages,11 figures

  34. arXiv:2403.09605  [pdf, other

    cs.CV cs.AI

    Counterfactual contrastive learning: robust representations via causal image synthesis

    Authors: Melanie Roschewitz, Fabio De Sousa Ribeiro, Tian Xia, Galvin Khara, Ben Glocker

    Abstract: Contrastive pretraining is well-known to improve downstream task performance and model generalisation, especially in limited label settings. However, it is sensitive to the choice of augmentation pipeline. Positive pairs should preserve semantic information while destroying domain-specific information. Standard augmentation pipelines emulate domain-specific changes with pre-defined photometric tra… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/biomedia-mira/counterfactual-contrastive

  35. arXiv:2403.09422  [pdf, other

    cs.CV cs.AI

    Mitigating attribute amplification in counterfactual image generation

    Authors: Tian Xia, Mélanie Roschewitz, Fabio De Sousa Ribeiro, Charles Jones, Ben Glocker

    Abstract: Causal generative modelling is gaining interest in medical imaging due to its ability to answer interventional and counterfactual queries. Most work focuses on generating counterfactual images that look plausible, using auxiliary classifiers to enforce effectiveness of simulated interventions. We investigate pitfalls in this approach, discovering the issue of attribute amplification, where unrelat… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  36. arXiv:2403.09073  [pdf, other

    cs.CL

    Large Language Models are Parallel Multilingual Learners

    Authors: Yongyu Mu, Peinan Feng, Zhiquan Cao, Yuzhang Wu, Bei Li, Chenglong Wang, Tong Xiao, Kai Song, Tongran Liu, Chunliang Zhang, **gbo Zhu

    Abstract: In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities. To test this capability, we design extensive experiments encompassing 8 typical datasets, 7 languages and 8 state-of-th… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Working in process

  37. arXiv:2403.08167  [pdf, other

    cs.LG cs.CL q-bio.QM

    MolBind: Multimodal Alignment of Language, Molecules, and Proteins

    Authors: Teng Xiao, Chao Cui, Huaisheng Zhu, Vasant G. Honavar

    Abstract: Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) re… ▽ More

    Submitted 2 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  38. arXiv:2403.07179  [pdf, other

    cs.LG cs.CL q-bio.BM

    3M-Diffusion: Latent Multi-Modal Diffusion for Text-Guided Generation of Molecular Graphs

    Authors: Huaisheng Zhu, Teng Xiao, Vasant G Honavar

    Abstract: Generating molecules with desired properties is a critical task with broad applications in drug discovery and materials design. Inspired by recent advances in large language models, there is a growing interest in using natural language descriptions of molecules to generate molecules with the desired properties. Most existing methods focus on generating molecules that precisely match the text descr… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  39. arXiv:2403.05110  [pdf, other

    cs.RO cs.AI cs.LG

    Efficient Data Collection for Robotic Manipulation via Compositional Generalization

    Authors: Jensen Gao, Annie Xie, Ted Xiao, Chelsea Finn, Dorsa Sadigh

    Abstract: Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad generalization. Recent works on large-scale robotic data collection typically vary many environmental factors of variation (e.g., object types, table textures) during data collection, to cover a diverse range of scenar… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: RSS 2024

  40. arXiv:2403.05054  [pdf, other

    math.OC cs.LG

    A Sinkhorn-type Algorithm for Constrained Optimal Transport

    Authors: Xun Tang, Holakou Rahmanian, Michael Shavlovsky, Kiran Koshy Thekumparampil, Tesi Xiao, Lexing Ying

    Abstract: Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation an… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  41. arXiv:2403.03950  [pdf, other

    cs.LG cs.AI stat.ML

    Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

    Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

    Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  42. arXiv:2403.02709  [pdf, other

    cs.RO

    RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

    Authors: Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, A**kya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal

    Abstract: Natural language and images are commonly used as goal representations in goal-conditioned imitation learning (IL). However, natural language can be ambiguous and images can be over-specified. In this work, we propose hand-drawn sketches as a modality for goal specification in visual imitation learning. Sketches are easy for users to provide on the fly like language, but similar to images they can… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  43. arXiv:2403.02332  [pdf, other

    cs.CV

    UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

    Authors: Xuweiyi Chen, Tian Xia, Sihan Xu

    Abstract: Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. Despite the progress, ensuring consistency across frames remains a challenge, particularly when using text prompts as control conditions. To address this problem, we introduce UniCtrl, a novel, plug-and-play method that is universally appli… ▽ More

    Submitted 6 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Github: https://github.com/XuweiyiChen/UniCtrl Website: https://unified-attention-control.github.io/

  44. arXiv:2403.01968  [pdf, other

    cs.CV

    Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection

    Authors: Xin Zhang, Tao Xiao, Gepeng Ji, Xuan Wu, Keren Fu, Qijun Zhao

    Abstract: Camouflage poses challenges in distinguishing a static target, whereas any movement of the target can break this disguise. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting fra… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 9 pages, 6 figures

  45. arXiv:2403.01823  [pdf, other

    cs.RO cs.AI

    RT-H: Action Hierarchies Using Language

    Authors: Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, Dorsa Sadigh

    Abstract: Language provides a way to break down complex concepts into digestible pieces. Recent works in robot imitation learning use language-conditioned policies that predict actions given visual observations and the high-level task specified in language. These methods leverage the structure of natural language to share data between semantically similar tasks (e.g., "pick coke can" and "pick an apple") in… ▽ More

    Submitted 31 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  46. arXiv:2403.01548  [pdf, other

    cs.CL cs.AI cs.LG

    In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

    Authors: Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He

    Abstract: Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in… ▽ More

    Submitted 12 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: code repo is available at: https://github.com/hkust-nlp/Activation_decoding.git

  47. arXiv:2402.18191  [pdf, other

    cs.CL

    Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation

    Authors: Yuan Ge, Yilun Liu, Chi Hu, Weibin Meng, Shimin Tao, Xiaofeng Zhao, Hongxia Ma, Li Zhang, Hao Yang, Tong Xiao

    Abstract: With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required by training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  48. arXiv:2402.15813  [pdf, other

    cs.CL cs.GT

    Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method

    Authors: Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, Rui Wang

    Abstract: Bargaining is an important and unique part of negotiation between humans. As LLM-driven agents learn to negotiate and act like real humans, how to evaluate agents' bargaining abilities remains an open problem. For the first time, we formally described the Bargaining task as an asymmetric incomplete information game, defining the gains of the Buyer and Seller in multiple bargaining processes. It al… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings. The dataset AmazonHistoryPrice and our code are available at https://github.com/TianXiaSJTU/AmazonPriceHistory

  49. arXiv:2402.11450  [pdf, other

    cs.RO

    Learning to Learn Faster from Human Feedback with Language Model Predictive Control

    Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  50. arXiv:2402.10350  [pdf, other

    cs.LG cs.AI

    Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review

    Authors: **g Su, Chufeng Jiang, Xin **, Yuxin Qiao, Tingsong Xiao, Hongda Ma, Rong Wei, Zhi **g, Jiajun Xu, Junhong Lin

    Abstract: This systematic literature review comprehensively examines the application of Large Language Models (LLMs) in forecasting and anomaly detection, highlighting the current state of research, inherent challenges, and prospective future directions. LLMs have demonstrated significant potential in parsing and analyzing extensive datasets to identify patterns, predict future events, and detect anomalous… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.