Skip to main content

Showing 1–50 of 1,667 results for author: Xu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01458  [pdf, other

    cs.LG cs.AI cs.GT econ.TH

    Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

    Authors: Jibang Wu, Siyu Chen, Mengdi Wang, Huazheng Wang, Haifeng Xu

    Abstract: The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally aris… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01358  [pdf, other

    cs.CL

    Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models

    Authors: Xiaolin Xing, Zhiwei He, Haoyu Xu, Xing Wang, Rui Wang, Yu Hong

    Abstract: This paper investigates the cross-lingual inconsistencies observed in Large Language Models (LLMs), such as ChatGPT, Llama, and Baichuan, which have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.00631  [pdf, other

    cs.LG cs.AI

    TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

    Authors: **tai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

    Abstract: Clinical trials are pivotal for develo** new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2407.00326  [pdf, other

    cs.DC cs.AI cs.NI

    Teola: Towards End-to-End Optimization of LLM-based Applications

    Authors: Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu

    Abstract: Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling dec… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  5. arXiv:2406.19949  [pdf, other

    cs.CL

    Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

    Authors: Jiazheng Li, Hainiu Xu, Zhaoyue Sun, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He

    Abstract: Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  6. arXiv:2406.19770  [pdf, other

    cs.LG cs.AI

    Self-Supervised Spatial-Temporal Normality Learning for Time Series Anomaly Detection

    Authors: Yutong Chen, Hongzuo Xu, Guansong Pang, Hezhe Qiao, Yuan Zhou, Mingsheng Shang

    Abstract: Time Series Anomaly Detection (TSAD) finds widespread applications across various domains such as financial markets, industrial production, and healthcare. Its primary objective is to learn the normal patterns of time series data, thereby identifying deviations in test samples. Most existing TSAD methods focus on modeling data from the temporal dimension, while ignoring the semantic information in… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 18 pages, 4 figures, accepted in ECML PKDD2024

  7. arXiv:2406.18049  [pdf

    cs.CL cs.AI

    Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources

    Authors: Yiming Li, Deepthi Viswaroopan, William He, Jianfu Li, Xu Zuo, Hua Xu, Cui Tao

    Abstract: Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual infor… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.16374  [pdf, other

    cs.CL

    KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

    Authors: Dongyang Li, Taolin Zhang, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

    Abstract: Knowledge-enhanced pre-trained language models (KEPLMs) leverage relation triples from knowledge graphs (KGs) and integrate these external data sources into language models via self-supervised learning. Previous works treat knowledge enhancement as two independent operations, i.e., knowledge injection and knowledge integration. In this paper, we propose to learn Knowledge-Enhanced language represe… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  9. arXiv:2406.16372  [pdf, other

    cs.CL

    UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding

    Authors: Dongyang Li, Taolin Zhang, Jiali Deng, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

    Abstract: Cross-lingual representation learning transfers knowledge from resource-rich data to resource-scarce ones to improve the semantic understanding abilities of different languages. However, previous works rely on shallow unsupervised data generated by token surface matching, regardless of the global context-aware semantics of the surrounding text tokens. In this paper, we propose an Unsupervised Pseu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  10. arXiv:2406.16367  [pdf, other

    cs.IR

    On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

    Authors: Dongyang Li, Junbing Yan, Taolin Zhang, Chengyu Wang, Xiaofeng He, Longtao Huang, Hui Xue, Jun Huang

    Abstract: Retrieval augmented generation (RAG) exhibits outstanding performance in promoting the knowledge capabilities of large language models (LLMs) with retrieved documents related to user queries. However, RAG only focuses on improving the response quality of LLMs via enhancing queries indiscriminately with retrieved information, paying little attention to what type of knowledge LLMs really need to ans… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.16203  [pdf, other

    cs.CL

    LLMs' Classification Performance is Overclaimed

    Authors: Hanzi Xu, Renze Lou, Jiangshu Du, Vahid Mahzoon, Elmira Talebianaraki, Zhuoan Zhou, Elizabeth Garrison, Slobodan Vucetic, Wenpeng Yin

    Abstract: In many classification tasks designed for AI or human to solve, gold labels are typically included within the label space by default, often posed as "which of the following is correct?" This standard setup has traditionally highlighted the strong performance of advanced AI, particularly top-performing Large Language Models (LLMs), in routine classification tasks. However, when the gold label is in… ▽ More

    Submitted 29 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  12. arXiv:2406.16005  [pdf, other

    cs.DC

    A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

    Authors: Lei Chen, Shi Liu, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, Harry Xu

    Abstract: With rapid advances in network hardware, far memory has gained a great deal of traction due to its ability to break the memory capacity wall. Existing far memory systems fall into one of two data paths: one that uses the kernel's paging system to transparently access far memory at the page granularity, and a second that bypasses the kernel, fetching data at the object granularity. While it is gene… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  13. arXiv:2406.15762  [pdf, other

    cs.LG stat.ML

    Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow

    Authors: Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang

    Abstract: Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  14. arXiv:2406.15534  [pdf, other

    cs.LG cs.AI cs.CL q-bio.QM

    Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

    Authors: Tianyu Liu, Yijia Xiao, Xiao Luo, Hua Xu, W. Jim Zheng, Hongyu Zhao

    Abstract: The applications of large language models (LLMs) are promising for biomedical and healthcare research. Despite the availability of open-source LLMs trained using a wide range of biomedical data, current research on the applications of LLMs to genomics and proteomics is still limited. To fill this gap, we propose a collection of finetuned LLMs and multimodal LLMs (MLLMs), known as Geneverse, for th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages

  15. arXiv:2406.15459  [pdf, other

    cs.GT cs.CE cs.LG

    Large-Scale Contextual Market Equilibrium Computation through Deep Learning

    Authors: Yunxuan Ma, Yide Bian, Hao Xu, Weitao Yang, **gshu Zhao, Zhijian Duan, Feng Wang, Xiaotie Deng

    Abstract: Market equilibrium is one of the most fundamental solution concepts in economics and social optimization analysis. Existing works on market equilibrium computation primarily focus on settings with a relatively small number of buyers. Motivated by this, our paper investigates the computation of market equilibrium in scenarios with a large-scale buyer population, where buyers and goods are represent… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 22 pages

  16. arXiv:2406.14773  [pdf, other

    cs.CR

    Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data

    Authors: Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, Jiliang Tang

    Abstract: Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. However, when the retrieval process involves private data, RAG systems may face severe privacy risks, potentially leading to the leakage of sensitive information. To address this issue, we propose using synthetic data as a privacy-preserving al… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  17. arXiv:2406.13413  [pdf, other

    eess.IV cs.CV

    Recurrent Inference Machine for Medical Image Registration

    Authors: Yi Zhang, Yidong Zhao, Hui Xue, Peter Kellman, Stefan Klein, Qian Tao

    Abstract: Image registration is essential for medical image applications where alignment of voxels across multiple images is needed for qualitative or quantitative analysis. With recent advancements in deep neural networks and parallel computing, deep learning-based medical image registration methods become competitive with their flexible modelling and fast inference capabilities. However, compared to tradi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint

  18. arXiv:2406.13123  [pdf, other

    cs.AI cs.CV

    ViLCo-Bench: VIdeo Language COntinual learning Benchmark

    Authors: Tianqi Tang, Shohreh Deldari, Hao Xue, Celso De Melo, Flora D. Salim

    Abstract: Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model's ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures, 8 tables, under review

  19. arXiv:2406.12913  [pdf, other

    cs.LG cs.AI

    T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation

    Authors: Lihuan Li, Hao Xue, Yang Song, Flora Salim

    Abstract: Trajectory similarity computation is an essential technique for analyzing moving patterns of spatial data across various applications such as traffic management, wildlife tracking, and location-based services. Modern methods often apply deep learning techniques to approximate heuristic metrics but struggle to learn more robust and generalized representations from the vast amounts of unlabeled traj… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  20. arXiv:2406.12709  [pdf, other

    cs.LG cs.AI

    Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

    Authors: Du Yin, **liang Deng, Shuang Ao, Zechen Li, Hao Xue, Arian Prabowo, Renhe Jiang, Xuan Song, Flora Salim

    Abstract: Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in perfo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  21. arXiv:2406.12693  [pdf, other

    cs.LG cs.AI

    XXLTraffic: Expanding and Extremely Long Traffic Dataset for Ultra-Dynamic Forecasting Challenges

    Authors: Du Yin, Hao Xue, Arian Prabowo, Shuang Ao, Flora Salim

    Abstract: Traffic forecasting is crucial for smart cities and intelligent transportation initiatives, where deep learning has made significant progress in modeling complex spatio-temporal patterns in recent years. However, current public datasets have limitations in reflecting the ultra-dynamic nature of real-world scenarios, characterized by continuously evolving infrastructures, varying temporal distribut… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  22. arXiv:2406.12651  [pdf, other

    cs.RO cs.AI cs.CL cs.HC

    Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics

    Authors: Huan Xu, **lin Wu, Guanglin Cao, Zhen Chen, Zhen Lei, Hongbin Liu

    Abstract: Ultrasonography has revolutionized non-invasive diagnostic methodologies, significantly enhancing patient outcomes across various medical domains. Despite its advancements, integrating ultrasound technology with robotic systems for automated scans presents challenges, including limited command understanding and dynamic execution capabilities. To address these challenges, this paper introduces a no… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This work has been accepted by MICCAI 2024

  23. arXiv:2406.12516  [pdf, other

    cs.CR cs.DC cs.LG

    Update Selective Parameters: Federated Machine Unlearning Based on Model Explanation

    Authors: Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, Philip S. Yu

    Abstract: Federated learning is a promising privacy-preserving paradigm for distributed machine learning. In this context, there is sometimes a need for a specialized process called machine unlearning, which is required when the effect of some specific training samples needs to be removed from a learning model due to privacy, security, usability, and/or legislative factors. However, problems arise when curr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Big Data

  24. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, **gning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong **, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  26. arXiv:2406.10954  [pdf, other

    cs.LG cs.CR

    Towards Efficient Target-Level Machine Unlearning Based on Essential Graph

    Authors: Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, Wei Zhao

    Abstract: Machine unlearning is an emerging technology that has come to attract widespread attention. A number of factors, including regulations and laws, privacy, and usability concerns, have resulted in this need to allow a trained model to forget some of its training data. Existing studies of machine unlearning mainly focus on unlearning requests that forget a cluster of instances or all instances from o… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  27. arXiv:2406.10953  [pdf, other

    cs.CR

    Really Unlearned? Verifying Machine Unlearning via Influential Sample Pairs

    Authors: Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou

    Abstract: Machine unlearning enables pre-trained models to eliminate the effects of partial training samples. Previous research has mainly focused on proposing efficient unlearning strategies. However, the verification of machine unlearning, or in other words, how to guarantee that a sample has been successfully unlearned, has been overlooked for a long time. Existing verification schemes typically rely on… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  28. arXiv:2406.10951  [pdf, other

    cs.CR

    Don't Forget Too Much: Towards Machine Unlearning on Feature Level

    Authors: Heng Xu, Tianqing Zhu, Wanlei Zhou, Wei Zhao

    Abstract: Machine unlearning enables pre-trained models to remove the effect of certain portions of training data. Previous machine unlearning schemes have mainly focused on unlearning a cluster of instances or all instances belonging to a specific class. These types of unlearning might have a significant impact on the model utility; and they may be inadequate for situations where we only need to unlearn fe… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  29. arXiv:2406.10794  [pdf, other

    cs.CL

    Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis

    Authors: Yu** Lin, Pengfei He, Han Xu, Yue Xing, Makoto Yamada, Hui Liu, Jiliang Tang

    Abstract: Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents. Although there are diverse jailbreak attack strategies, there is no unified understanding on why some methods succeed and others fail. This paper explores the behavior of harmful and harmless prompts in the LLM's representation space to investigate the intrinsic p… ▽ More

    Submitted 26 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  30. arXiv:2406.10746  [pdf, other

    cs.CL cs.IR

    SparseCL: Sparse Contrastive Learning for Contradiction Retrieval

    Authors: Haike Xu, Zongyu Lin, Yizhou Sun, Kai-Wei Chang, Piotr Indyk

    Abstract: Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and crossencoder models exhibit significant limitations.… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  31. arXiv:2406.10671  [pdf

    cs.CL

    Augmenting Biomedical Named Entity Recognition with General-domain Resources

    Authors: Yu Yin, Hyunjae Kim, Xiao Xiao, Chih Hsuan Wei, Jaewoo Kang, Zhiyong Lu, Hua Xu, Meng Fang, Qingyu Chen

    Abstract: Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera

  32. arXiv:2406.10517  [pdf, other

    cs.IR cs.AI cs.LG

    ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

    Authors: Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, Jie Jiang

    Abstract: Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  33. arXiv:2406.10459  [pdf, other

    cs.CL

    CancerLLM: A Large Language Model in Cancer Domain

    Authors: Mingchen Li, Anne Blaes, Steven Johnson, Hongfang Liu, Hua Xu, Rui Zhang

    Abstract: Medical Large Language Models (LLMs) such as ClinicalCamel 70B, Llama3-OpenBioLLM 70B have demonstrated impressive performance on a wide variety of medical NLP task.However, there still lacks a large language model (LLM) specifically designed for cancer domain. Moreover, these LLMs typically have billions of parameters, making them computationally expensive for healthcare systems.Thus, in this stu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  34. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui **, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  35. arXiv:2406.10157  [pdf, other

    cs.RO cs.AI

    RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model

    Authors: Hantao Zhou, Tianying Ji, Jianwei Zhang, Fuchun Sun, Huazhe Xu

    Abstract: Minigolf, a game with countless court layouts, and complex ball motion, constitutes a compelling real-world testbed for the study of embodied intelligence. As it not only challenges spatial and kinodynamic reasoning but also requires reflective and corrective capacities to address erroneously designed courses. We introduce RoboGolf, a framework that perceives dual-camera visual inputs with nested… ▽ More

    Submitted 23 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://jity16.github.io/RoboGolf/

  36. arXiv:2406.10093  [pdf, other

    cs.RO cs.LG

    BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation

    Authors: Dongjie Yu, Hang Xu, Yizhou Chen, Yi Ren, Jia Pan

    Abstract: Bimanual manipulation tasks typically involve multiple stages which require efficient interactions between two arms, posing step-wise and stage-wise challenges for imitation learning systems. Specifically, failure and delay of one step will broadcast through time, hinder success and efficiency of each sub-stage task, and thereby overall task performance. Although recent works have made strides in… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  37. arXiv:2406.09656  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement

    Authors: **gcheng Li, Ye Qiao, Haocheng Xu, Sitao Huang

    Abstract: Images captured under low-light scenarios often suffer from low quality. Previous CNN-based deep learning methods often involve using Retinex theory. Nevertheless, most of them cannot perform well in more complicated datasets like LOL-v2 while consuming too much computational resources. Besides, some of these methods require sophisticated training at different stages, making the procedure even mor… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  38. arXiv:2406.08990  [pdf, other

    cs.LG

    BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics

    Authors: Arian Prabowo, Xiachong Lin, Imran Razzak, Hao Xue, Emily W. Yap, Matthew Amos, Flora D. Salim

    Abstract: Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytic… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 21 pages, 2 figures, 9 tables, under review

  39. arXiv:2406.07936  [pdf, other

    cs.SE cs.PL

    Characterizing Unsafe Code Encapsulation In Real-world Rust Systems

    Authors: Zihao Rao, Yiran Yang, Hui Xu

    Abstract: Interior unsafe is an essential design paradigm advocated by the Rust community in system software development. However, there is little official guidance or few best practices regarding how to encapsulate unsafe code and achieve interior unsafe. The problem is critical because the Rust compiler is incapable of verifying the soundness of a safe function containing unsafe code. Falsely declaring an… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.07528  [pdf, other

    cs.LG

    QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

    Authors: **gyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia

    Abstract: The capacity of Large Language Models (LLMs) to comprehend and reason over long contexts is pivotal for advancements in diverse fields. Yet, they still stuggle with capturing long-distance dependencies within sequences to deeply understand semantics. To address this issue, we introduce Query-aware Inference for LLMs (Q-LLM), a system designed to process extensive sequences akin to human cognition.… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  41. arXiv:2406.07381  [pdf, other

    cs.AI cs.LG

    World Models with Hints of Large Language Models for Goal Achieving

    Authors: Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu

    Abstract: Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  42. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  43. arXiv:2406.07091  [pdf, other

    cs.CV

    AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

    Authors: Xing Zhang, Jiaxi Gu, Haoyu Zhao, Shicong Wang, Hang Xu, Ren**g Pei, Songcen Xu, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Temporal Video Grounding (TVG) aims to localize a moment from an untrimmed video given the language description. Since the annotation of TVG is labor-intensive, TVG under limited supervision has accepted attention in recent years. The great success of vision-language pre-training guides TVG to follow the traditional "pre-training + fine-tuning" paradigm, however, the pre-training process would suf… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Technique Report

  44. arXiv:2406.06220  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Label-Loo**: Highly Efficient Decoding for Transducers

    Authors: Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: This paper introduces a highly efficient greedy decoding algorithm for Transducer inference. We propose a novel data structure using CUDA tensors to represent partial hypotheses in a batch that supports parallelized hypothesis manipulations. During decoding, our algorithm maximizes GPU parallelism by adopting a nested-loop design, where the inner loop consumes all blank predictions, while non-blan… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  45. arXiv:2406.05871  [pdf, other

    cs.CV cs.LG

    OmniControlNet: Dual-stage Integration for Conditional Image Generation

    Authors: Yilin Wang, Haiyang Xu, Xiang Zhang, Zeyuan Chen, Zhizhou Sha, Zirui Wang, Zhuowen Tu

    Abstract: We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method and incorporating its individually trained image generation processes into a single model. Despite its tremendous success, the ControlNet of a two-stage pipeline bears limitations in being not self-contained (e.g. calls the external condit… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024 Workshop: Generative Models for Computer Vision

  46. arXiv:2406.05835  [pdf, other

    cs.CV

    Mamba YOLO: SSMs-Based YOLO For Object Detection

    Authors: Zeyu Wang, Chen Li, Huiying Xu, Xinzhong Zhu

    Abstract: Propelled by the rapid advancement of deep learning technologies, the YOLO series has set a new benchmark for real-time object detectors. Researchers have continuously explored innovative applications of reparameterization, efficient layer aggregation networks, and anchor-free techniques on the foundation of YOLO. To further enhance detection performance, Transformer-based structures have been int… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  47. arXiv:2406.05187  [pdf, other

    cs.GT cs.AI cs.HC cs.LG

    How to Strategize Human Content Creation in the Era of GenAI?

    Authors: Seyed A. Esmaeili, Kshipra Bhawalkar, Zhe Feng, Di Wang, Haifeng Xu

    Abstract: Generative AI (GenAI) will have significant impact on content creation platforms. In this paper, we study the dynamic competition between a GenAI and a human contributor. Unlike the human, the GenAI's content only improves when more contents are created by human over the time; however, GenAI has the advantage of generating content at a lower cost. We study the algorithmic problem in this dynamic c… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  48. arXiv:2406.04727  [pdf, other

    cs.LG cond-mat.soft cs.AI

    Predicting Polymer Properties Based on Multimodal Multitask Pretraining

    Authors: Fanmeng Wang, Wentao Guo, Minjie Cheng, Shen Yuan, Hongteng Xu, Zhifeng Gao

    Abstract: In the past few decades, polymers, high-molecular-weight compounds formed by bonding numerous identical or similar monomers covalently, have played an essential role in various scientific fields. In this context, accurate prediction of their properties is becoming increasingly crucial. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highl… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  49. arXiv:2406.03791  [pdf, other

    cs.LG

    Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU

    Authors: Daniel Galvez, Vladimir Bataev, Hainan Xu, Tim Kaldewey

    Abstract: The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on decoding. Current state-of-the-art RNN-T decoding implementations leave the GPU idle ~80% of the time. Leveraging a new CUDA 12.4 feature, CUDA graph conditional nodes, we present an exact GPU-based implementation of greedy decoding for RNN-T models that eliminates this idle time. Our optimizations speed up a 1… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024 Proceedings

  50. arXiv:2406.03757  [pdf, other

    cs.RO cs.LG

    RoboCoder: Robotic Learning from Basic Skills to General Tasks with Large Language Models

    Authors: **gyao Li, Pengguang Chen, Sitong Wu, Chuanyang Zheng, Hong Xu, Jiaya Jia

    Abstract: The emergence of Large Language Models (LLMs) has improved the prospects for robotic tasks. However, existing benchmarks are still limited to single tasks with limited generalization capabilities. In this work, we introduce a comprehensive benchmark and an autonomous learning framework, RoboCoder aimed at enhancing the generalization capabilities of robots in complex environments. Unlike tradition… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.