Skip to main content

Showing 1–50 of 318 results for author: Dai, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13149  [pdf, other

    cs.CV

    High-Fidelity Facial Albedo Estimation via Texture Quantization

    Authors: Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng

    Abstract: Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results. In this paper, we present a novel facial albedo recons… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.11100  [pdf, other

    cs.CV

    An Analysis on Quantizing Diffusion Transformers

    Authors: Yuewei Yang, Jialiang Wang, Xiaoliang Dai, Peizhao Zhang, Hongbo Zhang

    Abstract: Diffusion Models (DMs) utilize an iterative denoising process to transform random noise into synthetic data. Initally proposed with a UNet structure, DMs excel at producing images that are virtually indistinguishable with or without conditioned text prompts. Later transformer-only structure is composed with DMs to achieve better performance. Though Latent Diffusion Models (LDMs) reduce the computa… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: CVPR T4V workshop

  3. arXiv:2406.10252  [pdf, other

    cs.IR cs.AI cs.CL

    AutoSurvey: Large Language Models Can Automatically Write Surveys

    Authors: Yidong Wang, Qi Guo, Wen** Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang

    Abstract: This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  4. arXiv:2406.04583  [pdf, other

    cs.CL

    Extroversion or Introversion? Controlling The Personality of Your Large Language Models

    Authors: Yanquan Chen, Zhen Wu, Junjie Guo, Shujian Huang, Xinyu Dai

    Abstract: Large language models (LLMs) exhibit robust capabilities in text generation and comprehension, mimicking human behavior and exhibiting synthetic personalities. However, some LLMs have displayed offensive personality, propagating toxic discourse. Existing literature neglects the origin and evolution of LLM personalities, as well as the effective personality control. To fill these gaps, our study em… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  5. arXiv:2406.04374  [pdf, other

    cs.IR cs.GT cs.LG stat.ML

    Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

    Authors: Yuantong Li, Guang Cheng, Xiaowu Dai

    Abstract: Recommender systems play a crucial role in internet economies by connecting users with relevant products or services. However, designing effective recommender systems faces two key challenges: (1) the exploration-exploitation tradeoff in balancing new product exploration against exploiting known preferences, and (2) dynamic incentive compatibility in accounting for users' self-interested behaviors… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2406.04334  [pdf, other

    cs.CV

    DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

    Authors: Lingchen Meng, Jianwei Yang, Rui Tian, Xiyang Dai, Zuxuan Wu, Jianfeng Gao, Yu-Gang Jiang

    Abstract: Most large multimodal models (LMMs) are implemented by feeding visual tokens as a sequence into the first layer of a large language model (LLM). The resulting architecture is simple but significantly increases computation and memory costs, as it has to handle a large number of additional tokens in its input layer. This paper presents a new architecture DeepStack for LMMs. Considering $N$ layers in… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://deepstack-vl.github.io/

  7. arXiv:2406.03868  [pdf, other

    cs.DC

    PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

    Authors: Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, **yi Deng, Yang Hu, Shouyi Yin

    Abstract: Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages

  8. arXiv:2406.02368  [pdf, other

    cs.IR cs.CL

    Large Language Models Make Sample-Efficient Recommender Systems

    Authors: Jianghao Lin, Xinyi Dai, Rong Shan, Bo Chen, Ruiming Tang, Yong Yu, Weinan Zhang

    Abstract: Large language models (LLMs) have achieved remarkable progress in the field of natural language processing (NLP), demonstrating remarkable abilities in producing text that resembles human language for various tasks. This opens up new opportunities for employing them in recommender systems (RSs). In this paper, we specifically examine the sample efficiency of LLM-enhanced recommender systems, which… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Frontier of Computer Science

  9. arXiv:2406.01062  [pdf, other

    cs.CV

    SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models

    Authors: Qilong Zhangli, **dong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan

    Abstract: While diffusion models have significantly advanced the quality of image generation, their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene text generation are typically limited by their reliance on an intermediate layout output. This dependency often results in a constrained diversity of text sty… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  10. arXiv:2406.00011  [pdf, other

    cs.IR cs.AI

    DisCo: Towards Harmonious Disentanglement and Collaboration between Tabular and Semantic Space for Recommendation

    Authors: Kounianhua Du, Jizheng Chen, Jianghao Lin, Yunjia Xi, Hangyu Wang, Xinyi Dai, Bo Chen, Ruiming Tang, Weinan Zhang

    Abstract: Recommender systems play important roles in various applications such as e-commerce, social media, etc. Conventional recommendation methods usually model the collaborative signals within the tabular representation space. Despite the personalization modeling and the efficiency, the latent semantic dependencies are omitted. Methods that introduce semantics into recommendation then emerge, injecting… ▽ More

    Submitted 4 June, 2024; v1 submitted 20 May, 2024; originally announced June 2024.

  11. arXiv:2405.18015  [pdf, other

    cs.CL

    MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction

    Authors: Xiang Dai, Sarvnaz Karimi, Abeed Sarker, Ben Hachey, Cecile Paris

    Abstract: Objective. Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over years, many datasets are created, and shared tasks are organised to facilitate active adverse event surveillance. However, most-if not all-datasets or shared tasks focus on extracting ADEs from… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Under review; feedback welcome

  12. arXiv:2405.16587  [pdf, other

    cs.LG cs.AI cs.HC

    Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

    Authors: Xiangxiang Dai, ** Li, Xutong Liu, Anqi Yu, John C. S. Lui

    Abstract: With the rapid advancement of large language models (LLMs), the diversity of multi-LLM tasks and the variability in their pricing structures have become increasingly important, as costs can vary greatly between different LLMs. To tackle these challenges, we introduce the \textit{C2MAB-V}, a \underline{C}ost-effective \underline{C}ombinatorial \underline{M}ulti-armed \underline{B}andit with \underl… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 29 pages, 12 figures, conference

  13. arXiv:2405.14129  [pdf, other

    cs.CL cs.AI cs.CV

    AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

    Authors: Fei Zhao, Taotian Pang, Chunhui Li, Zhen Wu, Junjie Guo, Shangyu Xing, Xinyu Dai

    Abstract: Multimodal Large Language Models (MLLMs) are widely regarded as crucial in the exploration of Artificial General Intelligence (AGI). The core of MLLMs lies in their capability to achieve cross-modal alignment. To attain this goal, current MLLMs typically follow a two-phase training paradigm: the pre-training phase and the instruction-tuning phase. Despite their success, there are shortcomings in t… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Code and models are available at $\href{https://aligngpt-vl.github.io/}{\textit{this https URL}}$

  14. arXiv:2405.07038  [pdf, other

    cs.GT cs.LG stat.ML

    Conformal Online Auction Design

    Authors: Jiale Han, Xiaowu Dai

    Abstract: This paper proposes the conformal online auction design (COAD), a novel mechanism for maximizing revenue in online auctions by quantifying the uncertainty in bidders' values without relying on assumptions about value distributions. COAD incorporates both the bidder and item features and leverages historical data to provide an incentive-compatible mechanism for online auctions. Unlike traditional m… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  15. arXiv:2405.03798  [pdf, other

    cs.IT

    Update Rate, Accuracy, and Age of Information in a Wireless Sensor Network

    Authors: Xinlu Dai, Cyril Leung

    Abstract: Age of Information (AoI), namely the time that has elapsed since the most recently delivered packet was generated, is receiving increasing attention with the emergence of many real-time applications that rely on the exchange of time-sensitive information. AoI captures the freshness of the information from the perspective of the destination. The term "accuracy of information" is used to assess how… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  16. arXiv:2405.03501  [pdf, other

    cs.LG cs.AI cs.CV

    Boosting Single Positive Multi-label Classification with Generalized Robust Loss

    Authors: Yanxi Chen, Chunxiao Li, Xinyang Dai, **huan Li, Weiyu Sun, Yiming Wang, Renyuan Zhang, Tinghe Zhang, Bo Wang

    Abstract: Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and ro… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 6 tables

  17. arXiv:2405.03373  [pdf, other

    cs.CV

    Knowledge-aware Text-Image Retrieval for Remote Sensing Images

    Authors: Li Mi, Xianjie Dai, Javiera Castillo-Navarro, Devis Tuia

    Abstract: Image-based retrieval in large Earth observation archives is challenging because one needs to navigate across thousands of candidate matches only with the query image as a guide. By using text as information supporting the visual query, the retrieval system gains in usability, but at the same time faces difficulties due to the diversity of visual signals that cannot be summarized by a short captio… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Under review

  18. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  19. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi **, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, **g Lin, Alan Yuille, Ben Shao, ** Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  20. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  21. S4TP: Social-Suitable and Safety-Sensitive Trajectory Planning for Autonomous Vehicles

    Authors: Xiao Wang, Ke Tang, Xingyuan Dai, **tao Xu, Quancheng Du, Rui Ai, Yuxiao Wang, Weihao Gu

    Abstract: In public roads, autonomous vehicles (AVs) face the challenge of frequent interactions with human-driven vehicles (HDVs), which render uncertain driving behavior due to varying social characteristics among humans. To effectively assess the risks prevailing in the vicinity of AVs in social interactive traffic scenarios and achieve safe autonomous driving, this article proposes a social-suitable and… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages,4 figures, published to IEEE Transactions on Intelligent Vehicles

  22. arXiv:2404.09778  [pdf, other

    cs.CV

    The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

    Authors: Yaohui Li, Qifeng Zhou, Haoxing Chen, Jianbing Zhang, Xinyu Dai, Hao Zhou

    Abstract: Contrastive Language-Image Pre-training (CLIP) has shown powerful zero-shot learning performance. Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'. Most existing methods either implicitly learn from the few shots by incorporating learnable prompts or adapters, or explicitly embed them in a cache model for inference. Howev… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  23. arXiv:2404.00702  [pdf, other

    cs.IR

    Tired of Plugins? Large Language Models Can Be End-To-End Recommenders

    Authors: Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, Ruiming Tang

    Abstract: Recommender systems aim to predict user interest based on historical behavioral data. They are mainly designed in sequential pipelines, requiring lots of data to train different sub-systems, and are hard to scale to new domains. Recently, Large Language Models (LLMs) have demonstrated remarkable generalized capabilities, enabling a singular model to tackle diverse recommendation tasks across vario… ▽ More

    Submitted 7 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  24. arXiv:2403.20289  [pdf, other

    cs.CL cs.SD eess.AS

    Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation

    Authors: Fangxu Yu, Junjie Guo, Zhen Wu, Xinyu Dai

    Abstract: Emotion Recognition in Conversation (ERC) involves detecting the underlying emotion behind each utterance within a conversation. Effectively generating representations for utterances remains a significant challenge in this task. Recent works propose various models to address this issue, but they still struggle with differentiating similar emotions such as excitement and happiness. To alleviate thi… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by Findings of NAACL 2024

  25. arXiv:2403.19967  [pdf, other

    cs.CV

    Rewrite the Stars

    Authors: Xu Ma, Xiyang Dai, Yue Bai, Yizhou Wang, Yun Fu

    Abstract: Recent studies have drawn attention to the untapped potential of the "star operation" (element-wise multiplication) in network design. While intuitive explanations abound, the foundational rationale behind its application remains largely unexplored. Our study attempts to reveal the star operation's ability to map inputs into high-dimensional, non-linear feature spaces -- akin to kernel tricks -- w… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Codes are made publically available at https://github.com/ma-xu/Rewrite-the-Stars

  26. arXiv:2403.19963  [pdf, other

    cs.CV

    Efficient Modulation for Vision Networks

    Authors: Xu Ma, Xiyang Dai, Jianwei Yang, Bin Xiao, Yinpeng Chen, Yun Fu, Lu Yuan

    Abstract: In this work, we present efficient modulation, a novel design for efficient vision networks. We revisit the modulation mechanism, which operates input through convolutional context modeling and feature projection layers, and fuses features via element-wise multiplication and an MLP block. We demonstrate that the modulation mechanism is particularly well suited for efficient networks and further ta… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024. Codes are made publically available at https://github.com/ma-xu/EfficientMod

  27. arXiv:2403.16210  [pdf, other

    cs.CV cs.AI cs.GR

    Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

    Authors: Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

    Abstract: We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Video: https://youtu.be/lRn-HqyCrLI

  28. arXiv:2403.16037  [pdf, other

    cs.IR

    Knowledge-aware Dual-side Attribute-enhanced Recommendation

    Authors: Taotian Pang, Xingyu Lou, Fei Zhao, Zhen Wu, Kuiyao Dong, Qiuying Peng, Yue Qi, Xinyu Dai

    Abstract: \textit{Knowledge-aware} recommendation methods (KGR) based on \textit{graph neural networks} (GNNs) and \textit{contrastive learning} (CL) have achieved promising performance. However, they fall short in modeling fine-grained user preferences and further fail to leverage the \textit{preference-attribute connection} to make predictions, leading to sub-optimal performance. To address the issue, we… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  29. arXiv:2403.13263  [pdf, other

    cs.CV

    SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

    Authors: Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, **g Liu

    Abstract: Recent trends in Large Vision Language Models (LVLMs) research have been increasingly focusing on advancing beyond general image understanding towards more nuanced, object-level referential comprehension. In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects an… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  30. arXiv:2403.12995  [pdf, other

    q-bio.BM cs.CE cs.LG

    ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

    Authors: Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou

    Abstract: Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small mole… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: ICML2024 camera-ready, update some experimental results, add github url, fix some typos

  31. arXiv:2403.12393  [pdf, other

    cs.CL

    Dr3: Ask Large Language Models Not to Give Off-Topic Answers in Open Domain Multi-Hop Question Answering

    Authors: Yuan Gao, Yiheng Zhu, Yuanbin Cao, Yinzhi Zhou, Zhen Wu, Yujie Chen, Shenglan Wu, Haoyuan Hu, Xinyu Dai

    Abstract: Open Domain Multi-Hop Question Answering (ODMHQA) plays a crucial role in Natural Language Processing (NLP) by aiming to answer complex questions through multi-step reasoning over retrieved information from external knowledge sources. Recently, Large Language Models (LLMs) have demonstrated remarkable performance in solving ODMHQA owing to their capabilities including planning, reasoning, and util… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024, Long Paper

  32. arXiv:2403.10953  [pdf, other

    cs.CV

    Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription

    Authors: Hongxiang Zhao, Xili Dai, Jianan Wang, Shengbang Tong, **gyuan Zhang, Weida Wang, Lei Zhang, Yi Ma

    Abstract: Large image diffusion models have demonstrated zero-shot capability in novel view synthesis (NVS). However, existing diffusion-based NVS methods struggle to generate novel views that are accurately consistent with the corresponding ground truth poses and appearances, even on the training set. This consequently limits the performance of downstream tasks, such as image-to-multiview generation and 3D… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  33. arXiv:2403.10413  [pdf, other

    cs.CV

    Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

    Authors: Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai

    Abstract: Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating m… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures, submitted to IROS 2024

  34. arXiv:2403.09997  [pdf, other

    cs.CL

    Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

    Authors: Xiang Dai, Sarvnaz Karimi, Nathan O'Callaghan

    Abstract: Electronic health records include information on patients' status and medical history, which could cover the history of diseases and disorders that could be hereditary. One important use of family history information is in precision health, where the goal is to keep the population healthy with preventative measures. Natural Language Processing (NLP) and machine learning techniques can assist with… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Under Review

  35. arXiv:2403.04134  [pdf, other

    cs.RO

    An Adaptable, Safe, and Portable Robot-Assisted Feeding System

    Authors: Ethan Kroll Gordon, Rajat Kumar Jenamani, Amal Nanavati, Ziang Liu, Haya Bolotski, Raida Karim, Daniel Stabile, Atharva Kashyap, Bernie Hao Zhu, Xilai Dai, Tyler Schrenk, Jonathan Ko, Taylor Kessler Faulkner, Tapomayukh Bhattacharjee, Siddhartha Srinivasa

    Abstract: We demonstrate a robot-assisted feeding system that enables people with mobility impairments to feed themselves. Our system design embodies Safety, Portability, and User Control, with comprehensive full-stack safety checks, the ability to be mounted on and powered by any powered wheelchair, and a custom web-app allowing care-recipients to leverage their own assistive devices for robot control. For… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: HRI 2024 Demo; Corrected inaccurate author ordering in ACM DL which occurred due to formatting issues

  36. arXiv:2402.17257  [pdf, other

    cs.LG cs.AI cs.RO

    RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

    Authors: Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang

    Abstract: Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method… ▽ More

    Submitted 30 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML2024

  37. arXiv:2402.12846  [pdf, other

    cs.CV cs.AI

    ConVQG: Contrastive Visual Question Generation with Multimodal Guidance

    Authors: Li Mi, Syrielle Montariol, Javiera Castillo-Navarro, Xianjie Dai, Antoine Bosselut, Devis Tuia

    Abstract: Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual constraints, such as expected answers or knowledge triplets, to generate focused questions. These constraints allow VQG systems to… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: AAAI 2024. Project page at https://limirs.github.io/ConVQG

  38. arXiv:2402.11542  [pdf, other

    cs.CL cs.AI

    Question Answering Over Spatio-Temporal Knowledge Graph

    Authors: Xinbang Dai, Huiying Li, Guilin Qi

    Abstract: Spatio-temporal knowledge graphs (STKGs) extend the concept of knowledge graphs (KGs) by incorporating time and location information. While the research community's focus on Knowledge Graph Question Answering (KGQA), the field of answering questions incorporating both spatio-temporal information based on STKGs remains largely unexplored. Furthermore, a lack of comprehensive datasets also has hinde… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 11 pages, 4 figures

    ACM Class: I.2.4; I.2.7

  39. arXiv:2402.11541  [pdf, other

    cs.CL cs.AI

    Large Language Models Can Better Understand Knowledge Graphs Than We Thought

    Authors: Xinbang Dai, Yuncheng Hua, Tongtong Wu, Yang Sheng, Qiu Ji, Guilin Qi

    Abstract: As the parameter scale of large language models (LLMs) grows, jointly training knowledge graph (KG) embeddings with model parameters to enhance LLM capabilities becomes increasingly costly. Consequently, the community has shown interest in develo** prompt strategies that effectively integrate KG information into LLMs. However, the format for incorporating KGs into LLMs lacks standardization; for… ▽ More

    Submitted 16 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 15 pages

    ACM Class: I.2.4; I.2.7

  40. arXiv:2402.11273  [pdf, other

    cs.CV cs.AI

    Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo Labeling Leveraging Strong and Weak Data Augmentation Strategies

    Authors: Yifei Chen, Chenyan Zhang, Yifan Ke, Yiyu Huang, Xuezhou Dai, Feiwei Qin, Yongquan Zhang, Xiaodong Zhang, Changmiao Wang

    Abstract: Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances t… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 5 pages, 2 figures, accept ISBI2024

    Journal ref: ISBI 2024

  41. arXiv:2402.10487  [pdf, other

    cs.LG cs.AI

    RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal Data

    Authors: Chin-Chia Michael Yeh, Yujie Fan, Xin Dai, Uday Singh Saini, Vivian Lai, Prince Osei Aboagye, Junpeng Wang, Huiyuan Chen, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang

    Abstract: Spatial-temporal forecasting systems play a crucial role in addressing numerous real-world challenges. In this paper, we investigate the potential of addressing spatial-temporal forecasting problems using general time series forecasting models, i.e., models that do not leverage the spatial relationships among the nodes. We propose a all-Multi-Layer Perceptron (all-MLP) time series forecasting arch… ▽ More

    Submitted 12 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  42. arXiv:2402.09801  [pdf, other

    cs.CL cs.CV

    EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

    Authors: Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai

    Abstract: Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algor… ▽ More

    Submitted 23 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  43. arXiv:2402.03174  [pdf, ps, other

    eess.SY cs.LG

    Decentralized Event-Triggered Online Learning for Safe Consensus of Multi-Agent Systems with Gaussian Process Regression

    Authors: Xiaobing Dai, Zewen Yang, Mengtian Xu, Fangzhou Liu, Georges Hattab, Sandra Hirche

    Abstract: Consensus control in multi-agent systems has received significant attention and practical implementation across various domains. However, managing consensus control under unknown dynamics remains a significant challenge for control design due to system uncertainties and environmental disturbances. This paper presents a novel learning-based distributed control law, augmented by an auxiliary dynamic… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  44. arXiv:2402.03048  [pdf, other

    cs.MA cs.LG eess.SY

    Cooperative Learning with Gaussian Processes for Euler-Lagrange Systems Tracking Control under Switching Topologies

    Authors: Zewen Yang, Songbo Dong, Armin Lederer, Xiaobing Dai, Siyu Chen, Stefan Sosnowski, Georges Hattab, Sandra Hirche

    Abstract: This work presents an innovative learning-based approach to tackle the tracking control problem of Euler-Lagrange multi-agent systems with partially unknown dynamics operating under switching communication topologies. The approach leverages a correlation-aware cooperative algorithm framework built upon Gaussian process regression, which adeptly captures inter-agent correlations for uncertainty pre… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 8 pages

  45. arXiv:2402.03014  [pdf, other

    cs.LG cs.AI

    Whom to Trust? Elective Learning for Distributed Gaussian Process Regression

    Authors: Zewen Yang, Xiaobing Dai, Akshat Dubey, Sandra Hirche, Georges Hattab

    Abstract: This paper introduces an innovative approach to enhance distributed cooperative learning using Gaussian process (GP) regression in multi-agent systems (MASs). The key contribution of this work is the development of an elective learning algorithm, namely prior-aware elective distributed GP (Pri-GP), which empowers agents with the capability to selectively request predictions from neighboring agents… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 9 pages, conference preprint

  46. arXiv:2402.02012  [pdf, other

    cs.CV

    Precise Knowledge Transfer via Flow Matching

    Authors: Shitong Shao, Zhiqiang Shen, Linrui Gong, Huanran Chen, Xu Dai

    Abstract: In this paper, we propose a novel knowledge transfer framework that introduces continuous normalizing flows for progressive knowledge transformation and leverages multi-step sampling strategies to achieve precision knowledge transfer. We name this framework Knowledge Transfer with Flow Matching (FM-KT), which can be integrated with a metric-based distillation method with any form (\textit{e.g.} va… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  47. arXiv:2402.01001  [pdf, other

    cs.DC

    Ensuring Data Privacy in AC Optimal Power Flow with a Distributed Co-Simulation Framework

    Authors: Xinliang Dai, Alexander Kocher, Jovana Kovačević, Burak Dindar, Yuning Jiang, Colin N. Jones, Hüseyin Çakmak, Veit Hagenmeyer

    Abstract: During the energy transition, the significance of collaborative management among institutions is rising, confronting challenges posed by data privacy concerns. Prevailing research on distributed approaches, as an alternative to centralized management, often lacks numerical convergence guarantees or is limited to single-machine numerical simulation. To address this, we present a distributed approac… ▽ More

    Submitted 15 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  48. arXiv:2401.06952  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning for Scalable Train Timetable Rescheduling with Graph Representation

    Authors: Peng Yue, Yaochu **, Xuewu Dai, Zhenhua Feng, Dongliang Cui

    Abstract: Train timetable rescheduling (TTR) aims to promptly restore the original operation of trains after unexpected disturbances or disruptions. Currently, this work is still done manually by train dispatchers, which is challenging to maintain performance under various problem instances. To mitigate this issue, this study proposes a reinforcement learning-based approach to TTR, which makes the following… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  49. arXiv:2401.06706  [pdf, other

    cs.CL

    Multi-Candidate Speculative Decoding

    Authors: Sen Yang, Shujian Huang, Xinyu Dai, Jiajun Chen

    Abstract: Large language models have shown impressive capabilities across a variety of NLP tasks, yet their generating text autoregressively is time-consuming. One way to speed them up is speculative decoding, which generates candidate segments (a sequence of tokens) from a fast draft model that is then verified in parallel by the target model. However, the acceptance rate of candidate tokens receives limit… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  50. arXiv:2401.02987  [pdf, other

    cs.CL cs.AI

    Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

    Authors: Prince Aboagye, Yan Zheng, Junpeng Wang, Uday Singh Saini, Xin Dai, Michael Yeh, Yujie Fan, Zhongfang Zhuang, Shubham Jain, Liang Wang, Wei Zhang

    Abstract: The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and more effectively. In this study, we explore a novel approach where we leverage the meta-featur… ▽ More

    Submitted 14 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.