Skip to main content

Showing 1–50 of 215 results for author: Bian, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00128  [pdf, other

    cs.IR cs.AI cs.LG

    When Search Engine Services meet Large Language Models: Visions and Challenges

    Authors: Haoyi Xiong, Jiang Bian, Yuchen Li, Xuhong Li, Mengnan Du, Shuaiqiang Wang, Dawei Yin, Sumi Helal

    Abstract: Combining Large Language Models (LLMs) with search engine services marks a significant shift in the field of services computing, opening up new possibilities to enhance how we search for and retrieve information, understand content, and interact with internet services. This paper conducts an in-depth examination of how integrating LLMs with search engines can mutually benefit both technologies. We… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Under Review

  2. arXiv:2406.12738  [pdf, other

    cs.CL cs.AI

    Large Language Model as a Universal Clinical Multi-task Decoder

    Authors: Yujiang Wu, Hongjian Song, Jiawen Zhang, Xumeng Wen, Shun Zheng, Jiang Bian

    Abstract: The development of effective machine learning methodologies for enhancing the efficiency and accuracy of clinical systems is crucial. Despite significant research efforts, managing a plethora of diversified clinical tasks and adapting to emerging new tasks remain significant challenges. This paper presents a novel paradigm that employs a pre-trained large language model as a universal clinical mul… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in progress

  3. arXiv:2406.08096  [pdf, other

    cs.CV

    Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

    Authors: Runyi Yu, Tianyu He, Ailing Zhang, Yuchi Wang, Junliang Guo, Xu Tan, Chang Liu, Jie Chen, Jiang Bian

    Abstract: We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appearance synthesis. Current solutions handle the two sub-problems within a single generative model, resulting in a challenging trade-off between lip-sync… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages of main text, 23 pages in total, 9 figures

  4. arXiv:2406.07529  [pdf, other

    cs.LG

    MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

    Authors: Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

    Abstract: Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the ob… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.06572  [pdf, other

    cs.CL cs.AI cs.IR

    Graph Neural Network Enhanced Retrieval for Question Answering of LLMs

    Authors: Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, Rui Wang

    Abstract: Retrieval augmented generation has revolutionized large language model (LLM) outputs by providing factual supports. Nevertheless, it struggles to capture all the necessary knowledge for complex reasoning questions. Existing retrieval methods typically divide reference documents into passages, treating them in isolation. These passages, however, are often interrelated, such as passages that are con… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Under review

  6. arXiv:2406.03503  [pdf, other

    cs.AI cs.LG

    Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems

    Authors: Yifan Xia, Xianliang Yang, Zichuan Liu, Zhihao Liu, Lei Song, Jiang Bian

    Abstract: Recent advancements in solving large-scale traveling salesman problems (TSP) utilize the heatmap-guided Monte Carlo tree search (MCTS) paradigm, where machine learning (ML) models generate heatmaps, indicating the probability distribution of each edge being part of the optimal solution, to guide MCTS in solution finding. However, our theoretical and experimental analysis raises doubts about the ef… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted by International Conference on Machine Learning (ICML 2024)

  7. arXiv:2406.01597  [pdf, other

    cs.CV cs.GR

    End-to-End Rate-Distortion Optimized 3D Gaussian Representation

    Authors: Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen

    Abstract: 3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

  8. arXiv:2405.15758  [pdf, other

    cs.CV cs.AI

    InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

    Authors: Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan, Xu Sun, Jiang Bian

    Abstract: Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering f… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://wangyuchi369.github.io/InstructAvatar/

  9. arXiv:2405.15200  [pdf, other

    cs.LG cs.IT

    Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits

    Authors: Jie Bian, Vincent Y. F. Tan

    Abstract: The Indexed Minimum Empirical Divergence (IMED) algorithm is a highly effective approach that offers a stronger theoretical guarantee of the asymptotic optimality compared to the Kullback--Leibler Upper Confidence Bound (KL-UCB) algorithm for the multi-armed bandit problem. Additionally, it has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. Despite its effectiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted to the Transactions on Machine Learning Research (TMLR)

  10. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, **dong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  11. arXiv:2404.18922  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    DPO Meets PPO: Reinforced Token Optimization for RLHF

    Authors: Han Zhong, Guhao Feng, Wei Xiong, Li Zhao, Di He, Jiang Bian, Liwei Wang

    Abstract: In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards -- a challenging scenario in traditional deep reinforcement learning. Despite the great successes of PPO in the alignment of state-of-the-art closed-source large language models (LLMs), its open-source implementation is still larg… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  12. arXiv:2404.18886  [pdf, other

    cs.LG cs.AI

    A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

    Authors: Yiyuan Yang, Ming **, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, Jiang Bian, Shirui Pan, Qingsong Wen

    Abstract: The study of time series is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal data… ▽ More

    Submitted 11 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Ongoing work & Under review; 27 pages, 8 figures, 2 tables; Github Repo: https://github.com/yyysjz1997/Awesome-TimeSeries-SpatioTemporal-Diffusion-Model

  13. arXiv:2404.13968  [pdf, other

    cs.CL cs.AI cs.CR

    Protecting Your LLMs with Information Bottleneck

    Authors: Zichuan Liu, Zefan Wang, Linjie Xu, **yu Wang, Lei Song, Tianchun Wang, Chunlin Chen, Wei Cheng, Jiang Bian

    Abstract: The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector… ▽ More

    Submitted 16 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 23 pages, 7 figures, 8 tables

  14. arXiv:2404.11962  [pdf, other

    cs.AI cs.CR cs.CV cs.LG

    ©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model

    Authors: Chao Zhou, Huishuai Zhang, Jiang Bian, Weiming Zhang, Nenghai Yu

    Abstract: This paper addresses the contentious issue of copyright infringement in images generated by text-to-image models, sparking debates among AI developers, content creators, and legal entities. State-of-the-art models create high-quality content without crediting original creators, causing concern in the artistic community. To mitigate this, we propose the ©Plug-in Authorization framework, introducing… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 20 pages, 6 figures

  15. arXiv:2404.11276  [pdf, other

    cs.AI q-fin.GN

    RD2Bench: Toward Data-Centric Automatic R&D

    Authors: Haotian Chen, Xinjie Shen, Zeqi Ye, Xiao Yang, Xu Yang, Weiqing Liu, Jiang Bian

    Abstract: The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments. Researchers often seek the potential research directions by reading and then verifying them through experiments. The process imposes a significant burden on researchers. In the past decade, the data-driven black-box deep learning method demonstrates its effectiveness in a wide range of r… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures,

  16. arXiv:2404.11027  [pdf, other

    cs.AI

    Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

    Authors: Guangran Cheng, Chuheng Zhang, Wenzhe Cai, Li Zhao, Changyin Sun, Jiang Bian

    Abstract: While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly. We find that the main reason is that LLMs are not grounded in the physical world. Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policie… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  17. arXiv:2404.09715  [pdf, other

    cs.LG cs.AI cs.MA

    Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

    Authors: Linjie Xu, Zichuan Liu, Alexander Dockhorn, Diego Perez-Liebana, **yu Wang, Lei Song, Jiang Bian

    Abstract: One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to develo** new methods and enhancing sample efficiency… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  18. arXiv:2404.05694  [pdf, other

    cs.CL cs.AI cs.LG

    Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

    Authors: Ahmad Idrissi-Yaghir, Amin Dada, Henning Schäfer, Kamyar Arzideh, Giulia Baldini, Jan Trienes, Max Hasin, Jeanette Bewersdorff, Cynthia S. Schmidt, Marie Bauer, Kaleb E. Smith, Jiang Bian, Yonghui Wu, Jörg Schlötterer, Torsten Zesch, Peter A. Horn, Christin Seifert, Felix Nensa, Jens Kleesiek, Christoph M. Friedrich

    Abstract: Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are commo… ▽ More

    Submitted 8 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at LREC-COLING 2024

  19. arXiv:2404.00466  [pdf, other

    cs.LG cs.DC

    Computation and Communication Efficient Lightweighting Vertical Federated Learning

    Authors: Heqiang Wang, Jieming Bian, Lei Wang

    Abstract: The exploration of computational and communication efficiency within Federated Learning (FL) has emerged as a prominent and crucial field of study. While most existing efforts to enhance these efficiencies have focused on Horizontal FL, the distinct processes and model structures of Vertical FL preclude the direct application of Horizontal FL-based techniques. In response, we introduce the concept… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  20. arXiv:2403.13089  [pdf

    cs.CL

    Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning

    Authors: Mengxian Lyu, Cheng Peng, Xiaohan Li, Patrick Balian, Jiang Bian, Yonghui Wu

    Abstract: Automatic text summarization (ATS) is an emerging technology to assist clinicians in providing continuous and coordinated care. This study presents an approach to summarize doctor-patient dialogues using generative large language models (LLMs). We developed prompt-tuning algorithms to instruct generative LLMs to summarize clinical text. We examined the prompt-tuning strategies, the size of soft pr… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  21. arXiv:2403.12374  [pdf

    cs.CL

    Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning

    Authors: Cheng Peng, Zehao Yu, Kaleb E Smith, Wei-Hsuan Lo-Ciganic, Jiang Bian, Yonghui Wu

    Abstract: The progress in natural language processing (NLP) using large language models (LLMs) has greatly improved patient information extraction from clinical narratives. However, most methods based on the fine-tuning strategy have limited transfer learning ability for cross-domain applications. This study proposed a novel approach that employs a soft prompt-based learning architecture, which introduces t… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  22. arXiv:2403.11425  [pdf

    cs.LG cs.CL

    Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure

    Authors: Ziyi Chen, Mengyuan Zhang, Mustafa Mohammed Ahmed, Yi Guo, Thomas J. George, Jiang Bian, Yonghui Wu

    Abstract: Cancer treatments are known to introduce cardiotoxicity, negatively impacting outcomes and survivorship. Identifying cancer patients at risk of heart failure (HF) is critical to improving cancer treatment outcomes and safety. This study examined machine learning (ML) models to identify cancer patients at risk of HF using electronic health records (EHRs), including traditional ML, Time-Aware long s… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 9 pages, 2 figures, 5 tables

  23. arXiv:2403.09048  [pdf, other

    cs.LG cs.CV

    Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains

    Authors: Lei Wang, Jieming Bian, Letian Zhang, Chen Chen, Jie Xu

    Abstract: Federated learning (FL) allows collaborative machine learning training without sharing private data. While most FL methods assume identical data domains across clients, real-world scenarios often involve heterogeneous data domains. Federated Prototype Learning (FedPL) addresses this issue, using mean feature vectors as prototypes to enhance model generalization. However, existing FedPL methods cre… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 16 pages

  24. arXiv:2403.08733  [pdf, other

    cs.CV

    GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

    Authors: **g Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu

    Abstract: We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editin… ▽ More

    Submitted 25 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Our Project Website: https://gaussctrl.active.vision/

  25. arXiv:2403.05751  [pdf, other

    cs.LG cs.AI

    MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process

    Authors: Xinyao Fan, Yueying Wu, Chang Xu, Yuhao Huang, Weiqing Liu, Jiang Bian

    Abstract: Recently, diffusion probabilistic models have attracted attention in generative time series forecasting due to their remarkable capacity to generate high-fidelity samples. However, the effective utilization of their strong modeling ability in the probabilistic time series forecasting task remains an open question, partially due to the challenge of instability arising from their stochastic nature.… ▽ More

    Submitted 15 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: International Conference on Learning Representations (ICLR) 2024

  26. arXiv:2403.03100  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

    Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, **yu Li, Sheng Zhao

    Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

  27. arXiv:2403.00758  [pdf, other

    cs.CL cs.AI cs.LG

    Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training

    Authors: Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, Yujiu Yang

    Abstract: While large language models (LLMs) have achieved impressive performance across diverse tasks, recent studies showcase that causal LLMs suffer from the "reversal curse". It is a typical example that the model knows "A's father is B", but is unable to reason "B's child is A". This limitation poses a challenge to the advancement of artificial general intelligence (AGI), as it suggests a gap in the mo… ▽ More

    Submitted 20 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  28. arXiv:2402.15858  [pdf, other

    cs.CV cs.DC

    FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology

    Authors: Yuanzhe Peng, Jieming Bian, Jie Xu

    Abstract: The fusion of complementary multimodal information is crucial in computational pathology for accurate diagnostics. However, existing multimodal learning approaches necessitate access to users' raw data, posing substantial privacy risks. While Federated Learning (FL) serves as a privacy-preserving alternative, it falls short in addressing the challenges posed by heterogeneous (yet possibly overlapp… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Journal ref: 2024 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

  29. arXiv:2402.15515  [pdf

    cs.AI q-bio.QM stat.AP

    Feasibility of Identifying Factors Related to Alzheimer's Disease and Related Dementia in Real-World Data

    Authors: Aokun Chen, Qian Li, Yu Huang, Yongqiu Li, Yu-neng Chuang, Xia Hu, Serena Guo, Yonghui Wu, Yi Guo, Jiang Bian

    Abstract: A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD/ADRD. In total, we extracted 477 risk factors in… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  30. arXiv:2402.13185  [pdf, other

    cs.CV

    UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing

    Authors: Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bian

    Abstract: Recent advances in text-guided video editing have showcased promising results in appearance editing (e.g., stylization). However, video motion editing in the temporal dimension (e.g., from eating to waving), which distinguishes video editing from image editing, is underexplored. In this work, we present UniEdit, a tuning-free framework that supports both video motion and appearance editing by harn… ▽ More

    Submitted 7 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Project page: https://jianhongbai.github.io/UniEdit/

  31. arXiv:2402.12749  [pdf

    cs.CL cs.AI

    Me LLaMA: Foundation Large Language Models for Medical Applications

    Authors: Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu, Fongci Lin, Xueqing Peng, Jimin Huang, Jeffrey Zhang, Vipina Keloth, Xinyu Zhou, Huan He, Lucila Ohno-Machado, Yonghui Wu, Hua Xu, Jiang Bian

    Abstract: Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a novel medical LLM family that includes foundation mode… ▽ More

    Submitted 11 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 21 pages, 3 figures, 8 tables

  32. arXiv:2402.02258  [pdf, other

    cs.LG cs.AI

    XTSFormer: Cross-Temporal-Scale Transformer for Irregular Time Event Prediction

    Authors: Tingsong Xiao, Zelin Xu, Wenchong He, Jim Su, Yupu Zhang, Raymond Opoku, Ronald Ison, Jason Petho, Jiang Bian, Patrick Tighe, Parisa Rashidi, Zhe Jiang

    Abstract: Event prediction aims to forecast the time and type of a future event based on a historical event sequence. Despite its significance, several challenges exist, including the irregularity of time intervals between consecutive events, the existence of cycles, periodicity, and multi-scale event interactions, as well as the high computational costs for long event sequences. Existing neural temporal po… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  33. arXiv:2401.16777  [pdf, other

    cs.LG

    Addressing Distribution Shift in Time Series Forecasting with Instance Normalization Flows

    Authors: Wei Fan, Shun Zheng, Pengyang Wang, Rui Xie, Jiang Bian, Yanjie Fu

    Abstract: Due to non-stationarity of time series, the distribution shift problem largely hinders the performance of time series forecasting. Existing solutions either fail for the shifts beyond simple statistics or the limited compatibility with forecasting models. In this paper, we propose a general decoupled formulation for time series forecasting, with no reliance on fixed statistics and no restriction o… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 17 pages

  34. arXiv:2401.00644  [pdf, other

    cs.CE

    DEWP: Deep Expansion Learning for Wind Power Forecasting

    Authors: Wei Fan, Yanjie Fu, Shun Zheng, Jiang Bian, Yuanchun Zhou, Hui Xiong

    Abstract: Wind is one kind of high-efficient, environmentally-friendly and cost-effective energy source. Wind power, as one of the largest renewable energy in the world, has been playing a more and more important role in supplying electricity. Though growing dramatically in recent years, the amount of generated wind power can be directly or latently affected by multiple uncertain factors, such as wind speed… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: Accepted by TKDD

  35. arXiv:2312.15268  [pdf, other

    cs.CV

    MGDepth: Motion-Guided Cost Volume For Self-Supervised Monocular Depth In Dynamic Scenarios

    Authors: Kaichen Zhou, Jia-Xing Zhong, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham

    Abstract: Despite advancements in self-supervised monocular depth estimation, challenges persist in dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present MGDepth, a Motion-Guided Cost Volume Depth Net, to achieve precise depth estimation for both dynamic objects and static backgrounds, all while maintaining computational efficiency. To tackle the challenges p… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  36. arXiv:2312.10324  [pdf, other

    cs.LG cs.CV

    Federated Learning with Instance-Dependent Noisy Label

    Authors: Lei Wang, Jieming Bian, Jie Xu

    Abstract: Federated learning (FL) with noisy labels poses a significant challenge. Existing methods designed for handling noisy labels in centralized learning tend to lose their effectiveness in the FL setting, mainly due to the small dataset size and the heterogeneity of client data. While some attempts have been made to tackle FL with noisy labels, they primarily focused on scenarios involving class-condi… ▽ More

    Submitted 9 January, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  37. arXiv:2312.07899  [pdf

    q-bio.QM cs.AI cs.CV cs.LG

    Morphological Profiling for Drug Discovery in the Era of Deep Learning

    Authors: Qiaosi Tang, Ranjala Ratnayake, Gustavo Seabra, Zhe Jiang, Ruogu Fang, Lina Cui, Yousong Ding, Tamer Kahveci, Jiang Bian, Chenglong Li, Hendrik Luesch, Yanjun Li

    Abstract: Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial… ▽ More

    Submitted 15 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 44 pages, 5 figure, 5 tables

  38. arXiv:2312.06099  [pdf

    cs.CL

    Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need

    Authors: Cheng Peng, Xi Yang, Aokun Chen, Zehao Yu, Kaleb E Smith, Anthony B Costa, Mona G Flores, Jiang Bian, Yonghui Wu

    Abstract: Objective To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning. Methods We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed using GPT-3 architecture and trained with up to 20 b… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  39. arXiv:2311.15230  [pdf, other

    cs.CV cs.MM

    GAIA: Zero-shot Talking Avatar Generation

    Authors: Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, Jialiang Zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian

    Abstract: Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image. Previous methods have relied on domain-specific heuristics such as war**-based motion representation and 3D Morphable Models, which limit the naturalness and diversity of the generated avatars. In this work, we introduce GAIA (Generative AI for Avatar), which eliminates the do… ▽ More

    Submitted 14 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: ICLR 2024. Project page: https://microsoft.github.io/GAIA/

  40. arXiv:2311.08896  [pdf, other

    cs.CL

    HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation

    Authors: Junyi Bian, Xiaolei Qin, Wuhe Zou, Mengzuo Huang, Congyi Luo, Ke Zhang, Weidong Zhang

    Abstract: Large models have demonstrated significant progress across various domains, particularly in tasks related to text generation. In the domain of Table to Text, many Large Language Model (LLM)-based methods currently resort to modifying prompts to invoke public APIs, incurring potential costs and information leaks. With the advent of open-source large models, fine-tuning LLMs has become feasible. In… ▽ More

    Submitted 27 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  41. arXiv:2311.03615  [pdf, other

    cs.LG cs.DC

    CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers

    Authors: Jieming Bian, Lei Wang, Shaolei Ren, Jie Xu

    Abstract: Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consi… ▽ More

    Submitted 5 February, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Preprint, Experiments Updated

  42. arXiv:2311.01797  [pdf, other

    cs.LG stat.ML

    On the Generalization Properties of Diffusion Models

    Authors: Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian

    Abstract: Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of… ▽ More

    Submitted 12 January, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 42 pages, 11 figures

  43. arXiv:2310.14714  [pdf, other

    cs.LG cs.AI

    BatteryML:An Open-source platform for Machine Learning on Battery Degradation

    Authors: Han Zhang, Xiaofan Gui, Shun Zheng, Ziheng Lu, Yuqi Li, Jiang Bian

    Abstract: Battery degradation remains a pivotal concern in the energy storage domain, with machine learning emerging as a potent tool to drive forward insights and solutions. However, this intersection of electrochemical science and machine learning poses complex challenges. Machine learning experts often grapple with the intricacies of battery science, while battery researchers face hurdles in adapting int… ▽ More

    Submitted 3 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    MSC Class: 68T05

    Journal ref: International Conference on Learning Representations (ICLR) 2024

  44. arXiv:2310.11954  [pdf, other

    cs.CL cs.MM eess.AS

    MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

    Authors: Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

    Abstract: AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data… ▽ More

    Submitted 25 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  45. arXiv:2310.11249  [pdf, other

    cs.AI q-fin.GN

    Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle

    Authors: Xu Yang, Xiao Yang, Weiqing Liu, **hui Li, Peng Yu, Zeqi Ye, Jiang Bian

    Abstract: In the wake of relentless digital transformation, data-driven solutions are emerging as powerful tools to address multifarious industrial tasks such as forecasting, anomaly detection, planning, and even complex decision-making. Although data-centric R&D has been pivotal in harnessing these solutions, it often comes with significant costs in terms of human, computational, and time resources. This p… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 29 pages, 11 figures

  46. arXiv:2310.07449  [pdf, other

    cs.CV

    PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction

    Authors: Jia-Wang Bian, Wen**g Bian, Victor Adrian Prisacariu, Philip Torr

    Abstract: Neural surface reconstruction is sensitive to the camera pose noise, even if state-of-the-art pose estimators like COLMAP or ARKit are used. More importantly, existing Pose-NeRF joint optimisation methods have struggled to improve pose accuracy in challenging real-world scenarios. To overcome the challenges, we introduce the pose residual field (PoRF), a novel implicit representation that uses an… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024. Find the project page at https://porf.active.vision/

  47. arXiv:2310.07446  [pdf, other

    cs.LG

    ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons

    Authors: Jiawen Zhang, Xumeng Wen, Zhenwei Zhang, Shun Zheng, Jia Li, Jiang Bian

    Abstract: Delivering precise point and distributional forecasts across a spectrum of prediction horizons represents a significant and enduring challenge in the application of time-series forecasting within various industries. Prior research on develo** deep learning models for time-series forecasting has often concentrated on isolated aspects, such as long-term point forecasting or short-term probabilisti… ▽ More

    Submitted 17 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Preprint

  48. arXiv:2310.07402  [pdf, other

    cs.LG cs.AI

    NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining

    Authors: Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu

    Abstract: Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequ… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  49. arXiv:2310.07338  [pdf, other

    cs.LG

    Towards Foundation Models for Learning on Tabular Data

    Authors: Han Zhang, Xumeng Wen, Shun Zheng, Wei Xu, Jiang Bian

    Abstract: Learning on tabular data underpins numerous real-world applications. Despite considerable efforts in develo** effective learning models for tabular data, current transferable tabular models remain in their infancy, limited by either the lack of support for direct instruction following in new tasks or the neglect of acquiring foundational knowledge and capabilities from diverse tabular datasets.… ▽ More

    Submitted 22 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  50. arXiv:2310.07321  [pdf, other

    cs.CL cs.AI cs.LG

    On the Impact of Cross-Domain Data on German Language Models

    Authors: Amin Dada, Aokun Chen, Cheng Peng, Kaleb E Smith, Ahmad Idrissi-Yaghir, Constantin Marc Seibold, Jianning Li, Lars Heiliger, Xi Yang, Christoph M. Friedrich, Daniel Truhn, Jan Egger, Jiang Bian, Jens Kleesiek, Yonghui Wu

    Abstract: Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed… ▽ More

    Submitted 13 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 13 pages, 1 figure, accepted at Findings of the Association for Computational Linguistics: EMNLP 2023