Skip to main content

Showing 1–50 of 4,633 results for author: Li, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19008  [pdf, other

    cs.DS

    VertiMRF: Differentially Private Vertical Federated Data Synthesis

    Authors: Fangyuan Zhao, Zitao Li, Xuebin Ren, Bolin Ding, Shusen Yang, Yaliang Li

    Abstract: Data synthesis is a promising solution to share data for various downstream analytic tasks without exposing raw data. However, without a theoretical privacy guarantee, a synthetic dataset would still leak some sensitive information. Differential privacy is thus widely adopted to safeguard data synthesis by strictly limiting the released information. This technique is advantageous yet presents sign… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.18742  [pdf, other

    cs.CV cs.RO

    3D Feature Distillation with Object-Centric Priors

    Authors: Georgios Tziafas, Yucheng Xu, Zhibin Li, Hamidreza Kasaei

    Abstract: Grounding natural language to the physical world is a ubiquitous topic with a wide range of applications in computer vision and robotics. Recently, 2D vision-language models such as CLIP have been widely popularized, due to their impressive capabilities for open-vocabulary grounding in 2D images. Recent works aim to elevate 2D CLIP features to 3D via feature distillation, but either learn neural f… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Submitted CoRL-24

  4. arXiv:2406.18202  [pdf, other

    physics.geo-ph cs.DB

    GlobalTomo: A global dataset for physics-ML seismic wavefield modeling and FWI

    Authors: Shiqian Li, Zhi Li, Zhancun Mu, Shiji Xin, Zhixiang Dai, Kuangdai Leng, Ruihua Zhang, Xiaodong Song, Yixin Zhu

    Abstract: Global seismic tomography, taking advantage of seismic waves from natural earthquakes, provides essential insights into the earth's internal dynamics. Advanced Full-waveform Inversion (FWI) techniques, whose aim is to meticulously interpret every detail in seismograms, confront formidable computational demands in forward modeling and adjoint simulations on a global scale. Recent advancements in Ma… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 36 pages

  5. arXiv:2406.18082  [pdf, other

    cs.CL cs.HC

    Octo-planner: On-device Language Model for Planner-Action Agents

    Authors: Wei Chen, Zhiyuan Li, Zhen Guo, Yikang Shen

    Abstract: AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two di… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  6. arXiv:2406.17706  [pdf, other

    cs.LG cs.CL cs.DC

    FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model

    Authors: Feijie Wu, Zitao Li, Yaliang Li, Bolin Ding, **g Gao

    Abstract: Large language models (LLMs) show amazing performance on many domain-specific tasks after fine-tuning with some appropriate data. However, many domain-specific data are privately distributed across multiple owners. Thus, this dilemma raises the interest in how to perform LLM fine-tuning in federated learning (FL). However, confronted with limited computation and communication capacities, FL client… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  7. arXiv:2406.17697  [pdf, other

    cs.LG cs.AI cs.CV

    HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction

    Authors: Xi Xiao, Wentao Wang, Jiacheng Xie, Li**g Zhu, Gaofei Chen, Zhengji Li, Tianyang Wang, Min Xu

    Abstract: Drug target binding affinity (DTA) is a key criterion for drug screening. Existing experimental methods are time-consuming and rely on limited structural and domain information. While learning-based methods can model sequence and structural information, they struggle to integrate contextual data and often lack comprehensive modeling of drug-target interactions. In this study, we propose a novel DT… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.17442  [pdf, other

    cs.CV

    Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

    Authors: Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang

    Abstract: Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, w… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  9. arXiv:2406.17325  [pdf, other

    cs.SE

    AI Tool Use and Adoption in Software Development by Individuals and Organizations: A Grounded Theory Study

    Authors: Ze Shi Li, Nowshin Nawar Arony, Ahmed Musa Awon, Daniela Damian, Bowen Xu

    Abstract: AI assistance tools such as ChatGPT, Copilot, and Gemini have dramatically impacted the nature of software development in recent years. Numerous studies have studied the positive benefits that practitioners have achieved from using these tools in their work. While there is a growing body of knowledge regarding the usability aspects of leveraging AI tools, we still lack concrete details on the issu… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.16981  [pdf

    eess.IV cs.AI cs.LG eess.SP

    Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

    Authors: Lingxi Xiao, **xin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

    Abstract: Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional itera… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  11. arXiv:2406.16817  [pdf, other

    cs.CV

    GPT-4V Explorations: Mining Autonomous Driving

    Authors: Zixuan Li

    Abstract: This paper explores the application of the GPT-4V(ision) large visual language model to autonomous driving in mining environments, where traditional systems often falter in understanding intentions and making accurate decisions during emergencies. GPT-4V introduces capabilities for visual question answering and complex scene comprehension, addressing challenges in these specialized settings.Our ev… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.16793  [pdf, other

    cs.LG cs.AI

    Adam-mini: Use Fewer Learning Rates To Gain More

    Authors: Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun

    Abstract: We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  13. arXiv:2406.16777  [pdf, other

    cs.CL cs.AI

    Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

    Authors: Sai Koneru, Thai-Binh Nguyen, Ngoc-Quan Pham, Danni Liu, Zhaolin Li, Alexander Waibel, Jan Niehues

    Abstract: Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST). In this paper, we present KIT's offline submission in the constrained + LLM track by incorporating recently proposed techniques that can be added to any cascaded speech translation. Specifically, we inte… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.16722  [pdf, other

    cs.CL

    Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

    Authors: Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao

    Abstract: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16441  [pdf, other

    cs.CL

    UniCoder: Scaling Code Large Language Model via Universal Code

    Authors: Tao Sun, Linzheng Chai, Jian Yang, Yuwei Yin, Hongcheng Guo, Jiaheng Liu, Bing Wang, Liqun Yang, Zhoujun Li

    Abstract: Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural lan… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 (Main)

  16. arXiv:2406.16079  [pdf, other

    cs.CL cs.AI

    EERPD: Leveraging Emotion and Emotion Regulation for Improving Personality Detection

    Authors: Zheng Li, Dawei Zhu, Qilong Ma, Weimin Xiong, Sujian Li

    Abstract: Personality is a fundamental construct in psychology, reflecting an individual's behavior, thinking, and emotional patterns. Previous researches have made some progress in personality detection, primarily by utilizing the whole text to predict personality. However, these studies generally tend to overlook psychological knowledge: they rarely apply the well-established correlations between emotion… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  17. arXiv:2406.16072  [pdf, other

    cs.CV

    DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation

    Authors: Yueru Luo, Shuguang Cui, Zhen Li

    Abstract: Accurate 3D lane estimation is crucial for ensuring safety in autonomous driving. However, prevailing monocular techniques suffer from depth loss and lighting variations, hampering accurate 3D lane detection. In contrast, LiDAR points offer geometric cues and enable precise localization. In this paper, we present DV-3DLane, a novel end-to-end Dual-View multi-modal 3D Lane detection framework that… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted at ICLR2024

  18. arXiv:2406.16069  [pdf, other

    cs.CL cs.AI

    FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

    Authors: Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko

    Abstract: Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before in… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  19. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, ** Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  20. arXiv:2406.15778  [pdf, other

    cs.CV cs.AI

    ObjectNLQ @ Ego4D Episodic Memory Challenge 2024

    Authors: Yisen Feng, Haoyu Zhang, Yuquan Xie, Zai**g Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024. Both challenges require the localization of actions within long video sequences using textual queries. To enhance localization accuracy, our method not only processes the temporal information of videos but also identifies fine-grained objects spatial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The solution for the Natural Language Query track and Goal Step track at CVPR EgoVis Workshop 2024

  21. arXiv:2406.15771  [pdf, other

    cs.CV cs.AI

    HCQA @ Ego4D EgoSchema Challenge 2024

    Authors: Haoyu Zhang, Yuquan Xie, Yisen Feng, Zai**g Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our champion solution for Ego4D EgoSchema Challenge in CVPR 2024. To deeply integrate the powerful egocentric captioning model and question reasoning model, we propose a novel Hierarchical Comprehension scheme for egocentric video Question Answering, named HCQA. It consists of three stages: Fine-grained Caption Generation, Context-driven Summarization, and Inference-guid… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The champion solution for Ego4D EgoSchema Challenge in CVPR EgoVis Workshop 2024

  22. arXiv:2406.15560  [pdf, other

    cs.DC cs.PF

    How to Rent GPUs on a Budget

    Authors: Zhouzi Li, Benjamin Berg, Arpan Mukhopadhyay, Mor Harchol-Balter

    Abstract: The explosion in Machine Learning (ML) over the past ten years has led to a dramatic increase in demand for GPUs to train ML models. Because it is prohibitively expensive for most users to build and maintain a large GPU cluster, large cloud providers (Microsoft Azure, Amazon AWS, Google Cloud) have seen explosive growth in demand for renting cloud-based GPUs. In this cloud-computing paradigm, a us… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  23. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  24. arXiv:2406.14952  [pdf, other

    cs.CL

    ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

    Authors: Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

    Abstract: Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of ro… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Pre-print

  25. arXiv:2406.14910  [pdf, ps, other

    cs.LG cs.DC math.OC

    Towards Dynamic Resource Allocation and Client Scheduling in Hierarchical Federated Learning: A Two-Phase Deep Reinforcement Learning Approach

    Authors: Xiao**g Chen, Zhenyuan Li, Wei Ni, Xin Wang, Shunqing Zhang, Yanzan Sun, Shugong Xu, Qingqi Pei

    Abstract: Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic p… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  26. arXiv:2406.14840  [pdf

    cs.AI

    Automated architectural space layout planning using a physics-inspired generative design framework

    Authors: Zhipeng Li, Sichao Li, Geoff Hinchcliffe, Noam Maitless, Nick Birbilis

    Abstract: The determination of space layout is one of the primary activities in the schematic design stage of an architectural project. The initial layout planning defines the shape, dimension, and circulation pattern of internal spaces; which can also affect performance and cost of the construction. When carried out manually, space layout planning can be complicated, repetitive and time consuming. In this… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  27. arXiv:2406.14485   

    cs.AI cs.HC cs.MM cs.SD eess.AS

    Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zi** Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

    Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

    Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  28. arXiv:2406.14482  [pdf, other

    cs.CV

    Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

    Authors: Xinyi Ying, Chao Xiao, Ruo**g Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zai** Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

    Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  29. arXiv:2406.14393  [pdf, other

    cs.LG cs.CL

    Jailbreaking as a Reward Misspecification Problem

    Authors: Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and d… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  30. arXiv:2406.14191  [pdf, other

    cs.CL cs.AI cs.LG

    Temporal Knowledge Graph Question Answering: A Survey

    Authors: Miao Su, ZiXuan Li, Zhuo Chen, Long Bai, Xiaolong **, Jiafeng Guo

    Abstract: Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  31. arXiv:2406.14171  [pdf, other

    cs.AI cs.CL

    Ranking LLMs by compression

    Authors: Peijia Guo, Ziguang Li, Haibo Hu, Chao Huang, Ming Li, Rui Zhang

    Abstract: We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 tables

  32. arXiv:2406.14066  [pdf, other

    cs.AI cs.PF

    Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

    Authors: Xiaoxuan Liu, Cade Daniel, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang

    Abstract: Reducing the inference latency of large language models (LLMs) is crucial, and speculative decoding (SD) stands out as one of the most effective techniques. Rather than letting the LLM generate all tokens directly, speculative decoding employs effective proxies to predict potential outputs, which are then verified by the LLM without compromising the generation quality. Yet, deploying SD in real on… ▽ More

    Submitted 25 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  33. arXiv:2406.14004  [pdf, other

    cs.IR cs.LG

    Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce

    Authors: Yuan Wang, Zhiyu Li, Changshuo Zhang, Sirui Chen, Xiao Zhang, Jun Xu, Quan Lin

    Abstract: Recommender systems have been widely used in e-commerce, and re-ranking models are playing an increasingly significant role in the domain, which leverages the inter-item influence and determines the final recommendation lists. Online learning methods keep updating a deployed model with the latest available samples to capture the shifting of the underlying data distribution in e-commerce. However,… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  34. arXiv:2406.13988  [pdf, other

    cs.CV

    LGmap: Local-to-Global Map** Network for Online Long-Range Vectorized HD Map Construction

    Authors: Kuang Wu, Sulei Nian, Can Shen, Chuan Yang, Zhanbin Li

    Abstract: This report introduces the first-place winning solution for the Autonomous Grand Challenge 2024 - Mapless Driving. In this report, we introduce a novel online map** pipeline LGmap, which adept at long-range temporal model. Firstly, we propose symmetric view transformation(SVT), a hybrid view transformation module. Our approach overcomes the limitations of forward sparse feature representation an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  35. arXiv:2406.13856  [pdf, other

    cs.DB

    Kishu: Time-Traveling for Computational Notebooks

    Authors: Zhaoheng Li, Supawit Chockchowwat, Ribhav Sahu, Areet Sheth, Yongjoo Park

    Abstract: Computational notebooks (e.g., Jupyter, Google Colab) are widely used by data scientists. A key feature of notebooks is the interactive computing model of iteratively executing cells (i.e., a set of statements) and observing the result (e.g., model or plot). Unfortunately, existing notebook systems do not offer time-traveling to past states: when the user executes a cell, the notebook session stat… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  36. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Qiaozhi Tan, Tong Chen, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, **lin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  37. arXiv:2406.13674  [pdf, other

    eess.IV cs.CV

    Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

    Authors: Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

    Abstract: Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annot… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 6 tables, Early Accept to MICCAI 2024

  38. arXiv:2406.13625  [pdf

    cs.CV cs.AI physics.med-ph

    Enhance the Image: Super Resolution using Artificial Intelligence in MRI

    Authors: Ziyu Li, Zihan Li, Haoxiang Li, Qiuyun Fan, Karla L. Miller, Wenchuan Wu, Akshay S. Chaudhari, Qiyuan Tian

    Abstract: This chapter provides an overview of deep learning techniques for improving the spatial resolution of MRI, ranging from convolutional neural networks, generative adversarial networks, to more advanced models including transformers, diffusion models, and implicit neural representations. Our exploration extends beyond the methodologies to scrutinize the impact of super-resolved images on clinical an… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: A book chapter in Machine Learning in MRI: From methods to clinical translation. Copyright may be transferred without notice, after which this version may no longer be accessible

  39. arXiv:2406.13536  [pdf, other

    cs.CV

    Image Distillation for Safe Data Sharing in Histopathology

    Authors: Zhe Li, Bernhard Kainz

    Abstract: Histopathology can help clinicians make accurate diagnoses, determine disease prognosis, and plan appropriate treatment strategies. As deep learning techniques prove successful in the medical domain, the primary challenges become limited data availability and concerns about data sharing and privacy. Federated learning has addressed this challenge by training models locally and updating parameters… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: accepted at MICCAI 2024

  40. arXiv:2406.13527  [pdf, other

    cs.CV

    4K4DGen: Panoramic 4D Generation at 4K Resolution

    Authors: Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhiwen Fan

    Abstract: The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the needs of VR/AR applications. In this work, we tackle the challengin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  41. arXiv:2406.13408  [pdf, other

    cs.CL

    SQLFixAgent: Towards Semantic-Accurate SQL Generation via Multi-Agent Collaboration

    Authors: Jipeng Cen, Jiaxin Liu, Zhixu Li, **g**g Wang

    Abstract: While fine-tuned large language models (LLMs) excel in generating grammatically valid SQL in Text-to-SQL parsing, they often struggle to ensure semantic accuracy in queries, leading to user confusion and diminished system usability. To tackle this challenge, we introduce SQLFixAgent, an innovative multi-agent collaborative framework designed for detecting and repairing erroneous SQL. Our framework… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.13361  [pdf, other

    cs.CL cs.LG

    Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching

    Authors: Zhuoran Li, Chunming Hu, Junfan Chen, Zhijun Chen, Xiaohui Guo, Richong Zhang

    Abstract: Code-switching is a data augmentation scheme mixing words from multiple languages into source lingual text. It has achieved considerable generalization performance of cross-lingual transfer tasks by aligning cross-lingual contextual word representations. However, uncontrolled and over-replaced code-switching would augment dirty samples to model training. In other words, the excessive code-switchin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures, 6 tables. Accepted by International Joint Conference on Artificial Intelligence (IJCAI 2024)

  43. arXiv:2406.13217  [pdf, other

    cs.CL

    Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology

    Authors: Xiaoxi Kang, Lizhen Qu, Lay-Ki Soon, Zhuang Li, Adnan Trakic

    Abstract: The effectiveness of Large Language Models (LLMs) in legal reasoning is often limited due to the unique legal terminologies and the necessity for highly specialized knowledge. These limitations highlight the need for high-quality data tailored for complex legal reasoning tasks. This paper introduces LEGALSEMI, a benchmark specifically curated for legal scenario analysis. LEGALSEMI comprises 54 leg… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  44. arXiv:2406.13170  [pdf, other

    cs.AI cs.CL

    Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style

    Authors: Ze** Li, Xinlong Yang, Ziheng Gao, Ji Liu, Zhuang Liu, Dong Li, **zhang Peng, Lu Tian, Emad Barsoum

    Abstract: Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speeds, especially when hardware parallel accelerators and memory bandwidth are not fully utilized. In this work, we propose Amphista, a speculative decoding algorithm that adheres to a non-autoregressive decoding paradigm. Owing to the increased par… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  45. arXiv:2406.13161  [pdf, other

    cs.AI cs.CL cs.LG cs.PL

    APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

    Authors: Honghua Dong, Qidong Su, Yubo Gao, Zhaoyu Li, Yangjun Ruan, Gennady Pekhimenko, Chris J. Maddison, Xujie Si

    Abstract: Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer pr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  46. arXiv:2406.13057  [pdf, other

    cs.LG cs.AI

    Informed along the road: roadway capacity driven graph convolution network for network-wide traffic prediction

    Authors: Zilin Bian, **gqin Gao, Kaan Ozbay, Fan Zuo, Dachuan Zuo, Zhenning Li

    Abstract: While deep learning has shown success in predicting traffic states, most methods treat it as a general prediction task without considering transportation aspects. Recently, graph neural networks have proven effective for this task, but few incorporate external factors that impact roadway capacity and traffic flow. This study introduces the Roadway Capacity Driven Graph Convolution Network (RCDGCN)… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  47. arXiv:2406.13038  [pdf, other

    cs.AI eess.SP

    Traffic Prediction considering Multiple Levels of Spatial-temporal Information: A Multi-scale Graph Wavelet-based Approach

    Authors: Zilin Bian, **gqin Gao, Kaan Ozbay, Zhenning Li

    Abstract: Although traffic prediction has been receiving considerable attention with a number of successes in the context of intelligent transportation systems, the prediction of traffic states over a complex transportation network that contains different road types has remained a challenge. This study proposes a multi-scale graph wavelet temporal convolution network (MSGWTCN) to predict the traffic states… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  48. arXiv:2406.12784  [pdf, other

    cs.CL

    UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions

    Authors: Xunzhi Wang, Zhuowei Zhang, Qiongyu Li, Gaonan Chen, Mengting Hu, Zhiyu li, Bitong Luo, Hang Gao, Zhixin Han, Haotian Wang

    Abstract: The rapid development of large language models (LLMs) has shown promising practical results. However, their low interpretability often leads to errors in unforeseen circumstances, limiting their utility. Many works have focused on creating comprehensive evaluation systems, but previous benchmarks have primarily assessed problem-solving abilities while neglecting the response's uncertainty, which m… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Under review

  49. arXiv:2406.12709  [pdf, other

    cs.LG cs.AI

    Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

    Authors: Du Yin, **liang Deng, Shuang Ao, Zechen Li, Hao Xue, Arian Prabowo, Renhe Jiang, Xuan Song, Flora Salim

    Abstract: Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in perfo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  50. arXiv:2406.12459  [pdf, other

    cs.CV

    HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

    Authors: Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu

    Abstract: Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In part… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.