Skip to main content

Showing 1–50 of 2,299 results for author: li, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02530  [pdf, ps, other

    quant-ph cs.DS

    Unifying quantum spatial search, state transfer and uniform sampling on graphs: simple and exact

    Authors: Qingwen Wang, Ying Jiang, Lvzhou Li

    Abstract: This article presents a novel and succinct algorithmic framework via alternating quantum walks, unifying quantum spatial search, state transfer and uniform sampling on a large class of graphs. Using the framework, we can achieve exact uniform sampling over all vertices and perfect state transfer between any two vertices, provided that eigenvalues of Laplacian matrix of the graph are all integers.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This manuscript has some overlap with arXiv:2307.16133. More precisely, it is an advanced version of arXiv:2307.16133, which not only modifies the paper structure and some results but also adds several new results

  2. arXiv:2407.01942  [pdf, other

    cs.AI cs.CL cs.CV

    Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

    Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Ye** Choi

    Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages

  3. arXiv:2407.01220  [pdf, other

    cs.CV

    Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation

    Authors: Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wen** Ma, Yuwei Guo, Shuyuan Yang

    Abstract: Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enables open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. While effective, however, the per-pixel distilla… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 7 figures

  4. arXiv:2407.00943  [pdf, other

    cs.DC cs.LG

    FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlap** and Participant Selection

    Authors: Jiaxiang Geng, Boyu Li, Xiaoqi Qin, Yixuan Li, Liang Li, Yanzhao Hou, Miao Pan

    Abstract: Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlap** local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues i… ▽ More

    Submitted 2 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 21 pages, 10 figures, Submitted to Sensys2024

  5. arXiv:2407.00928  [pdf, other

    cs.LG cs.CL

    FoldGPT: Simple and Effective Large Language Model Compression Scheme

    Authors: Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen

    Abstract: The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of mos… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  6. arXiv:2407.00280  [pdf, other

    eess.IV cs.CV

    IVCA: Inter-Relation-Aware Video Complexity Analyzer

    Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

    Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

  7. arXiv:2406.19922  [pdf, other

    cs.CV

    Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography War**

    Authors: Tianli Liao, Ce Wang, Lei Li, Guangen Liu, Nan Li

    Abstract: Large parallax between images is an intractable issue in image stitching. Various war**-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography war** guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partit… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 11 pages, 9 figures

  8. arXiv:2406.19400  [pdf, other

    cs.CV

    Deep Convolutional Neural Networks Meet Variational Shape Compactness Priors for Image Segmentation

    Authors: Kehui Zhang, Lingfeng Li, Hao Liu, **g Yuan, Xue-Cheng Tai

    Abstract: Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fi… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: 28 pages

  9. arXiv:2406.18572  [pdf, other

    cs.CV cs.LG

    GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

    Authors: Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

    Abstract: This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  10. arXiv:2406.18532  [pdf, other

    cs.CL cs.AI cs.LG

    Symbolic Learning Enables Self-Evolving Agents

    Authors: Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang

    Abstract: The AI community has been exploring a pathway to artificial general intelligence (AGI) by develo** "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code available at https://github.com/aiwaves-cn/agents

  11. arXiv:2406.17534  [pdf, other

    cs.CL

    Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

    Authors: Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, Min Zhang

    Abstract: Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we intro… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages

  12. arXiv:2406.17526  [pdf, other

    cs.CL cs.IR

    LumberChunker: Long-Form Narrative Document Segmentation

    Authors: André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li, Arlindo L. Oliveira

    Abstract: Modern NLP tasks increasingly rely on dense retrieval methods to access up-to-date and relevant contextual information. We are motivated by the premise that retrieval benefits from segments that can vary in size such that a content's semantic independence is better captured. We propose LumberChunker, a method leveraging an LLM to dynamically segment documents, which iteratively prompts the LLM to… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    ACM Class: I.2

  13. arXiv:2406.16144  [pdf, other

    cs.CL

    Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step

    Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer?… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  14. arXiv:2406.15000  [pdf, other

    cs.CL cs.AI

    Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

    Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, **g Li, Renjun Xu, Zhenzhong Lan

    Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  15. arXiv:2406.14952  [pdf, other

    cs.CL

    ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

    Authors: Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

    Abstract: Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of ro… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Pre-print

  16. arXiv:2406.14777  [pdf, other

    cs.LG math.OC

    Learning to Cover: Online Learning and Optimization with Irreversible Decisions

    Authors: Alexandre Jacquillat, Michael Lingzhi Li

    Abstract: We define an online learning and optimization problem with irreversible decisions contributing toward a coverage target. At each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a machine learning model to guide future decisions. The goal is to minimize costs across a finite horizon under a chance constraint reflecting the coverage t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  17. arXiv:2406.14393  [pdf, other

    cs.LG cs.CL

    Jailbreaking as a Reward Misspecification Problem

    Authors: Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and d… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  18. arXiv:2406.14185  [pdf, other

    cs.DC cs.AI

    Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices

    Authors: Li Wang, Liang Li, Lianming Xu, Xian Peng, Aiguo Fei

    Abstract: The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their comp… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  19. arXiv:2406.14118  [pdf, other

    eess.IV cs.CV

    Prediction and Reference Quality Adaptation for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.14017  [pdf, other

    cs.IR

    EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

    Authors: Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao **, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong

    Abstract: Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024. Code available at https://reczoo.github.io/EAGER

  21. arXiv:2406.13982  [pdf, other

    cs.SD eess.AS

    Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio

    Authors: Li Li, Shogo Seki

    Abstract: RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environmen… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  22. arXiv:2406.13943  [pdf, ps, other

    cs.IT

    New QEC codes and EAQEC codes from repeated-root cyclic codes of length $2^rp^s$

    Authors: Lanqiang Li, Ziwen Cao, Tingting Wu, Li Liu

    Abstract: Let $p$ be an odd prime and $r,s,m$ be positive integers. In this study, we initiate our exploration by delving into the intricate structure of all repeated-root cyclic codes and their duals with a length of $2^rp^s$ over the finite field $\mathbb{F}_{p^m}$. Through the utilization of CSS and Steane's constructions, a series of new quantum error-correcting (QEC) codes are constructed with paramete… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 94B15 (Primary) 94B05; 11T71(Secondary)

  23. arXiv:2406.13905  [pdf, other

    cs.CL

    Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking

    Authors: Mohamed Elaraby, Diane Litman, Xiang Lorraine Li, Ahmed Magooda

    Abstract: Generating free-text rationales is among the emergent capabilities of Large Language Models (LLMs). These rationales have been found to enhance LLM performance across various NLP tasks. Recently, there has been growing interest in using these rationales to provide insights for various important downstream tasks. In this paper, we analyze generated free-text rationales in tasks with subjective answ… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  24. arXiv:2406.13869  [pdf, other

    cs.LG q-bio.BM

    Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning

    Authors: Danqing Wang, Antonis Antoniades, Kha-Dinh Luong, Edwin Zhang, Mert Kosan, Jiachen Li, Ambuj Singh, William Yang Wang, Lei Li

    Abstract: Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  25. arXiv:2406.13652  [pdf, other

    cs.AI

    Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics

    Authors: Weitong Zhang, Chengqi Zang, Liu Li, Sarah Cechnicka, Cheng Ouyang, Bernhard Kainz

    Abstract: Inverse problems describe the process of estimating the causal factors from a set of measurements or data. Map** of often incomplete or degraded data to parameters is ill-posed, thus data-driven iterative solutions are required, for example when reconstructing clean images from poor signals. Diffusion models have shown promise as potent generative tools for solving inverse problems due to their… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.13558   

    cs.AI

    Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach

    Authors: Xuehao Zhai, Hanlin Tian, Lintong Li, Tianyu Zhao

    Abstract: Travel choice analysis is crucial for understanding individual travel behavior to develop appropriate transport policies and recommendation systems in Intelligent Transportation Systems (ITS). Despite extensive research, this domain faces two critical challenges: a) modeling with limited survey data, and b) simultaneously achieving high model explainability and accuracy. In this paper, we introduc… ▽ More

    Submitted 22 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: We currently do not have a replacement version available. We request withdrawal due to a significant methodological error affecting the paper's validity, specifically a miscalculation in data preprocessing. We are working on corrections, but this will take time. We believe an interim withdrawal is necessary to prevent the dissemination of incorrect information.

  27. arXiv:2406.13372  [pdf, other

    cs.AI

    Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

    Authors: Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-conne… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 21 pages, 4 figures

  28. arXiv:2406.13227  [pdf, other

    cs.CV

    Controllable and Gradual Facial Blemishes Retouching via Physics-Based Modelling

    Authors: Chenhao Shuai, Rizhao Cai, Bandara Dissanayake, Amanda Newman, Dayan Guan, Dennis Sng, Ling Li, Alex Kot

    Abstract: Face retouching aims to remove facial blemishes, such as pigmentation and acne, and still retain fine-grain texture details. Nevertheless, existing methods just remove the blemishes but focus little on realism of the intermediate process, limiting their use more to beautifying facial images on social media rather than being effective tools for simulating changes in facial pigmentation and ance. Mo… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures. The paper has been accepted by the IEEE Conference on Multimedia Expo 2024

  29. arXiv:2406.12913  [pdf, other

    cs.LG cs.AI

    T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation

    Authors: Lihuan Li, Hao Xue, Yang Song, Flora Salim

    Abstract: Trajectory similarity computation is an essential technique for analyzing moving patterns of spatial data across various applications such as traffic management, wildlife tracking, and location-based services. Modern methods often apply deep learning techniques to approximate heuristic metrics but struggle to learn more robust and generalized representations from the vast amounts of unlabeled traj… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  30. arXiv:2406.12534  [pdf, other

    cs.CL

    Unified Active Retrieval for Retrieval Augmented Generation

    Authors: Qinyuan Cheng, Xiaonan Li, Shimin Li, Qin Zhu, Zhangyue Yin, Yunfan Shao, Linyang Li, Tianxiang Sun, Hang Yan, Xipeng Qiu

    Abstract: In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instru… ▽ More

    Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  31. arXiv:2406.12168  [pdf, other

    cs.LG cs.AI cs.CL

    BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM

    Authors: Wenda Xu, Jiachen Li, William Yang Wang, Lei Li

    Abstract: Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of onli… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Wenda Xu and Jiachen Li contributed equally

  32. arXiv:2406.11980  [pdf, other

    cs.AI cs.CY

    Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways

    Authors: Shubham Atreja, Joshua Ashkinaze, Lingyao Li, Julia Mendelsohn, Libby Hemphill

    Abstract: Manually annotating data for computational social science tasks can be costly, time-consuming, and emotionally draining. While recent work suggests that LLMs can perform such annotation tasks in zero-shot settings, little is known about how prompt design impacts LLMs' compliance and accuracy. We conduct a large-scale multi-prompt experiment to test how model selection (ChatGPT, PaLM2, and Falcon7b… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: under review

  33. arXiv:2406.11882  [pdf

    cs.AI cs.LG

    Applications of Explainable artificial intelligence in Earth system science

    Authors: Feini Huang, Shijie Jiang, Lu Li, Yongkun Zhang, Ye Zhang, Ruqing Zhang, Qingliang Li, Danxi Li, Wei Shangguan, Yongjiu Dai

    Abstract: In recent years, artificial intelligence (AI) rapidly accelerated its influence and is expected to promote the development of Earth system science (ESS) if properly harnessed. In application of AI to ESS, a significant hurdle lies in the interpretability conundrum, an inherent problem of black-box nature arising from the complexity of AI algorithms. To address this, explainable AI (XAI) offers a s… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  34. arXiv:2406.11445  [pdf, other

    cs.CV

    Solving the Inverse Problem of Electrocardiography for Cardiac Digital Twins: A Survey

    Authors: Lei Li, Julia Camps, Blanca Rodriguez, Vicente Grau

    Abstract: Cardiac digital twins are personalized virtual representations used to understand complex heart mechanisms. Solving the ECG inverse problem is crucial for accurate virtual heart modelling, enabling the derivation of internal electrical activity information from recorded surface potentials. Despite challenges from cardiac complexity, noisy ECG data, and computational efficiency, recent advancements… ▽ More

    Submitted 3 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  35. arXiv:2406.11267  [pdf, other

    cs.CL

    Mitigating Large Language Model Hallucination with Faithful Finetuning

    Authors: Minda Hu, Bowei He, Yufei Wang, Liangyou Li, Chen Ma, Irwin King

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on various natural language processing tasks. However, they are prone to generating fluent yet untruthful responses, known as "hallucinations". Hallucinations can lead to the spread of misinformation and cause harm in critical applications. Mitigating hallucinations is challenging as they arise from factors such as noisy data, m… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  36. arXiv:2406.10956  [pdf, other

    cs.SD cs.LG eess.AS

    Robust Channel Learning for Large-Scale Radio Speaker Verification

    Authors: Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

    Abstract: Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learnin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 11 figures

  37. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  38. arXiv:2406.10594  [pdf, other

    cs.CL

    BlockPruner: Fine-grained Pruning for Large Language Models

    Authors: Longguang Zhong, Fanqi Wan, Ruijun Chen, Xiaojun Quan, Liangzhi Li

    Abstract: With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and pruning these layers has minimal impact on the overall performance. While various layer pruning methods have been developed based on this insight, they… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  39. arXiv:2406.10462  [pdf, other

    cs.CV

    CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

    Authors: Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen

    Abstract: Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generating integrated image-text sequences that exhibit narrative coherence and entity and style consistency remains challenging due to poor training data qu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 22 pages

  40. arXiv:2406.10313  [pdf, ps, other

    cs.CL cs.CV

    CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge

    Authors: Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang

    Abstract: The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR for a set of registered speakers. The challenge yielded highly successful results, with the best submission significantly outperforming the baseline,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  41. arXiv:2406.10227  [pdf, other

    cs.CV cs.AI

    VideoGUI: A Benchmark for GUI Automation from Instructional Videos

    Authors: Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen WU, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-c… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 24 pages, 16 tables, 17 figures

  42. DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications

    Authors: Li Li, Khalid N. Ismail, Hubert P. H. Shum, Toby P. Breckon

    Abstract: We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery, as well as a sample benchmark task using depth estimation for autonomous driving applications. Our driving platform is equipped with a high resolution 128 channel LiDAR, a 2MPix stereo camera, a lux meter and a GNSS/INS system. Ambient and reflectivity images are made av… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by 3DV 2021; 13 pages, 14 figures; Dataset at https://github.com/l1997i/durlar

    Journal ref: Proc. Int. Conf. on 3D Vision (3DV 2021)

  43. arXiv:2406.09205  [pdf, other

    cs.CL cs.AI

    ReadCtrl: Personalizing text generation with readability-controlled instruction learning

    Authors: Hieu Tran, Zonghai Yao, Lingxi Li, Hong Yu

    Abstract: Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' reada… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages

  44. arXiv:2406.08453  [pdf, other

    cs.HC

    ORES-Inspect: A technology probe for machine learning audits on enwiki

    Authors: Zachary Levonian, Lauren Hagen, Lu Li, Jada Lilleboe, Solvejg Wastvedt, Aaron Halfaker, Loren Terveen

    Abstract: Auditing the machine learning (ML) models used on Wikipedia is important for ensuring that vandalism-detection processes remain fair and effective. However, conducting audits is challenging because stakeholders have diverse priorities and assembling evidence for a model's [in]efficacy is technically complex. We designed an interface to enable editors to learn about and audit the performance of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Wiki Workshop 2024

    ACM Class: K.4.2

  45. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  46. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wang** Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  47. arXiv:2406.07854  [pdf, other

    cs.SD cs.MM eess.AS

    Zero-Shot Fake Video Detection by Audio-Visual Consistency

    Authors: Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang

    Abstract: Recent studies have advocated the detection of fake videos as a one-class detection task, predicated on the hypothesis that the consistency between audio and visual modalities of genuine data is more significant than that of fake data. This methodology, which solely relies on genuine audio-visual data while negating the need for forged counterparts, is thus delineated as a `zero-shot' detection pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  48. arXiv:2406.07832  [pdf, other

    cs.SD eess.AS

    SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition

    Authors: Tianhao Wang, Lantian Li, Dong Wang

    Abstract: Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  49. arXiv:2406.07529  [pdf, other

    cs.LG

    MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

    Authors: Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

    Abstract: Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the ob… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  50. arXiv:2406.07421  [pdf, other

    cs.SD eess.AS

    A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

    Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

    Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024