Skip to main content

Showing 1–50 of 1,865 results for author: li, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09336  [pdf, other

    cs.LG cs.AI

    Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

    Authors: Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang

    Abstract: Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 20 pages, 11 figures

  2. arXiv:2407.09048  [pdf, other

    cs.AI

    KUNPENG: An Embodied Large Model for Intelligent Maritime

    Authors: Naiyao Wang, Tongbang Jiang, Ye Wang, Shaoyang Qiu, Bo Zhang, Xinqiang Xie, Munan Li, Chunliu Wang, Yiyang Wang, Hongxiang Ren, Ruili Wang, Hongjun Shan, Hongbo Liu

    Abstract: Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  3. arXiv:2407.09019  [pdf, other

    cs.SI cs.AI

    Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media

    Authors: Chen Chen, Mingwei Li, Fenghuan Li, Haopeng Chen, Yuankun Lin

    Abstract: Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users.… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  4. arXiv:2407.08986  [pdf

    cs.CY

    Exploring Generative AI Policies in Higher Education: A Comparative Perspective from China, Japan, Mongolia, and the USA

    Authors: Qin Xie, Ming Li, Ariunaa Enkhtur

    Abstract: This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positiv… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 14 pages, 1 table

  5. arXiv:2407.08651  [pdf, other

    cs.CR cs.DC

    SpiralShard: Highly Concurrent and Secure Blockchain Sharding via Linked Cross-shard Endorsement

    Authors: You Lin, Mingzhe Li, ** Zhang

    Abstract: Blockchain sharding improves the scalability of blockchain systems by partitioning the whole blockchain state, nodes, and transaction workloads into different shards. However, existing blockchain sharding systems generally suffer from a small number of shards, resulting in limited concurrency. The main reason is that existing sharding systems require large shard sizes to ensure security. To enhanc… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.08537  [pdf, other

    cs.NI cs.CR

    BriDe Arbitrager: Enhancing Arbitrage in Ethereum 2.0 via Bribery-enabled Delayed Block Production

    Authors: Hulin Yang, Mingzhe Li, ** Zhang, Alia Asheralieva, Qingsong Wei, Siow Mong Rick Goh

    Abstract: The advent of Ethereum 2.0 has introduced significant changes, particularly the shift to Proof-of-Stake consensus. This change presents new opportunities and challenges for arbitrage. Amidst these changes, we introduce BriDe Arbitrager, a novel tool designed for Ethereum 2.0 that leverages Bribery-driven attacks to Delay block production and increase arbitrage gains. The main idea is to allow mali… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.08443  [pdf, other

    cs.CV

    Infinite Motion: Extended Motion Generation via Long Text Instructions

    Authors: Mengtian Li, Chengshuo Zhai, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang

    Abstract: In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reass… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 12 pages,13 figures

  8. arXiv:2407.08273   

    cs.CL

    RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

    Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

    Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Further improvement and modification are needed.

  9. arXiv:2407.08255  [pdf, other

    cs.CV cs.LG

    GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification

    Authors: Aitao Yang, Min Li, Yao Ding, Leyuan Fang, Yaoming Cai, Yujie He

    Abstract: Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integ… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures

  10. arXiv:2407.08125  [pdf, ps, other

    cs.LG

    Real-Time Summarization of Twitter

    Authors: Yixin **, Meiqi Wang, Meng Li, Wen**g Zhou, Yi Shen, Hao Liu

    Abstract: In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given inter… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper was accepted to International Conference on Artificial Intelligence and Electromechanical Automation 2024

  11. arXiv:2407.08039  [pdf, other

    cs.CL

    Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

    Authors: Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi R. Fung, **g Li, Manling Li, Heng Ji

    Abstract: Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model wi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2407.07723  [pdf, other

    cs.IT cs.AI

    Understanding is Compression

    Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

    Abstract: We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  13. arXiv:2407.07059  [pdf, other

    q-bio.NC cs.LG

    Differentiable Optimization of Similarity Scores Between Models and Brains

    Authors: Nathan Cloos, Moufan Li, Markus Siegel, Scott L. Brincat, Earl K. Miller, Guangyu Robert Yang, Christopher J. Cueva

    Abstract: What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates,… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 16 pages, 6 figures

  14. arXiv:2407.06953  [pdf, other

    cs.DC

    SP-Chain: Boosting Intra-Shard and Cross-Shard Security and Performance in Blockchain Sharding

    Authors: Mingzhe Li, You Lin, Wei Wang, ** Zhang

    Abstract: A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  15. arXiv:2407.06882  [pdf, other

    cs.DC

    DL-Chain: Scalable and Stable Blockchain Sharding with High Concurrency via Dual-Layer Consensus

    Authors: You Lin, Mingzhe Li, Qingsong Wei, Yong Liu, Siow Mong Rick Goh, ** Zhang

    Abstract: Sharding enhances blockchain scalability by partitioning nodes into multiple groups for concurrent transaction processing. Configuring a large number of \emph{small shards} helps improve the transaction concurrency of a sharding system. However, it increases the fraction of malicious nodes within each shard, easily leading to shard corruption and jeopardizing system security. Some existing works h… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  16. arXiv:2407.05763  [pdf, other

    math.OC cs.MA eess.SY

    Homogeneous Distributed Observers for Quasilinear Systems

    Authors: Min Li, Andrey Polyakov, Siyuan Wang, Gang Zheng

    Abstract: The problem of finite/fixed-time cooperative state estimation is considered for a class of quasilinear systems with nonlinearities satisfying a Hölder condition. A strongly connected nonlinear distributed observer is designed under the assumption of global observability. By proper parameter tuning with linear matrix inequalities, the observer error equation possesses finite/fixed-time stability in… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This manuscript has been submitted for a possible journal publication

  17. arXiv:2407.05591  [pdf, other

    cs.LG cs.CL cs.NE

    On the Power of Convolution Augmented Transformer

    Authors: Mingchen Li, Xuechen Zhang, Yixiao Huang, Samet Oymak

    Abstract: The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks. CAT incorporates convolutional filters in the K/Q/V embeddings of an at… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  18. arXiv:2407.05082  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    DMTG: One-Shot Differentiable Multi-Task Grou**

    Authors: Yuan Gao, Shuguo Jiang, Moran Li, **-Gang Yu, Gui-Song Xia

    Abstract: We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grou** (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train th… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

    Journal ref: International Conference on Machine Learning (ICML), 2024

  19. arXiv:2407.04955  [pdf, other

    cs.CV

    Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

    Authors: Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang

    Abstract: Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: TCSVT 2024

  20. arXiv:2407.04928  [pdf, other

    cs.CV eess.IV

    CLIPVQA:Video Quality Assessment via CLIP

    Authors: Fengchuang Xing, Mingjie Li, Yuan-Gen Wang, Guopu Zhu, Xiaochun Cao

    Abstract: In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  21. arXiv:2407.04557  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

    Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

    Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 512 pages total, 4 main figures + 218 supplementary figures

  22. arXiv:2407.04211  [pdf, other

    cs.LG

    TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation

    Authors: Jian Qian, Miao Sun, Sifan Zhou, Biao Wan, Minhao Li, Patrick Chiang

    Abstract: Time series generation is a crucial research topic in the area of deep learning, which can be used for data augmentation, imputing missing values, and forecasting. Currently, latent diffusion models are ascending to the forefront of generative modeling for many important data representations. Being the most pivotal in the computer vision domain, latent diffusion models have also recently attracted… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  23. arXiv:2407.02446  [pdf, other

    cs.CL cs.AI

    Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

    Authors: Margaret Li, Weijia Shi, Artidoro Pagnoni, Peter West, Ari Holtzman

    Abstract: RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  24. arXiv:2407.01891  [pdf, other

    cs.RO eess.SY

    Refined Motion Compensation with Soft Laser Manipulators using Data-Driven Surrogate Models

    Authors: Yongjun Yan, Qingpeng Ding, Mingwu Li, Junyan Yan, Shing Shin Cheng

    Abstract: Non-contact laser ablation, a precise thermal technique, simultaneously cuts and coagulates tissue without the insertion errors associated with rigid needles. Human organ motions, such as those in the liver, exhibit rhythmic components influenced by respiratory and cardiac cycles, making effective laser energy delivery to target lesions while compensating for tumor motion crucial. This research in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  25. arXiv:2407.01316  [pdf, other

    cs.LG cs.CY stat.ML

    Evaluating Model Performance Under Worst-case Subpopulations

    Authors: Mike Li, Hongseok Namkoong, Shangzhou Xia

    Abstract: The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for compl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Earlier version appeared in the proceedings of Advances in Neural Information Processing Systems 34 (NeurIPS 2021): https://proceedings.neurips.cc/paper_files/paper/2021/file/908075ea2c025c335f4865f7db427062-Paper.pdf

  26. arXiv:2407.01281  [pdf, other

    cs.LG cs.AI math.FA

    Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

    Authors: Guangrui Yang, Jianfei Li, Ming Li, Han Feng, Ding-Xuan Zhou

    Abstract: In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  27. arXiv:2407.00948  [pdf, other

    cs.CL cs.AI cs.LG

    The House Always Wins: A Framework for Evaluating Strategic Deception in LLMs

    Authors: Tanush Chopra, Michael Li

    Abstract: We propose a framework for evaluating strategic deception in large language models (LLMs). In this framework, an LLM acts as a game master in two scenarios: one with random game mechanics and another where it can choose between random or deliberate actions. As an example, we use blackjack because the action space nor strategies involve deception. We benchmark Llama3-70B, GPT-4-Turbo, and Mixtral i… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Research conducted at the Deception Detection Hackathon 2024 hosted by Apart & Apollo Research

  28. arXiv:2407.00046  [pdf, other

    cs.DC cs.GR

    Barrier-Augmented Lagrangian for GPU-based Elastodynamic Contact

    Authors: Dewen Guo, Minchen Li, Yin Yang, Guo** Wang, Sheng Li

    Abstract: We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint set… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

    Comments: 17 pages, 30 figures

  29. Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs

    Authors: Sangwon Jeong, Mingwei Li, Matthew Berger, Shusen Liu

    Abstract: As applications of generative AI become mainstream, it is important to understand what generative models are capable of producing, and the extent to which one can predictably control their outputs. In this paper, we propose a visualization design, named Concept Lens, for jointly navigating the data distribution of a generative model, and concept manipulations supported by the model. Our work is fo… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Journal ref: 2023 IEEE Visualization and Visual Analytics (VIS), Melbourne, Australia, 2023, pp. 221-225

  30. arXiv:2406.19756  [pdf, other

    cs.CV cs.AI

    Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

    Authors: Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-trai… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Technical report

  31. arXiv:2406.19236  [pdf, other

    cs.AI cs.CV cs.RO

    Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

    Authors: Minghan Li, Heng Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann

    Abstract: Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activitie… ▽ More

    Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 30 pages, 18 figures, Project Page: https://lpercc.github.io/HA3D_simulator/

  32. arXiv:2406.18588  [pdf, other

    cs.CV cs.LG

    Varying Manifolds in Diffusion: From Time-varying Geometries to Visual Saliency

    Authors: Junhao Chen, Manyi Li, Zherong Pan, Xifeng Gao, Changhe Tu

    Abstract: Deep generative models learn the data distribution, which is concentrated on a low-dimensional manifold. The geometric analysis of distribution transformation provides a better understanding of data structure and enables a variety of applications. In this paper, we study the geometric properties of the diffusion model, whose forward diffusion process and reverse generation process construct a seri… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  33. arXiv:2406.18546  [pdf

    cs.CV cs.AI

    Application of Multimodal Fusion Deep Learning Model in Disease Recognition

    Authors: Xiaoyi Liu, Hongjie Qiu, Muqing Li, Zhou Yu, Yutian Yang, Yafeng Yan

    Abstract: This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transform… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  34. arXiv:2406.18311  [pdf, other

    cs.LG

    Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields

    Authors: Yixin **, Wen**g Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu

    Abstract: This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors.… ▽ More

    Submitted 29 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  35. arXiv:2406.16982  [pdf

    cs.LG cs.AI

    Research on Disease Prediction Model Construction Based on Computer AI deep Learning Technology

    Authors: Yang Lin, Muqing Li, Ziyi Zhu, Yinqiu Feng, Lingxi Xiao, Zexi Chen

    Abstract: The prediction of disease risk factors can screen vulnerable groups for effective prevention and treatment, so as to reduce their morbidity and mortality. Machine learning has a great demand for high-quality labeling information, and labeling noise in medical big data poses a great challenge to efficient disease risk warning methods. Therefore, this project intends to study the robust learning alg… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  36. arXiv:2406.16710  [pdf, other

    cs.CV

    Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

    Authors: **kun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Lizhuang Ma

    Abstract: While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framew… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: https://**kun-hao.github.io/Portrait3D/

  37. Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

    Authors: Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang

    Abstract: Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  38. arXiv:2406.16272  [pdf, other

    cs.CV cs.AI

    Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

    Authors: Zhiyuan Chang, Mingyang Li, Junjie Wang, Yi Liu, Qing Wang, Yang Liu

    Abstract: Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 11 pages, 3 figures

  39. arXiv:2406.16271  [pdf, other

    cs.CV

    Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

    Authors: Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, Wen Zheng

    Abstract: Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted for MICCAI2024

  40. arXiv:2406.16116  [pdf, ps, other

    cs.NE

    A First Running Time Analysis of the Strength Pareto Evolutionary Algorithm 2 (SPEA2)

    Authors: Shengjie Ren, Chao Bian, Miqing Li, Chao Qian

    Abstract: Evolutionary algorithms (EAs) have emerged as a predominant approach for addressing multi-objective optimization problems. However, the theoretical foundation of multi-objective EAs (MOEAs), particularly the fundamental aspects like running time analysis, remains largely underexplored. Existing theoretical studies mainly focus on basic MOEAs, with little attention given to practical MOEAs. In this… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  41. arXiv:2406.15938  [pdf, other

    cs.CL cs.AI cs.LG

    RuleR: Improving LLM Controllability by Rule-based Data Recycling

    Authors: Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou

    Abstract: Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR),… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  42. arXiv:2406.15769  [pdf, other

    cs.DC

    Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

    Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

    Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages; 27 figures

  43. arXiv:2406.15305  [pdf, other

    cs.CR cs.AI

    PID: Prompt-Independent Data Protection Against Latent Diffusion Models

    Authors: Ang Li, Yichuan Mo, Mingjie Li, Yisen Wang

    Abstract: The few-shot fine-tuning of Latent Diffusion Models (LDMs) has enabled them to grasp new concepts from a limited number of images. However, given the vast amount of personal images accessible online, this capability raises critical concerns about civil privacy. While several previous defense methods have been developed to prevent such misuse of LDMs, they typically assume that the textual prompts… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 27 pages, ICML 2024 poster

  44. arXiv:2406.14777  [pdf, other

    cs.LG math.OC

    Learning to Cover: Online Learning and Optimization with Irreversible Decisions

    Authors: Alexandre Jacquillat, Michael Lingzhi Li

    Abstract: We define an online learning and optimization problem with irreversible decisions contributing toward a coverage target. At each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a machine learning model to guide future decisions. The goal is to minimize costs across a finite horizon under a chance constraint reflecting the coverage t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  45. arXiv:2406.14482  [pdf, other

    cs.CV

    Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

    Authors: Xinyi Ying, Chao Xiao, Ruo**g Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zai** Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

    Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  46. arXiv:2406.14422  [pdf, other

    cs.CV cs.AI

    FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

    Authors: Mingkun Wang, Xiaoguang Ren, Ruochun **, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wen**g Yang

    Abstract: Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to e… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages

  47. arXiv:2406.14180  [pdf, other

    cs.NE

    RTFormer: Re-parameter TSBN Spiking Transformer

    Authors: Hongzhi Wang, Xiubo Liang, Mengjian Li, Tao Zhang

    Abstract: The Spiking Neural Networks (SNNs), renowned for their bio-inspired operational mechanism and energy efficiency, mirror the human brain's neural activity. Yet, SNNs face challenges in balancing energy efficiency with the computational demands of advanced tasks. Our research introduces the RTFormer, a novel architecture that embeds Re-parameterized Temporal Sliding Batch Normalization (TSBN) within… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  48. arXiv:2406.14171  [pdf, other

    cs.AI cs.CL

    Ranking LLMs by compression

    Authors: Peijia Guo, Ziguang Li, Haibo Hu, Chao Huang, Ming Li, Rui Zhang

    Abstract: We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 tables

  49. arXiv:2406.13940  [pdf, other

    cs.CL

    AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought

    Authors: Yongheng Zhang, Qiguang Chen, Min Li, Wanxiang Che, Libo Qin

    Abstract: Cross-lingual chain-of-thought can effectively complete reasoning tasks across languages, which gains increasing attention. Recently, dominant approaches in the literature improve cross-lingual alignment capabilities by integrating reasoning knowledge from different languages. Despite achieving excellent performance, current methods still have two main challenges: (1) Manual language specification… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL2024 Findings

  50. arXiv:2406.13778  [pdf, other

    cs.CR cs.LG

    Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN

    Authors: Pablo Moriano, Steven C. Hespeler, Mingyan Li, Robert A. Bridges

    Abstract: Vehicular controller area networks (CANs) are susceptible to masquerade attacks by malicious adversaries. In masquerade attacks, adversaries silence a targeted ID and then send malicious frames with forged content at the expected timing of benign frames. As masquerade attacks could seriously harm vehicle functionality and are the stealthiest attacks to detect in CAN, recent work has devoted attent… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 15 pages, 9 figures, 3 tables