Skip to main content

Showing 1–50 of 686 results for author: li, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03197  [pdf, other

    cs.CV

    DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

    Authors: Le Yang, Ziwei Zheng, Yizeng Han, Hao Cheng, Shiji Song, Gao Huang, Fan Li

    Abstract: Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneo… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  2. arXiv:2407.01191  [pdf, other

    cs.RO cs.AI cs.CV

    MARS: Multimodal Active Robotic Sensing for Articulated Characterization

    Authors: Hongliang Zeng, ** Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

    Abstract: Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characteri… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.00917  [pdf, other

    cs.CV

    From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos

    Authors: Tanqiu Qiao, Ruochen Li, Frederick W. B. Li, Hubert P. H. Shum

    Abstract: Video-based Human-Object Interaction (HOI) recognition explores the intricate dynamics between humans and objects, which are essential for a comprehensive understanding of human behavior and intentions. While previous work has made significant strides, effectively integrating geometric and visual features to model dynamic relationships between humans and objects in a graph framework remains a chal… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by ICPR 2024

  4. arXiv:2407.00814  [pdf, other

    cs.NI cs.AI

    Privacy-Aware Spectrum Pricing and Power Control Optimization for LEO Satellite Internet-of-Things

    Authors: Bowen Shen, Kwok-Yan Lam, Feng Li

    Abstract: Low earth orbit (LEO) satellite systems play an important role in next generation communication networks due to their ability to provide extensive global coverage with guaranteed communications in remote areas and isolated areas where base stations cannot be cost-efficiently deployed. With the pervasive adoption of LEO satellite systems, especially in the LEO Internet-of-Things (IoT) scenarios, th… ▽ More

    Submitted 1 April, 2024; originally announced July 2024.

  5. arXiv:2406.17342  [pdf, other

    cs.CV cs.AI

    Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds

    Authors: Hongliang Zeng, ** Zhang, Fang Li, Jiahua Wang, Tingyu Ye, Pengteng Guo

    Abstract: In the field of 2D image generation modeling and representation learning, Masked Generative Encoder (MAGE) has demonstrated the synergistic potential between generative modeling and representation learning. Inspired by this, we propose Point-MAGE to extend this concept to point cloud data. Specifically, this framework first utilizes a Vector Quantized Variational Autoencoder (VQVAE) to reconstruct… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  6. arXiv:2406.16905  [pdf

    cs.LG cs.AI

    Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm

    Authors: Xirui Tang, Feiyang Li, Zinan Cao, Qixuan Yu, Yulu Gong

    Abstract: In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2406.16021  [pdf, other

    cs.CL cs.AI

    Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

    Authors: Qiang Gao, Zixiang Meng, Bobo Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

    Abstract: Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document ev… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: ACL2024(Findings)

  8. arXiv:2406.15990  [pdf, other

    cs.CL cs.AI

    Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

    Authors: Qiang Gao, Bobo Li, Zixiang Meng, Yunlong Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

    Abstract: Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performan… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Report number: https://aclanthology.org/2024.lrec-main.523/

    Journal ref: LREC|COLING,Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation,2024,5907-5921

  9. arXiv:2406.15709  [pdf, other

    cs.CR

    I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures

    Authors: Mingyi Liu, Jun Ho Huh, HyungSeok Han, Jaehyuk Lee, Jihae Ahn, Frank Li, Hyoungshick Kim, Taesoo Kim

    Abstract: Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness leve… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 33rd USENIX Security Symposium, Philadelphia, PA, USA, Aug. 2024

  10. arXiv:2406.14795  [pdf, other

    cs.RO eess.SY

    Design and Control of a Low-cost Non-backdrivable End-effector Upper Limb Rehabilitation Device

    Authors: Fulan Li, Yunfei Guo, Wenda Xu, Weide Zhang, Fangyun Zhao, Baiyu Wang, Huaguang Du, Chengkun Zhang

    Abstract: This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 15 figures

  11. arXiv:2406.13036  [pdf, other

    stat.ML cs.LG math.PR math.ST stat.CO

    Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

    Authors: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

    Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  12. Contextual Distillation Model for Diversified Recommendation

    Authors: Fan Li, Xu Si, Shisong Tang, Dingmin Wang, Kunyan Han, Bing Han, Guorui Zhou, Yang Song, Hechang Chen

    Abstract: The diversity of recommendation is equally crucial as accuracy in improving user experience. Existing studies, e.g., Determinantal Point Process (DPP) and Maximal Marginal Relevance (MMR), employ a greedy paradigm to iteratively select items that optimize both accuracy and diversity. However, prior methods typically exhibit quadratic complexity, limiting their applications to the re-ranking stage… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: accepted by KDD 2024

  13. arXiv:2406.08455  [pdf, other

    cs.RO

    AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind

    Authors: Wei Ding, Fanhong Li, Ziteng Ji, Zhengrong Xue, Jia Liu

    Abstract: We propose AToM-Bot, a novel task generation and execution framework for proactive robot-human interaction, which leverages the human mental and physical state inference capabilities of the Vision Language Model (VLM) prompted by the Affective Theory of Mind (AToM). Without requiring explicit commands by humans, AToM-Bot proactively generates and follows feasible tasks to improve general human wel… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.07824  [pdf, other

    quant-ph cs.CR

    Efficient Arbitrated Quantum Digital Signature with Multi-Receiver Verification

    Authors: Siyu Xiong, Bangying Tang, Hui Han, **quan Huang, Mingqiang Bai, Fangzhao Li, Wanrong Yu Zhiwen Mo, Bo Liu

    Abstract: Quantum digital signature is used to authenticate the identity of the signer with information theoretical security, while providing non-forgery and non-repudiation services. In traditional multi-receiver quantum digital signature schemes without an arbitrater, the transferability of one-to-one signature is always required to achieve unforgeability, with complicated implementation and heavy key con… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  15. arXiv:2406.07342  [pdf, other

    cs.NI cs.DC

    EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning

    Authors: Yijun Hao, Shusen Yang, Fang Li, Yifan Zhang, Shibo Wang, Xuebin Ren

    Abstract: In mobile edge computing (MEC), resource scheduling is crucial to task requests' performance and service providers' cost, involving multi-layer heterogeneous scheduling decisions. Existing schedulers typically adopt static timescales to regularly update scheduling decisions of each layer, without adaptive adjustment of timescales for different layers, resulting in potentially poor performance in p… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  16. arXiv:2406.06608  [pdf, other

    cs.CL cs.AI

    The Prompt Report: A Systematic Survey of Prompting Techniques

    Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  17. arXiv:2406.05746  [pdf

    cs.AI cs.HC cs.LG

    Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

    Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

    Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Artificaial Intelligence Review, (2024) 57:151

  18. arXiv:2406.05534  [pdf, other

    cs.AI cs.CL cs.LG

    Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

    Authors: Biqing Qi, Pengfei Li, Fangyuan Li, Junqi Gao, Kaiyan Zhang, Bowen Zhou

    Abstract: Direct Preference Optimization (DPO) improves the alignment of large language models (LLMs) with human values by training directly on human preference datasets, eliminating the need for reward models. However, due to the presence of cross-domain human preferences, direct continual training can lead to catastrophic forgetting, limiting DPO's performance and efficiency. Inspired by intraspecific com… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  19. arXiv:2406.03701  [pdf, other

    cs.MM

    Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction

    Authors: Meishan Zhang, Hao Fei, Bin Wang, Shengqiong Wu, Yixin Cao, Fei Li, Min Zhang

    Abstract: In the field of information extraction (IE), tasks across a wide range of modalities and their combinations have been traditionally studied in isolation, leaving a gap in deeply recognizing and analyzing cross-modal information. To address this, this work for the first time introduces the concept of grounded Multimodal Universal Information Extraction (MUIE), providing a unified task framework to… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  20. arXiv:2406.03459  [pdf, other

    cs.CV

    LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

    Authors: Qiang Chen, Xiangbo Su, Xinyu Zhang, Jian Wang, Jiahui Chen, Yunpeng Shen, Chuchu Han, Ziliang Chen, Weixiang Xu, Fanrong Li, Shan Zhang, Kun Yao, Errui Ding, Gang Zhang, **gdong Wang

    Abstract: In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for r… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  21. arXiv:2406.03032  [pdf, other

    cs.CV

    Instructing Prompt-to-Prompt Generation for Zero-Shot Learning

    Authors: Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Meng Wang, Tat-Seng Chua, Yao Zhao

    Abstract: Zero-shot learning (ZSL) aims to explore the semantic-visual interactions to discover comprehensive knowledge transferred from seen categories to classify unseen categories. Recently, prompt engineering has emerged in ZSL, demonstrating impressive potential as it enables the zero-shot transfer of diverse visual concepts to downstream tasks. However, these methods are still not well generalized to… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  22. arXiv:2406.02470  [pdf, other

    quant-ph cs.LG

    Meta-Designing Quantum Experiments with Language Models

    Authors: Sören Arlt, Haonan Duan, Felix Li, Sang Michael Xie, Yuhuai Wu, Mario Krenn

    Abstract: Artificial Intelligence (AI) has the potential to significantly advance scientific discovery by finding solutions beyond human capabilities. However, these super-human solutions are often unintuitive and require considerable effort to uncover underlying principles, if possible at all. Here, we show how a code-generating language model trained on synthetic data can not only find solutions to specif… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10+3 pages, 5 figures

  23. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  24. arXiv:2406.01042  [pdf, other

    cs.CV

    Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting

    Authors: Fang Li, Hao Zhang, Narendra Ahuja

    Abstract: Gaussian Splatting (GS) has significantly elevated scene reconstruction efficiency and novel view synthesis (NVS) accuracy compared to Neural Radiance Fields (NeRF), particularly for dynamic scenes. However, current 4D NVS methods, whether based on GS or NeRF, primarily rely on camera parameters provided by COLMAP and even utilize sparse point clouds generated by COLMAP for initialization, which l… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: GitHub Page: https://github.com/fangli333/SC-4DGS

  25. arXiv:2406.00992  [pdf, other

    cs.SE

    Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis

    Authors: Fengjie Li, Jiajun Jiang, Jiajun Sun, Hongyu Zhang

    Abstract: Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. Recently, LLM-based APR methods have shown promise in repairing real-world bugs. However, existing APR methods often utilize patches generated by LLMs without further optimization, resulting in reduced effectiveness due to the lack of program-specific kn… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  26. arXiv:2406.00758  [pdf, other

    eess.IV cs.CV cs.MM

    Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

    Authors: Anqi Li, Yuxi Liu, Huihui Bai, Feng Li, Runmin Cong, Meng Wang, Yao Zhao

    Abstract: Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  27. arXiv:2405.18023  [pdf, ps, other

    cs.IT

    Generator polynomials of cyclic expurgated or extended Goppa codes

    Authors: Xue Jia, Fengwei Li, Huan Sun, Qin Yue

    Abstract: Classical Goppa codes are a well-known class of codes with applications in code-based cryptography, which are a special case of alternant codes. Many papers are devoted to the search for Goppa codes with a cyclic extension or with a cyclic parity-check subcode. Let $\Bbb F_q$ be a finite field with $q=2^l$ elements, where $l$ is a positive integer. In this paper, we determine all the generator pol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  28. arXiv:2405.15064  [pdf, other

    cs.CL cs.AI cs.DB

    Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning

    Authors: Fangjun Li, David C. Hogg, Anthony G. Cohn

    Abstract: Spatial reasoning plays a vital role in both human cognition and machine intelligence, prompting new research into language models' (LMs) capabilities in this regard. However, existing benchmarks reveal shortcomings in evaluating qualitative spatial reasoning (QSR). These benchmarks typically present oversimplified scenarios or unclear natural language descriptions, hindering effective evaluation.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Camera-Ready version for IJCAI 2024

  29. arXiv:2405.14017  [pdf, other

    cs.CV

    MagicPose4D: Crafting Articulated Models with Appearance and Motion Control

    Authors: Hao Zhang, Di Chang, Fang Li, Mohammad Soleymani, Narendra Ahuja

    Abstract: With the success of 2D and 3D visual generative models, there is growing interest in generating 4D content. Existing methods primarily rely on text prompts to produce 4D content, but they often fall short of accurately defining complex or rare motions. To address this limitation, we propose MagicPose4D, a novel framework for refined control over both appearance and motion in 4D generation. Unlike… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Project Page: https://boese0601.github.io/magicpose4d

  30. arXiv:2405.12607  [pdf, other

    cs.CV

    S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video

    Authors: Hao Zhang, Fang Li, Samyak Rawlekar, Narendra Ahuja

    Abstract: Reconstructing dynamic articulated objects from a singular monocular video is challenging, requiring joint estimation of shape, motion, and camera parameters from limited views. Current methods typically demand extensive computational resources and training time, and require additional human annotations such as predefined parametric models, camera poses, and key points, limiting their generalizabi… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  31. arXiv:2405.10300  [pdf, other

    cs.CV

    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

    Authors: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

    Abstract: This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model o… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: homepage: https://deepdataspace.com/home

  32. arXiv:2405.09782  [pdf, other

    cs.CV

    Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

    Authors: Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Runmin Cong, Xiaochun Cao, Qingming Huang

    Abstract: This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image. We observe that current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified withou… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by ICML2024

  33. arXiv:2405.07792  [pdf, other

    cs.DB cs.DS cs.LG

    Optimal Matrix Sketching over Sliding Windows

    Authors: Hanyan Yin, Dongxie Wen, Jiajun Li, Zhewei Wei, Xiao Zhang, Zengfeng Huang, Feifei Li

    Abstract: Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Dir… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  34. arXiv:2405.06705  [pdf, other

    cs.CL cs.AI

    LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

    Authors: Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, Dongsheng Li

    Abstract: Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: To appear at IJCAI 2024

  35. arXiv:2405.05499  [pdf, other

    cs.LG cs.AI

    Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting

    Authors: Feifei Li, Suhan Guo, Feng Han, Jian Zhao, Furao Shen

    Abstract: Accurate forecasting of long-term time series has important applications for decision making and planning. However, it remains challenging to capture the long-term dependencies in time series data. To better extract long-term dependencies, We propose Multi Scale Dilated Convolution Network (MSDCN), a method that utilizes a shallow dilated convolution architecture to capture the period and trend ch… ▽ More

    Submitted 14 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  36. arXiv:2405.02754  [pdf, other

    cs.RO cs.AI cs.LG

    Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning

    Authors: Weiye Zhao, Tairan He, Feihan Li, Changliu Liu

    Abstract: Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward sha**, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: submissions to Journal of Artificial Intelligence Research. arXiv admin note: text overlap with arXiv:2308.13140

  37. arXiv:2405.02628  [pdf, other

    cs.LG cs.AI

    Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction

    Authors: Zexing Zhao, Guangsi Shi, Xiaopeng Wu, Ruohua Ren, Xiaojun Gao, Fuyi Li

    Abstract: Molecular property prediction is a key component of AI-driven drug discovery and molecular characterization learning. Despite recent advances, existing methods still face challenges such as limited ability to generalize, and inadequate representation of learning from unlabeled data, especially for tasks specific to molecular structures. To address these limitations, we introduce DIG-Mol, a novel s… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  38. arXiv:2404.14581  [pdf, other

    cs.CV cs.AI cs.CR

    The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

    Authors: Yuying Li, Zeyan Liu, Junyi Zhao, Liangqin Ren, Fengjun Li, Jiebo Luo, Bo Luo

    Abstract: Generative AI models can produce high-quality images based on text prompts. The generated images often appear indistinguishable from images generated by conventional optical photography devices or created by human artists (i.e., real images). While the outstanding performance of such generative models is generally well received, security concerns arise. For instance, such image generators could be… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  39. arXiv:2404.13518  [pdf, other

    cs.CR cs.AI

    Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

    Authors: Hongyu Zhu, Sichu Liang, Wentao Hu, Fangqi Li, Ju Jia, Shilin Wang

    Abstract: With the rise of Machine Learning as a Service (MLaaS) platforms,safeguarding the intellectual property of deep learning models is becoming paramount. Among various protective measures, trigger set watermarking has emerged as a flexible and effective strategy for preventing unauthorized model distribution. However, this paper identifies an inherent flaw in the current paradigm of trigger set water… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  40. arXiv:2404.11825  [pdf, other

    cs.LG

    Hypergraph Self-supervised Learning with Sampling-efficient Signals

    Authors: Fan Li, Xiaoyang Wang, Dawei Cheng, Wenjie Zhang, Ying Zhang, Xuemin Lin

    Abstract: Self-supervised learning (SSL) provides a promising alternative for representation learning on hypergraphs without costly labels. However, existing hypergraph SSL models are mostly based on contrastive methods with the instance-level discrimination strategy, suffering from two significant limitations: (1) They select negative samples arbitrarily, which is unreliable in deciding similar and dissimi… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 9 pages,4 figures,4 tables

  41. arXiv:2404.11155  [pdf, other

    cs.CV

    HybriMap: Hybrid Clues Utilization for Effective Vectorized HD Map Construction

    Authors: Chi Zhang, Qi Song, Feifei Li, Yongquan Chen, Rui Huang

    Abstract: Constructing vectorized high-definition maps from surround-view cameras has garnered significant attention in recent years. However, the commonly employed multi-stage sequential workflow in prevailing approaches often leads to the loss of early-stage information, particularly in perspective-view features. Usually, such loss is observed as an instance missing or shape mismatching in the final birds… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  42. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  43. arXiv:2404.05050  [pdf, other

    cs.HC

    Co-design Accessible Public Robots: Insights from People with Mobility Disability, Robotic Practitioners and Their Collaborations

    Authors: Howard Ziyu Han, Franklin Mingzhe Li, Alesandra Baca Vazquez, Daragh Byrne, Nikolas Martelaro, Sarah E Fox

    Abstract: Sidewalk robots are increasingly common across the globe. Yet, their operation on public paths poses challenges for people with mobility disabilities (PwMD) who face barriers to accessibility, such as insufficient curb cuts. We interviewed 15 PwMD to understand how they perceive sidewalk robots. Findings indicated that PwMD feel they have to compete for space on the sidewalk when robots are introd… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  44. arXiv:2404.04953  [pdf, other

    cs.CV

    High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning

    Authors: Yu Lei, Guoshuai Sheng, Fangfang Li, Quanxue Gao, Cheng Deng, Qin Li

    Abstract: Zero-shot learning(ZSL) aims to recognize new classes without prior exposure to their samples, relying on semantic knowledge from observed classes. However, current attention-based models may overlook the transferability of visual features and the distinctiveness of attribute localization when learning regional features in images. Additionally, they often overlook shared attributes among different… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  45. arXiv:2404.04940  [pdf, other

    cs.LG

    Fuzzy K-Means Clustering without Cluster Centroids

    Authors: Han Lu, Fangfang Li, Quanxue Gao, Cheng Deng, Chris Ding, Qianqian Wang

    Abstract: Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. However, the performance of popular Fuzzy K-Means algorithms is sensitive to the selection of initial cluster centroids and is also affected by noise when updating mean cluster centroids. To address these challenges, this paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  46. arXiv:2403.18180  [pdf, other

    cs.CV

    Multi-Layer Dense Attention Decoder for Polyp Segmentation

    Authors: Krushi Patel, Fengjun Li, Guanghui Wang

    Abstract: Detecting and segmenting polyps is crucial for expediting the diagnosis of colon cancer. This is a challenging task due to the large variations of polyps in color, texture, and lighting conditions, along with subtle differences between the polyp and its surrounding area. Recently, vision Transformers have shown robust abilities in modeling global context for polyp segmentation. However, they face… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  47. arXiv:2403.17610  [pdf, other

    cs.CV

    MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors

    Authors: He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Cao

    Abstract: Foot contact is an important cue for human motion capture, understanding, and generation. Existing datasets tend to annotate dense foot contact using visual matching with thresholding or incorporating pressure signals. However, these approaches either suffer from low accuracy or are only designed for small-range and slow motion. There is still a lack of a vision-pressure multimodal dataset with la… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  48. arXiv:2403.17369  [pdf, other

    cs.CV

    CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

    Authors: Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xiangwei Zhu, Zhenming Ji

    Abstract: Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these disc… ▽ More

    Submitted 4 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  49. arXiv:2403.16645  [pdf

    cs.HC

    Virtual Co-Pilot: Multimodal Large Language Model-enabled Quick-access Procedures for Single Pilot Operations

    Authors: Fan Li, Shanshan Feng, Yuqi Yan, Ching-Hung Lee, Yew Soon Ong

    Abstract: Advancements in technology, pilot shortages, and cost pressures are driving a trend towards single-pilot and even remote operations in aviation. Considering the extensive workload and huge risks associated with single-pilot operations, the development of a Virtual Co-Pilot (V-CoP) is expected to be a potential way to ensure aviation safety. This study proposes a V-CoP concept and explores how huma… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 10 pages,7 figures

  50. arXiv:2403.16540  [pdf, other

    cs.HC

    Enhancing Cross-Dataset EEG Emotion Recognition: A Novel Approach with Emotional EEG Style Transfer Network

    Authors: Yi** Zhou, Fu Li, Yang Li, Youshuo Ji, Lijian Zhang, Yuanfang Chen

    Abstract: Recognizing the pivotal role of EEG emotion recognition in the development of affective Brain-Computer Interfaces (aBCIs), considerable research efforts have been dedicated to this field. While prior methods have demonstrated success in intra-subject EEG emotion recognition, a critical challenge persists in addressing the style mismatch between EEG signals from the source domain (training data) an… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 8 pages. arXiv admin note: substantial text overlap with arXiv:2308.05767