Skip to main content

Showing 1–50 of 397 results for author: Wu, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13988  [pdf, other

    cs.CV

    LGmap: Local-to-Global Map** Network for Online Long-Range Vectorized HD Map Construction

    Authors: Kuang Wu, Sulei Nian, Can Shen, Chuan Yang, Zhanbin Li

    Abstract: This report introduces the first-place winning solution for the Autonomous Grand Challenge 2024 - Mapless Driving. In this report, we introduce a novel online map** pipeline LGmap, which adept at long-range temporal model. Firstly, we propose symmetric view transformation(SVT), a hybrid view transformation module. Our approach overcomes the limitations of forward sparse feature representation an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.13743  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

    Authors: Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia, Pengchuan Zhang, Graham Neubig, Deva Ramanan

    Abstract: While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-vis… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: We open-source our dataset, model, and code at: https://linzhiqiu.github.io/papers/genai_bench ; Project page: https://linzhiqiu.github.io/papers/genai_bench ; GenAI-Bench was first introduced in arxiv:2404.01291. This article extends it with an additional GenAI-Rank benchmark.

  3. arXiv:2406.11941  [pdf, other

    cs.LG cs.AI cs.RO

    Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction

    Authors: Junwei You, Haotian Shi, Keshu Wu, Keke Long, Sicheng Fu, Sikai Chen, Bin Ran

    Abstract: Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS), enhancing road safety and traffic efficiency. While traditional methods have laid foundational work, modern deep learning techniques, particularly transformer-based models and generative approaches, have significantly improved prediction accuracy by capturing complex and non-lin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.11643  [pdf, other

    cs.CV

    AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

    Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, **long Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

    Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.06496  [pdf, other

    cs.LG cs.CL cs.CV

    Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

    Authors: Oishi Banerjee, Hong-Yu Zhou, Subathra Adithan, Stephen Kwak, Kay Wu, Pranav Rajpurkar

    Abstract: Recent advances in generative vision-language models (VLMs) have exciting potential implications for AI in radiology, yet VLMs are also known to produce hallucinations, nonsensical text, and other unwanted behaviors that can waste clinicians' time and cause patient harm. Drawing on recent work on direct preference optimization (DPO), we propose a simple method for modifying the behavior of pretrai… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Added acknowledgemnts

  6. arXiv:2406.01870  [pdf, other

    cs.LG stat.ML

    Understanding Stochastic Natural Gradient Variational Inference

    Authors: Kaiwen Wu, Jacob R. Gardner

    Abstract: Stochastic natural gradient variational inference (NGVI) is a popular posterior inference method with applications in various probabilistic models. Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding. For conjugate likelihoods, we prove the first $\mathcal{O}(\frac{1}{T})$ n… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  7. arXiv:2406.01316  [pdf, other

    cs.CV cs.AI

    Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs

    Authors: Vitor Fortes Rey, Lala Shakti Swarup Ray, Xia Qingxin, Kaishun Wu, Paul Lukowicz

    Abstract: Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Review Copy

  8. arXiv:2405.20613  [pdf, other

    cs.CL

    FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores

    Authors: Alyssa Huang, Oishi Banerjee, Kay Wu, Eduardo Pontes Reis, Pranav Rajpurkar

    Abstract: The current gold standard for evaluating generated chest x-ray (CXR) reports is through radiologist annotations. However, this process can be extremely time-consuming and costly, especially when evaluating large numbers of reports. In this work, we present FineRadScore, a Large Language Model (LLM)-based automated evaluation metric for generated CXR reports. Given a candidate report and a ground-t… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  9. arXiv:2405.20343  [pdf, other

    cs.CV cs.GR cs.LG

    Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

    Authors: Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng Ma

    Abstract: In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://wukailu.github.io/Unique3D

    ACM Class: I.2.10

  10. arXiv:2405.20281  [pdf, other

    cs.CR quant-ph

    Tight Characterizations for Preprocessing against Cryptographic Salting

    Authors: Fangqi Dong, Qipeng Liu, Kewen Wu

    Abstract: Cryptography often considers the strongest yet plausible attacks in the real world. Preprocessing (a.k.a. non-uniform attack) plays an important role in both theory and practice: an efficient online attacker can take advantage of advice prepared by a time-consuming preprocessing stage. Salting is a heuristic strategy to counter preprocessing attacks by feeding a small amount of randomness to the… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.20081  [pdf, other

    cs.CV cs.AI

    NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

    Authors: Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang

    Abstract: Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures with supplementary material

  12. arXiv:2405.17372  [pdf, other

    cs.AI cs.LG cs.RO

    BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

    Authors: Zikang Zhou, Haibo Hu, Xinhong Chen, Jian** Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue

    Abstract: Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to lo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  13. arXiv:2405.17336  [pdf, other

    cs.CL

    XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser

    Authors: Xianfu Cheng, Hang Zhang, Jian Yang, Xiang Li, Weixiao Zhou, Kui Wu, Fei Liu, Wei Zhang, Tao Sun, Tongliang Li, Zhoujun Li

    Abstract: In the domain of document AI, semi-structured form parsing plays a crucial role. This task leverages techniques from key information extraction (KIE), dealing with inputs that range from plain text to intricate modal data comprising images and structural layouts. The advent of pre-trained multimodal models has driven the extraction of key information from form documents in different formats such a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures, 6 tables

  14. arXiv:2405.14133  [pdf, other

    cs.LG cs.AI cs.SC

    Automated Loss function Search for Class-imbalanced Node Classification

    Authors: Xinyu Guo, Kai Wu, Xiaoyu Zhang, **g Liu

    Abstract: Class-imbalanced node classification tasks are prevalent in real-world scenarios. Due to the uneven distribution of nodes across different classes, learning high-quality node representations remains a challenging endeavor. The engineering of loss functions has shown promising potential in addressing this issue. It involves the meticulous design of loss functions, utilizing information about the qu… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  15. arXiv:2405.12484  [pdf, other

    cs.GR

    Meta-Homogenization for Knitwear Simulation

    Authors: Chun Yuan, Kui Wu, Haoyang Shi, Lei Lan, Yuxing Qiu, Cem Yuksel, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: This paper presents meta-homogenization, a spatially varying homogenization scheme for knitwear simulation. We are motivated by the observation that macro-scale fabric dynamics is strongly correlated with its underlying knitting patterns. Therefore, homogenization towards a single material is less effective when the knitting is complex and non-repetitive. Our method tackles this challenge by homog… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  16. arXiv:2405.10988  [pdf, other

    cs.LG cs.AI

    Flow Score Distillation for Diverse Text-to-3D Generation

    Authors: Runjie Yan, Kailu Wu, Kaisheng Ma

    Abstract: Recent advancements in Text-to-3D generation have yielded remarkable progress, particularly through methods that rely on Score Distillation Sampling (SDS). While SDS exhibits the capability to create impressive 3D assets, it is hindered by its inherent maximum-likelihood-seeking essence, resulting in limited diversity in generation outcomes. In this paper, we discover that the Denoise Diffusion Im… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  17. arXiv:2405.10739  [pdf, other

    cs.CV cs.AI

    Efficient Multimodal Large Language Models: A Survey

    Authors: Yizhang **, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, e… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  18. arXiv:2405.10565  [pdf, other

    cs.GR

    Real-time Level-of-Detail Strand-based Hair Rendering

    Authors: Tao Huang, Yang Zhou, Daqi Lin, Junqiu Zhu, Ling-Qi Yan, Kui Wu

    Abstract: Strand-based hair rendering has become increasingly popular in production for its realistic appearance. However, the prevailing level-of-detail solution employing hair cards for distant hair models introduces a significant discontinuity in dynamics and appearance during the transition from strands to cards. We introduce an innovative real-time framework for strand-based hair rendering that ensures… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 10 figures, 1 performance plot

    ACM Class: I.3.5; I.3.3

  19. arXiv:2405.09285  [pdf, other

    cs.LG math.NA

    Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

    Authors: Junfeng Chen, Kailiang Wu

    Abstract: Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism$\unicode{x2013}$a powerful tool originally designed for natural language processing$\unicode{x2013}$have recently been adapted for operator learning. However, they confront challenges, including high comp… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Journal ref: Poster in International Conference on Machine Learning (ICML) 2024

  20. arXiv:2405.08276  [pdf, other

    stat.ML cs.LG stat.CO

    Scalable Subsampling Inference for Deep Neural Networks

    Authors: Ke** Wu, Dimitris N. Politis

    Abstract: Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest r… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  21. arXiv:2405.06201  [pdf, other

    cs.CV

    PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement

    Authors: Jiyao Wang, Hao Lu, Ange Wang, Xiao Yang, Yingcong Chen, Dengbo He, Kaishun Wu

    Abstract: Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  22. arXiv:2405.05663  [pdf, other

    cs.CV

    RPBG: Towards Robust Neural Point-based Graphics in the Wild

    Authors: Qingtian Zhu, Zizhuang Wei, Zhongtian Zheng, Yifan Zhan, Zhuyu Yao, Jiawang Zhang, Kejian Wu, Yinqiang Zheng

    Abstract: Point-based representations have recently gained popularity in novel view synthesis, for their unique advantages, e.g., intuitive geometric representation, simple manipulation, and faster convergence. However, based on our observation, these point-based neural re-rendering methods are only expected to perform well under ideal conditions and suffer from noisy, patchy points and unbounded scenes, wh… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  23. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  24. arXiv:2405.03728  [pdf, other

    cs.NE cs.AI

    GLHF: General Learned Evolutionary Algorithm Via Hyper Functions

    Authors: Xiaobin Li, Kai Wu, Yujian Betterest Li, Xiaoyu Zhang, Handing Wang, **g Liu

    Abstract: Pretrained Optimization Models (POMs) leverage knowledge gained from optimizing various tasks, providing efficient solutions for new optimization challenges through direct usage or fine-tuning. Despite the inefficiencies and limited generalization abilities observed in current POMs, our proposed model, the general pre-trained optimization model (GPOM), addresses these shortcomings. GPOM constructs… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  25. arXiv:2405.00566  [pdf, other

    cs.CE cs.CL q-fin.GN

    NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance

    Authors: Huan-Yi Su, Ke Wu, Yu-Hao Huang, Wu-Jun Li

    Abstract: Recently, many works have proposed various financial large language models (FinLLMs) by pre-training from scratch or fine-tuning open-sourced LLMs on financial corpora. However, existing FinLLMs exhibit unsatisfactory performance in understanding financial text when numeric variables are involved in questions. In this paper, we propose a novel LLM, called numeric-sensitive large language model (Nu… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  26. arXiv:2404.18891  [pdf, other

    cs.CV cs.AI cs.LG

    IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

    Authors: Kebin Wu, Wenbin Li, Xiaofei Xiao

    Abstract: The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and segmentation performance. However, previous approaches, whether based on consistency regularization or self-training, tend to neglect the contextual knowledge emb… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 7 pages, 2 figures

  27. arXiv:2404.18327  [pdf, other

    cs.CV

    MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

    Authors: Peihao Xiang, Chaohao Lin, Kaida Wu, Ou Bai

    Abstract: This paper presents a novel approach to processing multimodal data for dynamic emotion recognition, named as the Multimodal Masked Autoencoder for Dynamic Emotion Recognition (MultiMAE-DER). The MultiMAE-DER leverages the closely correlated representation information within spatiotemporal sequences across visual and audio modalities. By utilizing a pre-trained masked autoencoder model, the MultiMA… ▽ More

    Submitted 16 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: Camera-ready Version, Accepted by ICPRS 2024

  28. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  29. arXiv:2404.13611  [pdf, other

    cs.CV cs.CL

    Video sentence grounding with temporally global textual knowledge

    Authors: Cai Chen, Runzhong Zhang, Jianjun Gao, Kejun Wu, Kim-Hui Yap, Yi Wang

    Abstract: Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the… ▽ More

    Submitted 1 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  30. arXiv:2404.13059  [pdf, other

    math.OC cs.CE cs.GR

    Regularization in Space-Time Topology Optimization for Multi-Axis Additive Manufacturing

    Authors: Weiming Wang, Kai Wu, Fred van Keulen, Jun Wu

    Abstract: In additive manufacturing, the fabrication sequence has a large influence on the quality of manufactured components. While planning of the fabrication sequence is typically performed after the component has been designed, recent developments have demonstrated the possibility and benefits of simultaneous optimization of both the structural layout and the corresponding fabrication sequence. The simu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 23 pages, 14 figures

  31. arXiv:2404.10198  [pdf, other

    cs.CL cs.AI

    ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

    Authors: Kevin Wu, Eric Wu, James Zou

    Abstract: Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect… ▽ More

    Submitted 10 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Revised June 9 2024

  32. arXiv:2404.09857  [pdf, other

    cs.CV cs.AI cs.RO

    Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL

    Authors: Fangwei Zhong, Kui Wu, Hai Ci, Churan Wang, Hao Chen

    Abstract: Embodied visual tracking is to follow a target object in dynamic 3D environments using an agent's egocentric vision. This is a vital and challenging skill for embodied agents. However, existing methods suffer from inefficient training and poor generalization. In this paper, we propose a novel framework that combines visual foundation models (VFM) and offline reinforcement learning (offline RL) to… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  33. arXiv:2404.08870  [pdf, other

    cs.CC

    Almost Optimal Time Lower Bound for Approximating Parameterized Clique, CSP, and More, under ETH

    Authors: Venkatesan Guruswami, Bingkai Lin, Xuandi Ren, Yican Sun, Kewen Wu

    Abstract: The Parameterized Inapproximability Hypothesis (PIH), which is an analog of the PCP theorem in parameterized complexity, asserts that, there is a constant $\varepsilon> 0$ such that for any computable function $f:\mathbb{N}\to\mathbb{N}$, no $f(k)\cdot n^{O(1)}$-time algorithm can, on input a $k$-variable CSP instance with domain size $n$, find an assignment satisfying $1-\varepsilon$ fraction of… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  34. arXiv:2404.07493  [pdf, other

    cs.LG cs.AI

    Characterizing the Influence of Topology on Graph Learning Tasks

    Authors: Kailong Wu, Yule Xie, Jiaxin Ding, Yuxiang Ren, Luoyi Fu, Xinbing Wang, Chenghu Zhou

    Abstract: Graph neural networks (GNN) have achieved remarkable success in a wide range of tasks by encoding features combined with topology to create effective representations. However, the fundamental problem of understanding and analyzing how graph topology influences the performance of learning models on downstream tasks has not yet been well understood. In this paper, we propose a metric, TopoInf, which… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  35. arXiv:2404.06836  [pdf, other

    cs.CV

    O2V-Map**: Online Open-Vocabulary Map** with Neural Implicit Representation

    Authors: Muer Tie, Julong Wei, Zhengjun Wang, Ke Wu, Shansuai Yuan, Kaizhao Zhang, Jie Jia, Jieru Zhao, Zhongxue Gan, Wenchao Ding

    Abstract: Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required. Recently, neural implicit representation has provided a promising direction for online interactive map**. However, implementing open-vocabulary scene understanding capability into online neural implicit map** still faces three challenges: lac… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  36. arXiv:2404.05183  [pdf, other

    cs.CV cs.LG

    Progressive Alignment with VLM-LLM Feature to Augment Defect Classification for the ASE Dataset

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu

    Abstract: Traditional defect classification approaches are facing with two barriers. (1) Insufficient training data and unstable data quality. Collecting sufficient defective sample is expensive and time-costing, consequently leading to dataset variance. It introduces the difficulty on recognition and learning. (2) Over-dependence on visual modality. When the image pattern and texture is monotonic for all d… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: MULA 2024

  37. arXiv:2404.00893  [pdf, other

    cs.RO

    An Integrating Comprehensive Trajectory Prediction with Risk Potential Field Method for Autonomous Driving

    Authors: Kailu Wu, Xing Liu, Feiyu Bian, Yizhai Zhang, Panfeng Huang

    Abstract: Due to the uncertainty of traffic participants' intentions, generating safe but not overly cautious behavior in interactive driving scenarios remains a formidable challenge for autonomous driving. In this paper, we address this issue by combining a deep learning-based trajectory prediction model with risk potential field-based motion planning. In order to comprehensively predict the possible futur… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  38. arXiv:2403.20159  [pdf, other

    cs.CV

    HGS-Map**: Online Dense Map** Using Hybrid Gaussian Representation in Urban Scenes

    Authors: Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Shanshuai Yuan, Muer Tie, Julong Wei, Zijun Xu, Jieru Zhao, Zhongxue Gan, Wenchao Ding

    Abstract: Online dense map** of urban scenes forms a fundamental cornerstone for scene understanding and navigation of autonomous vehicles. Recent advancements in map** methods are mainly based on NeRF, whose rendering speed is too slow to meet online requirements. 3D Gaussian Splatting (3DGS), with its rendering speed hundreds of times faster than NeRF, holds greater potential in online dense map**.… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  39. arXiv:2403.18356  [pdf, other

    cs.CV

    MonoHair: High-Fidelity Hair Modeling from a Monocular Video

    Authors: Keyu Wu, Lingchen Yang, Zhiyi Kuang, Yao Feng, Xutao Han, Yuefan Shen, Hongbo Fu, Kun Zhou, Youyi Zheng

    Abstract: Undoubtedly, high-fidelity 3D hair is crucial for achieving realism, artistic expression, and immersion in computer graphics. While existing 3D hair modeling methods have achieved impressive performance, the challenge of achieving high-quality hair reconstruction persists: they either require strict capture conditions, making practical applications difficult, or heavily rely on learned prior data,… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE CVPR 2024

  40. arXiv:2403.17701   

    eess.IV cs.CV cs.LG

    Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation

    Authors: Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu

    Abstract: Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its var… ▽ More

    Submitted 3 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Experimental method encountered errors, undergoing experiment again

  41. arXiv:2403.16450  [pdf, other

    cs.CV

    Camera-aware Label Refinement for Unsupervised Person Re-identification

    Authors: Pengna Li, Kangyi Wu, Wenli Huang, San** Zhou, **jun Wang

    Abstract: Unsupervised person re-identification aims to retrieve images of a specified person without identity labels. Many recent unsupervised Re-ID approaches adopt clustering-based methods to measure cross-camera feature similarity to roughly divide images into clusters. They ignore the feature distribution discrepancy induced by camera domain gap, resulting in the unavoidable performance degradation. Ca… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: submitted to IEEE TMM

  42. arXiv:2403.14922  [pdf, other

    cs.LG cs.NI

    CODA: A COst-efficient Test-time Domain Adaptation Mechanism for HAR

    Authors: Minghui Qiu, Yandao Huang, Lin Chen, Lu Wang, Kaishun Wu

    Abstract: In recent years, emerging research on mobile sensing has led to novel scenarios that enhance daily life for humans, but dynamic usage conditions often result in performance degradation when systems are deployed in real-world settings. Existing solutions typically employ one-off adaptation schemes based on neural networks, which struggle to ensure robustness against uncertain drifting conditions in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  43. arXiv:2403.12658  [pdf, other

    cs.CV

    Tuning-Free Image Customization with Image and Text Guidance

    Authors: Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, Yong Liu, **long Peng, Chengjie Wang, Feng Zheng

    Abstract: Despite significant advancements in image customization with diffusion models, current methods still have several limitations: 1) unintended changes in non-target areas when regenerating the entire image; 2) guidance solely by a reference image or text descriptions; and 3) time-consuming fine-tuning, which limits their practical application. In response, we introduce a tuning-free framework for si… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 17 pages, 8 figures

  44. arXiv:2403.12543  [pdf, other

    cs.CV

    HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching

    Authors: Ying Chen, Yong Liu, Kai Wu, Qiang Nie, Shang Xu, Huifang Ma, Bing Wang, Chengjie Wang

    Abstract: Deep learning-based image matching methods play a crucial role in computer vision, yet they often suffer from substantial computational demands. To tackle this challenge, we present HCPM, an efficient and detector-free local feature-matching method that employs hierarchical pruning to optimize the matching pipeline. In contrast to recent detector-free methods that depend on an exhaustive set of co… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  45. arXiv:2403.11536  [pdf, other

    cs.CV cs.AI cs.LG

    OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu

    Abstract: Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These incl… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  46. arXiv:2403.10033  [pdf, other

    cs.CG

    Ipelets for the Convex Polygonal Geometry

    Authors: Nithin Parepally, Ainesh Chatterjee, Auguste Gezalyan, Hongyang Du, Sukrit Mangla, Kenny Wu, Sarah Hwang, David Mount

    Abstract: There are many structures, both classical and modern, involving convex polygonal geometries whose deeper understanding would be facilitated through interactive visualizations. The Ipe extensible drawing editor, developed by Otfried Cheong, is a widely used software system for generating geometric figures. One of its features is the capability to extend its functionality through programs called Ipe… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  47. arXiv:2403.06947  [pdf, other

    cs.CV

    Advancing Generalizable Remote Physiological Measurement through the Integration of Explicit and Implicit Prior Knowledge

    Authors: Yuting Zhang, Hao Lu, Xin Liu, Yingcong Chen, Kaishun Wu

    Abstract: Remote photoplethysmography (rPPG) is a promising technology that captures physiological signals from face videos, with potential applications in medical health, emotional computing, and biosecurity recognition. The demand for rPPG tasks has expanded from demonstrating good performance on intra-dataset testing to cross-dataset testing (i.e., domain generalization). However, most existing methods h… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  48. arXiv:2402.18394  [pdf, other

    cs.RO

    Dual-IMU State Estimation for Relative Localization of Two Mobile Agents

    Authors: Wenqian Lai, Ruonan Guo, Kejian J. Wu

    Abstract: In this paper, we address the problem of relative localization of two mobile agents. Specifically, we consider the Dual-IMU system, where each agent is equipped with one IMU, and employs relative pose observations between them. Previous works, however, typically assumed known ego motion and ignored biases of the IMUs. Instead, we study the most general case of unknown biases for both IMUs. Besides… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  49. arXiv:2402.15237  [pdf, other

    cs.CV cs.LG

    Unsupervised Domain Adaptation for Brain Vessel Segmentation through Transwarp Contrastive Learning

    Authors: Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Kun Wu, Nishant Ravikumar, Alejandro F. Frangi

    Abstract: Unsupervised domain adaptation (UDA) aims to align the labelled source distribution with the unlabelled target distribution to obtain domain-invariant predictive models. Since cross-modality medical data exhibit significant intra and inter-domain shifts and most are unlabelled, UDA is more important while challenging in medical image analysis. This paper proposes a simple yet potent contrastive le… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted by ISBI 2024

  50. arXiv:2402.14278  [pdf, other

    cs.CC cs.DS quant-ph

    Locality Bounds for Sampling Hamming Slices

    Authors: Daniel M. Kane, Anthony Ostuni, Kewen Wu

    Abstract: Spurred by the influential work of Viola (Journal of Computing 2012), the past decade has witnessed an active line of research into the complexity of (approximately) sampling distributions, in contrast to the traditional focus on the complexity of computing functions. We build upon and make explicit earlier implicit results of Viola to provide superconstant lower bounds on the locality of Boolea… ▽ More

    Submitted 26 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Minor updates to better reflect past literature. No technical material has been changed