Skip to main content

Showing 1–50 of 610 results for author: Han, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00906  [pdf, other

    cs.CV cs.LG

    GSO-YOLO: Global Stability Optimization YOLO for Construction Site Detection

    Authors: Yuming Zhang, Dongzhi Guan, Shouxin Zhang, Junhao Su, Yunzhi Han, Jiabin Liu

    Abstract: Safety issues at construction sites have long plagued the industry, posing risks to worker safety and causing economic damage due to potential hazards. With the advancement of artificial intelligence, particularly in the field of computer vision, the automation of safety monitoring on construction sites has emerged as a solution to this longstanding issue. Despite achieving impressive performance,… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2407.00548  [pdf, other

    cs.RO

    KOROL: Learning Visualizable Object Feature with Koopman Operator Rollout for Manipulation

    Authors: Hongyi Chen, Abulikemu Abuduweili, Aviral Agrawal, Yunhai Han, Harish Ravichandar, Changliu Liu, Jeffrey Ichnowski

    Abstract: Learning dexterous manipulation skills presents significant challenges due to complex nonlinear dynamics that underlie the interactions between objects and multi-fingered hands. Koopman operators have emerged as a robust method for modeling such nonlinear dynamics within a linear framework. However, current methods rely on runtime access to ground-truth (GT) object states, making them unsuitable f… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.16500  [pdf, other

    cs.NE

    A Dual-Channel Particle Swarm Optimization Algorithm Based on Adaptive Balance Search

    Authors: Zhenxing Zhang, Tianxian Zhang, Xiangliang Xu, Lingjiang Kong, Yi Han, Zicheng Wang

    Abstract: The balance between exploration (Er) and exploitation (Ei) determines the generalization performance of the particle swarm optimization (PSO) algorithm on different problems. Although the insufficient balance caused by global best being located near a local minimum has been widely researched, few scholars have systematically paid attention to two behaviors about personal best position (P) and glob… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.11654  [pdf, other

    cs.CL

    Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

    Authors: Vernon Toh Yan Han, Rishabh Bhardwaj, Soujanya Poria

    Abstract: We propose Ruby Teaming, a method that improves on Rainbow Teaming by including a memory cache as its third dimension. The memory dimension provides cues to the mutator to yield better-quality prompts, both in terms of attack success rate (ASR) and quality diversity. The prompt archive generated by Ruby Teaming has an ASR of 74%, which is 20% higher than the baseline. In terms of quality diversity… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  6. arXiv:2406.09162  [pdf, other

    cs.CV

    EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

    Authors: Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang

    Abstract: Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image gen… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: https://tencentqqgylab.github.io/EMMA

  7. arXiv:2406.07111  [pdf, other

    cs.CV

    NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

    Authors: Yufei Han, Heng Guo, Koki Fukai, Hiroaki Santo, Boxin Shi, Fumio Okura, Zhanyu Ma, Yunpeng Jia

    Abstract: We present NeRSP, a Neural 3D reconstruction technique for Reflective surfaces with Sparse Polarized images. Reflective surface reconstruction is extremely challenging as specular reflections are view-dependent and thus violate the multiview consistency for multiview stereo. On the other hand, sparse image inputs, as a practical capture setting, commonly cause incomplete or distorted results due t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 10 pages

  8. arXiv:2406.06207  [pdf, other

    cs.LG cs.CR

    Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

    Authors: Xiaoting Lyu, Yufei Han, Wei Wang, **gkai Liu, Yongsheng Zhu, Guangquan Xu, Jiqiang Liu, Xiangliang Zhang

    Abstract: Federated Learning (FL) is a collaborative machine learning technique where multiple clients work together with a central server to train a global model without sharing their private data. However, the distribution shift across non-IID datasets of clients poses a challenge to this one-model-fits-all method hindering the ability of the global model to effectively adapt to each client's unique local… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Usenix Security 2024

  9. arXiv:2406.05995  [pdf, other

    cs.CL cs.AI cs.LG

    A Dual-View Approach to Classifying Radiology Reports by Co-Training

    Authors: Yutong Han, Yan Yuan, Lili Mou

    Abstract: Radiology report analysis provides valuable information that can aid with public health initiatives, and has been attracting increasing attention from the research community. In this work, we present a novel insight that the structure of a radiology report (namely, the Findings and Impression sections) offers different views of a radiology scan. Based on this intuition, we further propose a co-tra… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by LREC-COLING 2024

  10. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: **long Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  11. arXiv:2405.20969  [pdf, other

    cs.RO eess.SY

    Design, Calibration, and Control of Compliant Force-sensing Grip** Pads for Humanoid Robots

    Authors: Yuanfeng Han, Boren Jiang, Gregory S. Chirikjian

    Abstract: This paper introduces a pair of low-cost, light-weight and compliant force-sensing grip** pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal grip** forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the g… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 21 pages, 16 figures, Published in ASME Journal of Mechanisms and Robotics

    Journal ref: Journal of Mechanisms and Robotics, 15, 031010,2023

  12. arXiv:2405.20494  [pdf, other

    cs.CV cs.AI cs.LG

    Slight Corruption in Pre-training Data Makes Better Diffusion Models

    Authors: Hao Chen, Yu** Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, **dong Wang, Bhiksha Raj

    Abstract: Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pair… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 50 pages, 33 figures, 4 tables

  13. arXiv:2405.19610  [pdf, other

    stat.ML cs.LG stat.ME

    Factor Augmented Tensor-on-Tensor Neural Networks

    Authors: Guanhao Zhou, Yuefeng Han, Xiufan Yu

    Abstract: This paper studies the prediction task of tensor-on-tensor regression in which both covariates and responses are multi-dimensional arrays (a.k.a., tensors) across time with arbitrary tensor order and data dimension. Existing methods either focused on linear models without accounting for possibly nonlinear relationships between covariates and responses, or directly employed black-box deep learning… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  14. arXiv:2405.19033  [pdf, other

    cs.LG cs.AI

    CiliaGraph: Enabling Expression-enhanced Hyper-Dimensional Computation in Ultra-Lightweight and One-Shot Graph Classification on Edge

    Authors: Yuxi Han, Jihe Wang, Danghui Wang

    Abstract: Graph Neural Networks (GNNs) are computationally demanding and inefficient when applied to graph classification tasks in resource-constrained edge scenarios due to their inherent process, involving multiple rounds of forward and backward propagation. As a lightweight alternative, Hyper-Dimensional Computing (HDC), which leverages high-dimensional vectors for data encoding and processing, offers a… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  15. arXiv:2405.18959  [pdf, other

    cs.CV cs.MM

    Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval

    Authors: Rui Yang, Shuang Wang, Ying** Han, Yuanheng Li, Dong Zhao, Dou Quan, Yanhe Guo, Licheng Jiao

    Abstract: Remote Sensing Image-Text Retrieval (RSITR) is pivotal for knowledge services and data mining in the remote sensing (RS) domain. Considering the multi-scale representations in image content and text vocabulary can enable the models to learn richer representations and enhance retrieval. Current multi-scale RSITR approaches typically align multi-scale fused image features with text features, but ove… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures

  16. arXiv:2405.18483  [pdf, other

    cs.CV

    Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

    Authors: Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, Mitch Hill

    Abstract: This work aims to generate natural and diverse group motions of multiple humans from textual descriptions. While single-person text-to-motion generation is extensively studied, it remains challenging to synthesize motions for more than one or two subjects from in-the-wild prompts, mainly due to the lack of available datasets. In this work, we curate human pose and motion datasets by estimating pos… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://shanmy.github.io/Multi-Motion/

  17. arXiv:2405.17984  [pdf, other

    cs.LG

    Cross-Context Backdoor Attacks against Graph Prompt Learning

    Authors: Xiaoting Lyu, Yufei Han, Wei Wang, Hangwei Qian, Ivor Tsang, Xiangliang Zhang

    Abstract: Graph Prompt Learning (GPL) bridges significant disparities between pretraining and downstream applications to alleviate the knowledge transfer bottleneck in real-world graph learning. While GPL offers superior effectiveness in graph knowledge transfer and computational efficiency, the security risks posed by backdoor poisoning effects embedded in pretrained models remain largely unexplored. Our s… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  18. arXiv:2405.17789  [pdf, ps, other

    cs.IT

    On the Downlink Average Energy Efficiency of Non-Stationary XL-MIMO

    Authors: Jun Zhang, Jiacheng Lu, **g**g Zhang, Yu Han, Jue Wang, Shi **

    Abstract: Extra large-scale multiple-input multiple-output (XL-MIMO) is a key technology for future wireless communication systems. This paper considers the effects of visibility region (VR) at the base station (BS) in a non-stationary multi-user XL-MIMO scenario, where only partial antennas can receive users' signal. In time division duplexing (TDD) mode, we first estimate the VR at the BS by detecting the… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 13 pages, 11 figures

  19. arXiv:2405.16605  [pdf, other

    cs.CV

    Demystify Mamba in Vision: A Linear Attention Perspective

    Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

    Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  20. arXiv:2405.15474  [pdf, other

    cs.LG cs.DC

    Unlearning during Learning: An Efficient Federated Machine Unlearning Method

    Authors: Hanlin Gu, Gongxi Zhu, Jie Zhang, Xinyuan Zhao, Yuxing Han, Lixin Fan, Qiang Yang

    Abstract: In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  21. arXiv:2405.15343  [pdf, other

    cs.CV

    Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

    Authors: Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu

    Abstract: The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  22. arXiv:2405.14212  [pdf, other

    cs.CR cs.CL

    Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

    Authors: Haoran Li, Xinyuan Zhao, Dadi Guo, Hanlin Gu, Ziqian Zeng, Yuxing Han, Yangqiu Song, Lixin Fan, Qiang Yang

    Abstract: As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utiliz… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  23. arXiv:2405.13694  [pdf, other

    cs.CV

    Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances

    Authors: Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han

    Abstract: Recent advancements in neural rendering techniques have significantly enhanced the fidelity of 3D reconstruction. Notably, the emergence of 3D Gaussian Splatting (3DGS) has marked a significant milestone by adopting a discrete scene representation, facilitating efficient training and real-time rendering. Several studies have successfully extended the real-time rendering capability of 3DGS to dynam… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 14 pages, 6 figures

  24. arXiv:2405.13179  [pdf, other

    cs.CL

    RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

    Authors: Yuelyu Ji, Zhuochun Li, Rui Meng, Sonish Sivarajkumar, Yanshan Wang, Zeshui Yu, Hui Ji, Yushui Han, Hanyu Zeng, Daqing He

    Abstract: This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learni… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  25. arXiv:2405.12970  [pdf, other

    cs.CV

    Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

    Authors: Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong Liu

    Abstract: Current face reenactment and swap** methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient and effective adapter designed… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Project Page: https://faceadapter.github.io/face-adapter.github.io/

  26. arXiv:2405.08768  [pdf, other

    cs.CV cs.AI cs.LG

    EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

    Authors: Yulin Wang, Yang Yue, Rui Lu, Yizeng Han, Shiji Song, Gao Huang

    Abstract: The superior performance of modern visual backbones usually comes with a costly training procedure. We contribute to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using easier-to-harder data. Specifically, we reformulate the training curriculum as a soft-selection function, which uncovers progressively more difficult patterns with… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Journal version of arXiv:2211.09703 (ICCV 2023). Code is available at: https://github.com/LeapLabTHU/EfficientTrain

  27. arXiv:2405.06959  [pdf, other

    cs.RO

    AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenoty** and Pose Estimation

    Authors: Xingxu Li, Nan Ma, Yiheng Han, Shun Yang, Siyi Zheng

    Abstract: To address the limitations inherent to conventional automated harvesting robots specifically their suboptimal success rates and risk of crop damage, we design a novel bot named AHPPEBot which is capable of autonomous harvesting based on crop phenoty** and pose estimation. Specifically, In phenoty**, the detection, association, and maturity estimation of tomato trusses and individual fruits are… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA),7 pages, 3 figures

  28. arXiv:2405.04020  [pdf, other

    cs.GT cs.DS

    Metric Distortion of Line-up Elections: The Right Person for the Right Job

    Authors: Christopher Jerrett, Yue Han, Elliot Anshelevich

    Abstract: We provide mechanisms and new metric distortion bounds for line-up elections. In such elections, a set of $n$ voters, $k$ candidates, and $\ell$ positions are all located in a metric space. The goal is to choose a set of candidates and assign them to different positions, so as to minimize the total cost of the voters. The cost of each voter consists of the distances from itself to the chosen candi… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  29. arXiv:2405.00797  [pdf, other

    cs.RO cs.CV

    ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

    Authors: Jiahui Li, Tianle Shen, Zekai Gu, Jiawei Sun, Chengran Yuan, Yuhang Han, Shuo Sun, Marcelo H. Ang Jr

    Abstract: Motion prediction is a challenging problem in autonomous driving as it demands the system to comprehend stochastic dynamics and the multi-modal nature of real-world agent interactions. Diffusion models have recently risen to prominence, and have proven particularly effective in pedestrian motion prediction tasks. However, the significant time consumption and sensitivity to noise have limited the r… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  30. arXiv:2405.00728  [pdf

    cs.CL cs.AI cs.HC

    Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

    Authors: Dou Liu, Ying Han, Xiandi Wang, Xiaomei Tan, Di Liu, Guangwu Qian, Kang Li, Dan Pu, Rong Yin

    Abstract: The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address t… ▽ More

    Submitted 27 April, 2024; originally announced May 2024.

    Comments: 8 pages, 1 figure, conference(International Ergonomics Association)

  31. arXiv:2405.00367  [pdf, other

    cs.IR cs.AI cs.SD eess.AS

    Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

    Authors: Yoori Oh, Yoseob Han, Kyogu Lee

    Abstract: There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted at SIGIR 2024 short paper track

  32. arXiv:2404.18392  [pdf, other

    cs.DC

    Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

    Authors: Xinzijian Liu, Yanbo Han, Zhuoyuan Li, Jiahao Fan, Chengqian Zhang, **zhe Zeng, Yifan Shan, Yannan Yuan, Wei-Hong Xu, Yun-Pei Liu, Yuzhi Zhang, Tongqi Wen, Darrin M. York, Zhicheng Zhong, Hang Zheng, Jun Cheng, Linfeng Zhang, Han Wang

    Abstract: In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  33. arXiv:2404.18166  [pdf, other

    cs.IR

    Behavior-Contextualized Item Preference Modeling for Multi-Behavior Recommendation

    Authors: Mingshi Yan, Fan Liu, **g Sun, Fuming Sun, Zhiyong Cheng, Yahong Han

    Abstract: In recommender systems, multi-behavior methods have demonstrated their effectiveness in mitigating issues like data sparsity, a common challenge in traditional single-behavior recommendation approaches. These methods typically infer user preferences from various auxiliary behaviors and apply them to the target behavior for recommendations. However, this direct transfer can introduce noise to the t… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by SIGIR 2024

  34. arXiv:2404.15454  [pdf, ps, other

    math.ST cs.IT

    Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes

    Authors: Yanjun Han, Tianze Jiang, Yihong Wu

    Abstract: Consider the problem of predicting the next symbol given a sample path of length n, whose joint distribution belongs to a distribution class that may have long-term memory. The goal is to compete with the conditional predictor that knows the true model. For both hidden Markov models (HMMs) and renewal processes, we determine the optimal prediction risk in Kullback- Leibler divergence up to univers… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 37 Pages

  35. arXiv:2404.13815  [pdf, other

    cs.LG

    Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference

    Authors: Yu** Han, Difan Zou

    Abstract: Standard empirical risk minimization (ERM) models may prioritize learning spurious correlations between spurious features and true labels, leading to poor accuracy on groups where these correlations do not hold. Mitigating this issue often requires expensive spurious attribute (group) labels or relies on trained ERM models to infer group labels when group information is unavailable. However, the s… ▽ More

    Submitted 3 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 25 pages, 13 figures, 8 tables

  36. arXiv:2404.10304  [pdf, other

    cs.SE cs.LG

    LLM-Powered Test Case Generation for Detecting Tricky Bugs

    Authors: Kaibo Liu, Yiyang Liu, Zhenpeng Chen, Jie M. Zhang, Yudong Han, Yun Ma, Ge Li, Gang Huang

    Abstract: Conventional automated test generation tools struggle to generate test oracles and tricky bug-revealing test inputs. Large Language Models (LLMs) can be prompted to produce test inputs and oracles for a program directly, but the precision of the tests can be very low for complex scenarios (only 6.3% based on our experiments). To fill this gap, this paper proposes AID, which combines LLMs with diff… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  37. arXiv:2404.10295  [pdf, other

    cs.RO

    ControlMTR: Control-Guided Motion Transformer with Scene-Compliant Intention Points for Feasible Motion Prediction

    Authors: Jiawei Sun, Chengran Yuan, Shuo Sun, Shanze Wang, Yuhang Han, Shuailei Ma, Zefan Huang, Anthony Wong, Keng Peng Tee, Marcelo H. Ang Jr

    Abstract: The ability to accurately predict feasible multimodal future trajectories of surrounding traffic participants is crucial for behavior planning in autonomous vehicles. The Motion Transformer (MTR), a state-of-the-art motion prediction method, alleviated mode collapse and instability during training and enhanced overall prediction performance by replacing conventional dense future endpoints with a s… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  38. arXiv:2404.10122  [pdf, other

    stat.ML cs.LG math.ST

    Online Estimation via Offline Estimation: An Information-Theoretic Framework

    Authors: Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin

    Abstract: $… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  39. arXiv:2404.09515  [pdf, other

    cs.CV

    Revealing the structure-property relationships of copper alloys with FAGC

    Authors: Yuexing Han, Guanxin Wan, Tao Han, Bing Wang, Yi Liu

    Abstract: Understanding how the structure of materials affects their properties is a cornerstone of materials science and engineering. However, traditional methods have struggled to accurately describe the quantitative structure-property relationships for complex structures. In our study, we bridge this gap by leveraging machine learning to analyze images of materials' microstructures, thus offering a novel… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  40. arXiv:2404.05583  [pdf, other

    cs.CV

    Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model

    Authors: Yue-Hua Han, Tai-Ming Huang, Shu-Tzu Lo, Po-Han Huang, Kai-Lung Hua, Jun-Cheng Chen

    Abstract: With the rise of deep learning, generative models have enabled the creation of highly realistic synthetic images, presenting challenges due to their potential misuse. While research in Deepfake detection has grown rapidly in response, many detection methods struggle with unseen Deepfakes generated by new synthesis techniques. To address this generalisation challenge, we propose a novel Deepfake de… ▽ More

    Submitted 5 June, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  41. arXiv:2404.05582  [pdf, other

    cs.RO

    Learning Prehensile Dexterity by Imitating and Emulating State-only Observations

    Authors: Yunhai Han, Zhenyang Chen, Harish Ravichandar

    Abstract: When human acquire physical skills (e.g., tennis) from experts, we tend to first learn from merely observing the expert. But this is often insufficient. We then engage in practice, where we try to emulate the expert and ensure that our actions produce similar effects on our environment. Inspired by this observation, we introduce Combining IMitation and Emulation for Motion Refinement (CIMER) -- a… ▽ More

    Submitted 12 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Under review by RA-L

  42. arXiv:2404.00268  [pdf, other

    cs.IR

    A Unified Framework for Adaptive Representation Enhancement and Inversed Learning in Cross-Domain Recommendation

    Authors: Luankang Zhang, Hao Wang, Suojuan Zhang, Mingjia Yin, Yongqiang Han, Jiaqing Zhang, Defu Lian, Enhong Chen

    Abstract: Cross-domain recommendation (CDR), aiming to extract and transfer knowledge across domains, has attracted wide attention for its efficacy in addressing data sparsity and cold-start problems. Despite significant advances in representation disentanglement to capture diverse user preferences, existing methods usually neglect representation enhancement and lack rigorous decoupling constraints, thereby… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by DASFAA 2024

  43. arXiv:2403.18188  [pdf

    cs.CY

    Integrating urban digital twins with cloud-based geospatial dashboards for coastal resilience planning: A case study in Florida

    Authors: Changjie Chen, Yu Han, Andrea Galinski, Christian Calle, Jeffery Carney, Xinyue Ye, Cees van Westen

    Abstract: Coastal communities are confronted with a growing incidence of climate-induced flooding, necessitating adaptation measures for resilience. In this paper, we introduce a framework that integrates an urban digital twin with a geospatial dashboard to allow visualization of the vulnerabilities within critical infrastructure across a range of spatial and temporal scales. The synergy between these two t… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  44. arXiv:2403.17664  [pdf, other

    cs.CV

    DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation

    Authors: Qilin Wang, Jiangning Zhang, Chengming Xu, Weijian Cao, Ying Tai, Yue Han, Yanhao Ge, Hong Gu, Chengjie Wang, Yanwei Fu

    Abstract: Facial Appearance Editing (FAE) aims to modify physical attributes, such as pose, expression and lighting, of human facial images while preserving attributes like identity and background, showing great importance in photograph. In spite of the great progress in this area, current researches generally meet three challenges: low generation fidelity, poor attribute preservation, and inefficient infer… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  45. arXiv:2403.17603  [pdf, other

    cs.IR

    END4Rec: Efficient Noise-Decoupling for Multi-Behavior Sequential Recommendation

    Authors: Yongqiang Han, Hao Wang, Kefan Wang, Likang Wu, Zhi Li, Wei Guo, Yong Liu, Defu Lian, Enhong Chen

    Abstract: In recommendation systems, users frequently engage in multiple types of behaviors, such as clicking, adding to a cart, and purchasing. However, with diversified behavior data, user behavior sequences will become very long in the short term, which brings challenges to the efficiency of the sequence recommendation model. Meanwhile, some behavior data will also bring inevitable noise to the modeling… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  46. arXiv:2403.16110  [pdf, other

    cs.DB

    ByteCard: Enhancing ByteDance's Data Warehouse with Learned Cardinality Estimation

    Authors: Yuxing Han, Haoyu Wang, Lixiang Chen, Yifeng Dong, Xing Chen, Benquan Yu, Chengcheng Yang, Weining Qian

    Abstract: Cardinality estimation is a critical component and a longstanding challenge in modern data warehouses. ByteHouse, ByteDance's cloud-native engine for extensive data analysis in exabyte-scale environments, serves numerous internal decision-making business scenarios. With the increasing demand for ByteHouse, cardinality estimation becomes the bottleneck for efficiently processing queries. Specifical… ▽ More

    Submitted 11 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  47. arXiv:2403.13315  [pdf, other

    cs.CV

    PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

    Authors: Yew Ken Chia, Vernon Toh Yan Han, Deepanway Ghosal, Lidong Bing, Soujanya Poria

    Abstract: Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. Wit… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  48. arXiv:2403.12982  [pdf

    cond-mat.mtrl-sci cs.LG physics.chem-ph

    Knowledge-Reuse Transfer Learning Methods in Molecular and Material Science

    Authors: An Chen, Zhilong Wang, Karl Luigi Loza Vidaurre, Yanqiang Han, Simin Ye, Kehao Tao, Shiwei Wang, **g Gao, **** Li

    Abstract: Molecules and materials are the foundation for the development of modern advanced industries such as energy storage systems and semiconductor devices. However, traditional trial-and-error methods or theoretical calculations are highly resource-intensive, and extremely long R&D (Research and Development) periods cannot meet the urgent need for molecules/materials in industrial development. Machine… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 42 pages, 10 figures

  49. arXiv:2403.11808  [pdf, other

    cs.CV

    Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

    Authors: Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

    Abstract: Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  50. arXiv:2403.11544  [pdf, ps, other

    cs.LG

    RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model

    Authors: Junyi Fan, Yuxuan Han, Jialin Zeng, Jian-Feng Cai, Yang Wang, Yang Xiang, Jiheng Zhang

    Abstract: Efficiently learning equilibria with large state and action spaces in general-sum Markov games while overcoming the curse of multi-agency is a challenging problem. Recent works have attempted to solve this problem by employing independent linear function classes to approximate the marginal $Q$-value for each agent. However, existing sample complexity bounds under such a framework have a suboptimal… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)