Skip to main content

Showing 1–50 of 550 results for author: Su, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00979  [pdf, other

    cs.CV

    Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval

    Authors: Hanwen Su, Ge Song, Kai Huang, Jiyan Wang, Ming Yang

    Abstract: In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR). The prior methods tackle the problem in a two-modality setting with only category labels or even no textual information involved. However, the growing prevalence of Large-scale pre-trained Language Models (LLMs), which have demonstrated great knowledge learned from web-scale data, can provide us with an opport… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00908  [pdf, other

    cs.CL cs.AI

    FineSurE: Fine-grained Summarization Evaluation using LLMs

    Authors: Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour

    Abstract: Automated evaluation is crucial for streamlining text summarization benchmarking and model development, given the costly and time-consuming nature of human evaluation. Traditional methods like ROUGE do not correlate well with human judgment, while recently proposed LLM-based metrics provide only summary-level assessment using Likert-scale scores. This limits deeper model analysis, e.g., we can onl… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024 (main, long)

  3. arXiv:2406.19931  [pdf, other

    cs.LG cs.AI

    Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank Decomposition

    Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, Guogang Zhu, Hao Su

    Abstract: To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

  4. arXiv:2406.17741  [pdf, other

    cs.CV cs.AI

    Point-SAM: Promptable 3D Segmentation Model for Point Clouds

    Authors: Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, Hao Su

    Abstract: The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  5. arXiv:2406.16588  [pdf, other

    eess.SY cs.FL

    Switching Controller Synthesis for Hybrid Systems Against STL Formulas

    Authors: Han Su, Shenghua Feng, Sinong Zhan, Naijun Zhan

    Abstract: Switching controllers play a pivotal role in directing hybrid systems (HSs) towards the desired objective, embodying a ``correct-by-construction'' approach to HS design. Identifying these objectives is thus crucial for the synthesis of effective switching controllers. While most of existing works focus on safety and liveness, few of them consider timing constraints. In this paper, we delves into t… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.13009  [pdf, other

    cs.CL cs.AI

    Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors

    Authors: Alex Chandler, Devesh Surve, Hui Su

    Abstract: Accurate text summarization is one of the most common and important tasks performed by Large Language Models, where the costs of human review for an entire document may be high, but the costs of errors in summarization may be even greater. We propose Detecting Errors through Ensembling Prompts (DEEP) - an end-to-end large language model framework for detecting factual errors in text summarization.… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.10923  [pdf, other

    cs.CV cs.CL cs.LG

    Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

    Authors: Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu

    Abstract: Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reaso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Project page: https://ander1119.github.io/TiM

  8. arXiv:2406.07057  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

    Authors: Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu

    Abstract: Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchm… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 100 pages, 84 figures, 33 tables

  9. arXiv:2406.06464  [pdf, other

    cs.AI cs.CL

    Transforming Wearable Data into Health Insights using Large Language Model Agents

    Authors: Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, Xin Liu

    Abstract: Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 38 pages

  10. arXiv:2406.05588  [pdf, other

    cs.CL cs.AI cs.LG

    CERET: Cost-Effective Extrinsic Refinement for Text Generation

    Authors: Jason Cai, Hang Su, Monica Sunkara, Igor Shalyminov, Saab Mansour

    Abstract: Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: The source code and data samples are released at https://github.com/amazon-science/CERET-LLM-refine

  11. arXiv:2406.02925  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition

    Authors: Hsuan Su, Hua Farn, Fan-Yun Sun, Shang-Tse Chen, Hung-yi Lee

    Abstract: Synthetic data is widely used in speech recognition due to the availability of text-to-speech models, which facilitate adapting models to previously unseen text domains. However, existing methods suffer in performance when they fine-tune an automatic speech recognition (ASR) model on synthetic data as they suffer from the distributional shift commonly referred to as the synthetic-to-real gap. In t… ▽ More

    Submitted 15 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  12. arXiv:2405.19885  [pdf, other

    cs.LG cs.RO

    Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

    Authors: Hengkai Tan, Songming Liu, Kai Ma, Chengyang Ying, Xingxing Zhang, Hang Su, Jun Zhu

    Abstract: Transformer has shown promise in reinforcement learning to model time-varying features for obtaining generalized low-level robot policies on diverse robotics datasets in embodied learning. However, it still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain. We first observe tha… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  13. arXiv:2405.19802  [pdf, other

    cs.MM

    Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

    Authors: Shuyuan Liu, Jiawei Chen, Shouwei Ruan, Hang Su, Zhaoxia Yin

    Abstract: Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  14. arXiv:2405.19789  [pdf, other

    cs.LG cs.DC

    Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

    Authors: Guogang Zhu, Xuefeng Liu, Xinghao Wu, Shaojie Tang, Chao Tang, Jianwei Niu, Hao Su

    Abstract: Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  15. arXiv:2405.19668  [pdf, other

    cs.CV

    AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization

    Authors: Jiawei Chen, Xiao Yang, Zhengwei Fang, Yu Tian, Yinpeng Dong, Zhaoxia Yin, Hang Su

    Abstract: Despite the widespread application of large language models (LLMs) across various tasks, recent studies indicate that they are susceptible to jailbreak attacks, which can render their defense mechanisms ineffective. However, previous jailbreak research has frequently been constrained by limited universality, suboptimal efficiency, and a reliance on manual crafting. In response, we rethink the appr… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Under review

  16. arXiv:2405.18418  [pdf, other

    cs.LG cs.CV cs.RO

    Hierarchical World Models as Visual Whole-Body Humanoid Controllers

    Authors: Nicklas Hansen, Jyothir S V, Vlad Sobal, Yann LeCun, Xiaolong Wang, Hao Su

    Abstract: Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, rewa… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Code and videos at https://nicklashansen.com/rlpuppeteer

  17. arXiv:2405.17509  [pdf, other

    cs.LG

    Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations

    Authors: Ze Cheng, Zhongkai Hao, Xiaoqiang Wang, Jianing Huang, Youjia Wu, Xudan Liu, Yiru Zhao, Songming Liu, Hang Su

    Abstract: For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a map** from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the r… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  18. arXiv:2405.17507  [pdf, other

    cs.LG cs.AI cs.NI

    Enhancing Sustainable Urban Mobility Prediction with Telecom Data: A Spatio-Temporal Framework Approach

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: Traditional traffic prediction, limited by the scope of sensor data, falls short in comprehensive traffic management. Mobile networks offer a promising alternative using network activity counts, but these lack crucial directionality. Thus, we present the TeltoMob dataset, featuring undirected telecom counts and corresponding directional flows, to predict directional mobility flows on roadways. To… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 8 Figures, 5 Tables. Just accepted by IJCAI (to appear)

  19. arXiv:2405.16262  [pdf, other

    cs.LG

    Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

    Authors: Runqi Lin, Chaojian Yu, Bo Han, Hang Su, Tongliang Liu

    Abstract: Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT), manifesting as highly distorted deep neural networks (DNNs) that are vulnerable to multi-step adversarial attacks. However, the underlying factors that lead to the distortion of decision boundaries remain unclear. In this work, we delve into the specific changes within different DNN layers and… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  20. arXiv:2405.15056  [pdf, other

    cs.LG cs.CV cs.GR

    ElastoGen: 4D Generative Elastodynamics

    Authors: Yutao Feng, Yintong Shang, Xiang Feng, Lei Lan, Shandian Zhe, Tianjia Shao, Hongzhi Wu, Kun Zhou, Hao Su, Chenfanfu Jiang, Yin Yang

    Abstract: We present ElastoGen, a knowledge-driven model that generates physically accurate and coherent 4D elastodynamics. Instead of relying on petabyte-scale data-driven learning, ElastoGen leverages the principles of physics-in-the-loop and learns from established physical knowledge, such as partial differential equations and their numerical solutions. The core idea of ElastoGen is converting the global… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  21. arXiv:2405.14800  [pdf, other

    cs.CR cs.CV

    Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

    Authors: Shengfang Zhai, Huanran Chen, Yinpeng Dong, Jiajun Li, Qingni Shen, Yansong Gao, Hang Su, Yang Liu

    Abstract: Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. Membership inference arises in these contexts as a potential auditing method for detecting unauthorized data usage. While some efforts have been made on diffusion models, they are not applicable to text-to-image d… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 17 pages, 5 figures; minor typos corrected

  22. arXiv:2405.14073  [pdf, other

    cs.LG

    PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

    Authors: Chengyang Ying, Zhongkai Hao, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu

    Abstract: Designing generalizable agents capable of adapting to diverse embodiments has achieved significant attention in Reinforcement Learning (RL), which is critical for deploying RL agents in various real-world applications. Previous Cross-Embodiment RL approaches have focused on transferring knowledge across embodiments within specific tasks. These methods often result in knowledge tightly coupled with… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  23. arXiv:2405.13426  [pdf

    cs.HC cs.AI

    A New Era in Human Factors Engineering: A Survey of the Applications and Prospects of Large Multimodal Models

    Authors: Li Fan, Lee Ching-Hung, Han Su, Feng Shanshan, Jiang Zhuoxuan, Sun Zhu

    Abstract: In recent years, the potential applications of Large Multimodal Models (LMMs) in fields such as healthcare, social psychology, and industrial design have attracted wide research attention, providing new directions for human factors research. For instance, LMM-based smart systems have become novel research subjects of human factors studies, and LMM introduces new research paradigms and methodologie… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 14 pages, journal paper

  24. arXiv:2405.13197  [pdf, other

    cs.CV

    Global-Local Detail Guided Transformer for Sea Ice Recognition in Optical Remote Sensing Images

    Authors: Zhanchao Huang, Wenjun Hong, Hua Su

    Abstract: The recognition of sea ice is of great significance for reflecting climate change and ensuring the safety of ship navigation. Recently, many deep learning based methods have been proposed and applied to segment and recognize sea ice regions. However, the diverse scales of sea ice areas, the zigzag and fine edge contours, and the difficulty in distinguishing different types of sea ice pose challeng… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures

    Journal ref: IEEE IGARSS 2024

  25. arXiv:2405.09585  [pdf, other

    cs.LG cs.AI

    An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding

    Authors: Renqi Chen, Wenwei Han, Haohao Zhang, Haoyang Su, Zhefan Wang, Xiaolei Liu, Hao Jiang, Wanli Ouyang, Nanqing Dong

    Abstract: Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis. The predominant approaches in GS currently revolve around employing statistical methods for prediction. However, statistical methods often come with two main limitations: strong statistical priors and linear assumptions. A recent trend is to capture t… ▽ More

    Submitted 24 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI2024. Code is available at https://github.com/RenqiChen/Genomic-Selection

  26. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  27. arXiv:2405.03908  [pdf, other

    cs.DC cs.DS

    Deterministic Expander Routing: Faster and More Versatile

    Authors: Yi-Jun Chang, Shang-En Huang, Hsin-Hao Su

    Abstract: We consider the expander routing problem formulated by Ghaffari, Kuhn, and Su (PODC 2017), where the goal is to route all the tokens to their destinations given that each vertex is the source and the destination of at most $°(v)$ tokens. They developed $\textit{randomized algorithms}$ that solve this problem in $\text{poly}(φ^{-1}) \cdot 2^{O(\sqrt{\log n \log \log n})}$ rounds in the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to PODC 2024

  28. A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

    Authors: Kaiwen Jiang, Yang Fu, Mukund Varma T, Yash Belhe, Xiaolong Wang, Hao Su, Ravi Ramamoorthi

    Abstract: Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation.… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  29. arXiv:2405.03379  [pdf, other

    cs.LG cs.AI cs.RO

    Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

    Authors: Stone Tao, Arth Shukla, Tse-kai Chan, Hao Su

    Abstract: Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, espe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024). Website: https://reverseforward-cl.github.io/

  30. arXiv:2405.00885  [pdf, other

    cs.LG cs.NI eess.IV

    WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

    Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Xin Fu, Miao Pan

    Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training b… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  31. arXiv:2405.00566  [pdf, other

    cs.CE cs.CL q-fin.GN

    NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance

    Authors: Huan-Yi Su, Ke Wu, Yu-Hao Huang, Wu-Jun Li

    Abstract: Recently, many works have proposed various financial large language models (FinLLMs) by pre-training from scratch or fine-tuning open-sourced LLMs on financial corpora. However, existing FinLLMs exhibit unsatisfactory performance in understanding financial text when numeric variables are involved in questions. In this paper, we propose a novel LLM, called numeric-sensitive large language model (Nu… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  32. arXiv:2404.19525  [pdf, other

    cs.CV

    MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

    Authors: Luxi Chen, Zhengyi Wang, Zihan Zhou, Tingting Gao, Hang Su, Jun Zhu, Chongxuan Li

    Abstract: Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to r… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  33. arXiv:2404.17302  [pdf, other

    cs.RO cs.AI cs.CV

    Part-Guided 3D RL for Sim2Real Articulated Object Manipulation

    Authors: Pengwei Xie, Rui Chen, Siang Chen, Yuzhe Qin, Fanbo Xiang, Tianyu Sun, **g Xu, Gui** Wang, Hao Su

    Abstract: Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages

  34. arXiv:2404.16779  [pdf, other

    cs.LG cs.AI cs.RO

    DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

    Authors: Tongzhou Mu, Minghua Liu, Hao Su

    Abstract: The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-qu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: ICLR 2024. Explore videos, data, code, and more at https://sites.google.com/view/iclr24drs

  35. arXiv:2404.12713  [pdf, other

    cs.NI

    Energy Conserved Failure Detection for NS-IoT Systems

    Authors: Guo** Liu, Jianhong Zhou, Hang Su, Biaohong Xiong, Xianhua Niu

    Abstract: Nowadays, network slicing (NS) technology has gained widespread adoption within Internet of Things (IoT) systems to meet diverse customized requirements. In the NS based IoT systems, the detection of equipment failures necessitates comprehensive equipment monitoring, which leads to significant resource utilization, particularly within large-scale IoT ecosystems. Thus, the imperative task of reduci… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  36. arXiv:2404.12385  [pdf, other

    cs.CV cs.GR

    MeshLRM: Large Reconstruction Model for High-Quality Mesh

    Authors: Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, Zexiang Xu

    Abstract: We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a p… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2404.12379  [pdf, other

    cs.CV

    Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos

    Authors: Isabella Liu, Hao Su, Xiaolong Wang

    Abstract: Modern 3D engines and graphics pipelines require mesh as a memory-efficient representation, which allows efficient rendering, geometry processing, texture editing, and many other downstream operations. However, it is still highly difficult to obtain high-quality mesh in terms of structure and detail from monocular visual observations. The problem becomes even more challenging for dynamic scenes an… ▽ More

    Submitted 22 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Project page: https://www.liuisabella.com/DG-Mesh/

  38. arXiv:2404.12139  [pdf, other

    cs.CV

    Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

    Authors: Shouwei Ruan, Yinpeng Dong, Hanqing Liu, Yao Huang, Hang Su, Xingxing Wei

    Abstract: Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while kee** VLPs' origin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 20 pages

  39. arXiv:2404.11207  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

    Authors: Yichi Zhang, Yinpeng Dong, Siyuan Zhang, Tianzan Min, Hang Su, Jun Zhu

    Abstract: Although Multimodal Large Language Models (MLLMs) have demonstrated promising versatile capabilities, their performance is still inferior to specialized models on downstream tasks, which makes adaptation necessary to enhance their utility. However, fine-tuning methods require independent training for every model, leading to huge computation and memory overheads. In this paper, we propose a novel s… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted in CVPR 2024 as Poster (Highlight)

  40. arXiv:2404.09524  [pdf, other

    cs.LG

    Dynamic fault detection and diagnosis of industrial alkaline water electrolyzer process with variational Bayesian dictionary learning

    Authors: Qi Zhang, Lei Xie, Weihua Xu, Hongye Su

    Abstract: Alkaline Water Electrolysis (AWE) is one of the simplest green hydrogen production method using renewable energy. AWE system typically yields process variables that are serially correlated and contaminated by measurement uncertainty. A novel robust dynamic variational Bayesian dictionary learning (RDVDL) monitoring approach is proposed to improve the reliability and safety of AWE operation.… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  41. arXiv:2404.09519  [pdf, other

    cs.LG eess.SY

    Nonlinear sparse variational Bayesian learning based model predictive control with application to PEMFC temperature control

    Authors: Qi Zhang, Lei Wang, Weihua Xu, Hongye Su, Lei Xie

    Abstract: The accuracy of the underlying model predictions is crucial for the success of model predictive control (MPC) applications. If the model is unable to accurately analyze the dynamics of the controlled system, the performance and stability guarantees provided by MPC may not be achieved. Learning-based MPC can learn models from data, improving the applicability and reliability of MPC. This study deve… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  42. arXiv:2404.09193  [pdf, other

    cs.CV

    FaceCat: Enhancing Face Recognition Security with a Unified Generative Model Framework

    Authors: Jiawei Chen, Xiao Yang, Yinpeng Dong, Hang Su, Jianteng Peng, Zhaoxia Yin

    Abstract: Face anti-spoofing (FAS) and adversarial detection (FAD) have been regarded as critical technologies to ensure the safety of face recognition systems. As a consequence of their limited practicality and generalization, some existing methods aim to devise a framework capable of concurrently detecting both threats to address the challenge. Nevertheless, these methods still encounter challenges of ins… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Under review

  43. arXiv:2404.08285  [pdf

    cs.CV cs.AI eess.SY

    A Survey of Neural Network Robustness Assessment in Image Recognition

    Authors: Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, **gyu Liu

    Abstract: In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models.… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Corrected typos and grammatical errors in Section 5

  44. arXiv:2404.07428  [pdf, other

    cs.RO cs.LG

    AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

    Authors: Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg

    Abstract: Encouraged by the remarkable achievements of language and vision foundation models, develo** generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstra… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  45. arXiv:2404.01699  [pdf, other

    cs.CV

    Task Integration Distillation for Object Detectors

    Authors: Hai Su, ZhenWen Jian, Songsen Yu

    Abstract: Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial u… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  46. arXiv:2404.00842  [pdf, other

    cs.CV

    An N-Point Linear Solver for Line and Motion Estimation with Event Cameras

    Authors: Ling Gao, Daniel Gehrig, Hang Su, Davide Scaramuzza, Laurent Kneip

    Abstract: Event cameras respond primarily to edges--formed by strong gradients--and are thus particularly well-suited for line-based motion estimation. Recent work has shown that events generated by a single line each satisfy a polynomial constraint which describes a manifold in the space-time volume. Multiple such constraints can be solved simultaneously to recover the partial linear velocity and line para… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  47. arXiv:2404.00540  [pdf, other

    cs.CV cs.AI

    Embodied Active Defense: Leveraging Recurrent Feedback to Counter Adversarial Patches

    Authors: Lingxuan Wu, Xiao Yang, Yinpeng Dong, Liuwei Xie, Hang Su, Jun Zhu

    Abstract: The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dy… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 27pages

  48. arXiv:2404.00247  [pdf, ps, other

    eess.SY cs.AI cs.LG

    Facilitating Reinforcement Learning for Process Control Using Transfer Learning: Perspectives

    Authors: Runze Lin, Junghui Chen, Lei Xie, Hongye Su, Biao Huang

    Abstract: This paper provides insights into deep reinforcement learning (DRL) for process control from the perspective of transfer learning. We analyze the challenges of applying DRL in the field of process industries and the necessity of introducing transfer learning. Furthermore, recommendations and prospects are provided for future research directions on how transfer learning can be integrated with DRL t… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Final Version of Asian Control Conference (ASCC 2024)

  49. arXiv:2403.18922  [pdf, other

    cs.CV

    Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

    Authors: Mukund Varma T, Peihao Wang, Zhiwen Fan, Zhangyang Wang, Hao Su, Ravi Ramamoorthi

    Abstract: In recent years, there has been an explosion of 2D vision models for numerous tasks such as semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets. At the same time, there has been renewed interest in 3D scene representations such as neural radiance fields from multi-view images. However, the availability of 3D or multiview data is still substantially limi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Computer Vision and Pattern Recognition Conference (CVPR), 2024

  50. arXiv:2403.18330  [pdf, other

    cs.CV cs.LG

    Tracking-Assisted Object Detection with Event Cameras

    Authors: Ting-Kang Yen, Igor Morawski, Shusil Dangi, Kai He, Chung-Yi Lin, Jia-Fong Yeh, Hung-Ting Su, Winston Hsu

    Abstract: Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various memory mechanisms to p… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.