Skip to main content

Showing 1–50 of 702 results for author: Zhu, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16557  [pdf, other

    cs.LG cs.CY

    Efficient k-means with Individual Fairness via Exponential Tilting

    Authors: Shengkun Zhu, **shan Zeng, Yuan Sun, Sheng Wang, Xiaodong Li, Zhiyong Peng

    Abstract: In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve indivi… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.16294  [pdf, other

    cs.CL cs.AI

    LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments

    Authors: Zixia Jia, Mengmeng Wang, Baichen Tong, Song-Chun Zhu, Zilong Zheng

    Abstract: Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 represen… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2406.16028  [pdf, other

    cs.LG cs.AI

    TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing

    Authors: Namjoon Suh, Yuning Yang, Din-Yin Hsieh, Qitong Luan, Shirong Xu, Shixiang Zhu, Guang Cheng

    Abstract: In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data. Along with the temporal and feature correlations, the heterogeneous nature of the feature in the table has been one of the main obstacles in time series tabular data modeling. We tackle this problem by combining the ideas of the variational auto-encoder (VAE) and the denoising diffusion… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  4. arXiv:2406.10802  [pdf, other

    cs.CL cs.AI

    KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

    Authors: Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang

    Abstract: Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.10484  [pdf, other

    cs.CV

    Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

    Authors: Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen

    Abstract: The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos in real-world applications are edited videos, \textit{e.g.}, users usually cut and add effects/modifications to the raw video before publishing it on s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  6. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, ** Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  7. arXiv:2406.09693  [pdf, other

    cs.CV eess.IV

    Compressed Video Quality Enhancement with Temporal Group Alignment and Fusion

    Authors: Qiang Zhu, Yajun Qiu, Yu Liu, Shuyuan Zhu, Bing Zeng

    Abstract: In this paper, we propose a temporal group alignment and fusion network to enhance the quality of compressed videos by using the long-short term correlations between frames. The proposed model consists of the intra-group feature alignment (IntraGFA) module, the inter-group feature fusion (InterGFF) module, and the feature enhancement (FE) module. We form the group of pictures (GoP) by selecting fr… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  8. arXiv:2406.08801  [pdf, other

    cs.CV

    Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

    Authors: Mingwang Xu, Hui Li, Qingkun Su, Hanlin Shang, Liwei Zhang, Ce Liu, **gdong Wang, Yao Yao, Siyu Zhu

    Abstract: The field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits. This research delves into the complexities of synchronizing facial movements and creating visually appealing, temporally consistent animations within the framework of diffusion-based methodologies. Moving away from traditional paradigms… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 20 pages

  9. arXiv:2406.08002  [pdf, other

    cs.AI cs.MA

    Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

    Authors: Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng

    Abstract: Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players' behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred in… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.07151  [pdf, other

    cs.MM cs.AI cs.IR

    EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels

    Authors: Shuqi Zhu, Ziyi Ye, Qingyao Ai, Yiqun Liu

    Abstract: Identifying and reconstructing what we see from brain activity gives us a special insight into investigating how the biological visual system represents the world. While recent efforts have achieved high-performance image classification and high-quality image reconstruction from brain signals collected by Functional Magnetic Resonance Imaging (fMRI) or magnetoencephalogram (MEG), the expensiveness… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.07081  [pdf, other

    cs.CL

    Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

    Authors: Menglong Cui, Jiangcun Du, Shaolin Zhu, Deyi Xiong

    Abstract: Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL2024 long paper (Findings)

  12. arXiv:2406.06962  [pdf, other

    cs.CL cs.AI

    Evolving Subnetwork Training for Large Language Models

    Authors: Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu

    Abstract: Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  13. arXiv:2406.06949  [pdf, other

    cs.CV cs.AI

    Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

    Authors: Weiwei Duan, Lu** Ji, Shengjia Chen, Sicheng Zhu, Mao Ye

    Abstract: Moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily focus on extracting target features only from the spatial-temporal domain. For further enhancing feature representation, more information domains such as frequency are believed to be potentially valuable. To extend target feature… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has submitted to IEEE TGRS,under review

  14. arXiv:2406.03693  [pdf, ps, other

    cs.IT

    New MDS codes of non-GRS type and NMDS codes

    Authors: Yujie Zhi, Shixin Zhu

    Abstract: Maximum distance separable (MDS) and near maximum distance separable (NMDS) codes have been widely used in various fields such as communication systems, data storage, and quantum codes due to their algebraic properties and excellent error-correcting capabilities. This paper focuses on a specific class of linear codes and establishes necessary and sufficient conditions for them to be MDS or NMDS. A… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  15. arXiv:2406.02638  [pdf, ps, other

    cs.LG cs.AI

    EchoMamba4Rec: Harmonizing Bidirectional State Space Models with Spectral Filtering for Advanced Sequential Recommendation

    Authors: Yuda Wang, Xuxin He, Shengxin Zhu

    Abstract: Predicting user preferences and sequential dependencies based on historical behavior is the core goal of sequential recommendation. Although attention-based models have shown effectiveness in this field, they often struggle with inference inefficiency due to the quadratic computational complexity inherent in attention mechanisms, especially with long-range behavior sequences. Drawing inspiration f… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.03900 by other authors

  16. arXiv:2406.01392  [pdf, other

    cs.CL

    Sparsity-Accelerated Training for Large Language Models

    Authors: Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu

    Abstract: Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this trai… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  17. arXiv:2405.20234  [pdf, other

    cs.AI

    Context Injection Attacks on Large Language Models

    Authors: Cheng'an Wei, Kai Chen, Yue Zhao, Yujia Gong, Lu Xiang, Shenchen Zhu

    Abstract: Large Language Models (LLMs) such as ChatGPT and Llama-2 have become prevalent in real-world applications, exhibiting impressive text generation performance. LLMs are fundamentally developed from a scenario where the input data remains static and lacks a clear structure. To behave interactively over time, LLM-based chat systems must integrate additional contextual information (i.e., chat history)… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  18. arXiv:2405.19758  [pdf, other

    cs.RO

    InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning

    Authors: Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, Yuke Zhu

    Abstract: Learning abstract state representations and knowledge is crucial for long-horizon robot planning. We present InterPreT, an LLM-powered framework for robots to learn symbolic predicates from language feedback of human non-experts during embodied interaction. The learned predicates provide relational abstractions of the environment state, facilitating the learning of symbolic operators that capture… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: RSS 2024; https://interpret-robot.github.io

  19. arXiv:2405.19669  [pdf, other

    cs.CV

    Texture-guided Coding for Deep Features

    Authors: Lei Xiong, Xin Luo, Zihao Wang, Chaofan He, Shuyuan Zhu, Bing Zeng

    Abstract: With the rapid development of machine vision technology in recent years, many researchers have begun to focus on feature compression that is better suited for machine vision tasks. The target of feature compression is deep features, which arise from convolution in the middle layer of a pre-trained convolutional neural network. However, due to the large volume of data and high level of abstraction… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  20. arXiv:2405.19580  [pdf, other

    cs.HC

    Facilitating Mixed-Methods Analysis with Computational Notebooks

    Authors: Jiawen Stefanie Zhu, Zibo Zhang, Jian Zhao

    Abstract: Data exploration is an important aspect of the workflow of mixed-methods researchers, who conduct both qualitative and quantitative analysis. However, there currently exists few tools that adequately support both types of analysis simultaneously, forcing researchers to context-switch between different tools and increasing their mental burden when integrating the results. To address this gap, we pr… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Appeared at 1st ACM CHI Workshop on Human-Notebook Interactions

    ACM Class: H.5; H.5.2

  21. arXiv:2405.14241  [pdf, other

    cs.CV

    NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation

    Authors: Chaokang Jiang, Dalong Du, Jiuming Liu, Siting Zhu, Zhenqiang Liu, Zhuang Ma, Zhu** Liang, Jie Zhou

    Abstract: Point Cloud Interpolation confronts challenges from point sparsity, complex spatiotemporal dynamics, and the difficulty of deriving complete 3D point clouds from sparse temporal information. This paper presents NeuroGauss4D-PCI, which excels at modeling complex non-rigid deformations across varied dynamic scenes. The method begins with an iterative Gaussian cloud soft clustering module, offering s… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Under review

  22. arXiv:2405.13426  [pdf

    cs.HC cs.AI

    A New Era in Human Factors Engineering: A Survey of the Applications and Prospects of Large Multimodal Models

    Authors: Li Fan, Lee Ching-Hung, Han Su, Feng Shanshan, Jiang Zhuoxuan, Sun Zhu

    Abstract: In recent years, the potential applications of Large Multimodal Models (LMMs) in fields such as healthcare, social psychology, and industrial design have attracted wide research attention, providing new directions for human factors research. For instance, LMM-based smart systems have become novel research subjects of human factors studies, and LMM introduces new research paradigms and methodologie… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 14 pages, journal paper

  23. arXiv:2405.13380  [pdf, other

    cs.CR

    The Illusion of Anonymity: Uncovering the Impact of User Actions on Privacy in Web3 Social Ecosystems

    Authors: Bin Wang, Tianjian Liu, Wenqi Wang, Yuan Weng, Chao Li, Guangquan Xu, Meng Shen, Sencun Zhu, Wei Wang

    Abstract: The rise of Web3 social ecosystems signifies the dawn of a new chapter in digital interaction, offering significant prospects for user engagement and financial advancement. Nonetheless, this progress is shadowed by potential privacy concessions, especially as these platforms frequently merge with existing Web2.0 social media accounts, amplifying data privacy risks for users. In this study, we in… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  24. arXiv:2405.11629  [pdf, other

    cs.CV cs.AI

    Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems

    Authors: Shengxiang Sun, Shenzhe Zhu

    Abstract: Numerous studies on adversarial attacks targeting self-driving policies fail to incorporate realistic-looking adversarial objects, limiting real-world applicability. Building upon prior research that facilitated the transition of adversarial objects from simulations to practical applications, this paper discusses a modified gradient-based texture optimization method to discover realistic-looking a… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  25. arXiv:2405.08573  [pdf, other

    cs.HC

    ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph

    Authors: Shenji Zhu, Miaoxin Hu, Tianya Pan, Yue Hong, Bin Li, Zhiguang Zhou, Ting Xu

    Abstract: Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  26. arXiv:2405.08036  [pdf, other

    cs.LG cs.AI

    POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

    Authors: Chang Huang, Junqiao Zhao, Shatong Zhu, Hongtu Zhou, Chen Ye, Tiantian Feng, Changjun Jiang

    Abstract: Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting… ▽ More

    Submitted 15 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: change reference format

  27. arXiv:2405.07784  [pdf, other

    cs.CV

    Generating Human Motion in 3D Scenes from Text Descriptions

    Authors: Zhi Cen, Huai** Pi, Sida Peng, Zehong Shen, Minghui Yang, Shuai Zhu, Hujun Bao, Xiaowei Zhou

    Abstract: Generating human motions from textual descriptions has gained growing research interest due to its wide range of applications. However, only a few works consider human-scene interactions together with text conditions, which is crucial for visual and physical realism. This paper focuses on the task of generating human motions in 3D indoor scenes given text descriptions of the human-scene interactio… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Project page: https://zju3dv.github.io/text_scene_motion

  28. arXiv:2405.06996  [pdf, other

    cs.CL

    Quite Good, but Not Enough: Nationality Bias in Large Language Models -- A Case Study of ChatGPT

    Authors: Shucheng Zhu, Weikang Wang, Ying Liu

    Abstract: While nationality is a pivotal demographic element that enhances the performance of language models, it has received far less scrutiny regarding inherent biases. This study investigates nationality bias in ChatGPT (GPT-3.5), a large language model (LLM) designed for text generation. The research covers 195 countries, 4 temperature settings, and 3 distinct prompt types, generating 4,680 discourses… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted by LREC-COLING 2024

  29. arXiv:2405.06292  [pdf, ps, other

    cs.IT

    On $σ$ self-orthogonal matrix-product codes associated with Toeplitz matrices

    Authors: Yang Li, Shixin Zhu, Edgar MartĂ­nez-Moro

    Abstract: In this paper, we present four general constructions of $σ$ self-orthogonal matrix-product codes associated with Toeplitz matrices. The first one relies on the $σ'$ dual of a known $σ'$ dual-containing matrix-product code; the second one is founded on quasi-$\widehatσ$ matrices, where we provide an efficient algorithm for generating them on the basic of Toeplitz matrices; and the last two ones are… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    MSC Class: 94B05 15B05 12E10

  30. arXiv:2405.05949  [pdf, other

    cs.CV

    CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

    Authors: Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen

    Abstract: Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Exp… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  31. arXiv:2405.03524  [pdf, other

    cs.AI

    Exploring knowledge graph-based neural-symbolic system from application perspective

    Authors: Shenzhe Zhu, Shengxiang Sun

    Abstract: Advancements in Artificial Intelligence (AI) and deep neural networks have driven significant progress in vision and text processing. However, achieving human-like reasoning and interpretability in AI systems remains a substantial challenge. The Neural-Symbolic paradigm, which integrates neural networks with symbolic systems, presents a promising pathway toward more interpretable AI. Within this p… ▽ More

    Submitted 29 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  32. arXiv:2405.01839  [pdf, other

    cs.AI cs.MA

    SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

    Authors: Qian Long, Fangwei Zhong, Mingdong Wu, Yizhou Wang, Song-Chun Zhu

    Abstract: Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: AAAI 2024 Cooperative Multi-Agent Systems Decision-Making and Learning (CMASDL) Workshop

  33. arXiv:2404.18053  [pdf, ps, other

    cs.IT

    Binary duadic codes and their related codes with a square-root-like lower bound

    Authors: Tingting Wu, Lanqiang Li, Xiuyu Zhang, Shixin Zhu

    Abstract: Binary cyclic codes have been a hot topic for many years, and significant progress has been made in the study of this types of codes. As is well known, it is hard to construct infinite families of binary cyclic codes [n, n+1/2] with good minimum distance. In this paper, by using the BCH bound on cyclic codes, one of the open problems proposed by Liu et al. about binary cyclic codes (Finite Field A… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages

  34. arXiv:2404.17835  [pdf, other

    cs.CL

    VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition

    Authors: Junyi Biana, Weiqi Zhai, Xiaodi Huang, Jiaxuan Zheng, Shanfeng Zhu

    Abstract: Prevalent solution for BioNER involves using representation learning techniques coupled with sequence labeling. However, such methods are inherently task-specific, demonstrate poor generalizability, and often require dedicated model for each dataset. To leverage the versatile capabilities of recently remarkable large language models (LLMs), several endeavors have explored generative approaches to… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  35. arXiv:2404.17521  [pdf, other

    cs.RO cs.CV

    Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

    Authors: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task repre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Project website and open-source code: https://xiaoyao-li.github.io/research/ag2manip

  36. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  37. arXiv:2404.16666  [pdf, other

    cs.CV

    PhyRecon: Physically Plausible Neural Scene Reconstruction

    Authors: Junfeng Ni, Yixin Chen, Bohan **g, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Neural implicit representations have gained popularity in multi-view 3D reconstruction. However, most previous work struggles to yield physically plausible results, limiting their utility in domains requiring rigorous physical accuracy, such as embodied AI and robotics. This lack of plausibility stems from the absence of physics modeling in existing methods and their inability to recover intricate… ▽ More

    Submitted 2 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: project page: https://phyrecon.github.io/. arXiv admin note: text overlap with arXiv:2303.08605 by other authors

  38. arXiv:2404.16163  [pdf, other

    cs.RO cs.FL

    The Trembling-Hand Problem for LTLf Planning

    Authors: Pian Yu, Shufang Zhu, Giuseppe De Giacomo, Marta Kwiatkowska, Moshe Vardi

    Abstract: Consider an agent acting to achieve its temporal goal, but with a "trembling hand". In this case, the agent may mistakenly instruct, with a certain (typically small) probability, actions that are not intended due to faults or imprecision in its action selection mechanism, thereby leading to possible goal failure. We study the trembling-hand problem in the context of reasoning about actions and pla… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: The paper is accepted by IJCAI 2024

  39. arXiv:2404.15096  [pdf, other

    cs.RO cs.LG

    Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot

    Authors: Neil Guan, Shangqun Yu, Shifan Zhu, Donghyun Kim

    Abstract: Replicating the remarkable athleticism seen in animals has long been a challenge in robotics control. Although Reinforcement Learning (RL) has demonstrated significant progress in dynamic legged locomotion control, the substantial sim-to-real gap often hinders the real-world demonstration of truly dynamic movements. We propose a new framework to mitigate this gap through frequency-domain analysis-… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by Ubiquitous Robots 2024

  40. arXiv:2404.13654  [pdf, other

    cs.MA

    Multi-AUV Cooperative Underwater Multi-Target Tracking Based on Dynamic-Switching-enabled Multi-Agent Reinforcement Learning

    Authors: Shengbo Wang, Chuan Lin, Guangjie Han, Shengchao Zhu, Zhixian Li, Zhenyu Wang

    Abstract: With the rapid development of underwater communication, sensing, automation, robot technologies, autonomous underwater vehicle (AUV) swarms are gradually becoming popular and have been widely promoted in ocean exploration and underwater tracking or surveillance, etc. However, the complex underwater environment poses significant challenges for AUV swarm-based accurate tracking for the underwater mo… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  41. arXiv:2404.12824  [pdf, other

    cs.RO cs.LG cs.MA

    MAexp: A Generic Platform for RL-based Multi-Agent Exploration

    Authors: Shaohao Zhu, Jiacheng Zhou, Anjun Chen, Mingming Bai, Jiming Chen, **ming Xu

    Abstract: The sim-to-real gap poses a significant challenge in RL-based multi-agent exploration due to scene quantization and action discretization. Existing platforms suffer from the inefficiency in sampling and the lack of diversity in Multi-Agent Reinforcement Learning (MARL) algorithms across different scenarios, restraining their widespread applications. To fill these gaps, we propose MAexp, a generic… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  42. arXiv:2404.05039  [pdf, other

    cs.RO

    StaccaToe: A Single-Leg Robot that Mimics the Human Leg and Toe

    Authors: Nisal Perera, Shangqun Yu, Daniel Marew, Mack Tang, Ken Suzuki, Aidan McCormack, Shifan Zhu, Yong-Jae Kim, Donghyun Kim

    Abstract: We introduce StaccaToe, a human-scale, electric motor-powered single-leg robot designed to rival the agility of human locomotion through two distinctive attributes: an actuated toe and a co-actuation configuration inspired by the human leg. Leveraging the foundational design of HyperLeg's lower leg mechanism, we develop a stand-alone robot by incorporating new link designs, custom-designed power e… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  43. arXiv:2404.04698  [pdf, other

    cs.RO

    CEAR: Comprehensive Event Camera Dataset for Rapid Perception of Agile Quadruped Robots

    Authors: Shifan Zhu, Zixun Xiong, Donghyun Kim

    Abstract: When legged robots perform agile movements, traditional RGB cameras often produce blurred images, posing a challenge for rapid perception. Event cameras have emerged as a promising solution for capturing rapid perception and co** with challenging lighting conditions thanks to their low latency, high temporal resolution, and high dynamic range. However, integrating event cameras into agile-legged… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 8 pages, 7 figures

  44. arXiv:2404.02514  [pdf, other

    cs.CV

    Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

    Authors: Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compare… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  45. arXiv:2404.00424  [pdf, other

    q-fin.MF cs.AI cs.CE

    From attention to profit: quantitative trading strategy based on transformer

    Authors: Zhaofeng Zhang, Banghao Chen, Shengxin Zhu, Nicolas Langrené

    Abstract: In traditional quantitative trading practice, navigating the complicated and dynamic financial market presents a persistent challenge. Former machine learning approaches have struggled to fully capture various market variables, often ignore long-term information and fail to catch up with essential signals that may lead the profit. This paper introduces an enhanced transformer architecture and desi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    ACM Class: G.3; J.2

  46. arXiv:2403.18274  [pdf, other

    cs.CV

    DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

    Authors: Jiuming Liu, Dong Zhuo, Zhiheng Feng, Siting Zhu, Chensheng Peng, Zhe Liu, Hesheng Wang

    Abstract: Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between two modalities: Images are regular and dense, but LiDAR points are unordered and sparse. To address… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  47. arXiv:2403.17852  [pdf, other

    cs.LG stat.ML

    Counterfactual Fairness through Transforming Data Orthogonal to Bias

    Authors: Shuyi Chen, Shixiang Zhu

    Abstract: Machine learning models have shown exceptional prowess in solving complex issues across various domains. However, these models can sometimes exhibit biased decision-making, resulting in unequal treatment of different groups. Despite substantial research on counterfactual fairness, methods to reduce the impact of multivariate and continuous sensitive variables on decision-making outcomes are still… ▽ More

    Submitted 29 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  48. arXiv:2403.16048  [pdf, other

    cs.CV

    Edit3K: Universal Representation Learning for Video Editing Components

    Authors: Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, Yufei Wang, Tiejian Luo, Sijie Zhu

    Abstract: This paper focuses on understanding the predominant video creation pipeline, i.e., compositional video editing with six main types of editing components, including video effects, animation, transition, filter, sticker, and text. In contrast to existing visual representation learning of visual materials (i.e., images/videos), we aim to learn visual representations of editing actions/components that… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  49. arXiv:2403.14939  [pdf, other

    cs.CV

    STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

    Authors: Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao

    Abstract: Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  50. arXiv:2403.14781  [pdf, other

    cs.CV

    Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

    Authors: Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Qingkun Su, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu

    Abstract: In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques. The methodology utilizes the SMPL(Skinned Multi-Person Linear) model as the 3D human parametric model to establish a unified representation of body shape and pose. Thi… ▽ More

    Submitted 1 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.