Skip to main content

Showing 1–50 of 622 results for author: Mao, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19113  [pdf, other

    cs.AR cs.DC q-bio.GN

    MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

  2. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  3. arXiv:2406.16005  [pdf, other

    cs.DC

    A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

    Authors: Lei Chen, Shi Liu, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, Harry Xu

    Abstract: With rapid advances in network hardware, far memory has gained a great deal of traction due to its ability to break the memory capacity wall. Existing far memory systems fall into one of two data paths: one that uses the kernel's paging system to transparently access far memory at the page granularity, and a second that bypasses the kernel, fetching data at the object granularity. While it is gene… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  4. arXiv:2406.15658  [pdf, other

    cs.CV cs.AI

    TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

    Authors: Nemin Wu, Qian Cao, Zhangyu Wang, Ze** Liu, Yanlin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, Gengchen Mai

    Abstract: Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures. Submitted to NeurIPS 2024 Datasets and Benchmarks Track. Under review

  5. arXiv:2406.14842  [pdf, other

    q-bio.GN cs.HC

    Online t-SNE for single-cell RNA-seq

    Authors: Hui Ma, Kai Chen

    Abstract: Due to the sequential sample arrival, changing experiment conditions, and evolution of knowledge, the demand to continually visualize evolving structures of sequential and diverse single-cell RNA-sequencing (scRNA-seq) data becomes indispensable. However, as one of the state-of-the-art visualization and analysis methods for scRNA-seq, t-distributed stochastic neighbor embedding (t-SNE) merely visu… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.14585  [pdf

    physics.comp-ph cs.LG physics.data-an physics.optics

    Deep-learning-assisted reconfigurable metasurface antenna for real-time holographic beam steering

    Authors: Hyunjun Ma, **-soo Kim, Jong-Ho Choe, Q-Han Park

    Abstract: We propose a metasurface antenna capable of real time holographic beam steering. An array of reconfigurable dipoeles can generate on demand far field patterns of radiation through the specific encoding of meta atomic states. i.e., the configuration of each dipole. Suitable states for the generation of the desired patterns can be identified using iteartion, but this is very slow and needs to be don… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Journal ref: Nanophotonics 12.13 (2023): 2415-2423

  7. arXiv:2406.13873  [pdf, other

    cs.AI

    A Pure Transformer Pretraining Framework on Text-attributed Graphs

    Authors: Yu Song, Haitao Mao, Jiachen Xiao, **gzhe Liu, Zhikai Chen, Wei **, Carl Yang, Jiliang Tang, Hui Liu

    Abstract: Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Lan… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.13281  [pdf, other

    cs.CV

    ECAFormer: Low-light Image Enhancement using Cross Attention

    Authors: Yudi Ruan, Hao Ma, Weikai Li, Xiao Wang

    Abstract: Low-light image enhancement (LLIE) is vital for autonomous driving. Despite the importance, existing LLIE methods often prioritize robustness in overall brightness adjustment, which can come at the expense of detail preservation. To overcome this limitation,we propose the Hierarchical Mutual Enhancement via Cross-Attention transformer (ECAFormer), a novel network that utilizes Dual Multi-head Self… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  9. arXiv:2406.12465  [pdf, other

    cs.CY cs.AI cs.IR

    RIGL: A Unified Reciprocal Approach for Tracing the Independent and Group Learning Processes

    Authors: Xiaoshan Yu, Chuan Qin, Dazhong Shen, Shangshang Yang, Hai** Ma, Hengshu Zhu, Xingyi Zhang

    Abstract: In the realm of education, both independent learning and group learning are esteemed as the most classic paradigms. The former allows learners to self-direct their studies, while the latter is typically characterized by teacher-directed scenarios. Recent studies in the field of intelligent education have leveraged deep temporal models to trace the learning process, capturing the dynamics of studen… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024. 12 pages

  10. arXiv:2406.10727  [pdf, other

    cs.LG

    Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

    Authors: Zhikai Chen, Haitao Mao, **gzhe Liu, Yu Song, Bingheng Li, Wei **, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, Jiliang Tang

    Abstract: Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models t… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Preliminary version: if you find any mistakes regarding the evaluation, feel free to contact the first author

  11. arXiv:2406.08961  [pdf, other

    q-bio.BM cs.LG

    SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

    Authors: Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  12. arXiv:2406.05086  [pdf, other

    math.OC cs.AI cs.GT

    Robust Reward Design for Markov Decision Processes

    Authors: Shuo Wu, Haoxiang Ma, Jie Fu, Shuo Han

    Abstract: The problem of reward design examines the interaction between a leader and a follower, where the leader aims to shape the follower's behavior to maximize the leader's payoff by modifying the follower's reward function. Current approaches to reward design rely on an accurate model of how the follower responds to reward modifications, which can be sensitive to modeling inaccuracies. To address this… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 50 pages, 8 figures

  13. Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization

    Authors: Huanhuan Ma, **ghao Zhang, Qiang Liu, Shu Wu, Liang Wang

    Abstract: The rapid spread of information through mobile devices and media has led to the widespread of false or deceptive news, causing significant concerns in society. Among different types of misinformation, image repurposing, also known as out-of-context misinformation, remains highly prevalent and effective. However, current approaches for detecting out-of-context misinformation often lack interpretabi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ICASSP 2024 lecture paper

  14. arXiv:2406.04689  [pdf, other

    cs.CV

    CDeFuse: Continuous Decomposition for Infrared and Visible Image Fusion

    Authors: Haolong Ma, Hui Li, Chunyang Cheng, Xiaoning Song, Zhongwei Shen

    Abstract: As a common image processing technique, image decomposition is often used to extract complementary information between modalities. In current decomposition-based image fusion methods, typically, source images are decomposed into three parts at single scale (i.e., visible-exclusive part, infrared-exclusive part, and common part) and lacking interaction between modalities during the decomposition pr… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  15. arXiv:2406.03999  [pdf, other

    cs.LG cs.CV

    Unveiling the Dynamics of Information Interplay in Supervised Learning

    Authors: Kun Song, Zhiquan Tan, Bochao Zou, Huimin Ma, Weiran Huang

    Abstract: In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Specifically, inspired by the theory of Neural Collapse, we introduce matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to assess the interactions of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  16. arXiv:2406.03917  [pdf, other

    cs.CV

    Frequency-based Matcher for Long-tailed Semantic Segmentation

    Authors: Shan Li, Lu Yang, Pu Cao, Liulei Li, Huadong Ma

    Abstract: The successful application of semantic segmentation technology in the real world has been among the most exciting achievements in the computer vision community over the past decade. Although the long-tailed phenomenon has been investigated in many fields, e.g., classification and object detection, it has not received enough attention in semantic segmentation and has become a non-negligible obstacl… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted for publication as a Regular paper in the IEEE Transactions on Multimedia

  17. arXiv:2406.02803  [pdf, other

    cs.DC

    DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

    Authors: Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu

    Abstract: Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  18. arXiv:2406.02378  [pdf, other

    cs.CL

    On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

    Authors: Guangliang Liu, Haitao Mao, Bochuan Cao, Zhiyu Xue, Kristen Johnson, Jiliang Tang, Rongrong Wang

    Abstract: Large Language Models (LLMs) can improve their responses when instructed to do so, a capability known as self-correction. When these instructions lack specific details about the issues in the response, this is referred to as leveraging the intrinsic self-correction capability. The empirical success of self-correction can be found in various applications, e.g., text detoxification and social bias m… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 7 figures

  19. arXiv:2406.01908  [pdf, other

    cs.LG math.OC

    PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

    Authors: Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

    Abstract: Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  20. arXiv:2406.01899  [pdf, other

    cs.LG

    Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models

    Authors: Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, Jiliang Tang

    Abstract: Models for natural language and images benefit from data scaling behavior: the more data fed into the model, the better they perform. This 'better with more' phenomenon enables the effectiveness of large-scale pre-training on vast amounts of data. However, current graph pre-training methods struggle to scale up data due to heterogeneity across graphs. To achieve effective data scaling, we aim to d… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2405.20881  [pdf, other

    cs.CV

    S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion

    Authors: Haolong Ma, Hui Li, Chunyang Cheng, Gaoang Wang, Xiaoning Song, Xiaojun Wu

    Abstract: As one of the tasks in Image Fusion, Infrared and Visible Image Fusion aims to integrate complementary information captured by sensors of different modalities into a single image. The Selective State Space Model (SSSM), known for its ability to capture long-range dependencies, has demonstrated its potential in the field of computer vision. However, in image fusion, current methods underestimate th… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  22. arXiv:2405.18731  [pdf, other

    eess.SP cs.AI physics.comp-ph

    VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

    Authors: Ziqing Xing, Zhaoyang Zhang, Zirui Chen, Yusong Wang, Haoran Ma, Zhun Wei, Gang Bao

    Abstract: Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating upd… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages, 21 figures

  23. arXiv:2405.17152  [pdf, other

    cs.MA cs.AI

    CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

    Authors: **gqing Ruan, Ziyue Li, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao, Rui Zhao

    Abstract: Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator sel… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  24. arXiv:2405.16363  [pdf, other

    cs.IR cs.AI

    LLMs for User Interest Exploration in Large-scale Recommendation Systems

    Authors: Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren Han, Shuchao Bi, Lexi Baugher, Ed Chi, Minmin Chen

    Abstract: Traditional recommendation systems are subject to a strong feedback loop by learning from and reinforcing past user-item interactions, which in turn limits the discovery of novel user interests. To address this, we introduce a hybrid hierarchical framework combining Large Language Models (LLMs) and classic recommendation models for user interest exploration. The framework controls the interfacing… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  25. arXiv:2405.15199  [pdf, other

    cs.CV

    ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

    Authors: **gyuan Zhu, Shiyu Li, Yuxuan Liu, ** Huang, Jiulong Shan, Huimin Ma, Jian Yuan

    Abstract: Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on b… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  26. arXiv:2405.15124  [pdf, other

    cs.LG cs.AI

    Scaling Law for Time Series Forecasting

    Authors: **gzhe Shi, Qinwei Ma, Huan Ma, Lei Li

    Abstract: Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 20 pages

  27. arXiv:2405.13049  [pdf, other

    cs.CL cs.AI cs.MM

    SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

    Authors: Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria

    Abstract: The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We… ▽ More

    Submitted 10 June, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, 4 Tables

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  28. arXiv:2405.12850  [pdf, other

    cs.CV

    Weakly supervised alignment and registration of MR-CT for cervical cancer radiotherapy

    Authors: Jjahao Zhang, Yin Gu, Deyu Sun, Yuhua Gao, Ming Gao, Ming Cui, Teng Zhang, He Ma

    Abstract: Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be use… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  29. arXiv:2405.11715  [pdf, other

    cs.AI cs.LG

    Semantic Trajectory Data Mining with LLM-Informed POI Classification

    Authors: Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma

    Abstract: Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly en… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  30. arXiv:2405.10121  [pdf, other

    cs.CL cs.MM

    Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation

    Authors: Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin

    Abstract: Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Under Review

  31. arXiv:2405.09593  [pdf, other

    cs.DB cs.AI

    SQL-to-Schema Enhances Schema Linking in Text-to-SQL

    Authors: Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui Zhao

    Abstract: In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  32. arXiv:2405.09148  [pdf, ps, other

    cs.CV

    A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection

    Authors: Honghui Chen, **** Chen, Huan Mao, Mengxi Jiang

    Abstract: Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

    MSC Class: 68T01 ACM Class: I.2.10

  33. arXiv:2405.08547  [pdf, other

    cs.CV

    Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph

    Authors: Zhiwei Wang, Jun Huang, Longhua Ma, Chengyu Wu, Hongyu Ma

    Abstract: In visual tasks, large teacher models capture essential features and deep information, enhancing performance. However, distilling this information into smaller student models often leads to performance loss due to structural differences and capacity limitations. To tackle this, we propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy and an a… ▽ More

    Submitted 16 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  34. arXiv:2405.07104  [pdf, other

    cs.RO

    Uncertainty-Aware Shape Estimation of a Surgical Continuum Manipulator in Constrained Environments using Fiber Bragg Grating Sensors

    Authors: Alexander Schwarz, Arian Mehrfard, Golchehr Amirkhani, Henry Phalen, Justin H. Ma, Robert B. Grupp, Alejandro Martin-Gomez, Mehran Armand

    Abstract: Continuum Dexterous Manipulators (CDMs) are well-suited tools for minimally invasive surgery due to their inherent dexterity and reachability. Nonetheless, their flexible structure and non-linear curvature pose significant challenges for shape-based feedback control. The use of Fiber Bragg Grating (FBG) sensors for shape sensing has shown great potential in estimating the CDM's tip position and su… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  35. arXiv:2405.05787  [pdf, other

    cs.RO cs.CV eess.SY

    Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

    Authors: Tianpeng Zhang, Sekeun Kim, Jerome Charton, Haitong Ma, Kyungsang Kim, Na Li, Quanzheng Li

    Abstract: The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate map** between CT image and robot, and (iii) ta… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  36. arXiv:2405.03978  [pdf, other

    cs.CV

    VMambaCC: A Visual State Space Model for Crowd Counting

    Authors: Hao-Yuan Ma, Li Zhang, Shuai Shi

    Abstract: As a deep learning model, Visual Mamba (VMamba) has a low computational complexity and a global receptive field, which has been successful applied to image classification and detection. To extend its applications, we apply VMamba to crowd counting and propose a novel VMambaCC (VMamba Crowd Counting) model. Naturally, VMambaCC inherits the merits of VMamba, or global modeling for images and low com… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  37. arXiv:2405.01828  [pdf, other

    cs.CV

    FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space

    Authors: Hui Ma, Sen Lei, Turgay Celik, Heng-Chao Li

    Abstract: Facial Expression Recognition (FER) plays a pivotal role in understanding human emotional cues. However, traditional FER methods based on visual information have some limitations, such as preprocessing, feature extraction, and multi-stage classification procedures. These not only increase computational complexity but also require a significant amount of computing resources. Considering Convolution… ▽ More

    Submitted 9 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  38. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  39. arXiv:2404.14934  [pdf, other

    cs.MM cs.CV cs.HC

    G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

    Authors: Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

    Abstract: Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in develo** generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to desi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 18 pages, 29 figures

  40. arXiv:2404.14928  [pdf, other

    cs.LG cs.AI cs.CL cs.SI

    Graph Machine Learning in the Era of Large Language Models (LLMs)

    Authors: Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, Qing Li

    Abstract: Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented ca… ▽ More

    Submitted 3 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  41. arXiv:2404.12090  [pdf, other

    cs.AI

    X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

    Authors: Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, **gqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao

    Abstract: The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  42. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  43. arXiv:2404.10160  [pdf, other

    cs.AI

    Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

    Authors: Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

    Abstract: Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debat… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: The first three authors contributed equally to this work

  44. arXiv:2404.08450  [pdf, other

    cs.CV

    Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

    Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu

    Abstract: Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages with 6 figures, Accepted by CVPRW 2024

  45. arXiv:2404.08089  [pdf, other

    cs.LG math.OC

    Efficient Duple Perturbation Robustness in Low-rank MDPs

    Authors: Yang Hu, Haitong Ma, Bo Dai, Na Li

    Abstract: The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from efficiency issues that obstruct their real-world implementation. In this paper, we introduce duple perturbation robustness, i.e. perturbation on both the feature and factor vectors for low-rank Markov decision processes (MDPs), via a novel characteriza… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 25 pages, 8 figures, in submission to ICML'24

  46. arXiv:2404.06483  [pdf, other

    cs.CV

    RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos

    Authors: Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma

    Abstract: Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.12788

  47. arXiv:2404.06036  [pdf, other

    cs.CV

    Space-Time Video Super-resolution with Neural Operator

    Authors: Yuantong Zhang, Hanyou Zheng, Daiqin Yang, Zhenzhong Chen, Haichuan Ma, Wenpeng Ding

    Abstract: This paper addresses the task of space-time video super-resolution (ST-VSR). Existing methods generally suffer from inaccurate motion estimation and motion compensation (MEMC) problems for large motions. Inspired by recent progress in physics-informed neural networks, we model the challenges of MEMC in ST-VSR as a map** between two continuous function spaces. Specifically, our approach transform… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  48. arXiv:2404.05318  [pdf, other

    cs.LG cs.RO

    Stochastic Online Optimization for Cyber-Physical and Robotic Systems

    Authors: Hao Ma, Melanie Zeilinger, Michael Muehlebach

    Abstract: We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially ob… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 46 pages, 16 figures

  49. arXiv:2404.05221  [pdf, other

    cs.CL cs.AI

    LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

    Authors: Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

    Abstract: Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on develo** advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the la… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Project website: https://www.llm-reasoners.net/

  50. arXiv:2404.05051  [pdf, other

    cs.LG cs.RO

    Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

    Authors: Haitong Ma, Zhaolin Ren, Bo Dai, Na Li

    Abstract: We study sim-to-real skill transfer and discovery in the context of robotics control using representation learning. We draw inspiration from spectral decomposition of Markov decision processes. The spectral decomposition brings about representation that can linearly represent the state-action value function induced by any policies, thus can be regarded as skills. The skill representations are tran… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures. Project page: https://congharvard.github.io/steady-sim-to-real/