Skip to main content

Showing 1–50 of 625 results for author: Mao, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01537  [pdf, other

    cs.RO cs.CV

    WaveShot: A Compact Portable Unmanned Surface Vessel for Dynamic Water Surface Videography and Media Production

    Authors: Shijian Ma, Shicong Ma, Weize Ma

    Abstract: This paper presents WaveShot, an innovative portable unmanned surface vessel that aims to transform water surface videography by offering a highly maneuverable, cost-effective, and safe alternative to traditional filming methods. WaveShot is specially designed for the modern demands of film production, advertising, documentaries, and visual arts, equipped with professional-grade waterproof cameras… ▽ More

    Submitted 12 March, 2024; originally announced July 2024.

  2. arXiv:2407.01220  [pdf, other

    cs.CV

    Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation

    Authors: Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wen** Ma, Yuwei Guo, Shuyuan Yang

    Abstract: Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enables open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. While effective, however, the per-pixel distilla… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 7 figures

  3. arXiv:2407.00569  [pdf, other

    cs.CV cs.AI cs.CL

    Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

    Authors: Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

    Abstract: Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, w… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Main Conference. 21 pages, 20 figures

  4. arXiv:2406.18864  [pdf, other

    cs.CV

    Learning Modality Knowledge Alignment for Cross-Modality Transfer

    Authors: Wenxuan Ma, Shuang Li, Lincan Cai, **gxuan Kang

    Abstract: Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  5. arXiv:2406.18115  [pdf, other

    cs.RO cs.AI cs.CV

    Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps

    Authors: Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang

    Abstract: Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instruction… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Open-vocabulary, Mobile Manipulation, Dynamic Environments, 3D Semantic Maps, Zero-shot, LLMs, VLMs, 18 pages, 2 figures

  6. arXiv:2406.17797  [pdf, other

    physics.chem-ph cs.AI cs.LG

    MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

    Authors: Shikun Feng, Jiaxin Zheng, Yinjun Jia, Yanwen Huang, Fengfeng Zhou, Wei-Ying Ma, Yanyan Lan

    Abstract: Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address th… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.15796  [pdf, other

    cs.CL

    Rethinking Entity-level Unlearning for Large Language Models

    Authors: Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin

    Abstract: Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Work in progress

  8. arXiv:2406.14884  [pdf, other

    cs.CL

    FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents

    Authors: Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li

    Abstract: LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. De… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  9. arXiv:2406.14506  [pdf, ps, other

    cs.DS cs.DM math.CO

    Online Matching and Contention Resolution for Edge Arrivals with Vanishing Probabilities

    Authors: Will Ma, Calum MacRury, Pranav Nuti

    Abstract: We study the performance of sequential contention resolution and matching algorithms on random graphs with vanishing edge probabilities. When the edges of the graph are processed in an adversarially-chosen order, we derive a new OCRS that is $0.382$-selectable, attaining the "independence benchmark" from the literature under the vanishing edge probabilities assumption. Complementary to this positi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    ACM Class: F.2.2; G.2.2

    Journal ref: In EC 2024

  10. Unifying Graph Convolution and Contrastive Learning in Collaborative Filtering

    Authors: Yihong Wu, Le Zhang, Fengran Mo, Tianyu Zhu, Weizhi Ma, Jian-Yun Nie

    Abstract: Graph-based models and contrastive learning have emerged as prominent methods in Collaborative Filtering (CF). While many existing models in CF incorporate these methods in their design, there seems to be a limited depth of analysis regarding the foundational principles behind them. This paper bridges graph convolution, a pivotal element of graph-based models, with contrastive learning through a t… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  11. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  12. arXiv:2406.11775  [pdf, other

    cs.CV cs.AI

    Task Me Anything

    Authors: Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

    Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their spec… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: website: https://www.task-me-anything.org

  13. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  14. arXiv:2406.10303  [pdf, other

    cs.CL cs.AI

    A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

    Authors: **qiang Wang, Huansheng Ning, Yi Peng, Qikai Wei, Daniel Tesfai, Wenwei Mao, Tao Zhu, Runhe Huang

    Abstract: Large Language Models (LLMs) have demonstrated surprising performance across various natural language processing tasks. Recently, medical LLMs enhanced with domain-specific knowledge have exhibited excellent capabilities in medical consultation and diagnosis. These models can smoothly simulate doctor-patient dialogues and provide professional medical advice. Most medical LLMs are developed through… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 20 pages,3 figures

  15. arXiv:2406.10215  [pdf, other

    cs.CL cs.LG

    DevBench: A multimodal developmental benchmark for language learning

    Authors: Alvin Wei Ming Tan, Sunny Yu, Bria Long, Wan**g Anya Ma, Tonya Murray, Rebecca D. Silverman, Jason D. Yeatman, Michael C. Frank

    Abstract: How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and wit… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  16. arXiv:2406.09613  [pdf, other

    cs.CV

    ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

    Authors: Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille

    Abstract: A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e.g., class name and bounding box) and 3D information (e.g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images. This is a challenging task, as it involves inferring 3D information from 2D signals and most importantly, generalizing to rigid objects from unseen categori… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  17. arXiv:2406.09003  [pdf, other

    cs.CV cs.LG

    Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

    Authors: Lincan Cai, Shuang Li, Wenxuan Ma, **gxuan Kang, Binhui Xie, Zixun Sun, Chengwei Zhu

    Abstract: Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal f… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  18. arXiv:2406.08980  [pdf, other

    q-bio.BM cs.LG

    From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

    Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2406.08961  [pdf, other

    q-bio.BM cs.LG

    SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

    Authors: Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  20. arXiv:2406.08953  [pdf, other

    cs.CV cs.LG

    Preserving Identity with Variational Score for General-purpose 3D Editing

    Authors: Duong H. Le, Tuan Pham, Aniruddha Kembhavi, Stephan Mandt, Wei-Chiu Ma, Jiasen Lu

    Abstract: We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 22 pages, 14 figures

  21. arXiv:2406.08743  [pdf, other

    cs.LG

    Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner

    Authors: Tong Nie, Guoyang Qin, Wei Ma, Jian Sun

    Abstract: $\textbf{This is the conference version of our paper: Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner}… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by the Conference in Emerging Technologies in Transportation Systems (TRC-30). arXiv admin note: substantial text overlap with arXiv:2405.03185

  22. arXiv:2406.07791  [pdf, other

    cs.CL cs.AI

    Judging the Judges: A Systematic Investigation of Position Bias in Pairwise Comparative Assessments by LLMs

    Authors: Lin Shi, Weicheng Ma, Soroush Vosoughi

    Abstract: LLM-as-a-Judge offers a promising alternative to human judges across various tasks, yet inherent biases, particularly position bias - a systematic preference for answers based on their position in the prompt - compromise its effectiveness. Our study investigates this issue by develo** a framework to systematically study and quantify position bias using metrics such as repetitional consistency, p… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 70 pages, around 200 figures and subfigures

  23. arXiv:2406.06133  [pdf, other

    cs.CV

    ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models

    Authors: Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice, Aleksander Holynski, Forrester Cole, Brian L. Curless, Janne Kontkanen

    Abstract: We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF). Our main idea is to leverage NeRFs to model scene-specific, fine-grained details, while capitalizing on diffusion models to extrapolate beyond our observed data. A key ingredient is to track visibility to determine what portions of the scene have not been observed, and focus on reco… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 8 pages, 8 figures, CVPR2024

  24. arXiv:2406.03248  [pdf, other

    cs.IR cs.CL

    Large Language Models as Evaluators for Recommendation Explanations

    Authors: Xiaoyu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie Sun, Min Zhang

    Abstract: The explainability of recommender systems has attracted significant attention in academia and industry. Many efforts have been made for explainable recommendations, yet evaluating the quality of the explanations remains a challenging and unresolved issue. In recent years, leveraging LLMs as evaluators presents a promising avenue in Natural Language Processing tasks (e.g., sentiment classification,… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  25. arXiv:2406.03141  [pdf, other

    q-bio.BM cs.LG

    Floating Anchor Diffusion Model for Multi-motif Scaffolding

    Authors: Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen

    Abstract: Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes. Previous works approach the problem by inpainting or conditional generation. Both of them can only scaffold motifs with fixed positions, and the conditional generation cannot guarantee the presence of motifs. However… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  26. arXiv:2406.02630  [pdf, other

    cs.CR cs.AI

    AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways

    Authors: Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, Yang Xiang

    Abstract: An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or makes decisions based on pre-defined objectives and data inputs. AI agents, capable of perceiving user inputs, reasoning and planning tasks, and executing actions, have seen remarkable advancements in algorithm development and task performance. However, the security challenges they pose remain under-expl… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ACM Computing Survey

  27. arXiv:2406.02435  [pdf, other

    cs.CV

    Generative Active Learning for Long-tailed Instance Segmentation

    Authors: Muzhi Zhu, Chengxiang Fan, Hao Chen, Yang Liu, Weian Mao, Xiaogang Xu, Chunhua Shen

    Abstract: Recently, large-scale language-image generative models have gained widespread attention and many works have utilized generated data from these models to further enhance the performance of perception tasks. However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data. On the other hand, there is… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  28. arXiv:2406.02014  [pdf, other

    q-bio.NC cs.LG cs.SD eess.AS

    Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer

    Authors: Wanli Ma, Xuegang Tang, ** Gu, Ying Wang, Yuling Xia

    Abstract: In the fields of brain-computer interaction and cognitive neuroscience, effective decoding of auditory signals from task-based functional magnetic resonance imaging (fMRI) is key to understanding how the brain processes complex auditory information. Although existing methods have enhanced decoding capabilities, limitations remain in information utilization and model representation. To overcome the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  29. arXiv:2406.00819  [pdf, ps, other

    cs.GT cs.DS

    Sample Complexity of Posted Pricing for a Single Item

    Authors: Billy **, Thomas Kesselheim, Will Ma, Sahil Singla

    Abstract: Selling a single item to $n$ self-interested buyers is a fundamental problem in economics, where the two objectives typically considered are welfare maximization and revenue maximization. Since the optimal mechanisms are often impractical and do not work for sequential buyers, posted pricing mechanisms, where fixed prices are set for the item for different buyers, have emerged as a practical and e… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  30. arXiv:2406.00622  [pdf, other

    cs.CV cs.AI

    Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

    Authors: Xingrui Wang, Wufei Ma, Angtian Wang, Shuo Chen, Adam Kortylewski, Alan Yuille

    Abstract: For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions within 3D scenes from video is crucial for effective reasoning. In this work, we introduce a video question answering dataset SuperCLEVR-Physics that focuses on the dynamics properties of objects. We concentrate on physical concepts -- velocity, acceleration, and collisions within 4D scenes, w… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  31. arXiv:2406.00378  [pdf

    physics.app-ph cs.NE

    Real-Time State Modulation and Acquisition Circuit in Neuromorphic Memristive Systems

    Authors: Shengbo Wang, Cong Li, Tongming Pu, Jian Zhang, Weihao Ma, Luigi Occhipinti, Arokia Nathan, Shuo Gao

    Abstract: Memristive neuromorphic systems are designed to emulate human perception and cognition, where the memristor states represent essential historical information to perform both low-level and high-level tasks. However, current systems face challenges with the separation of state modulation and acquisition, leading to undesired time delays that impact real-time performance. To overcome this issue, we i… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  32. arXiv:2406.00017  [pdf, other

    cs.CL cs.AI cs.MM

    PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

    Authors: Shezheng Song, Shasha Li, Shan Zhao, Chengyu Wang, Xiaopeng Li, Jie Yu, Qian Wan, Jun Ma, Tianwei Yan, Wentao Ma, Xiaoguang Mao

    Abstract: Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text to… ▽ More

    Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced June 2024.

    Comments: Code will be released upon publication

  33. arXiv:2405.21013  [pdf, other

    cs.CV

    StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

    Authors: Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, **gdong Wang

    Abstract: Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Langu… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  34. arXiv:2405.20786  [pdf, other

    cs.CV cs.HC

    Stratified Avatar Generation from Sparse Observations

    Authors: Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu

    Abstract: Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024 (Oral)

  35. arXiv:2405.19915  [pdf, other

    cs.AI

    P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

    Authors: Huihong Shi, Xin Cheng, Wendong Mao, Zhongfeng Wang

    Abstract: Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quantization algorithms but retained floating-point scaling factors, which yield non-negligible re-quantization overhead, limiting ViTs' hardware efficien… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  36. arXiv:2405.18361  [pdf, other

    cs.CV

    Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

    Authors: Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

    Abstract: Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  37. arXiv:2405.18058  [pdf, other

    cs.IR

    ReChorus2.0: A Modular and Task-Flexible Recommendation Library

    Authors: Jiayu Li, Hanyu Li, Zhiyu He, Weizhi Ma, Peijie Sun, Min Zhang, Shao** Ma

    Abstract: With the applications of recommendation systems rapidly expanding, an increasing number of studies have focused on every aspect of recommender systems with different data inputs, models, and task settings. Therefore, a flexible library is needed to help researchers implement the experimental strategies they require. Existing open libraries for recommendation scenarios have enabled reproducing vari… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures. Under review

  38. arXiv:2405.16915  [pdf, other

    cs.CV cs.LG

    Multilingual Diversity Improves Vision-Language Representations

    Authors: Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

    Abstract: Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  39. arXiv:2405.10343  [pdf, other

    q-bio.BM cs.AI cs.LG

    UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

    Authors: Shikun Feng, Yuyan Ni, Minghao Li, Yanwen Huang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

    Abstract: Recently, a noticeable trend has emerged in develo** pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound un… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  40. arXiv:2405.09497  [pdf, other

    cs.IT cs.NI eess.SP

    Towards the limits: Sensing Capability Measurement for ISAC Through Channel Encoder

    Authors: Fei Shang, Haohua Du, Panlong Yang, Xin He, Wen Ma, Xiang-Yang Li

    Abstract: Integrated Sensing and Communication (ISAC) is gradually becoming a reality due to the significant increase in frequency and bandwidth of next-generation wireless communication technologies. Therefore it becomes crucial to evaluate the communication and sensing performance using appropriate channel models to address resource competition from each other. Existing work only models the sensing capabi… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  41. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  42. arXiv:2405.04496  [pdf, other

    cs.CV

    Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

    Authors: Yi Zuo, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wen** Ma, Shuyuan Yang, Yuwei Guo

    Abstract: Existing diffusion-based video editing methods have achieved impressive results in motion editing. Most of the existing methods focus on the motion alignment between the edited video and the reference video. However, these methods do not constrain the background and object content of the video to remain unchanged, which makes it possible for users to generate unexpected videos. In this paper, we p… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  43. arXiv:2405.03882  [pdf, other

    cs.CV cs.AI

    Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

    Authors: Huihong Shi, Haikuo Shao, Wendong Mao, Zhongfeng Wang

    Abstract: Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model sizes and intensive computations hinder ViTs' deployment on embedded devices, calling for effective model compression methods, such as quantization. Unf… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  44. arXiv:2405.03185  [pdf, other

    cs.LG

    Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner

    Authors: Tong Nie, Guoyang Qin, Wei Ma, Jian Sun

    Abstract: Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system. Existing methods aim to reconstruct STTD using low-dimensional models. However, they are limited to data-specific dimensions or source-dependent patterns, restricting them from unifying representations. Here, we present a novel paradigm to address the STTD learning problem by parame… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  45. arXiv:2405.02957  [pdf, other

    cs.AI

    Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents

    Authors: Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu

    Abstract: In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can s… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  46. arXiv:2405.01215  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna Enhanced Wireless Sensing Via Antenna Position Optimization

    Authors: Wenyan Ma, Lipeng Zhu, Rui Zhang

    Abstract: In this paper, we propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs). First, we show that the angle estimation performance in wireless sensing is fundamentally determined by the array geometry, where… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures. We propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs)

  47. arXiv:2404.18440  [pdf, other

    physics.ao-ph astro-ph.EP cs.LG physics.comp-ph

    Potential Paradigm Shift in Hazard Risk Management: AI-Based Weather Forecast for Tropical Cyclone Hazards

    Authors: Kairui Feng, Dazhi Xi, Wei Ma, Cao Wang, Yuanlong Li, Xuanhong Chen

    Abstract: The advents of Artificial Intelligence (AI)-driven models marks a paradigm shift in risk management strategies for meteorological hazards. This study specifically employs tropical cyclones (TCs) as a focal example. We engineer a perturbation-based method to produce ensemble forecasts using the advanced Pangu AI weather model. Unlike traditional approaches that often generate fewer than 20 scenario… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  48. arXiv:2404.16572  [pdf

    cs.SI

    ReliK: A Reliability Measure for Knowledge Graph Embeddings

    Authors: Maximilian K. Egger, Wenyue Ma, Davide Mottin, Panagiotis Karras, Ilaria Bordino, Francesco Gullo, Aris Anagnostopoulos

    Abstract: Can we assess a priori how well a knowledge graph embedding will perform on a specific downstream task and in a specific part of the knowledge graph? Knowledge graph embeddings (KGEs) represent entities (e.g., "da Vinci," "Mona Lisa") and relationships (e.g., "painted") of a knowledge graph (KG) as vectors. KGEs are generated by optimizing an embedding score, which assesses whether a triple (e.g.,… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  49. arXiv:2404.15753  [pdf, other

    cs.HC cs.IR

    Introducing EEG Analyses to Help Personal Music Preference Prediction

    Authors: Zhiyu He, Jiayu Li, Weizhi Ma, Min Zhang, Yiqun Liu, Shao** Ma

    Abstract: Nowadays, personalized recommender systems play an increasingly important role in music scenarios in our daily life with the preference prediction ability. However, existing methods mainly rely on users' implicit feedback (e.g., click, dwell time) which ignores the detailed user experience. This paper introduces Electroencephalography (EEG) signals to personal music preferences as a basis for the… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by CHCI 2022

  50. arXiv:2404.15643  [pdf, ps, other

    cs.IT eess.SP

    Dynamic Beam Coverage for Satellite Communications Aided by Movable-Antenna Array

    Authors: Lipeng Zhu, Xiangyu Pi, Wenyan Ma, Zhenyu Xiao, Rui Zhang

    Abstract: Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.