Skip to main content

Showing 1–50 of 1,392 results for author: Chen, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00902  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning

    Authors: Nan Xu, Fei Wang, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: Motivated by in-context learning (ICL) capabilities of Large Language models (LLMs), multimodal LLMs with additional visual modality are also exhibited with similar ICL abilities when multiple image-text pairs are provided as demonstrations. However, relatively less work has been done to investigate the principles behind how and why multimodal ICL works. We conduct a systematic and principled eval… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2407.00769  [pdf, other

    quant-ph cs.DC

    Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

    Authors: Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhiling Pei, Xingcheng Zhang, Wanli Ouyang

    Abstract: Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and de… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2407.00286  [pdf, other

    cs.NI cs.LG

    Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks

    Authors: Zifan Zhang, Yuchen Liu, Zhiyuan Peng, Mingzhe Chen, Dongkuan Xu, Shuguang Cui

    Abstract: Optimizing edge caching is crucial for the advancement of next-generation (nextG) wireless networks, ensuring high-speed and low-latency services for mobile users. Existing data-driven optimization approaches often lack awareness of the distribution of random data variables and focus solely on optimizing cache hit rates, neglecting potential reliability concerns, such as base station overload and… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Journal on Selected Areas in Communications (JSAC)

  5. arXiv:2406.19131  [pdf, other

    cs.CV

    CELLO: Causal Evaluation of Large Vision-Language Models

    Authors: Meiqi Chen, Bo Peng, Yan Zhang, Chaochao Lu

    Abstract: Causal reasoning is fundamental to human intelligence and crucial for effective decision-making in real-world environments. Despite recent advancements in large vision-language models (LVLMs), their ability to comprehend causality remains unclear. Previous work typically focuses on commonsense causality between events and/or actions, which is insufficient for applications like embodied agents and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.19055  [pdf, other

    cs.CV

    SimpleFusion: A Simple Fusion Framework for Infrared and Visible Images

    Authors: Ming Chen, Yuxuan Cheng, Xinwei He, Xinyue Wang, Yan Aze, **hai Xiang

    Abstract: Integrating visible and infrared images into one high-quality image, also known as visible and infrared image fusion, is a challenging yet critical task for many downstream vision tasks. Most existing works utilize pretrained deep neural networks or design sophisticated frameworks with strong priors for this task, which may be unsuitable or lack flexibility. This paper presents SimpleFusion, a sim… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: code:https://github.com/hxwxss/SimpleFusion-A-Simple-Fusion-Framework-for-Infrared-and-Visible-Images

  7. arXiv:2406.17681  [pdf, other

    cs.CL

    VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

    Authors: Kun Qian, Shunji Wan, Claudia Tang, Youzhi Wang, Xuanming Zhang, Maximillian Chen, Zhou Yu

    Abstract: As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem. To ensure fair evaluation, recent benchmarks release only the training and validation sets, kee** the test set labels closed-source. They require anyone wishing… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.16968  [pdf, other

    cs.LG cs.AI

    Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

    Authors: Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans Arno Jacobsen

    Abstract: Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  9. arXiv:2406.16370  [pdf, other

    cs.RO

    An Active Search Strategy with Multiple Unmanned Aerial Systems for Multiple Targets

    Authors: Chuanxiang Gao, Xinyi Wang, Xi Chen, Ben M. Chen

    Abstract: The challenge of efficient target searching in vast natural environments has driven the need for advanced multi-UAV active search strategies. This paper introduces a novel method in which global and local information is adeptly merged to avoid issues such as myopia and redundant back-and-forth movements. In addition, a trajectory generation method is used to ensure the search pattern within contin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  10. arXiv:2406.16213  [pdf, other

    cs.LG

    Provable Statistical Rates for Consistency Diffusion Models

    Authors: Zehao Dou, Minshuo Chen, Mengdi Wang, Zhuoran Yang

    Abstract: Diffusion models have revolutionized various application domains, including computer vision and audio generation. Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. In response, consistency models have been developed to merge multiple steps in the sampling process, thereby significantly boosting the s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 28 pages, 2 figures

  11. arXiv:2406.16198  [pdf, other

    cs.LG cs.AR

    Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGA

    Authors: Zehuan Zhang, Hongxiang Fan, Hao Mark Chen, Lukasz Dudziak, Wayne Luk

    Abstract: The increasing deployment of artificial intelligence (AI) for critical decision-making amplifies the necessity for trustworthy AI, where uncertainty estimation plays a pivotal role in ensuring trustworthiness. Dropout-based Bayesian Neural Networks (BayesNNs) are prominent in this field, offering reliable uncertainty estimates. Despite their effectiveness, existing dropout-based BayesNNs typically… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference (DAC) 2024

  12. arXiv:2406.16173  [pdf, other

    cs.HC

    Crepe: A Mobile Screen Data Collector Using Graph Query

    Authors: Yuwen Lu, Meng Chen, Qi Zhao, Victor Cox, Yang Yang, Meng Jiang, Jay Brockman, Tamara Kay, Toby Jia-Jun Li

    Abstract: Collecting mobile datasets remains challenging for academic researchers due to limited data access and technical barriers. Commercial organizations often possess exclusive access to mobile data, leading to a "data monopoly" that restricts the independence of academic research. Existing open-source mobile data collection frameworks primarily focus on mobile sensing data rather than screen content,… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  13. EntangleVR++: Evaluating the Potential of using Entanglement in an Interactive VR Scene Creation System

    Authors: Mengyu Chen, Marko Peljhan, Misha Sra

    Abstract: Interactive digital stories provide a sense of flexibility and freedom to players by allowing them to make choices at key junctions. These choices advance the narrative and determine, to some degree, how the story evolves for that player. As shown in prior work, the ability to control or participate in the construction of the narrative can give the player a high level of agency that results in a s… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Preprint for Frontiers in Virtual Reality, December 2023

    ACM Class: H.5.1

    Journal ref: Front. Virtual Real. 4:1252551 (2023)

  14. ConnectVR: A Trigger-Action Interface for Creating Agent-based Interactive VR Stories

    Authors: Mengyu Chen, Marko Peljhan, Misha Sra

    Abstract: The demand for interactive narratives is growing with increasing popularity of VR and video gaming. This presents an opportunity to create interactive storytelling experiences that allow players to engage with a narrative from a first person perspective, both, immersively in VR and in 3D on a computer. However, for artists and storytellers without programming experience, authoring such experiences… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Preprint for 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)

    ACM Class: H.5.1

    Journal ref: in 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR), Orlando, FL, USA, 2024 pp. 286-297

  15. arXiv:2406.15811  [pdf, other

    cs.CV

    PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud by 2D Inpainting

    Authors: Qiao Yu, Xianzhi Li, Yuan Tang, **feng Xu, Long Hu, Yixue Hao, Min Chen

    Abstract: Reconstructing textured meshes from colored point clouds is an important but challenging task in 3D graphics and vision. Most existing methods predict colors as implicit functions in 3D or UV space, suffering from blurry textures or the lack of generalization capability. Addressing this, we propose PointDreamer, a novel framework for textured mesh reconstruction from colored point cloud. It produc… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  16. arXiv:2406.14593  [pdf, other

    cs.LG

    Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA

    Authors: Hao Mark Chen, Liam Castelli, Martin Ferianc, Hongyu Zhou, Shuanglong Liu, Wayne Luk, Hongxiang Fan

    Abstract: Reliable uncertainty estimation plays a crucial role in various safety-critical applications such as medical diagnosis and autonomous driving. In recent years, Bayesian neural networks (BayesNNs) have gained substantial research and industrial interests due to their capability to make accurate predictions with reliable uncertainty estimation. However, the algorithmic complexity and the resulting h… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.06849

  17. arXiv:2406.14282  [pdf, other

    cs.CL cs.AI

    Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

    Authors: Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, **jie Gu, Jun Zhou, Jeff Z. Pan, Wen Zhang, Huajun Chen

    Abstract: Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress

  18. arXiv:2406.12834  [pdf, other

    cs.CV

    GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

    Authors: Ci-Siang Lin, I-Jieh Liu, Min-Hung Chen, Chien-Yi Wang, Sifei Liu, Yu-Chiang Frank Wang

    Abstract: Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the query sentence throughout the entire video. Most existing methods require end-to-end training with dense mask annotations, which could be computation-consuming and less scalable. In this work, we aim to efficiently adapt foundation segmentation models for addressing RVOS from weak supervision with the proposed… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: CVPR Workshop (CVinW) 2024. Project page: https://jack24658735.github.io/groprompt/

  19. arXiv:2406.11839  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    mDPO: Conditional Preference Optimization for Multimodal Large Language Models

    Authors: Fei Wang, Wenxuan Zhou, James Y. Huang, Nan Xu, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the ima… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  20. arXiv:2406.11794  [pdf, other

    cs.LG cs.CL

    DataComp-LM: In search of the next generation of training sets for language models

    Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

    Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.datacomp.ai/dclm/

  21. arXiv:2406.11243  [pdf, other

    cs.CL cs.AI

    FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation

    Authors: Bangzheng Li, Ben Zhou, Xingyu Fu, Fei Wang, Dan Roth, Muhao Chen

    Abstract: Language models have shown impressive in-context-learning capabilities, which allow them to benefit from input prompts and perform better on downstream end tasks. Existing works investigate the mechanisms behind this observation, and propose label-agnostic prompt metrics that can better estimate end-task performances. One popular approach is using perplexity as a way to measure models' familiarity… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  22. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  23. arXiv:2406.09843  [pdf, other

    cs.SE

    An Exploratory Study on Using Large Language Models for Mutation Testing

    Authors: Bo Wang, Mingda Chen, Youfang Lin, Mike Papadakis, Jie M. Zhang

    Abstract: The question of how to generate high-utility mutations, to be used for testing purposes, forms a key challenge in mutation testing literature. %Existing approaches rely either on human-specified syntactic rules or learning-based approaches, all of which produce large numbers of redundant mutants. Large Language Models (LLMs) have shown great potential in code-related tasks but their utility in mut… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 3 figures

    ACM Class: D.2.5

  24. arXiv:2406.09710  [pdf, other

    cs.CV cs.AI

    Fine-Grained Urban Flow Inference with Multi-scale Representation Learning

    Authors: Shilu Yuan, Dongfeng Li, Wei Liu, Xinxin Zhang, Meng Chen, Junjie Zhang, Yongshun Gong

    Abstract: Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between differe… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  25. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  26. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  27. arXiv:2406.08113  [pdf, other

    cs.CV cs.RO

    Valeo4Cast: A Modular Approach to End-to-End Forecasting

    Authors: Yihong Xu, Éloi Zablocki, Alexandre Boulch, Gilles Puy, Mickael Chen, Florent Bartoccioni, Nermin Samet, Oriane Siméoni, Spyros Gidaris, Tuan-Hung Vu, Andrei Bursuc, Eduardo Valle, Renaud Marlet, Matthieu Cord

    Abstract: Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the curren… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Winning solution of the Argoverse 2 "Unified Detection, Tracking, and Forecasting" challenge, held at CVPR 2024 WAD

  28. arXiv:2406.08079  [pdf, other

    cs.CV

    A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

    Authors: Lixian Zhang, Yi Zhao, Runmin Dong, **xiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu

    Abstract: Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limita… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  29. arXiv:2406.07547  [pdf, other

    cs.CV

    Zero-shot Image Editing with Reference Imitation

    Authors: Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

    Abstract: Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like. In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. Concretely, to edit an image region of interest, users are free to dire… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: https://xavierchen34.github.io/MimicBrush-Page

  30. arXiv:2406.07162  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

    Authors: Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain

    Abstract: Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers nu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. GitHub Repository: https://github.com/emo-box/EmoBox

  31. arXiv:2406.07032  [pdf, other

    cs.CV

    RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks

    Authors: Zhechao Wang, Peirui Cheng, Pengju Tian, Yuchao Wang, Mingxin Chen, Shu**g Duan, Zhirui Wang, Xinming Li, Xian Sun

    Abstract: Remote sensing lightweight foundation models have achieved notable success in online perception within remote sensing. However, their capabilities are restricted to performing online inference solely based on their own observations and models, thus lacking a comprehensive understanding of large-scale remote sensing scenarios. To overcome this limitation, we propose a Remote Sensing Distributed Fou… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  32. arXiv:2406.06600  [pdf, other

    cs.LG cs.AI cs.CL

    HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

    Authors: Yutao Sun, Mingshuai Chen, Kangjia Zhao, He Li, **tao Chen, Linyu Yang, Zhongyi Wang, Tiancheng Zhao, Jianwei Yin

    Abstract: Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show how HORAE facilitates an intelligent service regulation pipeline by further exploiting a fine-tuned large language model named HORAE that automates the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  33. arXiv:2406.06477  [pdf, other

    cs.CR cs.LG

    OmniLytics+: A Secure, Efficient, and Affordable Blockchain Data Market for Machine Learning through Off-Chain Processing

    Authors: Songze Li, Mingzhe Liu, Mengqi Chen

    Abstract: The rapid development of large machine learning (ML) models requires a massive amount of training data, resulting in booming demands of data sharing and trading through data markets. Traditional centralized data markets suffer from low level of security, and emerging decentralized platforms are faced with efficiency and privacy challenges. In this paper, we propose OmniLytics+, the first decentral… ▽ More

    Submitted 17 April, 2024; originally announced June 2024.

  34. arXiv:2406.05962  [pdf, other

    cs.DC cs.DB

    Data Caching for Enterprise-Grade Petabyte-Scale OLAP

    Authors: Chunxu Tang, Bin Fan, **g Zhao, Chen Liang, Yi Wang, Beinan Wang, Ziyue Qiu, Lu Qiu, Bowen Ding, Shouzhuo Sun, Saiguang Che, Jiaming Mai, Shouwei Chen, Yu Zhu, Jianjian Xie, Yutian, Sun, Yao Li, Yangjun Zhang, Ke Wang, Mingmin Chen

    Abstract: With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these ch… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to the USENIX Annual Technical Conference (USENIX ATC) 2024

  35. arXiv:2406.04941  [pdf, ps, other

    cs.CL

    TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

    Authors: ** Yu, Kaitao Song, Fengchen He, Ming Chen, Jianfeng Lu

    Abstract: The recently unprecedented advancements in Large Language Models (LLMs) have propelled the medical community by establishing advanced medical-domain models. However, due to the limited collection of medical datasets, there are only a few comprehensive benchmarks available to gauge progress in this area. In this paper, we introduce a new medical question-answering (QA) dataset that contains massive… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  36. arXiv:2406.04520  [pdf, other

    cs.CL cs.AI

    NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

    Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  37. arXiv:2406.03993  [pdf, other

    cs.CL

    Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing

    Authors: Hadi Askari, Anshuman Chhabra, Muhao Chen, Prasant Mohapatra

    Abstract: Large Language Models (LLMs) have achieved state-of-the-art performance at zero-shot generation of abstractive summaries for given articles. However, little is known about the robustness of such a process of zero-shot summarization. To bridge this gap, we propose relevance paraphrasing, a simple strategy that can be used to measure the robustness of LLMs as summarizers. The relevance paraphrasing… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  38. arXiv:2406.01636  [pdf

    q-bio.QM cs.AI

    COVID-19: post infection implications in different age groups, mechanism, diagnosis, effective prevention, treatment, and recommendations

    Authors: Muhammad Akmal Raheem, Muhammad Ajwad Rahim, Ijaz Gul, Md. Reyad-ul-Ferdous, Liyan Le, Junguo Hui, Shuiwei Xia, Minjiang Chen, Dongmei Yu, Vijay Pandey, Peiwu Qin, Jiansong Ji

    Abstract: SARS-CoV-2, the highly contagious pathogen responsible for the COVID-19 pandemic, has persistent effects that begin four weeks after initial infection and last for an undetermined duration. These chronic effects are more harmful than acute ones. This review explores the long-term impact of the virus on various human organs, including the pulmonary, cardiovascular, neurological, reproductive, gastr… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  39. arXiv:2406.01151  [pdf, other

    cs.AR

    A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

    Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

    Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  40. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  41. arXiv:2406.00222  [pdf, other

    cs.CL cs.AI cs.LG

    Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

    Authors: Maximillian Chen, Ruoxi Sun, Sercan Ö. Arık, Tomas Pfister

    Abstract: Large language models (LLMs) aligned through reinforcement learning from human feedback (RLHF) have quickly become one of the dominant paradigms for building intelligent conversational assistant agents. However, despite their strong performance across many benchmarks, LLM-based agents still lack conversational skills such as disambiguation: when generalized assistants are faced with ambiguity, the… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  42. arXiv:2405.21061  [pdf, other

    cs.LG

    Graph External Attention Enhanced Transformer

    Authors: Jianqing Liang, Min Chen, Jiye Liang

    Abstract: The Transformer architecture has recently gained considerable attention in the field of graph representation learning, as it naturally overcomes several limitations of Graph Neural Networks (GNNs) with customized attention mechanisms or positional and structural encodings. Despite making some progress, existing works tend to overlook external information of graphs, specifically the correlation bet… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: In Proceedings of ICML 2024

  43. arXiv:2405.20064  [pdf, other

    eess.AS cs.SD

    1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

    Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

    Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  44. arXiv:2405.19528  [pdf, other

    cs.RO

    Predicting Long-Term Human Behaviors in Discrete Representations via Physics-Guided Diffusion

    Authors: Zhitian Zhang, Anjian Li, Angelica Lim, Mo Chen

    Abstract: Long-term human trajectory prediction is a challenging yet critical task in robotics and autonomous systems. Prior work that studied how to predict accurate short-term human trajectories with only unimodal features often failed in long-term prediction. Reinforcement learning provides a good solution for learning human long-term behaviors but can suffer from challenges in data efficiency and optimi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  45. arXiv:2405.18628  [pdf, other

    cs.LG cs.CL

    Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

    Authors: Hao Mark Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan

    Abstract: The auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. While recent research has investigated various speculative decoding techniques for multi-token generation, these efforts have primarily focused on improving processing speed such as throughput. Crucially, they often neglect other metrics essential for real-life deployments,… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: The code for this implementation is available at https://github.com/hmarkc/parallel-prompt-decoding

  46. arXiv:2405.18458  [pdf

    cs.LG physics.optics

    Asymmetrical estimator for training grey-box deep photonic neural networks

    Authors: Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng

    Abstract: Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrica… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 17 pages, 5 figures

    MSC Class: 78-05

  47. arXiv:2405.16996  [pdf, other

    cs.CV

    Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

    Authors: Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo han, Ya Zhang, Yanfeng Wang

    Abstract: Noisy correspondence that refers to mismatches in cross-modal data pairs, is prevalent on human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly consider the application of uni-modal noisy label learning without amending the impact on both cross-modal and intra-modal geometrical structures in multimodal learning. Actually, we find that both structures are effective… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures, received by IEEE/CVF Computer Science and Pattern Recognition

  48. arXiv:2405.16791  [pdf, ps, other

    cs.IT eess.SP

    Joint Node Selection and Resource Allocation Optimization for Cooperative Sensing with a Shared Wireless Backhaul

    Authors: Mingxin Chen, Ming-Min Zhao, An Liu, Min Li, Qingjiang Shi

    Abstract: In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multi… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 13 pages, 10 figures

  49. arXiv:2405.16363  [pdf, other

    cs.IR cs.AI

    LLMs for User Interest Exploration in Large-scale Recommendation Systems

    Authors: Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren Han, Shuchao Bi, Lexi Baugher, Ed Chi, Minmin Chen

    Abstract: Traditional recommendation systems are subject to a strong feedback loop by learning from and reinforcing past user-item interactions, which in turn limits the discovery of novel user interests. To address this, we introduce a hybrid hierarchical framework combining Large Language Models (LLMs) and classic recommendation models for user interest exploration. The framework controls the interfacing… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  50. arXiv:2405.16247  [pdf, other

    cs.AI cs.CL

    AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning

    Authors: Minghao Chen, Yihang Li, Yanting Yang, Shiyu Yu, Binbin Lin, Xiaofei He

    Abstract: Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understand… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.