Skip to main content

Showing 1–50 of 394 results for author: Dong, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01518  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision

    Authors: Hao Dong, Eleni Chatzi, Olga Fink

    Abstract: The task of open-set domain generalization (OSDG) involves recognizing novel classes within unseen domains, which becomes more challenging with multiple modalities as input. Existing works have only addressed unimodal OSDG within the meta-learning framework, without considering multimodal scenarios. In this work, we introduce a novel approach to address Multimodal Open-Set Domain Generalization (M… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024, code: https://github.com/donghao51/MOOSA

  2. arXiv:2406.19711  [pdf, other

    cs.LG

    CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems

    Authors: Ziming Zhao, Tiehua Zhang, Zhishu Shen, Hai Dong, Xingjun Ma, Xianhui Liu, Yun Yang

    Abstract: In recent years, the widespread adoption of distributed microservice architectures within the industry has significantly increased the demand for enhanced system availability and robustness. Due to the complex service invocation paths and dependencies at enterprise-level microservice systems, it is challenging to locate the anomalies promptly during service invocations, thus causing intractable is… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.17898  [pdf, other

    cs.RO cs.AI

    Human-centered In-building Embodied Delivery Benchmark

    Authors: Zhuoqun Xu, Yang Liu, Xiaoqi Li, Jiyao Zhang, Hao Dong

    Abstract: Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong **, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, **feng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  5. arXiv:2406.16992  [pdf, other

    cs.LG cs.AI

    Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

    Authors: Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang

    Abstract: Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still su… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  6. arXiv:2406.13642  [pdf, other

    cs.CV

    SpatialBot: Precise Spatial Understanding with Vision Language Models

    Authors: Wenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao

    Abstract: Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI. In this paper, we propose SpatialBot for better spatial understanding by feeding both RGB and depth images. Additionally, we have constructed the SpatialQA dataset, which involves multi-level depth-related… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.13161  [pdf, other

    cs.AI cs.CL cs.LG cs.PL

    APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

    Authors: Honghua Dong, Qidong Su, Yubo Gao, Zhaoyu Li, Yangjun Ruan, Gennady Pekhimenko, Chris J. Maddison, Xujie Si

    Abstract: Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer pr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.11548  [pdf, other

    cs.RO cs.AI cs.CV

    AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation

    Authors: Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Rui** Wang, Hao Dong

    Abstract: The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects.Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly.However, these methods typically focus on high-level planning corrections using an ad… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.07579  [pdf, other

    cs.AI cs.GR cs.LG

    GFPack++: Improving 2D Irregular Packing by Learning Gradient Field with Attention

    Authors: Tianyang Xue, Lin Lu, Yang Liu, Mingdong Wu, Hao Dong, Yanbin Zhang, Renmin Han, Baoquan Chen

    Abstract: 2D irregular packing is a classic combinatorial optimization problem with various applications, such as material utilization and texture atlas generation. This NP-hard problem requires efficient algorithms to optimize space utilization. Conventional numerical methods suffer from slow convergence and high computational cost. Existing learning-based methods, such as the score-based diffusion model,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  10. arXiv:2406.07549  [pdf, other

    cs.RO

    A3VLM: Actionable Articulation-Aware Vision Language Model

    Authors: Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, Hongsheng Li

    Abstract: Vision Language Models (VLMs) have received significant attention in recent years in the robotics community. VLMs are shown to be able to perform complex visual reasoning and scene understanding tasks, which makes them regarded as a potential universal solution for general robotics problems such as manipulation and navigation. However, previous VLMs for robotics such as RT-1, RT-2, and ManipLLM ha… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.04882  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment

    Authors: Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, Hao Dong

    Abstract: Enabling robots to navigate following diverse language instructions in unexplored environments is an attractive goal for human-robot interaction. However, this goal is challenging because different navigation tasks require different strategies. The scarcity of instruction navigation data hinders training an instruction navigation model with varied strategies. Therefore, previous methods are all co… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024

  12. arXiv:2406.04316  [pdf, other

    cs.CV

    Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

    Authors: Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong

    Abstract: 6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.02283  [pdf, other

    cs.RO

    Broadcasting Support Relations Recursively from Local Dynamics for Object Retrieval in Clutters

    Authors: Yitong Li, Ruihai Wu, Haoran Lu, Chuanruo Ning, Yan Shen, Guanqi Zhan, Hao Dong

    Abstract: In our daily life, cluttered objects are everywhere, from scattered stationery and books cluttering the table to bowls and plates filling the kitchen sink. Retrieving a target object from clutters is an essential while challenging skill for robots, for the difficulty of safely manipulating an object without disturbing others, which requires the robot to plan a manipulation sequence and first move… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: RSS 2024

  14. arXiv:2406.01047  [pdf, other

    cs.DC cs.AI cs.LG

    An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

    Authors: Hang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda

    Abstract: Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2406.00439  [pdf, other

    cs.RO cs.CV

    Learning Manipulation by Predicting Interaction

    Authors: Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, ** Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

    Abstract: Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to RSS 2024. Project page: https://github.com/OpenDriveLab/MPI

  16. arXiv:2405.19092  [pdf, other

    cs.CV

    Benchmarking and Improving Detail Image Caption

    Authors: Hongyuan Dong, Jiawen Li, Bohong Wu, Jiacong Wang, Yuan Zhang, Haoyuan Guo

    Abstract: Image captioning has long been regarded as a fundamental task in visual understanding. Recently, however, few large vision-language model (LVLM) research discusses model's image captioning performance because of the outdated short-caption benchmarks and unreliable evaluation metrics. In this work, we propose to benchmark detail image caption task by curating high-quality evaluation datasets annota… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  17. arXiv:2405.17419  [pdf, other

    cs.CV cs.AI cs.LG

    MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

    Authors: Hao Dong, Yue Zhao, Eleni Chatzi, Olga Fink

    Abstract: Detecting out-of-distribution (OOD) samples is important for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. Existing research has mainly focused on unimodal scenarios on image data. However, real-world applications are inherently multimodal, which makes it essential to leverage information from multiple modalities to enhance… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Code and MultiOOD benchmark: https://github.com/donghao51/MultiOOD

  18. arXiv:2405.17158  [pdf, other

    cs.CV

    PatchScaler: An Efficient Patch-Independent Diffusion Model for Super-Resolution

    Authors: Yong Liu, Hang Dong, **shan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, Fei Wang

    Abstract: Diffusion models significantly improve the quality of super-resolved images with their impressive content generation capabilities. However, the huge computational costs limit the applications of these methods.Recent efforts have explored reasonable inference acceleration to reduce the number of sampling steps, but the computational cost remains high as each step is performed on the entire image.Th… ▽ More

    Submitted 11 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  19. arXiv:2405.17147  [pdf, other

    cs.MM

    Large Language Models (LLMs): Deployment, Tokenomics and Sustainability

    Authors: Haiwei Dong, Shuang Xie

    Abstract: The rapid advancement of Large Language Models (LLMs) has significantly impacted human-computer interaction, epitomized by the release of GPT-4o, which introduced comprehensive multi-modality capabilities. In this paper, we first explored the deployment strategies, economic considerations, and sustainability challenges associated with the state-of-the-art LLMs. More specifically, we discussed the… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE CTSoc-NCT

  20. arXiv:2405.16734  [pdf, other

    stat.ML cs.LG

    Faster Sampling via Stochastic Gradient Proximal Sampler

    Authors: Xunpeng Huang, Difan Zou, Yi-An Ma, Hanze Dong, Tong Zhang

    Abstract: Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochasti… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 48 pages, 2 figures, 5 tables

  21. arXiv:2405.16387  [pdf, other

    stat.ML cs.LG

    Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

    Authors: Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang

    Abstract: To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Ga… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 68 pages, 2 figures

  22. arXiv:2405.16234  [pdf, other

    cs.CV

    Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

    Authors: Shiyu Xia, Junyu Xiong, Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Mengyu Zhou, Yeye He, Shi Han, Dongmei Zhang

    Abstract: This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatial perception, and visual format recognition. Additionally, we utilize the spreadsheet table detection task to assess the overall performance of VLMs b… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  23. arXiv:2405.14156  [pdf, other

    cs.CV

    Unveiling the Tapestry of Consistency in Large Vision-Language Models

    Authors: Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

    Abstract: Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs an… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: This project is available at https://github.com/foundation-multimodal-models/ConBench

  24. arXiv:2405.13063  [pdf, other

    physics.ao-ph cs.LG

    Aurora: A Foundation Model of the Atmosphere

    Authors: Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris

    Abstract: Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-sc… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  25. arXiv:2405.10440  [pdf, other

    cs.CL

    Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification

    Authors: **ge Wu, Hang Dong, Zexi Li, Arijit Patra, Honghan Wu

    Abstract: The infrequency and heterogeneity of clinical presentations in rare diseases often lead to underdiagnosis and their exclusion from structured datasets. This necessitates the utilization of unstructured text data for comprehensive analysis. However, the manual identification from clinical reports is an arduous and intrinsically subjective task. This study proposes a novel hybrid approach that syner… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  26. arXiv:2405.08099  [pdf, other

    cs.CL

    KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

    Authors: Mengkang Hu, Haoyu Dong, ** Luo, Shi Han, Dongmei Zhang

    Abstract: Due to the concise and structured nature of tables, the knowledge contained therein may be incomplete or missing, posing a significant challenge for table question answering (TableQA) and data analysis systems. Most existing datasets either fail to address the issue of external knowledge in TableQA or only utilize unstructured text as supplementary information for tables. In this paper, we propose… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: LREC-Coling 2024

  27. arXiv:2405.07863  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    RLHF Workflow: From Reward Modeling to Online RLHF

    Authors: Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang

    Abstract: We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. However, existing open-source RLHF projects are still largely confined to the offline learning setting. In this technical report, we aim to fill i… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  28. arXiv:2405.07759  [pdf, other

    cs.MM cs.AI cs.NI eess.IV

    MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction

    Authors: Haopeng Wang, Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: Over the last few years, 360° video traffic on the network has grown significantly. A key challenge of 360° video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpo… ▽ More

    Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  29. arXiv:2405.06903  [pdf, other

    cs.CV

    UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

    Authors: Ruihai Wu, Haoran Lu, Yiyan Wang, Yubo Wang, Hao Dong

    Abstract: Garment manipulation (e.g., unfolding, folding and hanging clothes) is essential for future robots to accomplish home-assistant tasks, while highly challenging due to the diversity of garment configurations, geometries and deformations. Although able to manipulate similar shaped garments in a certain task, previous works mostly have to design different policies for different tasks, could not gener… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  30. arXiv:2405.05769  [pdf, other

    cs.CV

    Exploring Text-Guided Single Image Editing for Remote Sensing Images

    Authors: Fangzhou Han, Lingyu Si, Hongwei Dong, Lamei Zhang, Hao Chen, Bo Du

    Abstract: Artificial Intelligence Generative Content (AIGC) technologies have significantly influenced the remote sensing domain, particularly in the realm of image generation. However, remote sensing image editing, an equally vital research area, has not garnered sufficient attention. Different from text-guided editing in natural images, which relies on extensive text-image paired data for semantic correla… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  31. arXiv:2405.02369  [pdf, other

    cs.NE cs.AI cs.LG

    No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks

    Authors: Feng-Lei Fan, Meng Wang, Hang-Cheng Dong, Jianwei Ma, Tieyong Zeng

    Abstract: Biologically, the brain does not rely on a single type of neuron that universally functions in all aspects. Instead, it acts as a sophisticated designer of task-based neurons. In this study, we address the following question: since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design? Since methodo… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

  32. arXiv:2404.18356  [pdf, other

    cs.DC

    FEDQ-Trust: Efficient Data-Driven Trust Prediction for Mobile Edge-Based IoT Systems

    Authors: Jiahui Bai, Hai Dong, Athman Bouguettaya

    Abstract: We introduce FEDQ-Trust, an innovative data-driven trust prediction approach designed for mobile edge-based Internet of Things (IoT) environments. The decentralized nature of mobile edge environments introduces challenges due to variations in data distribution, impacting the accuracy and training efficiency of existing distributed data-driven trust prediction models. FEDQ-Trust effectively tackles… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 14 pages, 6 figures, submitted to IEEE Transactions on Services Computing

  33. arXiv:2404.17780  [pdf, other

    cs.MA cs.AI

    Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning

    Authors: Dapeng Li, Hang Dong, Lu Wang, Bo Qiao, Si Qin, Qingwei Lin, Dongmei Zhang, Qi Zhang, Zhiwei Xu, Bin Zhang, Guoliang Fan

    Abstract: In recent years, multi-agent reinforcement learning algorithms have made significant advancements in diverse gaming environments, leading to increased interest in the broader application of such techniques. To address the prevalent challenge of partial observability, communication-based algorithms have improved cooperative performance through the sharing of numerical embedding between agents. Howe… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures

  34. Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360$^\circ$ VR Video Streaming

    Authors: Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: A key challenge of 360$^\circ$ VR video streaming is ensuring high quality with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate streaming to reduce bandwidth consumption, where resources in network nodes are not fully utilized. This article proposes a tile-weighted rate-distortion (TWRD) packet scheduling optimization system to reduce data volume and improve… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Intelligent Systems

  35. arXiv:2404.11336  [pdf, other

    eess.SY cs.CV cs.RO

    Vision-based control for landing an aerial vehicle on a marine vessel

    Authors: Haohua Dong

    Abstract: This work addresses the landing problem of an aerial vehicle, exemplified by a simple quadrotor, on a moving platform using image-based visual servo control. First, the mathematical model of the quadrotor aircraft is introduced, followed by the design of the inner-loop control. At the second stage, the image features on the textured target plane are exploited to derive a vision-based control law.… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  36. arXiv:2404.09957  [pdf, other

    cs.CV cs.LG

    How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model

    Authors: Hanxue Gu, Haoyu Dong, Jichen Yang, Maciej A. Mazurowski

    Abstract: Automated segmentation is a fundamental medical image analysis task, which enjoys significant advances due to the advent of deep learning. While foundation models have been useful in natural language processing and some vision tasks for some time, the foundation model developed with image segmentation in mind - Segment Anything Model (SAM) - has been developed only recently and has shown similar p… ▽ More

    Submitted 13 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Code available at https://github.com/mazurowski-lab/finetune-SAM

  37. arXiv:2404.07318  [pdf, other

    eess.IV cs.CV cs.LG

    Rethinking Perceptual Metrics for Medical Image Translation

    Authors: Nicholas Konz, Yuwen Chen, Hanxue Gu, Haoyu Dong, Maciej A. Mazurowski

    Abstract: Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other hand, task-agnostic metrics are attractive, such as the network feature-based perceptual metrics (e.g., FID) that are common to image translation in gene… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  38. arXiv:2404.04050  [pdf, other

    cs.CV

    No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

    Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

    Abstract: To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' c… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR Highlight. Code is available at https://github.com/yangyangyang127/Seg-NN. arXiv admin note: text overlap with arXiv:2308.12961

  39. arXiv:2404.03634  [pdf, other

    cs.RO cs.CV

    PreAfford: Universal Affordance-Based Pre-Gras** for Diverse Objects and Environments

    Authors: Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Yixin Zhu, Guyue Zhou, Hao Dong, Hao Zhao

    Abstract: Robotic manipulation of ungraspable objects with two-finger grippers presents significant challenges due to the paucity of graspable features, while traditional pre-gras** techniques, which rely on repositioning objects and leveraging external aids like table edges, lack the adaptability across object categories and scenes. Addressing this, we introduce PreAfford, a novel pre-gras** planning f… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://air-discover.github.io/PreAfford/

  40. arXiv:2404.03418  [pdf, ps, other

    cs.LO cs.AI

    Permissible Knowledge Pooling

    Authors: Huimin Dong

    Abstract: Information pooling has been extensively formalised across various logical frameworks in distributed systems, characterized by diverse information-sharing patterns. These approaches generally adopt an intersection perspective, aggregating all possible information, regardless of whether it is known or unknown to the agents. In contrast, this work adopts a unique stance, emphasising that sharing kno… ▽ More

    Submitted 15 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  41. arXiv:2404.01365  [pdf, other

    cs.LG cs.AI cs.CL

    Prompt-prompted Mixture of Experts for Efficient LLM Generation

    Authors: Harry Dong, Beidi Chen, Yuejie Chi

    Abstract: With the development of transformer-based large language models (LLMs), they have been applied to many fields due to their remarkable utility, but this comes at a considerable computational cost at deployment. Fortunately, some methods such as pruning or constructing a mixture of experts (MoE) aim at exploiting sparsity in transformer feedforward (FF) blocks to gain boosts in speed and reduction i… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Revision 1: Updated abstract with code link; re-ran top-k + sampling rows in Table 4, conclusions unchanged

  42. Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External Communication

    Authors: Haoyu Dong, Tram Thi Minh Tran, Rutger Verstegen, Silvia Cazacu, Ruolin Gao, Marius Hoggenmüller, Debargha Dey, Mervyn Franssen, Markus Sasalovici, Pavlo Bazilinskyy, Marieke Martens

    Abstract: Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  43. How to Cache Important Contents for Multi-modal Service in Dynamic Networks: A DRL-based Caching Scheme

    Authors: Zhe Zhang, Marc St-Hilaire, Xin Wei, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: With the continuous evolution of networking technologies, multi-modal services that involve video, audio, and haptic contents are expected to become the dominant multimedia service in the near future. Edge caching is a key technology that can significantly reduce network load and content transmission latency, which is critical for the delivery of multi-modal contents. However, existing caching app… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Journal ref: IEEE Transactions on Multimedia (Early Access), 2024

  44. arXiv:2403.18259  [pdf, other

    cs.RO

    RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

    Authors: Yang Tian, Jiyao Zhang, Guowei Huang, Bin Wang, ** Wang, Jiangmiao Pang, Hao Dong

    Abstract: Estimating robot pose and joint angles is significant in advanced robotics, enabling applications like robot collaboration and online hand-eye calibration.However, the introduction of unknown joint angles makes prediction more complex than simple robot pose estimation, due to its higher dimensionality.Previous methods either regress 3D keypoints directly or utilise a render&compare strategy. These… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA 2024

  45. arXiv:2403.18195  [pdf, other

    cs.RO cs.AI

    SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network

    Authors: Yuxuan Wan, Kaichen Zhou, **hong Chen, Hao Dong

    Abstract: Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a s… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  46. arXiv:2403.15672  [pdf, other

    cs.NI

    Deciphering the Digital Veil: Exploring the Ecosystem of DNS HTTPS Resource Records

    Authors: Hongying Dong, Yizhe Zhang, Hyeonmin Lee, Shumon Huque, Yixin Sun

    Abstract: The DNS HTTPS resource record is a new DNS record type designed for the delivery of configuration information and parameters required to initiate connections to HTTPS network services. It provides the ability to perform zone apex redirection to a third-party provider, which the existing CNAME record cannot do. In addition, it is a key enabler for TLS Encrypted ClientHello (ECH) by providing the cr… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 20 pages

  47. Experimental Studies of Metaverse Streaming

    Authors: Haopeng Wang, Roberto Martinez-Velazquez, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: Metaverse aims to construct a large, unified, immersive, and shared digital realm by combining various technologies, namely XR (extended reality), blockchain, and digital twin, among others. This article explores the Metaverse from the perspective of multimedia communication by conducting and analyzing real-world experiments on four different Metaverse platforms: VR (virtual reality) Vircadia, VR… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Consumer Electronics Magazine

  48. Bringing Robots Home: The Rise of AI Robots in Consumer Electronics

    Authors: Haiwei Dong, Yang Liu, Ted Chu, Abdulmotaleb El Saddik

    Abstract: On March 18, 2024, NVIDIA unveiled Project GR00T, a general-purpose multimodal generative AI model designed specifically for training humanoid robots. Preceding this event, Tesla's unveiling of the Optimus Gen 2 humanoid robot on December 12, 2023, underscored the profound impact robotics is poised to have on resha** various facets of our daily lives. While robots have long dominated industrial… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Consumer Electronics Magazine

  49. arXiv:2403.14204  [pdf, other

    cs.ET

    VL-DNA: Enhance DNA Storage Capacity with Variable Payload (Strand) Lengths

    Authors: Yixun Wei, Wenlong Wang, Huibing Dong, Bingzhe Li, David Du

    Abstract: DNA storage is a promising archival data storage solution to today's big data problem. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. For efficient target data retrieving, existing Polymerase Chain Reaction PCR based DNA storage systems apply primers as specific identifier to tag different set of DNA… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  50. arXiv:2403.13336  [pdf

    cs.RO

    Discretizing SO(2)-Equivariant Features for Robotic Kitting

    Authors: Jiadong Zhou, Yadan Zeng, Huixu Dong, I-Ming Chen

    Abstract: Robotic kitting has attracted considerable attention in logistics and industrial settings. However, existing kitting methods encounter challenges such as low precision and poor efficiency, limiting their widespread applications. To address these issues, we present a novel kitting framework that improves both the precision and computational efficiency of complex kitting tasks. Firstly, our approach… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures