Skip to main content

Showing 1–50 of 430 results for author: Zhao, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18817  [pdf, other

    cs.CV cs.AI

    Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis

    Authors: Mingyang Zhao, **gen Jiang, Lei Ma, Shiqing Xin, Gaofeng Meng, Dong-Ming Yan

    Abstract: This paper presents a novel non-rigid point set registration method that is inspired by unsupervised clustering analysis. Unlike previous approaches that treat the source and target point sets as separate entities, we develop a holistic framework where they are formulated as clustering centroids and clustering members, separately. We then adopt Tikhonov regularization with an $\ell_1$-induced Lapl… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: [CVPR 2024 Highlight] Project and code at: https://github.com/zikai1/CVPR24_PointSetReg

  2. arXiv:2406.17672  [pdf, other

    cs.SD eess.AS

    SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

    Authors: Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of mod… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 6 pages, 8 figures, 8 tables. Audio samples: https://zzaudio.github.io/SpecMaskGIT/index.html

  3. arXiv:2406.16004  [pdf, other

    cs.CV

    RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization

    Authors: Mingshu Zhao, Yi Luo, Yong Ouyang

    Abstract: In the realm of resource-constrained mobile vision tasks, the pursuit of efficiency and performance consistently drives innovation in lightweight Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). While ViTs excel at capturing global context through self-attention mechanisms, their deployment in resource-limited environments is hindered by computational complexity and latency. Co… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Tech report

  4. arXiv:2406.15782  [pdf, other

    cs.SC cs.LO

    A Local Search Algorithm for MaxSMT(LIA)

    Authors: Xiang He, Bohan Li, Mengyu Zhao, Shaowei Cai

    Abstract: MaxSAT modulo theories (MaxSMT) is an important generalization of Satisfiability modulo theories (SMT) with various applications. In this paper, we focus on MaxSMT with the background theory of Linear Integer Arithmetic, denoted as MaxSMT(LIA). We design the first local search algorithm for MaxSMT(LIA) called PairLS, based on the following novel ideas. A novel operator called pairwise operator is… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  5. arXiv:2406.15735  [pdf, other

    cs.CV cs.AI

    Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

    Authors: Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun Zhu

    Abstract: Diffusion models have obtained substantial progress in image-to-video (I2V) generation. However, such models are not fully understood. In this paper, we report a significant but previously overlooked issue in I2V diffusion models (I2V-DMs), namely, conditional image leakage. I2V-DMs tend to over-rely on the conditional image at large time steps, neglecting the crucial task of predicting the clean… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Project page: https://cond-image-leak.github.io/

  6. arXiv:2406.11567  [pdf, other

    cs.CV cs.AI

    Quaternion Generative Adversarial Neural Networks and Applications to Color Image Inpainting

    Authors: Duan Wang, Dandan Zhu, Meixiang Zhao, Zhigang Jia

    Abstract: Color image inpainting is a challenging task in imaging science. The existing method is based on real operation, and the red, green and blue channels of the color image are processed separately, ignoring the correlation between each channel. In order to make full use of the correlation between each channel, this paper proposes a Quaternion Generative Adversarial Neural Network (QGAN) model and rel… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages, 6 figures

  7. arXiv:2406.11228  [pdf, other

    cs.CL

    ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark

    Authors: Hiromi Wakaki, Yuki Mitsufuji, Yoshinori Maeda, Yukiko Nishimura, Silin Gao, Mengjie Zhao, Keiichi Yamada, Antoine Bosselut

    Abstract: We propose a new benchmark, ComperDial, which facilitates the training and evaluation of evaluation metrics for open-domain dialogue systems. ComperDial consists of human-scored responses for 10,395 dialogue turns in 1,485 conversations collected from 99 dialogue agents submitted to the Commonsense Persona-grounded Dialogue (CPD) challenge. As a result, for any dialogue, our benchmark includes mul… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.10957  [pdf, other

    cs.CL

    Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

    Authors: Junru Lu, Jiazheng Li, Siyu An, Meng Zhao, Yulan He, Di Yin, Xing Sun

    Abstract: Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning from Human Feedback (RLHF). Despite its promising efficacy, DPO faces a notable drawback: "verbosity", a common over-optimization phenomenon also observ… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.08305  [pdf, other

    cs.NI eess.SP

    Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization

    Authors: Fengxiao Tang, Xiaonan Wang, Xun Yuan, Linfeng Luo, Ming Zhao, Nei Kato

    Abstract: Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learn… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.08152  [pdf, other

    cs.CV

    CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

    Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jie** Ye

    Abstract: The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two fram… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  11. arXiv:2406.07767  [pdf, other

    cs.RO cs.LG

    Conformalized Teleoperation: Confidently Map** Human Inputs to High-Dimensional Robot Actions

    Authors: Michelle Zhao, Reid Simmons, Henny Admoni, Andrea Bajcsy

    Abstract: Assistive robotic arms often have more degrees-of-freedom than a human teleoperator can control with a low-dimensional input, like a joystick. To overcome this challenge, existing approaches use data-driven methods to learn a map** from low-dimensional human inputs to high-dimensional robot actions. However, determining if such a black-box map** can confidently infer a user's intended high-dim… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  12. arXiv:2406.03694  [pdf, other

    cs.CV cs.IT

    Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

    Authors: Mengyu Zhao, Xi Chen, Xin Yuan, Shirin Jalali

    Abstract: Snapshot compressive imaging (SCI) recovers high-dimensional (3D) data cubes from a single 2D measurement, enabling diverse applications like video and hyperspectral imaging to go beyond standard techniques in terms of acquisition speed and efficiency. In this paper, we focus on SCI recovery algorithms that employ untrained neural networks (UNNs), such as deep image prior (DIP), to model source st… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  13. arXiv:2406.01026  [pdf, other

    cs.CL

    Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

    Authors: Mengge Xue, Zhenyu Hu, Liqun Liu, Kuo Liao, Shuang Li, Honglin Han, Meng Zhao, Chengguo Yin

    Abstract: Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM's performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept at ACL2024 Main

    Journal ref: ACL 2024

  14. arXiv:2406.00347  [pdf, other

    cs.CV

    E$^3$-Net: Efficient E(3)-Equivariant Normal Estimation Network

    Authors: Hanxiao Wang, Mingyang Zhao, Weize Quan, Zhen Chen, Dong-ming Yan, Peter Wonka

    Abstract: Point cloud normal estimation is a fundamental task in 3D geometry processing. While recent learning-based methods achieve notable advancements in normal prediction, they often overlook the critical aspect of equivariance. This results in inefficient learning of symmetric patterns. To address this issue, we propose E3-Net to achieve equivariance for normal estimation. We introduce an efficient ran… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  15. arXiv:2405.19763  [pdf, other

    cs.CL

    Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

    Authors: Kuo Liao, Shuang Li, Meng Zhao, Liqun Liu, Mengge Xue, Zhenyu Hu, Honglin Han, Chengguo Yin

    Abstract: Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks. To address this limitati… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accept at ACL2024 Main

  16. arXiv:2405.19516  [pdf, other

    eess.SP cs.CV cs.LG cs.RO

    Enabling Visual Recognition at Radio Frequency

    Authors: Haowen Lai, Gaoxiang Luo, Yifei Liu, Mingmin Zhao

    Abstract: This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. Pano… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  17. arXiv:2405.16791  [pdf, ps, other

    cs.IT eess.SP

    Joint Node Selection and Resource Allocation Optimization for Cooperative Sensing with a Shared Wireless Backhaul

    Authors: Mingxin Chen, Ming-Min Zhao, An Liu, Min Li, Qingjiang Shi

    Abstract: In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multi… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 13 pages, 10 figures

  18. arXiv:2405.15812  [pdf, other

    q-bio.NC cs.AI

    Pseudo Channel: Time Embedding for Motor Imagery Decoding

    Authors: Zhengqing Miao, Meirong Zhao

    Abstract: Motor imagery (MI) based EEG represents a frontier in enabling direct neural control of external devices and advancing neural rehabilitation. This study introduces a novel time embedding technique, termed traveling-wave based time embedding, utilized as a pseudo channel to enhance the decoding accuracy of MI-EEG signals across various neural network architectures. Unlike traditional neural network… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 13 pages, 5 figures

  19. arXiv:2405.14598  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

    Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

    Abstract: In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation method… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 10 pages

  20. arXiv:2405.14582  [pdf, other

    cs.CV cs.AI

    PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

    Authors: Yong Zhong, Min Zhao, Zebin You, Xiaofeng Yu, Changwang Zhang, Chongxuan Li

    Abstract: In this paper, we introduce PoseCrafter, a one-shot method for personalized video generation following the control of flexible poses. Built upon Stable Diffusion and ControlNet, we carefully design an inference process to produce high-quality videos without the corresponding ground-truth frames. First, we select an appropriate reference frame from the training video and invert it to initialize all… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  21. arXiv:2405.14009  [pdf, other

    cs.DC cs.LG

    SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures

    Authors: Swapnil Gandhi, Mark Zhao, Athinagoras Skiadopoulos, Christos Kozyrakis

    Abstract: Training large Deep Neural Network (DNN) models requires thousands of GPUs for days or weeks at a time. At these scales, failures are frequent and can have a big impact on training throughput. Restoring performance using spare GPU servers becomes increasingly expensive as models grow. SlipStream is a system for efficient DNN training in the presence of failures, without using spare servers. It exp… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  22. arXiv:2405.13238  [pdf

    cs.IR cs.LG

    Enhancing User Interest based on Stream Clustering and Memory Networks in Large-Scale Recommender Systems

    Authors: Peng Liu, Nian Wang, Cong Xu, Ming Zhao, Bin Wang, Yi Ren

    Abstract: Recommender Systems (RSs) provide personalized recommendation service based on user interest, which are widely used in various platforms. However, there are lots of users with sparse interest due to lacking consumption behaviors, which leads to poor recommendation results for them. This problem is widespread in large-scale RSs and is particularly difficult to address. To solve this problem, we pro… ▽ More

    Submitted 26 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  23. arXiv:2405.12114  [pdf, other

    cs.CV math.NA

    A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator

    Authors: Zhigang Jia, Yuelian Xiang, Meixiang Zhao, Tingting Wu, Michael K. Ng

    Abstract: The cross-channel deblurring problem in color image processing is difficult to solve due to the complex coupling and structural blurring of color pixels. Until now, there are few efficient algorithms that can reduce color infection in deblurring process. To solve this challenging problem, we present a novel cross-space total variation (CSTV) regularization model for color image deblurring by intro… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 15pages,10figures

  24. arXiv:2405.11732  [pdf

    cs.CV physics.med-ph

    Quality assurance of organs-at-risk delineation in radiotherapy

    Authors: Yihao Zhao, Cuiyun Yuan, Ying Liang, Yang Li, Chunxia Li, Man Zhao, Jun Hu, Wei Liu, Chenbin Liu

    Abstract: The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning. Automatic segmentation can be used to reduce the physician workload and improve the consistency. However, the quality assurance of the automatic segmentation is still an unmet need in clinical practice. The patient data used in our study was a standardized dataset from AAPM Thoracic Auto-Segmenta… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 14 pages,5 figures, 3 tables

    MSC Class: 68T07 ACM Class: I.4.9

  25. Defect Category Prediction Based on Multi-Source Domain Adaptation

    Authors: Ying Xing, Mengci Zhao, Bin Yang, Yuwei Zhang, Wen** Li, Jiawei Gu, Jun Yuan

    Abstract: In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categor… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 17 pages, in Chinese language, 8 figures (Due to length constraints of the abstract field, please refer to the original PDF file for the full content of abstract.)

    Journal ref: Journal of Software [2024]

  26. arXiv:2405.09871  [pdf, other

    cs.RO

    Servo Integrated Nonlinear Model Predictive Control for Overactuated Tiltable-Quadrotors

    Authors: **jie Li, Junichiro Sugihara, Moju Zhao

    Abstract: Quadrotors are widely employed across various domains, yet the conventional type faces limitations due to underactuation, where attitude control is closely tied to positional adjustments. In contrast, quadrotors equipped with tiltable rotors offer overactuation, empowering them to track both position and attitude trajectories. However, the nonlinear dynamics of the drone body and the sluggish resp… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: This article has been submitted to RA-L

  27. arXiv:2405.04233  [pdf, other

    cs.CV cs.LG

    Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

    Authors: Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, Jun Zhu

    Abstract: We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as un… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Project page at https://www.shengshu-ai.com/vidu

  28. arXiv:2405.02699  [pdf, other

    cs.GT

    Platform Competition in the Autobidding World

    Authors: Gagan Aggarwal, Andres Perlroth, Ariel Schvartzman, Mingfei Zhao

    Abstract: We study the problem of auction design for advertising platforms that face strategic advertisers who are bidding across platforms. Each advertiser's goal is to maximize their total value or conversions while satisfying some constraint(s) across all the platforms they participates in. In this paper, we focus on advertisers with return-over-investment (henceforth, ROI) constraints, i.e. each adverti… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  29. arXiv:2405.01242  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

    Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

    Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  30. arXiv:2405.00452  [pdf, other

    cs.CV

    Predictive Accuracy-Based Active Learning for Medical Image Segmentation

    Authors: Jun Shi, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Hong An, Xudong Xue, Bing Yan

    Abstract: Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in… ▽ More

    Submitted 29 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  31. arXiv:2404.18373  [pdf, other

    cs.NI

    6G comprehensive intelligence: network operations and optimization based on Large Language Models

    Authors: Sifan Long, Fengxiao Tang, Yangfan Li, Tiao Tan, Zhengjie **, Ming Zhao, Nei Kato

    Abstract: The sixth generation mobile communication standard (6G) can promote the development of Industrial Internet and Internet of Things (IoT). To achieve comprehensive intelligent development of the network and provide customers with higher quality personalized services. This paper proposes a network performance optimization and intelligent operation network architecture based on Large Language Model (L… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures, 15 preferences

  32. arXiv:2404.17589  [pdf

    cs.IR cs.LG

    An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

    Authors: Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

    Abstract: As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However,… ▽ More

    Submitted 6 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  33. arXiv:2404.16561  [pdf

    cs.CV

    Research on geometric figure classification algorithm based on Deep Learning

    Authors: Ruiyang Wang, Haonan Wang, Junfeng Sun, Mingjia Zhao, Meng Liu

    Abstract: In recent years, with the rapid development of computer information technology, the development of artificial intelligence has been accelerating. The traditional geometry recognition technology is relatively backward and the recognition rate is low. In the face of massive information database, the traditional algorithm model inevitably has the problems of low recognition accuracy and poor performa… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 6 pages,9 figures

    Report number: ISSN: 2664-9640

    Journal ref: Scientific Journal of Intelligent Systems Research,Volume 4 Issue 6, 2022

  34. arXiv:2404.09832  [pdf, other

    cs.GT cs.LG

    No-Regret Algorithms in non-Truthful Auctions with Budget and ROI Constraints

    Authors: Gagan Aggarwal, Giannis Fikioris, Mingfei Zhao

    Abstract: Advertisers increasingly use automated bidding to optimize their ad campaigns on online advertising platforms. Autobidding optimizes an advertiser's objective subject to various constraints, e.g. average ROI and budget constraints. In this paper, we study the problem of designing online autobidding algorithms to optimize value subject to ROI and budget constraints when the platform is running any… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  35. arXiv:2404.09153  [pdf, other

    cs.RO

    BEATLE - Self-Reconfigurable Aerial Robot: Design, Control and Experimental Validation

    Authors: Junichiro Sugihara, Moju Zhao, Takuzumi Nishio, Kei Okada, Masayuki Inaba

    Abstract: Modular self-reconfigurable robots (MSRRs) offer enhanced task flexibility by constructing various structures suitable for each task. However, conventional terrestrial MSRRs equipped with wheels face critical challenges, including limitations in the size of constructible structures and system robustness due to elevated wrench loads applied to each module. In this work, we introduce an Aerial MSRR… ▽ More

    Submitted 15 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  36. arXiv:2404.04399  [pdf, other

    stat.ML cs.AI cs.LG stat.AP stat.ME

    Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

    Authors: Toru Shirakawa, Yi Li, Yulun Wu, Sky Qiu, Yuxuan Li, Mingduo Zhao, Hiroyasu Iso, Mark van der Laan

    Abstract: We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, f… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  37. arXiv:2404.02304  [pdf, other

    cs.LG cs.AI cs.ET

    Virtual Sensor for Real-Time Bearing Load Prediction Using Heterogeneous Temporal Graph Neural Networks

    Authors: Mengjie Zhao, Cees Taal, Stephan Baggerohr, Olga Fink

    Abstract: Accurate bearing load monitoring is essential for their Prognostics and Health Management (PHM), enabling damage assessment, wear prediction, and proactive maintenance. While bearing sensors are typically placed on the bearing housing, direct load monitoring requires sensors inside the bearing itself. Recently introduced sensor rollers enable direct bearing load monitoring but are constrained by t… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures

  38. arXiv:2404.01817  [pdf, other

    cs.NE

    Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration

    Authors: Lishuang Wang, Mengfei Zhao, Enyu Liu, Kebin Sun, Ran Cheng

    Abstract: The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their topologies and weights. Although its capability across various challenges is evident, the algorithm's computational efficiency remains an impediment, limiting its sc… ▽ More

    Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

  39. arXiv:2404.00795  [pdf, other

    cs.SE

    Towards Practical Requirement Analysis and Verification: A Case Study on Software IP Components in Aerospace Embedded Systems

    Authors: Zhi Ma, Cheng Wen, Jie Su, Ming Zhao, Bin Yu, Xu Lu, Cong Tian

    Abstract: IP-based software design is a crucial research field that aims to improve efficiency and reliability by reusing complex software components known as intellectual property (IP) components. To ensure the reusability of these components, particularly in security-sensitive software systems, it is necessary to analyze the requirements and perform formal verification for each IP component. However, conv… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  40. arXiv:2403.19336  [pdf, other

    cs.CV cs.AI

    IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation

    Authors: Jiacui Huang, Hongtao Zhang, Mingbo Zhao, Zhou Wu

    Abstract: Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings. Recent studies aim to handle this task by constructing the semantic spatial map representation of the environment, and then leveraging the strong ability of reasoning in large language models for generalizing code for guiding the robot… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  41. arXiv:2403.15999  [pdf, ps, other

    stat.ML cs.CR cs.LG

    Near-Optimal differentially private low-rank trace regression with guaranteed private initialization

    Authors: Mengyue Zha

    Abstract: We study differentially private (DP) estimation of a rank-$r$ matrix $M \in \mathbb{R}^{d_1\times d_2}$ under the trace regression model with Gaussian measurement matrices. Theoretically, the sensitivity of non-private spectral initialization is precisely characterized, and the differential-privacy-constrained minimax lower bound for estimating $M$ under the Schatten-$q$ norm is established. Metho… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  42. arXiv:2403.15737  [pdf, other

    cs.CL

    Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning

    Authors: Zhouhang Xie, Bodhisattwa Prasad Majumder, Mengjie Zhao, Yoshinori Maeda, Keiichi Yamada, Hiromi Wakaki, Julian McAuley

    Abstract: We consider the task of building a dialogue system that can motivate users to adopt positive lifestyle changes: Motivational Interviewing. Addressing such a task requires a system that can infer \textit{how} to motivate a user effectively. We propose DIIT, a framework that is capable of learning and applying conversation strategies in the form of natural language inductive rules from expert demons… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  43. arXiv:2403.12853  [pdf, other

    cs.RO cs.AI cs.HC

    RASP: A Drone-based Reconfigurable Actuation and Sensing Platform Towards Ambient Intelligent Systems

    Authors: Minghui Zhao, Junxi Xia, Kaiyuan Hou, Yanchen Liu, Stephen Xia, Xiaofan Jiang

    Abstract: Realizing consumer-grade drones that are as useful as robot vacuums throughout our homes or personal smartphones in our daily lives requires drones to sense, actuate, and respond to general scenarios that may arise. Towards this vision, we propose RASP, a modular and reconfigurable sensing and actuation platform that allows drones to autonomously swap onboard sensors and actuators in only 25 secon… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  44. arXiv:2403.11106  [pdf, other

    cs.LG cs.AI cs.CV

    Self-Supervised Quantization-Aware Knowledge Distillation

    Authors: Kaiqi Zhao, Ming Zhao

    Abstract: Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve competitive performance in creating low-bit deep learning models. However, existing works applying KD to QAT require tedious hyper-parameter tuning to balance the weights of different loss terms, assume the availability of labeled training data, and require complex, computationally intensive training procedur… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  45. arXiv:2403.10873  [pdf, other

    cs.IT eess.SP

    CSI Transfer From Sub-6G to mmWave: Reduced-Overhead Multi-User Hybrid Beamforming

    Authors: Weicao Deng, Min Li, Ming-Min Zhao, Min-Jian Zhao, Osvaldo Simeone

    Abstract: Hybrid beamforming is vital in modern wireless systems, especially for massive MIMO and millimeter-wave deployments, offering efficient directional transmission with reduced hardware complexity. However, effective beamforming in multi-user scenarios relies heavily on accurate channel state information, the acquisition of which often incurs excessive pilot overhead, degrading system performance. To… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 13 pages, 12 figures, submitted

  46. arXiv:2403.06636  [pdf, other

    cs.RO

    Design and Control of Delta: Deformable Multilinked Multirotor with Rolling Locomotion Ability in Terrestrial Domain

    Authors: Kazuki Sugihara, Moju Zhao, Takuzumi Nishio, Kei Okada, Masayuki Inaba

    Abstract: In recent years, multiple types of locomotion methods for robots have been developed and enabled to adapt to multiple domains. In particular, aerial robots are useful for exploration in several situations, taking advantage of its three-dimensional mobility. Moreover, some aerial robots have achieved manipulation tasks in the air. However, energy consumption for flight is large and thus locomotion… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 8 pages, 15 figures

  47. arXiv:2403.06510  [pdf, other

    cs.CV

    Skeleton Supervised Airway Segmentation

    Authors: Mingyue Zhao, Han Li, Li Fan, Shiyuan Liu, Xiaolan Qiu, S. Kevin Zhou

    Abstract: Fully-supervised airway segmentation has accomplished significant triumphs over the years in aiding pre-operative diagnosis and intra-operative navigation. However, full voxel-level annotation constitutes a labor-intensive and time-consuming task, often plagued by issues such as missing branches, branch annotation discontinuity, or erroneous edge delineation. label-efficient solutions for airway e… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  48. arXiv:2402.18211  [pdf, other

    cs.LG cs.CR

    Catastrophic Overfitting: A Potential Blessing in Disguise

    Authors: Mengnan Zhao, Lihe Zhang, Yuqiu Kong, Baocai Yin

    Abstract: Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness. Particularly noteworthy is the challenge posed by catastrophic overfitting (CO) in this field. Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classi… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  49. arXiv:2402.17011  [pdf, other

    cs.CL

    DiffuCOMET: Contextual Commonsense Knowledge Diffusion

    Authors: Silin Gao, Mete Ismayilzada, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut

    Abstract: Inferring contextually-relevant and diverse commonsense to understand narratives remains challenging for knowledge models. In this work, we develop a series of knowledge models, DiffuCOMET, that leverage diffusion to learn to reconstruct the implicit semantic connections between narrative contexts and relevant commonsense knowledge. Across multiple diffusion steps, our method progressively refines… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  50. arXiv:2402.06131  [pdf, other

    cs.RO cs.CV

    PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes

    Authors: Xinggang Hu, Yanmin Wu, Mingyuan Zhao, Linghao Yang, Xiangkui Zhang, Xiangyang Ji

    Abstract: Visual SLAM (Simultaneous Localization and Map**) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and map** in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.