Skip to main content

Showing 1–50 of 20,338 results for author: Wang, Y

.
  1. arXiv:2407.01531  [pdf, other

    cs.RO cs.LG

    Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

    Authors: Yixiao Wang, Yifei Zhang, Mingxiao Huo, Ran Tian, Xiang Zhang, Yichen Xie, Chenfeng Xu, Pengliang Ji, Wei Zhan, Mingyu Ding, Masayoshi Tomizuka

    Abstract: The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). B… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01494  [pdf, other

    cs.CV cs.SD eess.AS

    FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

    Authors: Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen

    Abstract: We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project page: https://foleycrafter.github.io/

  3. arXiv:2407.01418  [pdf, other

    cs.RO cs.AI cs.LG

    RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing

    Authors: Bo Ai, Stephen Tian, Haochen Shi, Yixuan Wang, Cheston Tan, Yunzhu Li, Jiajun Wu

    Abstract: Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Robotics: Science and Systems (RSS), 2024. Project page: https://robo-pack.github.io/

    ACM Class: I.2.9; I.2.6; I.2.10

  4. arXiv:2407.01361  [pdf, other

    cs.RO

    Unfolding the Literature: A Review of Robotic Cloth Manipulation

    Authors: Alberta Longhini, Yufei Wang, Irene Garcia-Camacho, David Blanco-Mulero, Marco Moletta, Michael Welle, Guillem Alenyà, Hang Yin, Zackory Erickson, David Held, Júlia Borràs, Danica Kragic

    Abstract: The realm of textiles spans clothing, households, healthcare, sports, and industrial applications. The deformable nature of these objects poses unique challenges that prior work on rigid objects cannot fully address. The increasing interest within the community in textile perception and manipulation has led to new methods that aim to address challenges in modeling, perception, and control, resulti… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 30 pages, 3 figures, 2 tables. Submitted to Annual Review of Control, Robotics, and Autonomous Systems

  5. arXiv:2407.01349  [pdf, other

    cs.CV cs.RO

    PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction

    Authors: Xuan Yu, Yili Liu, Chenrui Han, Sitong Mao, Shunbo Zhou, Rong Xiong, Yiyi Liao, Yue Wang

    Abstract: Panoptic reconstruction is a challenging task in 3D scene understanding. However, most existing methods heavily rely on pre-trained semantic segmentation models and known 3D object bounding boxes for 3D panoptic segmentation, which is not available for in-the-wild scenes. In this paper, we propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. For zero-shot segmentat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.01344  [pdf, other

    math.OC

    Distributionally Robust Performative Optimization

    Authors: Zhuangzhuang Jia, Yijie Wang, Roy Dong, Grani A. Hanasusanto

    Abstract: In this paper, we propose a general distributionally robust framework for performative optimization, where the selected decision can influence the probabilistic distribution of uncertain parameters. Our framework facilitates safe decision-making in scenarios with incomplete information about the underlying decision-dependent distributions, relying instead on accessible reference distributions. To… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.01292  [pdf, other

    cs.RO

    Preserving Relative Localization of FoV-Limited Drone Swarm via Active Mutual Observation

    Authors: Lianjie Guo, Zaitian Gongye, Ziyi Xu, Yingjian Wang, Xin Zhou, **ni Zhou, Fei Gao

    Abstract: Relative state estimation is crucial for vision-based swarms to estimate and compensate for the unavoidable drift of visual odometry. For autonomous drones equipped with the most compact sensor setting -- a stereo camera that provides a limited field of view (FoV), the demand for mutual observation for relative state estimation conflicts with the demand for environment observation. To balance the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024, 8 pages, 10 figures

  8. arXiv:2407.01245  [pdf, other

    cs.AI cs.CY

    SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model

    Authors: Lingyue Fu, Hao Guan, Kounianhua Du, Jianghao Lin, Wei Xia, Weinan Zhang, Ruiming Tang, Yasheng Wang, Yong Yu

    Abstract: Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2407.01209  [pdf, other

    cs.RO

    6-DoF Grasp Detection in Clutter with Enhanced Receptive Field and Graspable Balance Sampling

    Authors: Hanwen Wang, Ying Zhang, Yunlong Wang, Jian Li

    Abstract: 6-DoF grasp detection of small-scale grasps is crucial for robots to perform specific tasks. This paper focuses on enhancing the recognition capability of small-scale gras**, aiming to improve the overall accuracy of gras** prediction results and the generalization ability of the network. We propose an enhanced receptive field method that includes a multi-radii cylinder grou** module and a p… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.01081  [pdf, other

    cs.CV cs.CL

    CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

    Authors: Yuxuan Wang, Yijun Liu, Fei Yu, Chen Huang, Kexin Li, Zhiguo Wan, Wanxiang Che

    Abstract: Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision- Language Understanding Evaluation (CVLUE… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.01014  [pdf, other

    cs.CV

    An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations

    Authors: Weimin Bai, Yifei Wang, Wenzheng Chen, He Sun

    Abstract: Diffusion models excel in solving imaging inverse problems due to their ability to model complex image priors. However, their reliance on large, clean datasets for training limits their practical use where clean data is scarce. In this paper, we propose EMDiffusion, an expectation-maximization (EM) approach to train diffusion models from corrupted observations. Our method alternates between recons… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  12. arXiv:2407.00978  [pdf, other

    cs.AI cs.LG

    Hybrid RAG-empowered Multi-modal LLM for Secure Healthcare Data Management: A Diffusion-based Contract Theory Approach

    Authors: Cheng Su, **bo Wen, Jiawen Kang, Yonghua Wang, Hudan Pan, M. Shamim Hossain

    Abstract: Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amount… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 6 figures

  13. arXiv:2407.00959  [pdf, other

    cs.AI cs.RO

    Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving

    Authors: Ran Tian, Boyi Li, Xinshuo Weng, Yuxiao Chen, Edward Schmerling, Yue Wang, Boris Ivanovic, Marco Pavone

    Abstract: The autonomous driving industry is increasingly adopting end-to-end learning from sensory inputs to minimize human biases in system design. Traditional end-to-end driving models, however, suffer from long-tail events due to rare or unseen inputs within their training distributions. To address this, we propose TOKEN, a novel Multi-Modal Large Language Model (MM-LLM) that tokenizes the world into ob… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  14. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2407.00945  [pdf, other

    cs.LG

    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  16. arXiv:2407.00935  [pdf, other

    cs.LG cs.CL

    Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining

    Authors: Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang

    Abstract: In recent years, the rise of generative self-supervised learning (SSL) paradigms has exhibited impressive performance across visual, language, and multi-modal domains. While the varied designs of generative SSL objectives lead to distinct properties in downstream tasks, a theoretical understanding of these differences remains largely unexplored. In this paper, we establish the first theoretical co… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  17. arXiv:2407.00925  [pdf, other

    cs.MM

    SIDQL: An Efficient Keyframe Extraction and Motion Reconstruction Framework in Motion Capture

    Authors: Xuling Zhang, Ziru Zhang, Yuyang Wang, Lik-hang Lee, Pan Hui

    Abstract: Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  18. arXiv:2407.00754  [pdf, ps, other

    q-bio.MN

    Gene Regulatory Network Inference with Covariance Dynamics

    Authors: Yue Wang, Peng Zheng, Yu-Chen Cheng, Zikun Wang, Aleksandr Aravkin

    Abstract: Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  19. arXiv:2407.00753  [pdf, other

    eess.AS cs.SD

    FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis

    Authors: Yinlin Guo, Yening Lv, **qiao Dou, Yan Zhang, Yuehai Wang

    Abstract: While recent advances in Text-To-Speech synthesis have yielded remarkable improvements in generating high-quality speech, research on lightweight and fast models is limited. This paper introduces FLY-TTS, a new fast, lightweight and high-quality speech synthesis system based on VITS. Specifically, 1) We replace the decoder with ConvNeXt blocks that generate Fourier spectral coefficients followed b… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to Interspeech 2024. 5 pages, 1 figure

  20. arXiv:2407.00752  [pdf, other

    cs.CV cs.AI

    Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation

    Authors: Peng Huang, Xue Gao, Lihong Huang, **g Jiao, Xiaokang Li, Yuanyuan Wang, Yi Guo

    Abstract: Text-to-image generation has important implications for generation of diverse and controllable images. Several attempts have been made to adapt Stable Diffusion (SD) to the medical domain. However, the large distribution difference between medical reports and natural texts, as well as high computational complexity in common stable diffusion limit the authenticity and feasibility of the generated m… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  21. arXiv:2407.00729  [pdf

    cond-mat.mtrl-sci

    Discovering one molecule out of a million: inverse design of molecular hole transporting semiconductors tailored for perovskite solar cells

    Authors: Jianchang Wu, Luca Torresi, ManMan Hu, Patrick Reiser, Jiyun Zhang, Juan S. Rocha-Ortiz, Luyao Wang, Zhiqiang Xie, Kaicheng Zhang, Byung-wook Park, Anastasia Barabash, Yicheng Zhao, Junsheng Luo, Yunuo Wang, Larry Lüer, Lin-Long Deng, Jens A. Hauch, Sang Il Seok, Pascal Friederich, Christoph J. Brabec

    Abstract: The inverse design of tailored organic molecules for specific optoelectronic devices of high complexity holds an enormous potential, but has not yet been realized1,2. The complexity and literally infinite diversity of conjugated molecular structures present both, an unprecedented opportunity for technological breakthroughs as well as an unseen optimization challenge. Current models rely on big dat… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 21 pages, 5 figures

  22. arXiv:2407.00722  [pdf, ps, other

    math.AP math.PR

    Global pathwise solutions of an abstract stochastic equation

    Authors: Y. -X. Lin, Y. -G. Wang

    Abstract: We establish the existence and uniqueness of the maximal pathwise solution for an abstract nonlinear stochastic evolutional equation, which takes the two and three dimensional stochastic Navier-Stokes equations as a typical model, forced by a multiplicative white noise, and show that the pathwise solution exists globally in time in a positive probability when the initial data is sufficiently small… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  23. arXiv:2407.00687  [pdf, ps, other

    cs.LO math.LO

    Field Knowledge as a Dual to Distributed Knowledge: A Characterization by Weighted Modal Logic

    Authors: Xiaolong Liang, Yì N. Wáng

    Abstract: The study of group knowledge concepts such as mutual, common, and distributed knowledge is well established within the discipline of epistemic logic. In this work, we incorporate epistemic abilities of agents to refine the formal definition of distributed knowledge and introduce a formal characterization of field knowledge. We propose that field knowledge serves as a dual to distributed knowledge.… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Journal ref: Liao et al. (eds.) Fourth International Workshop on Logics for New-Generation Artificial Intelligence (LNGAI 2024), pp. 9--31, College Publications, 24 June 2024

  24. arXiv:2407.00678  [pdf, other

    eess.IV cs.CV

    A Review of Image Processing Methods in Prostate Ultrasound

    Authors: Haiqiao Wang, Hong Wu, Zhuoyuan Wang, Peiyan Yue, Dong Ni, Pheng-Ann Heng, Yi Wang

    Abstract: Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in T… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  25. arXiv:2407.00676  [pdf, other

    cs.CV

    Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation

    Authors: Yuchuan Tian, Jianhong Han, Hanting Chen, Yuanyuan Xi, Guoyang Zhang, Jie Hu, Chao Xu, Yunhe Wang

    Abstract: Due to the unaffordable size and intensive computation costs of low-level vision models, All-in-One models that are designed to address a handful of low-level vision tasks simultaneously have been popular. However, existing All-in-One models are limited in terms of the range of tasks and performance. To overcome these limitations, we propose Instruct-IPT -- an All-in-One Image Processing Transform… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 15 pages, 4 figures

  26. arXiv:2407.00656  [pdf, other

    math.NA

    Multiple-GPU accelerated high-order gas-kinetic scheme on three-dimensional unstructured meshes

    Authors: Yuhang Wang, Waixiang Cao, Liang Pan

    Abstract: Recently, successes have been achieved for the high-order gas-kinetic schemes (HGKS) on unstructured meshes for compressible flows. In this paper, to accelerate the computation, HGKS is implemented with the graphical processing unit (GPU) using the compute unified device architecture (CUDA). HGKS on unstructured meshes is a fully explicit scheme, and the acceleration framework can be developed bas… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  27. arXiv:2407.00647  [pdf, other

    cond-mat.mes-hall quant-ph

    Critical fluctuation and noise spectra in two-dimensional Fe$_{3}$GeTe$_{2}$ magnets

    Authors: Yuxin Li, Zhe Ding, Chen Wang, Haoyu Sun, Zhousheng Chen, Pengfei Wang, Ya Wang, Ming Gong, Hualing Zeng, Fazhan Shi, Jiangfeng Du

    Abstract: Critical fluctuations play a fundamental role in determining the spin orders for low-dimensional quantum materials, especially for recently discovered two-dimensional (2D) magnets. Here we employ the quantum decoherence imaging technique utilizing nitrogen-vacancy centers in diamond to explore the critical magnetic fluctuations and the associated temporal spin noise in van der Waals magnet… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  28. arXiv:2407.00631  [pdf, other

    cs.LG cs.AI

    TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

    Authors: **tai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

    Abstract: Clinical trials are pivotal for develo** new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  29. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Gras** in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool gras** remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  30. arXiv:2407.00603  [pdf, other

    cs.CV

    Hierarchical Memory for Long Video QA

    Authors: Yiqin Wang, Haoji Zhang, Yansong Tang, Yong Liu, Jiashi Feng, Jifeng Dai, Xiaojie **

    Abstract: This paper describes our champion solution to the LOVEU Challenge @ CVPR'24, Track 1 (Long Video VQA). Processing long sequences of visual tokens is computationally expensive and memory-intensive, making long video question-answering a challenging task. The key is to compress visual tokens effectively, reducing memory footprint and decoding latency, while preserving the essential information for a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  31. arXiv:2407.00596  [pdf, other

    eess.IV cs.CV

    HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  32. arXiv:2407.00478  [pdf, other

    cs.LG cs.AI

    Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs

    Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

    Abstract: The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for develo** stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural in… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  33. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar , et al. (5 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presen… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  34. arXiv:2407.00458  [pdf, other

    cond-mat.soft

    Condensation and Synchronization in Aligning Chiral Active Matter

    Authors: Yujia Wang, Bruno Ventéjou, Hugues Chaté, Xia-qing Shi

    Abstract: We show that spontaneous density segregation in dense systems of aligning circle swimmers is a condensation phenomenon at odds with the phase separation scenarios usually observed in two-dimensional active matter. The condensates, which take the form of vortices or rotating polar packets, can absorb a finite fraction of the particles in the system, and keep a finite or slowly growing size as their… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 6 pages, 4 figures

  35. arXiv:2407.00456  [pdf, other

    cs.SE cs.AI

    Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

    Authors: Yanlin Wang, Tianyue Jiang, Mingwei Liu, Jiachi Chen, Zibin Zheng

    Abstract: Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style betw… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13pages, 14 figures

  36. arXiv:2407.00397  [pdf, other

    cs.LG stat.ML

    Markovian Gaussian Process: A Universal State-Space Representation for Stationary Temporal Gaussian Process

    Authors: Weihan Li, Yule Wang, Chengrui Li, Anqi Wu

    Abstract: Gaussian Processes (GPs) and Linear Dynamical Systems (LDSs) are essential time series and dynamic system modeling tools. GPs can handle complex, nonlinear dynamics but are computationally demanding, while LDSs offer efficient computation but lack the expressive power of GPs. To combine their benefits, we introduce a universal method that allows an LDS to mirror stationary temporal GPs. This state… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  37. arXiv:2407.00389  [pdf, other

    cs.CV

    Query-Efficient Hard-Label Black-Box Attack against Vision Transformers

    Authors: Chao Zhou, Xiaowen Shi, Yuan-Gen Wang

    Abstract: Recent studies have revealed that vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs). However, directly applying attack methodology on CNNs to ViTs has been demonstrated to be ineffective since the ViTs typically work on patch-wise encoding. This article explores the vulnerability of ViTs against adversarial attacks under a… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  38. arXiv:2407.00347  [pdf, ps, other

    cs.CR cs.SI

    Resource Allocation and Secure Wireless Communication in the Large Model-based Mobile Edge Computing System

    Authors: Zefan Wang, Yitong Wang, Jun Zhao

    Abstract: With the rapid advancement of large models and mobile edge computing, transfer learning, particularly through fine-tuning, has become crucial for adapting models to downstream tasks. Traditionally, this requires users to share their data with model owners for fine-tuning, which is not only costly but also raises significant privacy concerns. Furthermore, fine-tuning large-scale models is computati… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  39. arXiv:2407.00324  [pdf, other

    cs.RO cs.LG

    Revisiting Constant Negative Rewards for Goal-Reaching Tasks in Robot Learning

    Authors: Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jagersand, A. Rupam Mahmood

    Abstract: Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state, called minimum-time tasks.… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: In Proceedings of Reinforcement Learning Conference 2024. For video demo, see https://drive.google.com/file/d/1O8D3oCWq5xf2hi1JOlMBbs6W1ClrvUFb/view?usp=sharing

  40. arXiv:2407.00315  [pdf, other

    cs.CV

    Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

    Authors: Yangzhou Jiang, Yinxin Lin, Yaoming Wang, Teng Li, Bilian Ke, Bingbing Ni

    Abstract: Appearance-based supervised methods with full-face image input have made tremendous advances in recent gaze estimation tasks. However, intensive human annotation requirement inhibits current methods from achieving industrial level accuracy and robustness. Although current unsupervised pre-training frameworks have achieved success in many image recognition tasks, due to the deep coupling between fa… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 12 pages, 6 figures, 7 tables

  41. arXiv:2407.00285  [pdf, other

    physics.atom-ph hep-ex nucl-ex

    Imaging of single barium atoms in a second matrix site in solid xenon for barium tagging in a $^{136}$Xe double beta decay experiment

    Authors: M. Yvaine, D. Fairbank, J. Soderstrom, C. Taylor, J. Stanley, T. Walton, C. Chambers, A. Iverson, W. Fairbank, S. Al Kharusi, A. Amy, E. Angelico, A. Anker, I. J. Arnquist, A. Atencio, J. Bane, V. Belov, E. P. Bernard, T. Bhatta, A. Bolotnikov, J. Breslin, P. A. Breur, J. P. Brodsky, E. Brown, T. Brunner , et al. (112 additional authors not shown)

    Abstract: Neutrinoless double beta decay is one of the most sensitive probes for new physics beyond the Standard Model of particle physics. One of the isotopes under investigation is $^{136}$Xe, which would double beta decay into $^{136}$Ba. Detecting the single $^{136}$Ba daughter provides a sort of ultimate tool in the discrimination against backgrounds. Previous work demonstrated the ability to perform s… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 9 pages, 8 figures

  42. arXiv:2407.00201  [pdf, other

    q-bio.NC cs.LG eess.IV

    Deconvolving Complex Neuronal Networks into Interpretable Task-Specific Connectomes

    Authors: Yifan Wang, Vikram Ravindra, Ananth Grama

    Abstract: Task-specific functional MRI (fMRI) images provide excellent modalities for studying the neuronal basis of cognitive processes. We use fMRI data to formulate and solve the problem of deconvolving task-specific aggregate neuronal networks into a set of basic building blocks called canonical networks, to use these networks for functional characterization, and to characterize the physiological basis… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures

  43. arXiv:2407.00173  [pdf, other

    math.OC

    Approximate Solutions for Multi-Trip Route Planning in Time-Sensitive Situations

    Authors: Bahar Cavdar, Joseph Geunes, Xiaofeng Nie, Yue Wang

    Abstract: We consider emergent situations that require transporting individuals from their locations to a facility using a single capacitated vehicle, where transportation duration has a negative impact on the individuals. A dispatcher determines routes to maximize total satisfaction. We call this problem the Ambulance Bus Routing Problem. We develop efficient approximate policies for the dispatcher to allo… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  44. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  45. arXiv:2407.00118  [pdf, other

    cs.LG cs.AI

    From Efficient Multimodal Models to World Models: A Survey

    Authors: Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang

    Abstract: Multimodal Large Models (MLMs) are becoming a significant research focus, combining powerful large language models with multimodal learning to perform complex tasks across different data modalities. This review explores the latest developments and challenges in MLMs, emphasizing their potential in achieving artificial general intelligence and as a pathway to world models. We provide an overview of… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  46. arXiv:2407.00100  [pdf, other

    cs.LG cs.AI cs.CL

    Enhancing In-Context Learning via Implicit Demonstration Augmentation

    Authors: Xiaoling Zhou, Wei Ye, Yidong Wang, Chaoya Jiang, Zhemg Lee, Rui Xie, Shikun Zhang

    Abstract: The emergence of in-context learning (ICL) enables large pre-trained language models (PLMs) to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from t… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Main 19 pages,10 figures

    ACM Class: I.2.7

  47. arXiv:2407.00056  [pdf, other

    cs.IR cs.AI cs.SI

    MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

    Authors: Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng

    Abstract: Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024

  48. arXiv:2407.00024  [pdf, other

    cs.CV cs.AI cs.MM

    LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

    Authors: Lang He, Kai Chen, Junnan Zhao, Yimeng Wang, Ercheng Pei, Haifeng Chen, Jiewei Jiang, Shiqing Zhang, Jie Zhang, Zhongmin Wang, Tao He, Prayag Tiwari

    Abstract: Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection con… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  49. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, **hong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  50. arXiv:2406.20066  [pdf, other

    cs.CV

    ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction

    Authors: Ding-Jiun Huang, Zi-Ting Chou, Yu-Chiang Frank Wang, Cheng Sun

    Abstract: NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing. On the other hand, single-image super-resolution (SR) aims to enhance… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.