Skip to main content

Showing 201–250 of 13,608 results for author: Wang, Z

.
  1. arXiv:2406.08478  [pdf, other

    cs.CV cs.CL

    What If We Recaption Billions of Web Images with LLaMA-3?

    Authors: Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie

    Abstract: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community eff… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: First five authors contributed equally

  2. arXiv:2406.08343  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

    Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  3. arXiv:2406.08270  [pdf, other

    cs.IR

    Boosting Multimedia Recommendation via Separate Generic and Unique Awareness

    Authors: Zhuangzhuang He, Zihan Wang, Yonghui Yang, Haoyue Bai, Le Wu

    Abstract: Multimedia recommendation, which incorporates various modalities (e.g., images, texts, etc.) into user or item representation to improve recommendation quality, has received widespread attention. Recent methods mainly focus on cross-modal alignment with self-supervised learning to obtain higher quality representation. Despite remarkable performance, we argue that there is still a limitation: compl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.08225  [pdf, ps, other

    hep-ex

    Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

    Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.08214  [pdf, other

    cs.IR

    Graph Bottlenecked Social Recommendation

    Authors: Yonghui Yang, Le Wu, Zihan Wang, Zhuangzhuang He, Richang Hong, Meng Wang

    Abstract: With the emergence of social networks, social recommendation has become an essential technique for personalized services. Recently, graph-based social recommendations have shown promising results by capturing the high-order social influence. Most empirical studies of graph-based social recommendations directly take the observed social networks into formulation, and produce user preferences based o… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  6. arXiv:2406.08181  [pdf, ps, other

    hep-ph

    Systematic analysis of the form factors of $B_c\rightarrowη_c$, $J/ψ$ and corresponding weak decays

    Authors: Guo-Liang Yu, Bin Wu, Jie Lu, Zhi-Gang Wang

    Abstract: The form factors of $B_c\rightarrowη_c$ and $B_c\rightarrow J/ψ$ are analyzed in the framework of three-point QCD sum rules. In these analyses, the contributions of the vacuum condensate terms $\langle g_{s}^{2}GG\rangle$ and $\langle g_{s}^{3}GGGf\rangle$ are considered. In addition, the decay widths and branching ratios of several decay channels are obtained by using the calculated form factors.… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.08157  [pdf

    cond-mat.supr-con

    Superconducting diode effect under time reversal symmetry

    Authors: Fengshuo Liu, Yuki M. Itahashi, Shunta Aoki, Yu Dong, Ziqian Wang, Naoki Ogawa, Toshiya Ideue, Yoshihiro Iwasa

    Abstract: In noncentrosymmetric superconductors, superconducting and normal conductions can interchange based on the current flow direction. This effect is termed a superconducting diode effect (SDE), which is a focal point of recent research. The broken inversion and time reversal symmetry is believed to be the requirements of SDE but their intrinsic role has remained elusive. Here, we report strain-contro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 35 pages, 11 figures

  8. arXiv:2406.08128  [pdf, other

    cs.LG

    Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

    Authors: Zicheng Liu, Siyuan Li, Li Wang, Zedong Wang, Yunfan Liu, Stan Z. Li

    Abstract: To mitigate the computational complexity in the self-attention mechanism on long sequences, linear attention utilizes computation tricks to achieve linear complexity, while state space models (SSMs) popularize a favorable practice of using non-data-dependent memory pattern, i.e., emphasize the near and neglect the distant, to processing sequences. Recent studies have shown the priorities by combin… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: ICML 2024 camera ready

  9. arXiv:2406.08114  [pdf

    cond-mat.mes-hall cond-mat.str-el cond-mat.supr-con

    Massive 1D Dirac Line, Solitons and Reversible Manipulation on the Surface of a Prototype Obstructed Atomic Insulator, Silicon

    Authors: Zhongkai Liu, Peng Deng, Yuanfeng Xu, Haifeng Yang, Ding Pei, Cheng Chen, Shanmei He, Defa Liu, Sung-Kwan Mo, Timur Kim, Cephise Cacho, Hong Yao, Zhi-Da Song, Xi Chen, Zhong Wang, Binghai Yan, Lexian Yang, Bogdan A. Bernevig, Yulin Chen

    Abstract: Topologically trivial insulators can be classified into atomic insulators (AIs) and obstructed atomic insulators (OAIs) depending on whether the Wannier charge centers are localized or not at spatial positions occupied by atoms. An OAI can possess unusual properties such as surface states along certain crystalline surfaces, which advantageously appear in materials with much larger bulk energy gap… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.08112  [pdf, other

    cs.SD cs.AI eess.AS

    Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

    Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

    Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

  11. arXiv:2406.07918  [pdf, other

    eess.IV

    Micro-expression recognition based on depth map to point cloud

    Authors: Ren Zhang, Jianqin Yin, Chao Qi, Zehao Wang, Zhicheng Zhang, Yonghao Dang

    Abstract: Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these ap… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.07870  [pdf, ps, other

    math.OC

    Event-Triggered Optimal Tracking Control for Strict-Feedback Nonlinear Systems With Non-Affine Nonlinear Faults

    Authors: Ling Wang, Xin Wang, Ziming Wang

    Abstract: This article studies the control ideas of the optimal backstep** technique, proposing an event-triggered optimal tracking control scheme for a class of strict-feedback nonlinear systems with non-affine and nonlinear faults. A simplified identifier-critic-actor framework is employed in the reinforcement learning algorithm to achieve optimal control. The identifier estimates the unknown dynamic fu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.07846  [pdf, other

    eess.AS

    DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

    Authors: Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.07806  [pdf, other

    astro-ph.HE astro-ph.SR

    Probing the Shock Breakout Signal of SN 2024ggi from the Transformation of Early Flash Spectroscopy

    Authors: Jujia Zhang, Luc Dessart, Xiaofeng Wang, Qian Zhai, Yi Yang, Li** Li, Han Lin, Giorgio Valerin, Yongzhi Cai, Zhen Guo, Lingzhi Wang, Zeyi Zhao, Zhenyu Wang, Shengyu Yan

    Abstract: We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN 2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar mat… ▽ More

    Submitted 29 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 10 pages and 5 figures in the main text (16 pages and 9 figures in total). Accepted for publication in ApJL

  15. arXiv:2406.07588  [pdf, other

    cs.MM cs.CL

    AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning

    Authors: Jun Gao, Qian Qiao, Ziqiang Cao, Zili Wang, Wenjie Li

    Abstract: In-context learning (ICL) facilitates Large Language Models (LLMs) exhibiting emergent ability on downstream tasks without updating billions of parameters. However, in the area of multi-modal Large Language Models (MLLMs), two problems hinder the application of multi-modal ICL: (1) Most primary MLLMs are only trained on single-image datasets, making them unable to read multi-modal demonstrations.… ▽ More

    Submitted 30 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  16. arXiv:2406.07436  [pdf, other

    cs.PL

    McEval: Massively Multilingual Code Evaluation

    Authors: Linzheng Chai, Shukai Liu, Jian Yang, Yuwei Yin, Ke **, Jiaheng Liu, Tao Sun, Ge Zhang, Changyu Ren, Hongcheng Guo, Zekun Wang, Boyang Wang, Xianjie Wu, Bing Wang, Tongliang Li, Liqun Yang, Sufeng Duan, Zhoujun Li

    Abstract: Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited nu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 22 pages

  17. arXiv:2406.07368  [pdf, other

    cs.CL cs.AI cs.LG

    When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

    Authors: Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan, Lin

    Abstract: Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential soluti… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024; 17 pages; 10 figures; 16 tables

  18. arXiv:2406.07357  [pdf, other

    cs.CC

    PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph Clustering

    Authors: Longlong Lin, Tao Jia, Zeli Wang, ** Zhao, Rong-Hua Li

    Abstract: Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instance… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  19. arXiv:2406.07117  [pdf, other

    cs.AI cs.LG

    Augmenting Offline RL with Unlabeled Data

    Authors: Zhao Wang, Briti Gangopadhyay, Jia-Fong Yeh, Shingo Takamatsu

    Abstract: Recent advancements in offline Reinforcement Learning (Offline RL) have led to an increased focus on methods based on conservative policy updates to address the Out-of-Distribution (OOD) issue. These methods typically involve adding behavior regularization or modifying the critic learning objective, focusing primarily on states or actions with substantial dataset support. However, we challenge thi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  20. arXiv:2406.07110  [pdf, other

    cond-mat.str-el cond-mat.quant-gas quant-ph

    The renormalized classical spin liquid on the ruby lattice

    Authors: Zhenjiu Wang, Lode Pollet

    Abstract: The recent experimental detection of the onset of a dynamically prepared, gapped $Z_2$ quantum spin liquid on the ruby lattice brought the physics of frustrated magnetism and lattice gauge theory to Rydberg tweezer arrays (Semeghini et al, Science 374, 1242 (2021)). The thermodynamic properties of such models remain inadequately addressed, yet knowledge thereof is indispensable if one wants to pre… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures + supplemental

  21. arXiv:2406.07048  [pdf, other

    cs.RO

    GPU-Accelerated Optimization-Based Collision Avoidance

    Authors: Zeming Wu, Zhu** Wang, Hao Zhang

    Abstract: This paper proposes a GPU-accelerated optimization framework for collision avoidance problems where the controlled objects and the obstacles can be modeled as the finite union of convex polyhedra. A novel collision avoidance constraint is proposed based on scale-based collision detection and the strong duality of convex optimization. Under this constraint, the high-dimensional non-convex optimizat… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  22. arXiv:2406.07041  [pdf, other

    cs.LG cs.AI

    Integrating Domain Knowledge for handling Limited Data in Offline RL

    Authors: Briti Gangopadhyay, Zhao Wang, Jia-Fong Yeh, Shingo Takamatsu

    Abstract: With the ability to learn from static datasets, Offline Reinforcement Learning (RL) emerges as a compelling avenue for real-world applications. However, state-of-the-art offline RL algorithms perform sub-optimally when confronted with limited data confined to specific regions within the state space. The performance degradation is attributed to the inability of offline RL algorithms to learn approp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  23. arXiv:2406.07032  [pdf, other

    cs.CV

    RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks

    Authors: Zhechao Wang, Peirui Cheng, Pengju Tian, Yuchao Wang, Mingxin Chen, Shu**g Duan, Zhirui Wang, Xinming Li, Xian Sun

    Abstract: Remote sensing lightweight foundation models have achieved notable success in online perception within remote sensing. However, their capabilities are restricted to performing online inference solely based on their own observations and models, thus lacking a comprehensive understanding of large-scale remote sensing scenarios. To overcome this limitation, we propose a Remote Sensing Distributed Fou… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  24. arXiv:2406.07017  [pdf, other

    cs.LG cs.CL

    MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations

    Authors: Zixiao Wang, **gwei Zhang, Wenqian Zhao, Farzan Farnia, Bei Yu

    Abstract: Few-shot gradient methods have been extensively utilized in existing model pruning methods, where the model weights are regarded as static values and the effects of potential weight perturbations are not considered. However, the widely used large language models (LLMs) have several billion model parameters, which could increase the fragility of few-shot gradient pruning. In this work, we experimen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  25. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin **, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, **g**g Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, **long Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, **gfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  26. arXiv:2406.06956  [pdf, ps, other

    math.DS math.NT

    Arbitrarily slow decay in the logarithmically averaged Sarnak conjecture

    Authors: Amir Algom, Zhiren Wang

    Abstract: In 2017 Tao proposed a variant Sarnak's Möbius disjointness conjecture with logarithmic averaging: For any zero entropy dynamical system $(X,T)$, $\frac{1}{\log N} \sum_{n=1} ^N \frac{f(T^n x) μ(n)}{n}= o(1)$ for every $f\in \mathcal{C}(X)$ and every $x\in X$. We construct examples showing that this $o(1)$ can go to zero arbitrarily slowly. Nonetheless, all of our examples satisfy the conjecture.

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Preprint version, 12 pages. To appear in JMAA

  27. arXiv:2406.06893  [pdf, other

    stat.ML cs.IT cs.LG

    Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

    Authors: Zixuan Wang, Stanley Wei, Daniel Hsu, Jason D. Lee

    Abstract: The transformer architecture has prevailed in various deep learning settings due to its exceptional capabilities to select and compose structural information. Motivated by these capabilities, Sanford et al. proposed the sparse token selection task, in which transformers excel while fully-connected networks (FCNs) fail in the worst case. Building upon that, we strengthen the FCN lower bound to an a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  28. arXiv:2406.06845  [pdf, other

    nucl-th

    Time-dependent Relativistic Hartree-Fock model with spherical symmetry

    Authors: **g Geng, Zhi Heng Wang, Peng Wei Zhao, Yi Fei Niu, Haozhao Liang, Wen Hui Long

    Abstract: This work establishes the time-dependent relativistic Hartree-Fock (TD-RHF) model with spherical symmetry for the first time. The time-dependent integro-differential Dirac equations are solved by expanding Dirac spinors on the spherical Dirac Woods-Saxon (DWS) basis. The numerical verification demonstrates the high conservation qualities for both the total binding energy and the particle number, a… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures

  29. arXiv:2406.06833  [pdf, other

    eess.SY stat.AP

    Data-driven Power Flow Linearization: Simulation

    Authors: Mengshuo Jia, Gabriela Hug, Ning Zhang, Zhaojian Wang, Yi Wang, Chongqing Kang

    Abstract: Building on the theoretical insights of Part I, this paper, as the second part of the tutorial, dives deeper into data-driven power flow linearization (DPFL), focusing on comprehensive numerical testing. The necessity of these simulations stems from the theoretical analysis's inherent limitations, particularly the challenge of identifying the differences in real-world performance among DPFL method… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 26 pages

  30. arXiv:2406.06796  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SP

    FlexLoc: Conditional Neural Networks for Zero-Shot Sensor Perspective Invariance in Object Localization with Distributed Multimodal Sensors

    Authors: Jason Wu, Ziqi Wang, Xiaomin Ouyang, Ho Lyun Jeong, Colin Samplawski, Lance Kaplan, Benjamin Marlin, Mani Srivastava

    Abstract: Localization is a critical technology for various applications ranging from navigation and surveillance to assisted living. Localization systems typically fuse information from sensors viewing the scene from different perspectives to estimate the target location while also employing multiple modalities for enhanced robustness and accuracy. Recently, such systems have employed end-to-end deep neura… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  31. arXiv:2406.06744  [pdf

    cs.LG cs.CR eess.SY

    A Multi-module Robust Method for Transient Stability Assessment against False Label Injection Cyberattacks

    Authors: Hanxuan Wang, Na Lu, Yinhong Liu, Zhuqing Wang, Zixuan Wang

    Abstract: The success of deep learning in transient stability assessment (TSA) heavily relies on high-quality training data. However, the label information in TSA datasets is vulnerable to contamination through false label injection (FLI) cyberattacks, resulting in degraded performance of deep TSA models. To address this challenge, a Multi-Module Robust TSA method (MMR) is proposed to rectify the supervised… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  32. arXiv:2406.06600  [pdf, other

    cs.LG cs.AI cs.CL

    HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

    Authors: Yutao Sun, Mingshuai Chen, Kangjia Zhao, He Li, **tao Chen, Linyu Yang, Zhongyi Wang, Tiancheng Zhao, Jianwei Yin

    Abstract: Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show how HORAE facilitates an intelligent service regulation pipeline by further exploiting a fine-tuned large language model named HORAE that automates the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  33. arXiv:2406.06582  [pdf, ps, other

    cs.CL cs.LG eess.AS

    Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

    Authors: Viet Anh Trinh, Rosy Southwell, Yiwen Guan, Xinlu He, Zhiyong Wang, Jacob Whitehill

    Abstract: Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguistic information that can improve accuracy in a variety of tasks. In this paper, we present a decoder… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  34. arXiv:2406.06382  [pdf, other

    cs.CV cs.CL cs.LG

    Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

    Authors: Yi Gu, Zhendong Wang, Yueqin Yin, Yujia Xie, Mingyuan Zhou

    Abstract: Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory. The Diffusion-DPO technique made initial strides by employing pairwise preference learning in diffusion models tailored for specific text prompts. We introduce Di… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  35. arXiv:2406.06277  [pdf, other

    hep-ex

    Measurement of the branching fractions of $\bar{B}\to D^{(*)} K^- K^{(*)0}_{(S)}$ and $\bar{B}\to D^{(*)}D_s^{-}$ decays at Belle II

    Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer , et al. (382 additional authors not shown)

    Abstract: We present measurements of the branching fractions of eight $\overline B{}^0\to D^{(*)+} K^- K^{(*)0}_{(S)}$, $B^{-}\to D^{(*)0} K^- K^{(*)0}_{(S)}$ decay channels. The results are based on data from SuperKEKB electron-positron collisions at the $Υ(4S)$ resonance collected with the Belle II detector, corresponding to an integrated luminosity of $362~\text{fb}^{-1}$. The event yields are extracted… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Prepared for submission to JHEP. 34 pages, 14 figures

    Report number: Belle II Preprint: 2024-014, KEK Preprint: 2024-8

  36. arXiv:2406.06258  [pdf, other

    cs.CV

    Tuning-Free Visual Customization via View Iterative Self-Attention Control

    Authors: Xiaojie Li, Chenghao Gu, Shuzhao Xie, Yunpeng Bai, Weixiang Zhang, Zhi Wang

    Abstract: Fine-Tuning Diffusion Models enable a wide range of personalized generation and editing applications on diverse visual modalities. While Low-Rank Adaptation (LoRA) accelerates the fine-tuning process, it still requires multiple reference images and time-consuming training, which constrains its scalability for large-scale and real-time applications. In this paper, we propose \textit{View Iterative… ▽ More

    Submitted 10 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Under review

  37. arXiv:2406.06118  [pdf, other

    hep-ex

    Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  38. arXiv:2406.06022  [pdf, other

    cs.LG cs.DC

    GraphStorm: all-in-one graph machine learning framework for industry applications

    Authors: Da Zheng, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang, Zichen Wang, Soji Adeshina, Israt Nisa, Alejandro Mottini, Qingjun Cui, Huzefa Rangwala, Belinda Zeng, Christos Faloutsos, George Karypis

    Abstract: Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perfor… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Journal ref: KDD 2024

  39. arXiv:2406.06007  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

    Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

    Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  40. arXiv:2406.05940  [pdf, other

    cs.SE

    M2CVD: Multi-Model Collaboration for Code Vulnerability Detection

    Authors: Ziliang Wang, Ge Li, Jia Li, Yingfei Xiong, Jia Li, Zhi **

    Abstract: Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization; conversely, code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. To address these challenges, this paper introduces the Multi-Model Collaborativ… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  41. arXiv:2406.05871  [pdf, other

    cs.CV cs.LG

    OmniControlNet: Dual-stage Integration for Conditional Image Generation

    Authors: Yilin Wang, Haiyang Xu, Xiang Zhang, Zeyuan Chen, Zhizhou Sha, Zirui Wang, Zhuowen Tu

    Abstract: We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method and incorporating its individually trained image generation processes into a single model. Despite its tremendous success, the ControlNet of a two-stage pipeline bears limitations in being not self-contained (e.g. calls the external condit… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024 Workshop: Generative Models for Computer Vision

  42. arXiv:2406.05862  [pdf, other

    cs.CL cs.AI cs.CV

    II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

    Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xi** Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

    Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 100 pages, 82 figures, add citations

  43. arXiv:2406.05835  [pdf, other

    cs.CV

    Mamba YOLO: SSMs-Based YOLO For Object Detection

    Authors: Zeyu Wang, Chen Li, Huiying Xu, Xinzhong Zhu

    Abstract: Propelled by the rapid advancement of deep learning technologies, the YOLO series has set a new benchmark for real-time object detectors. Researchers have continuously explored innovative applications of reparameterization, efficient layer aggregation networks, and anchor-free techniques on the foundation of YOLO. To further enhance detection performance, Transformer-based structures have been int… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  44. arXiv:2406.05827  [pdf, ps, other

    hep-ex

    Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

    Abstract: We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  45. arXiv:2406.05823  [pdf

    cond-mat.mtrl-sci cond-mat.str-el

    Manipulating magnetism and transport properties of EuCd$_2$P$_2$ with a low carrier concentration

    Authors: Xiyu Chen, Ziwen Wang, Zhiyu Zhou, Wuzhang Yang, Yi Liu, Jia-Yi Lu, Zhi Ren, Guang-Han Cao, Fazel Tafti, Shuai Dong, Zhi-Cheng Wang

    Abstract: Materials that exhibit strongly coupled magnetic order and electronic properties are crucial for both fundamental research and technological applications. However, finding a material that not only shows remarkable magnetoresistive responses but also has an easily tunable ground state remains a challenge. Here, we report successful manipulation of the magnetic and transport properties of EuCd$_2$P… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  46. arXiv:2406.05819  [pdf, other

    cond-mat.mtrl-sci cond-mat.str-el

    Carrier-induced transition from antiferromagnetic insulator to ferromagnetic metal in the layered phosphide EuZn$_2$P$_2$

    Authors: Xiyu Chen, Wuzhang Yang, Jia-Yi Lu, Zhiyu Zhou, Zhi Ren, Guang-Han Cao, Shuai Dong, Zhi-Cheng Wang

    Abstract: EuZn$_2$P$_2$ was reported to be an insulating antiferromagnet with $T_\mathrm{N}$ of 23.5 K. In this study, single crystals of EuZn$_2$P$_2$ exhibiting metallic behavior and a ferromagnetic order of 72 K ($T_\mathrm{C}$) are successfully synthesized via a salt flux method. The presence of hole carriers induced by the Eu vacancies in the lattice is found to be crucial for the drastic changes in ma… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Physical Review B 109, L180410 (2024)

  47. arXiv:2406.05792  [pdf

    cond-mat.mtrl-sci

    Above room-temperature two-dimensional ferromagnetic half-metals in Mn-based Janus magnets

    Authors: Xiang-Fan Huang, Kang-Jie Li, Zequan Wang, Shi-Bo Zhao, Bing Shen, Zu-Xin Chen, Yusheng Hou

    Abstract: Two-dimensional (2D) ferromagnets and their heterostructures offer fertile grounds for designing fascinating functionalities in ultra-thin spintronic devices. Here, by first-principles calculations, we report the discovery of energetically and thermodynamically stable 2D ferromagnets with very strong inplane magnetic anisotropy in MnXY (X = S, and Se; Y = Cl, Br and I) monolayers. Remarkably, we f… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 16 pages, 4 figures, accepted by Applied Physics Letters

    Journal ref: Appl. Phys. Lett. 124, 252402 (2024)

  48. arXiv:2406.05770  [pdf, other

    hep-ph hep-ex

    LAYCAST: LAYered CAvern Surface Tracker at future electron-positron colliders

    Authors: Ye Lu, Ying-nan Mao, Kechen Wang, Zeren Simon Wang

    Abstract: We propose a detector concept, LAYered CAvern Surface Tracker (LAYCAST), to be installed on the ceiling and the wall of the cavern hosting the main experiment of future electron-positron colliders such as CEPC and FCC-ee. With detailed and realistic considerations of the design of such a new experiment, the proposed detector is dedicated to extending the sensitivity reach of the main detector to v… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 23 pages, 19 figures, 1 table

  49. arXiv:2406.05628  [pdf, other

    cs.LG

    Domain Generalization Guided by Large-Scale Pre-Trained Priors

    Authors: Zongbin Wang, Bin Pan, Shiyu Shen, Tianyang Shi, Zhenwei Shi

    Abstract: Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to m… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  50. arXiv:2406.05616  [pdf, other

    cs.LG

    Domain Agnostic Conditional Invariant Predictions for Domain Generalization

    Authors: Zongbin Wang, Bin Pan, Zhenwei Shi

    Abstract: Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capt… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.