Skip to main content

Showing 1–50 of 11,892 results for author: Li, X

.
  1. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01216  [pdf, other

    cs.RO cs.AI

    Let Hybrid A* Path Planner Obey Traffic Rules: A Deep Reinforcement Learning-Based Planning Framework

    Authors: Xibo Li, Shruti Patel, Christof Büskens

    Abstract: Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.01100  [pdf, other

    cs.CL cs.LG

    Eliminating Position Bias of Language Models: A Mechanistic Approach

    Authors: Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji

    Abstract: Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  4. arXiv:2407.01090  [pdf, other

    eess.IV cs.CV

    Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction

    Authors: Yiqun Lin, Hualiang Wang, Jixiang Chen, Xiaomeng Li

    Abstract: Cone-Beam Computed Tomography (CBCT) is an indispensable technique in medical imaging, yet the associated radiation exposure raises concerns in clinical practice. To mitigate these risks, sparse-view reconstruction has emerged as an essential research direction, aiming to reduce the radiation dose by utilizing fewer projections for CT reconstruction. Although implicit neural representations have b… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024. Project link: https://github.com/xmed-lab/DIF-Gaussian

  5. arXiv:2407.00983  [pdf, other

    cs.CV

    FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models

    Authors: Ruinan **, Zikang Xu, Yuan Zhong, Qiongsong Yao, Qi Dou, S. Kevin Zhou, Xiaoxiao Li

    Abstract: The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmark… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 29 pages, 17 figures

  6. arXiv:2407.00965  [pdf, other

    hep-ex

    Measurement of the integrated luminosity of data samples collected during 2019-2022 by the Belle II experiment

    Authors: The Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, J. K. Ahn, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (382 additional authors not shown)

    Abstract: A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, diga… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

    Report number: Belle II Preprint 2024-019; KEK Preprint 2024-16

  7. Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation

    Authors: Yuting Zhang, Yiqing Wu, Ruidong Han, Ying Sun, Yongchun Zhu, Xiang Li, Wei Lin, Fuzhen Zhuang, Zhulin An, Yongjun Xu

    Abstract: Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winte… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  8. arXiv:2407.00837  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Towards Robust Speech Representation Learning for Thousands of Languages

    Authors: William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, **chuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. We combine 1 millio… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 20 pages

  9. arXiv:2407.00752  [pdf, other

    cs.CV cs.AI

    Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation

    Authors: Peng Huang, Xue Gao, Lihong Huang, **g Jiao, Xiaokang Li, Yuanyuan Wang, Yi Guo

    Abstract: Text-to-image generation has important implications for generation of diverse and controllable images. Several attempts have been made to adapt Stable Diffusion (SD) to the medical domain. However, the large distribution difference between medical reports and natural texts, as well as high computational complexity in common stable diffusion limit the authenticity and feasibility of the generated m… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  10. arXiv:2407.00707  [pdf, other

    physics.chem-ph cond-mat.str-el physics.comp-ph

    Deep learning quantum Monte Carlo for solids

    Authors: Yubing Qian, Xiang Li, Zhe Li, Weiluo Ren, Ji Chen

    Abstract: Deep learning has deeply changed the paradigms of many research fields. At the heart of chemical and physical sciences is the accurate ab initio calculation of many-body wavefunction, which has become one of the most notable examples to demonstrate the power of deep learning in science. In particular, the introduction of deep learning into quantum Monte Carlo (QMC) has significantly advanced the f… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  11. arXiv:2407.00700  [pdf, other

    hep-ph

    Study of $τ^- \to ωπ^- ν_τ$ decay in resonance chiral theory with tensor sources

    Authors: Feng-Zhi Chen, Xin-Qiang Li, Shi-Can Peng, Ya-Dong Yang, Yuan-He Zou

    Abstract: In this work, we make a study of the $τ^- \to ωπ^-ν_τ$ decay in the framework of low-energy effective field theory. The $J^{\mathcal{P}G}$ decompositions of the quark currents and the $ωπ$ final state show that, besides the Standard Model vector interaction, only the non-standard tensor interaction can have a non-zero contribution to the decay. To discuss its effect, a reliable calculation of the… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 27 pages, 4 tables, and 2 figures

  12. arXiv:2407.00625  [pdf, other

    cs.LO

    Nonlinear Craig Interpolant Generation over Unbounded Domains by Separating Semialgebraic Sets

    Authors: Hao Wu, Jie Wang, Bican Xia, Xiakun Li, Naijun Zhan, Ting Gan

    Abstract: Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques. In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essenti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 21 pages (with appendix); accepted by the 26th International Symposium on Formal Methods (FM2024)

  13. arXiv:2407.00496  [pdf, other

    cs.LG cs.AI

    A Two-stage Reinforcement Learning-based Approach for Multi-entity Task Allocation

    Authors: Aicheng Gong, Kai Yang, Jiafei Lyu, Xiu Li

    Abstract: Task allocation is a key combinatorial optimization problem, crucial for modern applications such as multi-robot cooperation and resource scheduling. Decision makers must allocate entities to tasks reasonably across different scenarios. However, traditional methods assume static attributes and numbers of tasks and entities, often relying on dynamic programming and heuristic algorithms for solution… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  14. arXiv:2407.00163  [pdf

    cond-mat.mtrl-sci cond-mat.str-el

    Pressure Tuning the Mixture of Eu$^{2+}$ and Eu$^{3+}$ in Eu$_4$Bi$_6$Se$_{13}$

    Authors: Mingyu Xu, Jose L. Gonzalez Jimenez, Greeshma C. Jose, Artittaya Boonkird, Chengkun Xing, Chelsea Harrod, Xinle Li, Haidong Zhou, Alyssa Gaiser, Xianglin Ke, Wenli Bi, Mingda Li, Weiwei Xie

    Abstract: The investigation of crystallographic, electronic, and magnetic characteristics, especially the mixed valences of Eu$^{2+}$ and Eu$^{3+}$ under pressure of a novel europium-based bismuth selenide compound, Eu$_4$Bi$_6$Se$_{13}$, presented. This new compound adopts a monoclinic crystal structure classified under the P$2_1$/m space group (#11). It exhibits distinctive structural features, including… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 22 pages 8 figures

  15. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  16. arXiv:2407.00128  [pdf, other

    cs.IR cs.AI cs.LG

    When Search Engine Services meet Large Language Models: Visions and Challenges

    Authors: Haoyi Xiong, Jiang Bian, Yuchen Li, Xuhong Li, Mengnan Du, Shuaiqiang Wang, Dawei Yin, Sumi Helal

    Abstract: Combining Large Language Models (LLMs) with search engine services marks a significant shift in the field of services computing, opening up new possibilities to enhance how we search for and retrieve information, understand content, and interact with internet services. This paper conducts an in-depth examination of how integrating LLMs with search engines can mutually benefit both technologies. We… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Under Review

  17. arXiv:2407.00057  [pdf

    physics.app-ph physics.optics

    Radiative Thermal Transistor

    Authors: Yuxuan Li, Yongdi Dang, Shen Zhang, Xinran Li, Yi **, Philippe Ben-Abdallah, Jianbin Xu, Yungui Ma

    Abstract: Develo** thermal analogues of field-effect transistor could open the door to a low-power and even zero-power communication technology working with heat rather than electricity. These solid-sate devices could also find many applications in the field of active thermal management in numerous technologies (microelectronic, building science, energy harvesting,conversion,...). Recent theoretical works… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

    Journal ref: Physical Review Applied 20, 024061 (2023)

  18. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, **ghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  19. arXiv:2406.20087  [pdf, other

    cs.LG cs.AI cs.CL cs.CY cs.HC

    ProgressGym: Alignment with a Millennium of Moral Progress

    Authors: Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang

    Abstract: Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigat… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  20. arXiv:2406.20085  [pdf, other

    cs.CV

    Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

    Authors: Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen

    Abstract: Diffusion-based models have shown great potential in generating high-quality images with various layouts, which can benefit downstream perception tasks. However, a fully automatic layout generation driven only by language and a suitable metric for measuring multiple generated instances has not been well explored. In this work, we present Auto Cherry-Picker (ACP), a novel framework that generates h… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 19 pages, 7 figures

  21. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  22. arXiv:2406.19905  [pdf, other

    cs.CV

    Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

    Authors: Longrong Yang, Dong Sheng, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

    Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they e… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  23. arXiv:2406.19823  [pdf, ps, other

    math.CO

    Separable integer partition classes and partitions with congruence conditions

    Authors: Thomas Y. He, C. S. Huang, H. X. Li, X. Zhang

    Abstract: In this article, we first investigate the partitions whose parts are congruent to $a$ or $b$ modulo $k$ with the aid of separable integer partition classes with modulus $k$ introduced by Andrews. Then, we introduce the $(k,r)$-overpartitions in which only parts equivalent to $r$ modulo $k$ may be overlined and we will show that the number of $(k,k)$-overpartitions of $n$ equals the number of parti… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  24. arXiv:2406.19389  [pdf, other

    cs.CV

    OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

    Authors: Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shun** Ji, Chen Change Loy, Shuicheng Yan

    Abstract: Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding. However, they lack reasoning abilities and cannot be controlled via text instructions. In contrast, large vision-language multimodal models exhibit powerful vision-based conversation and reasoning capabilities but lack pixel-level understanding and have difficulty accepting visual p… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  25. arXiv:2406.19369  [pdf, other

    cs.CV

    Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

    Authors: Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy

    Abstract: Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifica… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 16 pages; 8 figures

  26. arXiv:2406.19190  [pdf, ps, other

    hep-ex

    Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

    Abstract: Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  27. arXiv:2406.19070  [pdf, other

    cs.CV

    FAGhead: Fully Animate Gaussian Head from Monocular Videos

    Authors: Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan

    Abstract: High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Repre… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  28. arXiv:2406.18856  [pdf, ps, other

    cs.CL cs.AI cs.CE

    FFN: a Fine-grained Chinese-English Financial Domain Parallel Corpus

    Authors: Yuxin Fu, Shi**g Si, Leyi Mai, Xi-ang Li

    Abstract: Large Language Models (LLMs) have stunningly advanced the field of machine translation, though their effectiveness within the financial domain remains largely underexplored. To probe this issue, we constructed a fine-grained Chinese-English parallel corpus of financial news called FFN. We acquired financial news articles spanning between January 1st, 2014, to December 31, 2023, from mainstream med… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: a simplified version of this paper is accepted by International Conference on Asian Language Processing 2024

  29. arXiv:2406.18816  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci cond-mat.stat-mech

    Angle-dependent planar thermal Hall effect by quasi-ballistic phonons in black phosphorus

    Authors: Xiaokang Li, Xiaodong Guo, Zengwei Zhu, Kamran Behnia

    Abstract: The origin of the phonon thermal Hall effect in insulators is a matter of ongoing debate. The large amplitude of the signal in an elemental non-magnetic solid, such as black phosphorus (BP) calls for a minimal mechanism with no role for spin degree of freedom. Here, we show that a longitudinal heat flow generates a transverse temperature gradient in BP even when the magnetic field, the heat curren… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures, Supplemental Materials included

  30. arXiv:2406.18803  [pdf, other

    physics.flu-dyn

    Non-modal growth analysis of high-speed flows over an inclined cone

    Authors: Xi Chen, Bingbing Wan, Guohua Tu, Maochang Duan, Xiaohu Li, Jianqiang Chen

    Abstract: Spatial optimal responses to both inlet disturbances and harmonic external forcing for hypersonic flows over a blunt cone at nonzero angles of attack are obtained by efficiently solving the direct-adjoint equations with a parabolic approach. In either case, the most amplified disturbances initially take the form of localized streamwise vortices on the windward side and will undergo a two-stage evo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  31. arXiv:2406.18679  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization

    Authors: Xiang Li, Vivek Govindan, Rohit Paturi, Sundararajan Srinivasan

    Abstract: End-to-end neural diarization (EEND) models offer significant improvements over traditional embedding-based Speaker Diarization (SD) approaches but falls short on generalizing to long-form audio with large number of speakers. EEND-vector-clustering method mitigates this by combining local EEND with global clustering of speaker embeddings from local windows, but this requires an additional speaker… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  32. arXiv:2406.18610  [pdf, other

    cs.CV

    Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling

    Authors: Haoran Li, Xingjian Li, Jiahua Shi, Huaming Chen, Bo Du, Daisuke Kihara, Johan Barthelemy, Jun Shen, Min Xu

    Abstract: Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology facilitating the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of full… ▽ More

    Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 11 pages

  33. arXiv:2406.18575  [pdf

    cs.CV cs.LG

    Research on Driver Facial Fatigue Detection Based on Yolov8 Model

    Authors: Chang Zhou, Yang Zhao, Shaobo Liu, Yi Zhao, Xingchen Li, Chiyu Cheng

    Abstract: In a society where traffic accidents frequently occur, fatigue driving has emerged as a grave issue. Fatigue driving detection technology, especially those based on the YOLOv8 deep learning model, has seen extensive research and application as an effective preventive measure. This paper discusses in depth the methods and technologies utilized in the YOLOv8 model to detect driver fatigue, elaborate… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by the 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS 2024), 2024 IEEE

  34. arXiv:2406.18351  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control

    Authors: Zifan Liu, Xinran Li, Shibo Chen, Gen Li, Jiashuo Jiang, Jun Zhang

    Abstract: Reinforcement learning (RL) has proven to be well-performed and general-purpose in the inventory control (IC). However, further improvement of RL algorithms in the IC domain is impeded due to two limitations of online experience. First, online experience is expensive to acquire in real-world applications. With the low sample efficiency nature of RL algorithms, it would take extensive time to train… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  35. arXiv:2406.18311  [pdf, other

    cs.LG

    Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields

    Authors: Yixin **, Wen**g Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu, Xingyuan Bu

    Abstract: This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors.… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  36. arXiv:2406.18183  [pdf, other

    hep-ex

    Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

    Abstract: Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 26 pages,5 tables, 4 figures

  37. arXiv:2406.18137  [pdf, ps, other

    stat.ML cs.LG

    Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

    Authors: Dongya Wu, Xin Li

    Abstract: Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the conver… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  38. arXiv:2406.18130  [pdf, other

    cs.ET

    Designing Unit Ising Models for Logic Gate Simulation through Integer Linear Programming

    Authors: Shunsuke Tsukiyama, Koji Nakano, Xiaotian Li, Yasuaki Ito, Takumi Kato, Yuya Kawamata

    Abstract: An Ising model is defined by a quadratic objective function known as the Hamiltonian, composed of spin variables that can take values of either $-1$ or $+1$. The goal is to assign spin values to these variables in a way that minimizes the value of the Hamiltonian. Ising models are instrumental in tackling many combinatorial optimization problems, leading to significant research in develo** solve… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 figures, 2 tables

  39. arXiv:2406.18083  [pdf, other

    hep-ex

    Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

    Abstract: Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 2 figures

  40. arXiv:2406.18048  [pdf, other

    cs.CV

    ScanFormer: Referring Expression Comprehension by Iteratively Scanning

    Authors: Wei Su, Peihan Miao, Huanzhang Dou, Xi Li

    Abstract: Referring Expression Comprehension (REC) aims to localize the target objects specified by free-form natural language descriptions in images. While state-of-the-art methods achieve impressive performance, they perform a dense perception of images, which incorporates redundant visual regions unrelated to linguistic queries, leading to additional computational overhead. This inspires us to explore a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR2024

  41. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, **g Sun, ** Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general pu… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  42. arXiv:2406.17898  [pdf, other

    cs.RO cs.AI

    Human-centered In-building Embodied Delivery Benchmark

    Authors: Zhuoqun Xu, Yang Liu, Xiaoqi Li, Jiyao Zhang, Hao Dong

    Abstract: Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  43. arXiv:2406.17862  [pdf, other

    cs.LO

    ESBMC v7.6: Enhanced Model Checking of C++ Programs with Clang AST

    Authors: Xianzhiyu Li, Kunjian Song, Mikhail R. Gadelha, Franz Brauße, Rafael S. Menezes, Konstantin Korovin, Lucas C. Cordeiro

    Abstract: This paper presents Efficient SMT-Based Context-Bounded Model Checker (ESBMC) v7.6, an extended version based on previous work on ESBMC v7.3 by K. Song et al. The v7.3 introduced a new Clang-based C++ front-end to address the challenges posed by modern C++ programs. Although the new front-end has demonstrated significant potential in previous studies, it remains in the developmental stage and lack… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 27 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2308.05649

  44. arXiv:2406.17806  [pdf, other

    cs.CL cs.AI cs.CR cs.CV cs.LG

    MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

    Authors: Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh

    Abstract: Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  45. arXiv:2406.17770  [pdf, other

    cs.CV

    MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

    Authors: Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, Kai Chen, Hua Yang

    Abstract: Multi-modal large language models (MLLMs) have made significant strides in various visual understanding tasks. However, the majority of these models are constrained to process low-resolution images, which limits their effectiveness in perception tasks that necessitate detailed visual information. In our study, we present MG-LLaVA, an innovative MLLM that enhances the model's visual processing capa… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  46. arXiv:2406.17758  [pdf, other

    cs.CV

    MotionBooth: Motion-Aware Customized Text-to-Video Generation

    Authors: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

    Abstract: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance t… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project page at https://jianzongwu.github.io/projects/motionbooth

  47. arXiv:2406.17601  [pdf, other

    cs.CV

    Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

    Authors: Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

    Abstract: Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined cameras. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We in… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/imlixinyang/director3d

  48. arXiv:2406.17452  [pdf, ps, other

    hep-ex

    Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (649 additional authors not shown)

    Abstract: We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  49. arXiv:2406.17297  [pdf, other

    cs.CV cs.AI

    Towards Open-set Camera 3D Object Detection

    Authors: Zhuolin He, Xinrun Li, Heng Gao, Jiachen Tang, Shoumeng Qiu, Wenfu Wang, Lvjian Lu, Xuchong Qiu, Xiangyang Xue, Jian Pu

    Abstract: Traditional camera 3D object detectors are typically trained to recognize a predefined set of known object classes. In real-world scenarios, these detectors may encounter unknown objects outside the training categories and fail to identify them correctly. To address this gap, we present OS-Det3D (Open-set Camera 3D Object Detection), a two-stage training framework enhancing the ability of camera 3… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  50. arXiv:2406.17289  [pdf, other

    cs.IR cs.AI

    Hyperbolic Knowledge Transfer in Cross-Domain Recommendation System

    Authors: Xin Yang, Heng Chang, Zhijian La, **ze Yang, Xingrun Li, Yu Lu, Shuaiqiang Wang, Dawei Yin, Erxue Min

    Abstract: Cross-Domain Recommendation (CDR) seeks to utilize knowledge from different domains to alleviate the problem of data sparsity in the target recommendation domain, and it has been gaining more attention in recent years. Although there have been notable advancements in this area, most current methods represent users and items in Euclidean space, which is not ideal for handling long-tail distributed… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.