Skip to main content

Showing 101–150 of 9,784 results for author: Li, H

.
  1. arXiv:2406.11496  [pdf, other

    cs.MA

    Decentralized Collaborative Pricing and Shunting for Multiple EV Charging Stations Based on Multi-Agent Reinforcement Learning

    Authors: Tianhao Bu, Hang Li, Guojie Li

    Abstract: The extraordinary electric vehicle (EV) popularization in the recent years has facilitated research studies in alleviating EV energy charging demand. Previous studies primarily focused on the optimizations over charging stations (CS) profit and EV users cost savings through charge/discharge scheduling events. In this work, the random behaviors of EVs are considered, with EV users preferences over… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at PSGEC: PSGEC2024

  2. arXiv:2406.11401  [pdf, other

    eess.AS

    An Exploration of Length Generalization in Transformer-Based Speech Enhancement

    Authors: Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

    Abstract: The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In thi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  3. Parameter effects on the total intensity of H I Lyα line for a modelled coronal mass ejection and its driven shock

    Authors: Beili Ying, Guanglu Shi, Li Feng, Lei Lu, Jianchao Xue, Shuting Li, Weiqun Gan, Hui Li

    Abstract: The combination of the H I Lyα (121.6 nm) line formation mechanism with ultraviolet (UV) Lyα and white-light (WL) observations provides an effective method for determining the electron temperature of coronal mass ejections (CMEs). A key to ensuring the accuracy of this diagnostic technique is the precise calculation of theoretical Lyα intensities. This study performs a modelled CME and its driven… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures, accepted by Solar Physics

  4. arXiv:2406.11149  [pdf, other

    cs.CL cs.CR

    GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory

    Authors: Wei Fan, Haoran Li, Zheye Deng, Weiqi Wang, Yangqiu Song

    Abstract: Privacy issues arise prominently during the inappropriate transmission of information between entities. Existing research primarily studies privacy by exploring various privacy attacks, defenses, and evaluations within narrowly predefined patterns, while neglecting that privacy is not an isolated, context-free concept limited to traditionally sensitive data (e.g., social security numbers), but int… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.10977  [pdf, other

    cs.CL cs.AI

    Toward Optimal LLM Alignments Using Two-Player Games

    Authors: Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuan**g Huang, Hang Li, Yang Liu

    Abstract: The standard Reinforcement Learning from Human Feedback (RLHF) framework primarily focuses on optimizing the performance of large language models using pre-collected prompts. However, collecting prompts that provide comprehensive coverage is both tedious and challenging, and often fails to include scenarios that LLMs need to improve on the most. In this paper, we investigate alignment through the… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Our code is released at https://github.com/ruizheng20/gpo

    MSC Class: 68

  6. arXiv:2406.10885  [pdf, other

    cs.CL

    On the Role of Entity and Event Level Conceptualization in Generalizable Reasoning: A Survey of Tasks, Methods, Applications, and Future Directions

    Authors: Weiqi Wang, Tianqing Fang, Haochen Shi, Baixuan Xu, Wenxuan Ding, Liyu Zhang, Wei Fan, Jiaxin Bai, Haoran Li, Xin Liu, Yangqiu Song

    Abstract: Entity- and event-level conceptualization, as fundamental elements of human cognition, plays a pivotal role in generalizable reasoning. This process involves abstracting specific instances into higher-level concepts and forming abstract knowledge that can be applied in unfamiliar or novel situations, which can enhance models' inferential capabilities and support the effective transfer of knowledge… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.10866  [pdf, ps, other

    math.SG math.DG

    K-contact manifolds with minimal closed Reeb orbits

    Authors: Hui Li

    Abstract: We use the Boothby-Wang fibration to construct certain simply connected K-contact manifolds and we give sufficient and necessary conditions on when such K-contact manifolds are homeomorphic to the odd dimensional spheres. For the symplectic base manifold of the fibration which admits a Hamiltonian torus action, we show that the total space of the fibration admits other than the above K-contact str… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    MSC Class: Primary: 53D10; 53D05; 53D20; Secondary: 55N10; 57R20

  8. arXiv:2406.10844  [pdf, other

    eess.AS cs.SD

    Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

    Authors: Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

    Abstract: Synthesizing speech across different accents while preserving the speaker identity is essential for various real-world customer applications. However, the individual and accurate modeling of accents and speakers in a text-to-speech (TTS) system is challenging due to the complexity of accent variations and the intrinsic entanglement between the accent and speaker identity. In this paper, we present… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  10. arXiv:2406.10667  [pdf, other

    cs.LG

    UniZero: Generalized and Efficient Planning with Scalable Latent World Models

    Authors: Yuan Pu, Yazhe Niu, Jiyuan Ren, Zhenjie Yang, Hongsheng Li, Yu Liu

    Abstract: Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates ra… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 32 pages, 16 figures

  11. CrossFuse: A Novel Cross Attention Mechanism based Infrared and Visible Image Fusion Approach

    Authors: Hui Li, Xiao-Jun Wu

    Abstract: Multimodal visual information fusion aims to integrate the multi-sensor data into a single image which contains more complementary information and less redundant features. However the complementary information is hard to extract, especially for infrared and visible images which contain big similarity gap between these two modalities. The common cross attention modules only consider the correlation… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 11 pages, 16 fuigures

  12. arXiv:2406.10536  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci

    Universal materials model of deep-learning density functional theory Hamiltonian

    Authors: Yuxiang Wang, Yang Li, Zechen Tang, He Li, Zilong Yuan, Honggeng Tao, Nianlong Zou, Ting Bao, Xinghao Liang, Zezhou Chen, Shanghua Xu, Ce Bian, Zhiming Xu, Chong Wang, Chen Si, Wenhui Duan, Yong Xu

    Abstract: Realizing large materials models has emerged as a critical endeavor for materials research in the new era of artificial intelligence, but how to achieve this fantastic and challenging objective remains elusive. Here, we propose a feasible pathway to address this paramount pursuit by develo** universal materials models of deep-learning density functional theory Hamiltonian (DeepH), enabling compu… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  13. arXiv:2406.10527  [pdf, other

    cs.CV

    Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center

    Authors: Zichen Yu, Changyong Shu, Qianpu Sun, Junjie Linghu, Xiaobao Wei, Jiangyong Yu, Zongdai Liu, Dawei Yang, Hui Li, Yan Chen

    Abstract: Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  14. arXiv:2406.10519  [pdf, other

    cs.CV cs.AI

    Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation

    Authors: Pengfei Gu, Yejia Zhang, Huimin Li, Hongxiao Wang, Yizhe Zhang, Chaoli Wang, Danny Z. Chen

    Abstract: Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems. By reconstructing missing pixel/voxel information in visible patches, a ViT encoder can aggregate contextual information for downstream tasks. But, existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  15. arXiv:2406.10504  [pdf, other

    cs.AI cs.CL cs.LG

    Task Facet Learning: A Structured Approach to Prompt Optimization

    Authors: Gurusha Juneja, Nagarajan Natarajan, Hua Li, Jian Jiao, Amit Sharma

    Abstract: Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model (LLM). Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whethe… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  16. arXiv:2406.10501  [pdf, other

    cs.CV

    Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition

    Authors: Weichao Zhao, Wengang Zhou, Hezhen Hu, Min Wang, Houqiang Li

    Abstract: Recently, there have been efforts to improve the performance in sign language recognition by designing self-supervised learning methods. However, these methods capture limited information from sign pose data in a frame-wise learning manner, leading to sub-optimal solutions. To this end, we propose a simple yet effective self-supervised contrastive learning framework to excavate rich context via sp… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted by TIP2023

  17. arXiv:2406.10338  [pdf, other

    astro-ph.GA astro-ph.CO astro-ph.HE

    Bursty Star Formation in Dwarfs is Sensitive to Numerical Choices in Supernova Feedback Models

    Authors: Eric Zhang, Laura V Sales, Federico Marinacci, Paul Torrey, Mark Vogelsberger, Volker Springel, Hui Li, Rüdiger Pakmor, Thales A Gutcke

    Abstract: Simulations of galaxy formation are mostly unable to resolve the energy-conserving phase of individual supernova events, having to resort to subgrid models to distribute the energy and momentum resulting from stellar feedback. However, the properties of these simulated galaxies, including the morphology, stellar mass formed and the burstiness of the star formation history, are highly sensitive to… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted ApJ; 15 pages, 12 figures; comments welcome

  18. arXiv:2406.10118  [pdf, other

    cs.CL

    SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

    Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse , et al. (36 additional authors not shown)

    Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: https://github.com/SEACrowd

  19. arXiv:2406.09902  [pdf

    physics.bio-ph

    Flight-Scope: microscopy with microfluidics in microgravity

    Authors: Thomas Wareing, Alexander Stokes, Katrina Crompton, Koren Murphy, Jack Dawson, Yusuf Furkan Ugurluoglu, Connor Richardson, Hongquan Li, Manu Prakash, Adam J. M. Wollman

    Abstract: With the European Space Agency (ESA) and NASA working to return humans to the moon and onwards to Mars, it has never been more important to study the impact of altered gravity conditions on biological organisms. These include astronauts but also useful micro-organisms they may bring with them to produce food, medicine, and other useful compounds by synthetic biology. Parabolic flights are one of t… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 22 pages, 5 figures

  20. arXiv:2406.09742  [pdf, other

    cs.IR

    IFA: Interaction Fidelity Attention for Entire Lifelong Behaviour Sequence Modeling

    Authors: Wenhui Yu, Chao Feng, Yanze Zhang, Lantao Hu, Peng Jiang, Han Li

    Abstract: The lifelong user behavior sequence provides abundant information of user preference and gains impressive improvement in the recommendation task, however increases computational consumption significantly. To meet the severe latency requirement in online service, a short sub-sequence is sampled based on similarity to the target item. Unfortunately, items not in the sub-sequence are abandoned, leadi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures

  21. arXiv:2406.09679  [pdf, other

    cs.CV

    Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

    Authors: Yuhang Zhou, Zihua Zhao, Haolin Li, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang

    Abstract: Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: ICML2024

  22. arXiv:2406.09475  [pdf, other

    hep-ex

    Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  23. arXiv:2406.09412  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Explore the Limits of Omni-modal Pretraining at Scale

    Authors: Yiyuan Zhang, Handong Li, **g Liu, Xiangyu Yue

    Abstract: We propose to build omni-modal intelligence, which is capable of understanding any modality and learning universal representations. In specific, we propose a scalable pretraining paradigm, named Multimodal Context (MiCo), which can scale up the numbers of modalities and amount of data, together with the model parameters, in the pretraining process. With MiCo, the pretrained models show significant… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project Website: https://invictus717.github.io/MiCo/

  24. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-reso… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper releases a SAI-oriented SGG toolkit with about 30 OBD methods and 10 SGG methods, and develops a benchmark based on RSG where our HOD-Net and RPCM significantly outperform the state-of-the-art methods in both OBD and SGG tasks. The RSG dataset and SAI-oriented toolkit will be made publicly available at https://linlin-dev.github.io/project/RSG

  25. arXiv:2406.09201  [pdf, other

    cs.CV

    Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

    Authors: Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou, Boning Wang, Yansong Peng, Hebei Li

    Abstract: In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including… ▽ More

    Submitted 21 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Journal ref: Second Place in CVPR 2024 Vast Vocabulary Visual Detection Challenge

  26. arXiv:2406.08868  [pdf, other

    cond-mat.quant-gas

    Dissipative Superfluidity in a Molecular Bose-Einstein Condensate

    Authors: Hongchao Li, Xie-Hang Yu, Masaya Nakagawa, Masahito Ueda

    Abstract: Motivated by recent experimental realization of a Bose-Einstein condensate (BEC) of dipolar molecules, we develop superfluid transport theory for a dissipative BEC to show that a weak uniform two-body loss can induce phase rigidity, leading to superfluid transport of bosons. A generalized f-sum rule is shown to hold for a dissipative superfluid as a consequence of weak U(1) symmetry. It is also de… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 6 pages, 1 figure+27 pages 2 figures

  27. arXiv:2406.08801  [pdf, other

    cs.CV

    Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

    Authors: Mingwang Xu, Hui Li, Qingkun Su, Hanlin Shang, Liwei Zhang, Ce Liu, **gdong Wang, Yao Yao, Siyu Zhu

    Abstract: The field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits. This research delves into the complexities of synchronizing facial movements and creating visually appealing, temporally consistent animations within the framework of diffusion-based methodologies. Moving away from traditional paradigms… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 20 pages

  28. arXiv:2406.08755  [pdf, other

    quant-ph

    Solving Fractional Differential Equations on a Quantum Computer: A Variational Approach

    Authors: Fong Yew Leong, Dax Enshan Koh, Jian Feng Kong, Siong Thye Goh, Jun Yong Khoo, Wei-Bin Ewe, Hongying Li, Jayne Thompson, Dario Poletti

    Abstract: We introduce an efficient variational hybrid quantum-classical algorithm designed for solving Caputo time-fractional partial differential equations. Our method employs an iterable cost function incorporating a linear combination of overlap history states. The proposed algorithm is not only efficient in time complexity, but has lower memory costs compared to classical methods. Our results indicate… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  29. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  30. arXiv:2406.08266  [pdf, other

    eess.AS cs.SD

    Refining Self-Supervised Learnt Speech Representation using Brain Activations

    Authors: Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Li** Chen, Jie Zhang, Zhenhua Ling

    Abstract: It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work,… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: accpeted by Interspeech2024

  31. arXiv:2406.08255  [pdf, other

    cs.CL

    M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

    Authors: Benjamin Hsu, Xiaoyu Liu, Huayang Li, Yoshinari Fu**uma, Maria Nadejde, Xing Niu, Yair Kittenplon, Ron Litman, Raghavendra Pappagari

    Abstract: Document translation poses a challenge for Neural Machine Translation (NMT) systems. Most document-level NMT systems rely on meticulously curated sentence-level parallel data, assuming flawless extraction of text from documents along with their precise reading order. These systems also tend to disregard additional visual cues such as the document layout, deeming it irrelevant. However, real-world… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: NAACL 2024, dataset at https://github.com/amazon-science/m3t-multi-modal-translation-bench

  32. arXiv:2406.08225  [pdf, ps, other

    hep-ex

    Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

    Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  33. arXiv:2406.08196  [pdf, other

    cs.SD eess.AS

    FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

    Authors: Yuanjun Lv, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed while achieving comparable audio quality. However, these frequency-domain vocoders suffer from large parameter sizes, thus introducing extra memory bu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 5 figures

  34. arXiv:2406.08173  [pdf, other

    cs.CL

    Semi-Supervised Spoken Language Glossification

    Authors: Huijie Yao, Wengang Zhou, Hao Zhou, Houqiang Li

    Abstract: Spoken language glossification (SLG) aims to translate the spoken language text into the sign language gloss, i.e., a written record of sign language. In this work, we present a framework named $S$emi-$S$upervised $S$poken $L$anguage $G$lossification ($S^3$LG) for SLG. To tackle the bottleneck of limited parallel data in SLG, our $S^3$LG incorporates large-scale monolingual spoken language text in… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL2024 main

  35. arXiv:2406.08151  [pdf, other

    nucl-th

    Unveiling potential neutron halos in intermediate-mass nuclei: an \textit{ab initio} study

    Authors: H. H. Li, J. G. Li, M. R. Xie, W. Zuo

    Abstract: Halos epitomize the fascinating interplay between weak binding, shell evolution, and deformation effects, especially in nuclei near the drip line. In this Letter, we apply the state-of-the-art \textit{ab initio} valence-space in-medium similarity renormalization group approach to predict potential candidates for one- and two-neutron halo in the intermediate-mass region. Notably, we use spectroscop… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  36. arXiv:2406.08027  [pdf, other

    physics.ins-det

    Real-time, chirped-pulse heterodyne detection at room-temperature with 100GHz 3dB-bandwidth mid-infrared quantum-well photodetectors

    Authors: Quyang Lin, Michael Hakl, Sylvie Lepillet, Hua Li, Jean-Francois Lampin, Emilien Peytavit, Stefano Barbieri

    Abstract: Thanks to intrinsically short electronic relaxation on the ps time scale, III-V semiconductor unipolar devices are ideal candidates for ultrahigh-speed operation at mid-infrared frequencies. In this work, antenna-coupled, GaAs-based multi quantum-well photodetectors operating in the 10-11um range are demonstrated, with a responsivity of 0.3A/W and a 3dB-cutoff bandwidth of 100GHz at room-temperatu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  37. arXiv:2406.07988  [pdf, other

    hep-th

    Holographic Superfluid Ring with a Weak Link

    Authors: Zhi-Hong li, Huai-Fan Li

    Abstract: We explore the generation of topological defects in the course of a dynamical phase transition in a ring with a weak link, i.e., a SSS Josephson junction, from the AdS/CFT correspondence. By setting different parameters of the junction (width, steepness, depth) and the final temperature of the quench, the configurations of the charge density and condensate of the order parameters of the dual field… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures

  38. arXiv:2406.07549  [pdf, other

    cs.RO

    A3VLM: Actionable Articulation-Aware Vision Language Model

    Authors: Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, Hongsheng Li

    Abstract: Vision Language Models (VLMs) have received significant attention in recent years in the robotics community. VLMs are shown to be able to perform complex visual reasoning and scene understanding tasks, which makes them regarded as a potential universal solution for general robotics problems such as manipulation and navigation. However, previous VLMs for robotics such as RT-1, RT-2, and ManipLLM ha… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.07499  [pdf, other

    cs.CV cs.GR

    Trim 3D Gaussian Splatting for Accurate Geometry Representation

    Authors: Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang

    Abstract: In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while pre… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project page: https://trimgs.github.io/

  40. arXiv:2406.07422  [pdf, other

    eess.AS

    Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

    Authors: Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

    Abstract: The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  41. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wen**g Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages

  42. arXiv:2406.07341  [pdf, other

    astro-ph.EP astro-ph.IM

    Analytical Delta-V Approximation for Nonlinear Programming of Multi-target Rendezvous and Flyby Trajectories

    Authors: An-Yi Huang, Heng-Nian Li, Ya-Zhong Luo

    Abstract: This study proposes an analytical Delta-V approximation of short-time transfers based on the linear relative motion and a gradient-based nonlinear programming model of multi-target rendezvous and flyby trajectories. In previous studies, the Lambert's solution is commonly used to evaluate Delta-V of short-duration transfers. In this study, to avoid the iteration process for obtaining the Lambert's… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  43. arXiv:2406.07268  [pdf, other

    cs.MM cs.CL cs.CV

    Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation

    Authors: **yuan Li, Ziyan Li, Han Li, Jianfei Yu, Rui Xia, Di Sun, Gang Pan

    Abstract: Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases u… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Extension of our Findings of EMNLP 2023 & ACL 2024 paper

  44. arXiv:2406.07198  [pdf, other

    eess.AS cs.MM

    Target Speech Diarization with Multimodal Prompts

    Authors: Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li

    Abstract: Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a novel Multimodal Target Speech Diarization (MM-TSD) framework, which accommodates diverse and multi-modal prompts to specify target events in a flexib… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  45. arXiv:2406.07119  [pdf, other

    cs.CV cs.AI

    T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

    Authors: Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang

    Abstract: In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. However, existing vector quantization (VQ) methods are fixed-length encodings, overlooking the uneven information density in sign language, which leads to under-encoding… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

  46. arXiv:2406.07076  [pdf, other

    physics.plasm-ph

    Invariant regimes of Spencer scaling law for magnetic compression of rotating FRC plasma

    Authors: Yiming Ma, ** Zhu, Bo Rao, Haolong Li

    Abstract: The scaling laws for the magnetic compression of a toroidally rotating field reversed configuration (FRC) have been investigated in this work. The magnetohydrodynamics (MHD) simulations of the magnetic compression on rotating FRCs employing the NIMROD code [C. R. Sovinec \textit{et al.}, J. Comput. Phys. \textbf{195}, 355 (2004)], are compared with the Spencer's one-dimensional (1D) theory [R. L.… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  47. arXiv:2406.06962  [pdf, other

    cs.CL cs.AI

    Evolving Subnetwork Training for Large Language Models

    Authors: Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu

    Abstract: Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  48. arXiv:2406.06941  [pdf, other

    stat.ME math.ST

    Efficient combination of observational and experimental datasets under general restrictions on outcome mean functions

    Authors: Harrison H. Li

    Abstract: A researcher collecting data from a randomized controlled trial (RCT) often has access to an auxiliary observational dataset that may be confounded or otherwise biased for estimating causal effects. Common modeling assumptions impose restrictions on the outcome mean function - the conditional expectation of the outcome of interest given observed covariates - in the two datasets. Running examples f… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 52 pages, 4 figures

  49. arXiv:2406.06621  [pdf, other

    cs.CL cs.AI cs.LG

    LinkQ: An LLM-Assisted Visual Interface for Knowledge Graph Question-Answering

    Authors: Harry Li, Gabriel Appleby, Ashley Suh

    Abstract: We present LinkQ, a system that leverages a large language model (LLM) to facilitate knowledge graph (KG) query construction through natural language question-answering. Traditional approaches often require detailed knowledge of complex graph querying languages, limiting the ability for users -- even experts -- to acquire valuable insights from KG data. LinkQ simplifies this process by first inter… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  50. arXiv:2406.06600  [pdf, other

    cs.LG cs.AI cs.CL

    HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

    Authors: Yutao Sun, Mingshuai Chen, Kangjia Zhao, He Li, **tao Chen, Linyu Yang, Zhongyi Wang, Tiancheng Zhao, Jianwei Yin

    Abstract: Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show how HORAE facilitates an intelligent service regulation pipeline by further exploiting a fine-tuned large language model named HORAE that automates the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.