Search | arXiv e-print repository

TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jicheng Sun, Cewu Lu, Lin Shao

Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes used as subgoals, we first learn a teacher policy using privileged information. Then, we learn a student policy with point cloud observation by imitating teacher policy. Lastly, our pipeline learns a residual policy when the learned policy is applied to real-world execution, mitigating the Sim2Real gap. We demonstrate the effectiveness of TieBot in simulation and the real world. In the real-world experiment, a dual-arm robot successfully knots a tie, achieving 50% success rate among 10 trials. Videos can be found on our $\href{https://tiebots.github.io/}{\text{website}}$. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: initial commit

arXiv:2407.03169 [pdf, other]

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Authors: Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov, Ruslan Mavlyutov, Sravya Popuri

Abstract: Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded sp… ▽ More Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded speech representation and generate the text translation. Additionally, we investigate the effects of different parameter-efficient fine-tuning techniques and task formulation. Our model achieves state-of-the-art performance on CoVoST 2 and FLEURS among models trained without proprietary data. We also conduct analyses to validate the design choices of our proposed model and bring insights to the integration of LLMs to S2TT. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted to Interspeech 2024

arXiv:2407.03165 [pdf, other]

Consistent Point Orientation for Manifold Surfaces via Boundary Integration

Authors: Weizhou Liu, Xingce Wang, Haichuan Zhao, Xingfei Xue, Zhongke Wu, Xuequan Lu, Ying He

Abstract: This paper introduces a new approach for generating globally consistent normals for point clouds sampled from manifold surfaces. Given that the generalized winding number (GWN) field generated by a point cloud with globally consistent normals is a solution to a PDE with jump boundary conditions and possesses harmonic properties, and the Dirichlet energy of the GWN field can be defined as an integr… ▽ More This paper introduces a new approach for generating globally consistent normals for point clouds sampled from manifold surfaces. Given that the generalized winding number (GWN) field generated by a point cloud with globally consistent normals is a solution to a PDE with jump boundary conditions and possesses harmonic properties, and the Dirichlet energy of the GWN field can be defined as an integral over the boundary surface, we formulate a boundary energy derived from the Dirichlet energy of the GWN. Taking as input a point cloud with randomly oriented normals, we optimize this energy to restore the global harmonicity of the GWN field, thereby recovering the globally consistent normals. Experiments show that our method outperforms state-of-the-art approaches, exhibiting enhanced robustness to noise, outliers, complex topologies, and thin structures. Our code can be found at \url{https://github.com/liuweizhou319/BIM}. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: accepted in siggraph2024

arXiv:2407.03153 [pdf, other]

Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group

Authors: Chris Lin, Mingyu Lu, Chanwoo Kim, Su-In Lee

Abstract: As diffusion models are deployed in real-world settings, data attribution is needed to ensure fair acknowledgment for contributors of high-quality training data and to identify sources of harmful content. Previous work focuses on identifying individual training samples important for the generation of a given image. However, instead of focusing on a given generated image, some use cases require und… ▽ More As diffusion models are deployed in real-world settings, data attribution is needed to ensure fair acknowledgment for contributors of high-quality training data and to identify sources of harmful content. Previous work focuses on identifying individual training samples important for the generation of a given image. However, instead of focusing on a given generated image, some use cases require understanding global properties of the distribution learned by a diffusion model (e.g., demographic diversity). Furthermore, training data for diffusion models are often contributed in groups rather than separately (e.g., multiple artworks from the same artist). Hence, here we tackle the problem of attributing global properties of diffusion models to groups of training data. Specifically, we develop a method to efficiently estimate Shapley values by leveraging model pruning and fine-tuning. We empirically demonstrate the utility of our method with three use cases: (i) global image quality for a DDPM trained on a CIFAR dataset, (ii) demographic diversity for an LDM trained on CelebA-HQ, and (iii) overall aesthetic quality for a Stable Diffusion model LoRA-finetuned on Post-Impressionist artworks. △ Less

Submitted 9 June, 2024; originally announced July 2024.

arXiv:2407.02999 [pdf, other]

doi 10.1103/PhysRevLett.133.016401

Fermi Surface Nesting Driving the RKKY Interaction in the Centrosymmetric Skyrmion Magnet Gd2PdSi3

Authors: Yuyang Dong, Yosuke Arai, Kenta Kuroda, Masayuki Ochi, Natsumi Tanaka, Yuxuan Wan, Matthew D. Watson, Timur K. Kim, Cephise Cacho, Makoto Hashimoto, Donghui Lu, Yuji Aoki, Tatsuma D. Matsuda, Takeshi Kondo

Abstract: The magnetic skyrmions generated in a centrosymmetric crystal were recently first discovered in Gd2PdSi3. In light of this, we observe the electronic structure by angle-resolved photoemission spectroscopy (ARPES) and unveil its direct relationship with the magnetism in this compound. The Fermi surface and band dispersions are demonstrated to have a good agreement with the density functional theory… ▽ More The magnetic skyrmions generated in a centrosymmetric crystal were recently first discovered in Gd2PdSi3. In light of this, we observe the electronic structure by angle-resolved photoemission spectroscopy (ARPES) and unveil its direct relationship with the magnetism in this compound. The Fermi surface and band dispersions are demonstrated to have a good agreement with the density functional theory (DFT) calculations carried out with careful consideration of the crystal superstructure. Most importantly, we find that the three-dimensional Fermi surface has extended nesting which matches well the q-vector of the magnetic order detected by recent scattering measurements. The consistency we find among ARPES, DFT, and the scattering measurements suggests the Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction involving itinerant electrons to be the formation mechanism of skyrmions in Gd2PdSi3. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Journal ref: Phys. Rev. Lett. 133, 016401 (2024)

arXiv:2407.02973 [pdf, other]

NOEMA formIng Cluster survEy (NICE): Characterizing eight massive galaxy groups at $1.5 < z < 4$ in the COSMOS field

Authors: Nikolaj B. Sillassen, Shuowen **, Georgios E. Magdis, Emanuele Daddi, Tao Wang, Shiying Lu, Hanwen Sun, Vinod Arumugam, Daizhong Liu, Malte Brinch, Chiara D'Eugenio, Raphael Gobat, Carlos Gómez-Guijarro, Michael Rich, Eva Schinnerer, Veronica Strazzullo, Qinghua Tan, Francesco Valentino, Yijun Wang, Mengyuan Xiao, Luwenjia Zhou, David Blánquez-Sesé, Zheng Cai, Yanmei Chen, Laure Ciesla , et al. (19 additional authors not shown)

Abstract: The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are c… ▽ More The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are confirmed with ALMA, and one is confirmed by H$α$ from Subaru/FMOS. We constructed the integrated FIR SEDs for the eight groups, obtaining total IR SFR $=260-1300~{\rm M_\odot}$~yr$^{-1}$. We adopted six methods to estimate the dark matter masses, including stellar mass to halo mass relations, overdensity with galaxy bias, and NFW profile fitting to radial stellar mass density. We found the radial stellar mass density are consistent with a NFW profile, supporting that they are collapsed structures hosted by a single dark matter halo. The best halo mass estimates are $\log(M_{\rm h}/{\rm M_\odot})=12.8-13.7$ with uncertainty of 0.3 dex. From halo mass estimates, we derive baryonic accretion rate ${\rm BAR}=(1-8)\times10^{3}\,{\rm M_{\odot}/yr}$ for this sample. We find a quasi-linear correlation between the integrated SFR/BAR and the theoretical halo mass limit for cold streams, $M_{\rm stream}/M_{\rm h}$, with ${\rm SFR/BAR}=10^{-0.46\pm0.22}\left({M_{\rm stream}/M_{\rm h}}\right)^{0.71\pm0.16}$ with a scatter of $0.40\,{\rm dex}$. Further, we compare halo masses and stellar masses with simulations, and find all structures are consistent with being progenitors of $M_{\rm h}(z=0)>10^{14}\,{\rm M_{\odot}}$ galaxy clusters, and the most massive central galaxies have stellar masses consistent with brightest cluster galaxies (BCGs) progenitors in the TNG300 simulation. The results strongly suggest these structures are forming massive galaxy clusters via baryonic and dark matter accretion. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 44 pages (27pp appendix), 32 figures, 18 tables, accepted for publication in A&A

arXiv:2407.02941 [pdf, other]

doi 10.1103/PhysRevB.110.035107

Topological phase in the extended Haldane-Hubbard model with sublattice-dependent repulsion

Authors: Bao-Qing Wang, Can Shao, Takami Tohyama, Hong-Gang Luo, Hantao Lu

Abstract: We study the ground-state phase diagram of the half-filled extended Haldane-Hubbard model on the honeycomb lattice with sublattice-dependent on-site repulsion ($U_{\text{A/B}}$) using the exact diagonalization (ED) and mean-field (MF) methods. The resulting phase diagram shows that there is a topologically nontrivial phase with the Chern number $C=1$, emerging via the development of the imbalance… ▽ More We study the ground-state phase diagram of the half-filled extended Haldane-Hubbard model on the honeycomb lattice with sublattice-dependent on-site repulsion ($U_{\text{A/B}}$) using the exact diagonalization (ED) and mean-field (MF) methods. The resulting phase diagram shows that there is a topologically nontrivial phase with the Chern number $C=1$, emerging via the development of the imbalance between $U_{\text{A}}$ and $U_{\text{B}}$. In this phase, the antiferromagnetic correlations are observed in the ED calculation, in line with the finite antiferromagnetic order obtained by the MF method. The spontaneous symmetry breaking of SU(2) spin rotation in the phase is also identified in the MF level. Distinct from previous studies in which the exotic $C=1$ phase relies on the interplay between sublattice-dependent potentials and electronic interactions, our paper presents an alternative way by solely tuning the on-site interactions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 5 figures

Journal ref: Phys. Rev. B 110, 035107 (2024)

arXiv:2407.02911 [pdf, other]

Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Authors: Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

Abstract: Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec… ▽ More Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the reconstruction of distinct sequences from the common latent space. We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation. Experiments using BraTS2021 dataset show that our non-adversarial model outperforms other GAN-based methods, and VQC latent space aids our model to achieve (1) anti-interference ability, which can eliminate the effects of noise, bias fields, and artifacts, and (2) solid semantic representation ability, with the potential of one-shot segmentation. Our code is publicly available. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02899 [pdf, other]

Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02817 [pdf]

Operando monitoring of strain field distribution in lithium battery anode via ultra-high spatial resolution optical frequency domain reflectometer

Authors: Kaijun Liu, Zhijuan Zou, Guolu Yin, Yingze Song, Zeheng Zhang, Yuyang Lou, Zixuan Zhong, Huafeng Lu, Duidui Li, Tao Zhu

Abstract: The cycling performance of lithium-ion batteries is closely related to the expansion effect of anode materials during charge and discharge processes. Studying the mechanical field evolution of anode materials is crucial for evaluating battery per-formance. Here, we propose a phase-sensitive ultra-high spatial resolution optical frequency domain reflectometry tech-nique, in which the test fiber is… ▽ More The cycling performance of lithium-ion batteries is closely related to the expansion effect of anode materials during charge and discharge processes. Studying the mechanical field evolution of anode materials is crucial for evaluating battery per-formance. Here, we propose a phase-sensitive ultra-high spatial resolution optical frequency domain reflectometry tech-nique, in which the test fiber is embedded into the anode of a lithium-ion battery to monitor the mechanical evolution of the anode material during cycling. We investigated the strain evolution of the anode material under different loading levels and used this method to infer the morphological changes of the material. Furthermore, combining this with battery capacity in-formation provides a new approach for assessing the performance of lithium-ion batteries. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures

arXiv:2407.02788 [pdf, other]

Generalized Gouy Rotation of Electron Vortex beams in uniform magnetic fields

Authors: Qi Meng, Xuan Liu, Wei Ma, Zhen Yang, Liang Lu, Alexander J. Silenko, Pengming Zhang, Li** Zou

Abstract: The rotation of electron vortex beams (EVBs) presents a complex interplay of the Gouy phase characterizing free-space behavior and Landau states or Larmor rotation observed in magnetic fields. Despite being studied separately, these phenomena manifest within a single beam during its propagation in magnetic fields, lacking a comprehensive description. We address this by utilizing exact solutions of… ▽ More The rotation of electron vortex beams (EVBs) presents a complex interplay of the Gouy phase characterizing free-space behavior and Landau states or Larmor rotation observed in magnetic fields. Despite being studied separately, these phenomena manifest within a single beam during its propagation in magnetic fields, lacking a comprehensive description. We address this by utilizing exact solutions of the relativistic paraxial equation in magnetic fields, termed "paraxial Landau modes". The paraxial Landau modes describe the quantum states of EVBs in magnetic fields. Our study of rotation angles demonstrates consistency with experimental data, supporting the practical presence of these modes. We provide a unified description of different regimes under generalized Gouy rotation, linking the Gouy phase to EVB rotation angles. This connection enhances our understanding of the Gouy phase and can be extended to nonuniform magnetic fields. Our theoretical analysis is validated through numerical simulations using the Chebyshev method. This work offers new insights into the dynamics of EVBs in magnetic fields and suggests practical applications in beam manipulation and beam optics of vortex particles. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02741 [pdf]

18 GHz Solidly Mounted Resonator in Scandium Aluminum Nitride on SiO2/Ta2O5 Bragg Reflector

Authors: Omar Barrera, Nishanth Ravi, Kapil Saha, Supratik Dasgupta, Joshua Campbell, Jack Kramer, Eugene Kwon, Tzu-Hsuan Hsu, Sinwoo Cho, Ian Anderson, Pietro Simeoni, Jue Hou, Matteo Rinaldi, Mark S. Goorsky, Ruochen Lu

Abstract: This work reports an acoustic solidly mounted resonator (SMR) at 18.64 GHz, among the highest operating frequencies reported. The device is built in scandium aluminum nitride (ScAlN) on top of silicon dioxide (SiO2) and tantalum pentoxide (Ta2O5) Bragg reflectors on silicon (Si) wafer. The stack is analyzed with X-ray reflectivity (XRR) and high-resolution X-ray diffraction (HRXRD). The resonator… ▽ More This work reports an acoustic solidly mounted resonator (SMR) at 18.64 GHz, among the highest operating frequencies reported. The device is built in scandium aluminum nitride (ScAlN) on top of silicon dioxide (SiO2) and tantalum pentoxide (Ta2O5) Bragg reflectors on silicon (Si) wafer. The stack is analyzed with X-ray reflectivity (XRR) and high-resolution X-ray diffraction (HRXRD). The resonator shows a coupling coefficient (k2) of 2.0%, high series quality factor (Qs) of 156, shunt quality factor (Qp) of 142, and maximum Bode quality factor (Qmax) of 210. The third-order harmonics at 59.64 GHz is also observed with k2 around 0.6% and Q around 40. Upon further development, the reported acoustic resonator platform can enable various front-end signal-processing functions, e.g., filters and oscillators, at future frequency range 3 (FR3) bands. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 5 pages, 9 figures, 5 tables

arXiv:2407.02600 [pdf]

Macroscopic uniform 2D moiré superlattices with controllable angles

Authors: Gregory Zaborski Jr., Paulina E. Majchrzak, Samuel Lai, Amalya C. Johnson, Ashley P. Saunders, Ziyan Zhu, Yujun Deng, Donghui Lu, Makoto Hashimoto, Z-X Shen, Fang Liu

Abstract: Moiré superlattices, engineered through precise stacking of van der Waals (vdW) layers, hold immense promise for exploring strongly correlated and topological phenomena. However, these applications have been held back by the common preparation method: tear-and-stack of Scotch tape exfoliated monolayers. It has low efficiency and reproducibility, along with challenges of twist angle inhomogeneity,… ▽ More Moiré superlattices, engineered through precise stacking of van der Waals (vdW) layers, hold immense promise for exploring strongly correlated and topological phenomena. However, these applications have been held back by the common preparation method: tear-and-stack of Scotch tape exfoliated monolayers. It has low efficiency and reproducibility, along with challenges of twist angle inhomogeneity, interfacial contamination, micrometer sizes, and a tendency to untwist at elevated temperatures. Here we report an effective strategy to construct highly consistent vdW moiré structures with high production throughput, near-unity yield, pristine interfaces, precisely controlled twist angles, and macroscopic scale (up to centimeters) with enhanced thermal stability. We further demonstrate the versatility across various vdW materials including transition metal dichalcogenides, graphene, and hBN. The expansive size and high quality of moiré structures enables high-resolution map** of the reciprocal space back-folded lattices and moiré mini band structures with low energy electron diffraction (LEED) and angle-resolved photoemission spectroscopy (ARPES). This technique will have broad applications in both fundamental studies and mass production of twistronic devices. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 16 pages, 4 figures

arXiv:2407.02353 [pdf, other]

Roadmap to Neuromorphic Computing with Emerging Technologies

Authors: Adnan Mehonic, Daniele Ielmini, Kaushik Roy, Onur Mutlu, Shahar Kvatinsky, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco, Sabina Spiga, Sergey Savelev, Alexander G Balanov, Nitin Chawla, Giuseppe Desoli, Gerardo Malavena, Christian Monzio Compagnoni, Zhongrui Wang, J Joshua Yang, Ghazi Sarwat Syed, Abu Sebastian, Thomas Mikolajick, Beatriz Noheda, Stefan Slesazeck, Bernard Dieny, Tuo-Hung, Hou, Akhil Varri , et al. (28 additional authors not shown)

Abstract: The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining t… ▽ More The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining the next essential steps for their advancement. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 90 pages, 22 figures, roadmap

arXiv:2407.02318 [pdf, other]

The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

Authors: Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu

Abstract: In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information… ▽ More In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information to help the model better localize the start and end of sounds. The fused features are trained in a multi-scale Transformer for training. In the final test dataset, we achieved a mean average precision (mAP) of 0.33, obtaining the second-best performance in this track. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.02265 [pdf, other]

DrugCLIP: Contrastive Drug-Disease Interaction For Drug Repurposing

Authors: Yingzhou Lu, Yaojun Hu, Chenhao Li

Abstract: Bringing a novel drug from the original idea to market typically requires more than ten years and billions of dollars. To alleviate the heavy burden, a natural idea is to reuse the approved drug to treat new diseases. The process is also known as drug repurposing or drug repositioning. Machine learning methods exhibited huge potential in automating drug repurposing. However, it still encounter som… ▽ More Bringing a novel drug from the original idea to market typically requires more than ten years and billions of dollars. To alleviate the heavy burden, a natural idea is to reuse the approved drug to treat new diseases. The process is also known as drug repurposing or drug repositioning. Machine learning methods exhibited huge potential in automating drug repurposing. However, it still encounter some challenges, such as lack of labels and multimodal feature representation. To address these issues, we design DrugCLIP, a cutting-edge contrastive learning method, to learn drug and disease's interaction without negative labels. Additionally, we have curated a drug repurposing dataset based on real-world clinical trial records. Thorough empirical studies are conducted to validate the effectiveness of the proposed DrugCLIP method. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02252 [pdf, other]

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Authors: Jian Ma, Yonglin Deng, Chen Chen, Haonan Lu, Zhenyu Yang

Abstract: Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generati… ▽ More Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generation remains underexplored. This complex task involves striking a balance between text rendering accuracy and automated layout to produce high-resolution images with variable aspect ratios. To tackle this challenge, we propose an end-to-end text rendering framework employing a triple cross-attention mechanism rooted in align learning, designed to create precise poster text within detailed contextual backgrounds. Additionally, we introduce a high-resolution dataset that exceeds 1024 pixels in image resolution. Our approach leverages the SDXL architecture. Extensive experiments validate the ability of our method to generate poster images featuring intricate and contextually rich backgrounds. Codes will be available at https://github.com/OPPO-Mente-Lab/GlyphDraw2. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02081 [pdf, other]

On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers

Authors: Zhengxian Lu, Fangyu Wang, Zhiwei Xu, Fei Yang, Tao Li

Abstract: Transformer models have emerged as potent solutions to a wide array of multidisciplinary challenges. The deployment of Transformer architectures is significantly hindered by their extensive computational and memory requirements, necessitating the reliance on advanced efficient distributed training methodologies. Prior research has delved into the performance bottlenecks associated with distributed… ▽ More Transformer models have emerged as potent solutions to a wide array of multidisciplinary challenges. The deployment of Transformer architectures is significantly hindered by their extensive computational and memory requirements, necessitating the reliance on advanced efficient distributed training methodologies. Prior research has delved into the performance bottlenecks associated with distributed training, aiming to unravel these bottlenecks and suggest optimization directions. However, such analyses often overlook three aspects unique to Transformer models: the specialized architecture, the dependency on various distributed strategies, and the requirement to balance computational and memory overhead. This paper aims to bridge this gap by offering a comprehensive examination of the performance bottlenecks inherent in distributed training of Transformer models, leveraging both theoretical analysis and empirical investigation. We propose an analytical framework tailored to these unique aspects of Transformers, facilitating a holistic evaluation of model architectures, distributed strategies, and resource consumption. Based on this analytical framework, we conduct a comparative analysis of theoretical performances and further systematically explore how various distributed training strategies fare in real-world scenarios. Most of the experimental results can be well explained by the analytical outcomes derived from the analytical framework. Notably, our findings suggest an advantage of pipeline parallelism over data parallelism for Transformer models. Moreover, we shed light on some unexpected outcomes, such as the potential for increased total memory overhead due to suboptimal model partitioning within pipeline parallelism. Additionally, we underscore the significance of communication block size and waiting time to further enhance performance. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02031 [pdf, other]

SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

Authors: Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Li** Zhang, Wei Wang

Abstract: This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generatin… ▽ More This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generating images for commercial applications. Despite their efficacy, these add-on modules incur high loading overhead, prolong the serving latency, and swallow up expensive GPU resources. Driven by our characterization study, we present SwiftDiffusion, a system that efficiently generates high-quality images using stable diffusion models and add-on modules. To achieve this, SwiftDiffusion reconstructs the existing text-to-image serving workflow by identifying the opportunities for parallel computation and distributing ControlNet computations across multiple GPUs. Further, SwiftDiffusion thoroughly analyzes the dynamics of image generation and develops techniques to eliminate the overhead associated with LoRA loading and patching while preserving the image quality. Last, SwiftDiffusion proposes specialized optimizations in the backbone architecture of the stable diffusion models, which are also compatible with the efficient serving of add-on modules. Compared to state-of-the-art text-to-image serving systems, SwiftDiffusion reduces serving latency by up to 5x and improves serving throughput by up to 2x without compromising image quality. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02015 [pdf, other]

Robust First and Second-Order Differentiation for Regularized Optimal Transport

Authors: Xingjie Li, Fei Lu, Molei Tao, Felix X. -F. Ye

Abstract: Applications such as unbalanced and fully shuffled regression can be approached by optimizing regularized optimal transport (OT) distances, such as the entropic OT and Sinkhorn distances. A common approach for this optimization is to use a first-order optimizer, which requires the gradient of the OT distance. For faster convergence, one might also resort to a second-order optimizer, which addition… ▽ More Applications such as unbalanced and fully shuffled regression can be approached by optimizing regularized optimal transport (OT) distances, such as the entropic OT and Sinkhorn distances. A common approach for this optimization is to use a first-order optimizer, which requires the gradient of the OT distance. For faster convergence, one might also resort to a second-order optimizer, which additionally requires the Hessian. The computations of these derivatives are crucial for efficient and accurate optimization. However, they present significant challenges in terms of memory consumption and numerical instability, especially for large datasets and small regularization strengths. We circumvent these issues by analytically computing the gradients for OT distances and the Hessian for the entropic OT distance, which was not previously used due to intricate tensor-wise calculations and the complex dependency on parameters within the bi-level loss function. Through analytical derivation and spectral analysis, we identify and resolve the numerical instability caused by the singularity and ill-posedness of a key linear system. Consequently, we achieve scalable and stable computation of the Hessian, enabling the implementation of the stochastic gradient descent (SGD)-Newton methods. Tests on shuffled regression examples demonstrate that the second stage of the SGD-Newton method converges orders of magnitude faster than the gradient descent-only method while achieving significantly more accurate parameter estimations. △ Less

Submitted 2 July, 2024; originally announced July 2024.

MSC Class: 68Q25; 68R10; 68U05

arXiv:2407.01993 [pdf, ps, other]

Analysis of short range interactions between $u/d$ quarks in the $NN$, $D_{03}$, and $D_{30}$ systems

Authors: Qi-Fang Lü, Yu-Bing Dong, Peng-Nian Shen, Zong-Ye Zhang

Abstract: The dynamic mechanism of short range interaction between $u/d$ quarks is still an open and challenging problem. In order to reveal this quark dynamics, we perform a systematic analysis of $NN$, $D_{03}$, and $D_{30}$ systems in the (extended) chiral SU(3) constituent quark models. By comparing results calculated with different models and different parameter sets, the effects of one gluon exchange… ▽ More The dynamic mechanism of short range interaction between $u/d$ quarks is still an open and challenging problem. In order to reveal this quark dynamics, we perform a systematic analysis of $NN$, $D_{03}$, and $D_{30}$ systems in the (extended) chiral SU(3) constituent quark models. By comparing results calculated with different models and different parameter sets, the effects of one gluon exchange and vector meson exchange terms are carefully examined. The results indicate that the vector meson exchange interactions dominate the short range interactions between $u/d$ quarks, while the small residual one gluon exchange coupling strength is also allowed. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 8 pages, 2 figures, comments and suggestions are welcome

arXiv:2407.01976 [pdf, other]

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Authors: **ghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, **gqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao Liu, Can Huang

Abstract: Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In th… ▽ More Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In this work, we introduce Interleaving Layout and Text in a Large Language Model (LayTextLLM)} for document understanding. In particular, LayTextLLM projects each bounding box to a single embedding and interleaves it with text, efficiently avoiding long sequence issues while leveraging autoregressive traits of LLMs. LayTextLLM not only streamlines the interaction of layout and textual data but also shows enhanced performance in Key Information Extraction (KIE) and Visual Question Answering (VQA). Comprehensive benchmark evaluations reveal significant improvements, with a 27.0% increase on KIE tasks and 24.1% on VQA tasks compared to previous state-of-the-art document understanding MLLMs, as well as a 15.5% improvement over other SOTA OCR-based LLMs on KIE tasks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01942 [pdf, other]

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Ye** Choi

Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 26 pages

arXiv:2407.01902 [pdf, other]

SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

Authors: Yan Yang, Zeguan Xiao, Xin Lu, Hongru Wang, Hailiang Huang, Guanhua Chen, Yun Chen

Abstract: The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SoP, a simple yet effective framework to design jailbreak prompts automatically. I… ▽ More The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SoP, a simple yet effective framework to design jailbreak prompts automatically. Inspired by the social facilitation concept, SoP generates and optimizes multiple jailbreak characters to bypass the guardrails of the target LLM. Different from previous work which relies on proprietary LLMs or seed jailbreak templates crafted by human expertise, SoP can generate and optimize the jailbreak prompt in a cold-start scenario using open-sourced LLMs without any seed jailbreak templates. Experimental results show that SoP achieves attack success rates of 88% and 60% in bypassing the safety alignment of GPT-3.5-1106 and GPT-4, respectively. Furthermore, we extensively evaluate the transferability of the generated templates across different LLMs and held-out malicious requests, while also exploring defense strategies against the jailbreak attack designed by SoP. Code is available at https://github.com/Yang-Yan-Yang-Yan/SoP. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01885 [pdf, other]

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

Authors: Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen

Abstract: Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models whil… ▽ More Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models while maintaining their accuracy has become a focal point of research. Among the various methods, knowledge distillation has emerged as an effective technique to enhance inference speed without greatly compromising performance. This paper presents a thorough survey from three aspects: method, evaluation, and application, exploring knowledge distillation techniques tailored specifically for LLMs. Specifically, we divide the methods into white-box KD and black-box KD to better illustrate their differences. Furthermore, we also explored the evaluation tasks and distillation effects between different distillation methods, and proposed directions for future research. Through in-depth understanding of the latest advancements and practical applications, this survey provides valuable resources for researchers, paving the way for sustained progress in this field. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 28 pages

arXiv:2407.01595 [pdf, other]

Fairpriori: Improving Biased Subgroup Discovery for Deep Neural Network Fairness

Authors: Kacy Zhou, Jiawen Wen, Nan Yang, Dong Yuan, Qinghua Lu, Huaming Chen

Abstract: While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-… ▽ More While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-skinned women, while not showing bias against individuals with darker skin or women. This problem calls for effective fairness testing before the deployment of such deep learning models in real-world scenarios. However, research into detecting such bias is currently limited compared to research on individual and group fairness. Existing tools to investigate intersectional bias lack important features such as support for multiple fairness metrics, fast and efficient computation, and user-friendly interpretation. This paper introduces Fairpriori, a novel biased subgroup discovery method, which aims to address these limitations. Fairpriori incorporates the frequent itemset generation algorithm to facilitate effective and efficient investigation of intersectional bias by producing fast fairness metric calculations on subgroups of a dataset. Through comparison with the state-of-the-art methods (e.g., Themis, FairFictPlay, and TestSGD) under similar conditions, Fairpriori demonstrates superior effectiveness and efficiency when identifying intersectional bias. Specifically, Fairpriori is easier to use and interpret, supports a wider range of use cases by accommodating multiple fairness metrics, and exhibits higher efficiency in computing fairness metrics. These findings showcase Fairpriori's potential for effectively uncovering subgroups affected by intersectional bias, supported by its open-source tooling at https://anonymous.4open.science/r/Fairpriori-0320. △ Less

Submitted 24 June, 2024; originally announced July 2024.

Comments: 11 pages

arXiv:2407.01523 [pdf, other]

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark comprising 1,062 expert-annotated questions. Distinct from previous datasets, it is constructed upon 130 lengthy PDF-formatted documents with an average of 49.4 pages and 20,971 textual tokens. Towards comprehensive evaluation, answers to these questions rely on pieces of evidence from (1) different sources (text, image, chart, table, and layout structure) and (2) various locations (i.e. page number). Moreover, 33.2% of the questions are cross-page questions requiring evidence across multiple pages. 22.8% of the questions are designed to be unanswerable for detecting potential hallucinations. Experiments on 14 LVLMs demonstrate that long-context DU greatly challenges current models. Notably, the best-performing model, GPT-4o, achieves an F1 score of only 42.7%, while the second-best, GPT-4V, scores 31.4%. Furthermore, 12 LVLMs (all except GPT-4o and GPT-4V) even present worse performance than their LLM counterparts which are fed with lossy-parsed OCR documents. These results validate the necessity of future research toward more capable long-context LVLMs. Project Page: https://mayubo2333.github.io/MMLongBench-Doc △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01491 [pdf, other]

Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning

Authors: Siwei Li, Yifan Yang, Yifei Shen, Fangyun Wei, Zongqing Lu, Lili Qiu, Yuqing Yang

Abstract: Efficient fine-tuning plays a fundamental role in modern large models, with low-rank adaptation emerging as a particularly promising approach. However, the existing variants of LoRA are hampered by limited expressiveness, a tendency to overfit, and sensitivity to hyperparameter settings. This paper presents LoRA Slow Cascade Learning (LoRASC), an innovative technique designed to enhance LoRA's exp… ▽ More Efficient fine-tuning plays a fundamental role in modern large models, with low-rank adaptation emerging as a particularly promising approach. However, the existing variants of LoRA are hampered by limited expressiveness, a tendency to overfit, and sensitivity to hyperparameter settings. This paper presents LoRA Slow Cascade Learning (LoRASC), an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities while preserving its training efficiency. Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns. Additionally, we introduce a slow-fast update mechanism and cascading noisy tuning to bolster generalization. The extensive experiments on various language and vision datasets, as well as robustness benchmarks, demonstrate that the proposed method not only significantly outperforms existing baselines, but also mitigates overfitting, enhances model stability, and improves OOD robustness. Code will be release in https://github.com/microsoft/LoRASC very soon. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01455 [pdf, other]

TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind

Authors: Guiyang Hou, Wenqi Zhang, Yongliang Shen, Linjuan Wu, Weiming Lu

Abstract: Theory of Mind (ToM)-the cognitive ability to reason about mental states of ourselves and others, is the foundation of social interaction. Although ToM comes naturally to humans, it poses a significant challenge to even the most advanced Large Language Models (LLMs). Due to the complex logical chains in ToM reasoning, especially in higher-order ToM questions, simply utilizing reasoning methods lik… ▽ More Theory of Mind (ToM)-the cognitive ability to reason about mental states of ourselves and others, is the foundation of social interaction. Although ToM comes naturally to humans, it poses a significant challenge to even the most advanced Large Language Models (LLMs). Due to the complex logical chains in ToM reasoning, especially in higher-order ToM questions, simply utilizing reasoning methods like Chain of Thought (CoT) will not improve the ToM capabilities of LLMs. We present TimeToM, which constructs a temporal space and uses it as the foundation to improve the ToM capabilities of LLMs in multiple scenarios. Specifically, within the temporal space, we construct Temporal Belief State Chain (TBSC) for each character and inspired by the cognition perspective of the social world model, we divide TBSC into self-world beliefs and social world beliefs, aligning with first-order ToM (first-order beliefs) and higher-order ToM (higher-order beliefs) questions, respectively. Moreover, we design a novel tool-belief solver that, by considering belief communication between characters in temporal space, can transform a character's higher-order beliefs into another character's first-order beliefs under belief communication period. Experimental results indicate that TimeToM can dramatically improve the reasoning performance of LLMs on ToM questions while taking a big step towards coherent and robust ToM reasoning. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 16 pages, 6 figures, ACL 2024(findings)

arXiv:2407.01351 [pdf, other]

Probing the connection between IceCube neutrinos and MOJAVE AGN

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (399 additional authors not shown)

Abstract: Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well establi… ▽ More Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well established which can be resolved via correlation studies with photon observations. For neutrinos produced due to photohadronic interactions in AGN, in addition to a correlation of neutrinos with high-energy photons, there would also be a correlation of neutrinos with photons emitted at radio wavelengths. In this work, we perform an in-depth stacking study of the correlation between 15 GHz radio observations of AGN reported in the MOJAVE XV catalog, and ten years of neutrino data from IceCube. We also use a time-dependent approach which improves the statistical power of the stacking analysis. No significant correlation was found for both analyses and upper limits are reported. When compared to the IceCube diffuse flux, at 100 TeV and for a spectral index of 2.5, the upper limits derived are $\sim3\%$ and $\sim9\%$ for the time-averaged and time-dependent case, respectively. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 14 Pages 7 Figures

arXiv:2407.01314 [pdf, other]

Search for a light sterile neutrino with 7.5 years of IceCube DeepCore data

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (399 additional authors not shown)

Abstract: We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previo… ▽ More We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previous DeepCore sterile neutrino searches. Our results are compatible with the absence of mixing between active and sterile neutrino states, and we place constraints on the mixing matrix elements $|U_{μ4}|^2 < 0.0534$ and $|U_{τ4}|^2 < 0.0574$ at 90% CL under the assumption that $Δm^2_{41}\geq 1\;\mathrm{eV^2}$. These null results add to the growing tension between anomalous appearance results and constraints from disappearance searches in the 3+1 sterile neutrino landscape. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 11 pages, 5 figures. To be submitted to Physical Review D

arXiv:2407.01271 [pdf, other]

First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1

Authors: Xiangyu Wu, Hailiang Zhang, Yang Yang, Jianfeng Lu

Abstract: In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and… ▽ More In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and gradually increasing the number of masking ratios to perform the denoising auto-encoder pre-training task. In the fine-tuning stage, we design iterative retrieval augmentation and noise-aware similarity bucket prompt strategies. The retrieval augmentation constructs a mini-knowledge base, enriching the input information of the model, while the similarity bucket further perceives the noise information within the mini-knowledge base, guiding the model to generate higher-quality diagnostic reports based on the similarity prompts. Surprisingly, our single model has achieved a score of 2.321 on leaderboard A, and the multiple model fusion scores are 2.362 and 2.320 on the A and B leaderboards respectively, securing first place in the rankings. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: First Place of 2023 Global Artificial Intelligence Technology Innovation Competition

arXiv:2407.01196 [pdf, other]

Implementation of a scalable universal two-qubit quantum processor with electron and nuclear spins in a trapped ion

Authors: Ji Bian, Teng Liu, Qifeng Lao, Min Ding, Huiyi Zhang, Xinxin Rao, Pengfei Lu, Le Luo

Abstract: Increasing the quantum information processing power with limited number of hosts is vital for achieving quantum advantage. Here we propose a novel scheme that achieves a scalable n-ion-2n-qubit quantum processor utilizing four internal levels of each ion, and experimentally implement a 1-ion-2-qubit universal processor using the valence electron spin and nuclear spin of a single 171Yb+ ion. Fideli… ▽ More Increasing the quantum information processing power with limited number of hosts is vital for achieving quantum advantage. Here we propose a novel scheme that achieves a scalable n-ion-2n-qubit quantum processor utilizing four internal levels of each ion, and experimentally implement a 1-ion-2-qubit universal processor using the valence electron spin and nuclear spin of a single 171Yb+ ion. Fidelities of single-qubit and two-qubit gates are around 0.98 obtained by quantum process tomography. Additionally, the Grover's algorithm is implemented with a successful rate exceeding 0.99. We provide explicit scaling-up protocols based on standard laser-less and laser-based frameworks, and further demonstrate that the electron/nuclear-spin scheme allows less demanding two-qubit entangling gates between different ions. The replacement of some inter-atomic gates by intra-atomic gates could increase the fidelity of some quantum circuits. Our work paves the way towards achieving 2n-times increase in the size of quantum computational Hilbert space with n ions. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01097 [pdf, other]

HGNET: A Hierarchical Feature Guided Network for Occupancy Flow Field Prediction

Authors: Zhan Chen, Chen Tang, Lu Xiong

Abstract: Predicting the motion of multiple traffic participants has always been one of the most challenging tasks in autonomous driving. The recently proposed occupancy flow field prediction method has shown to be a more effective and scalable representation compared to general trajectory prediction methods. However, in complex multi-agent traffic scenarios, it remains difficult to model the interactions a… ▽ More Predicting the motion of multiple traffic participants has always been one of the most challenging tasks in autonomous driving. The recently proposed occupancy flow field prediction method has shown to be a more effective and scalable representation compared to general trajectory prediction methods. However, in complex multi-agent traffic scenarios, it remains difficult to model the interactions among various factors and the dependencies among prediction outputs at different time steps. In view of this, we propose a transformer-based hierarchical feature guided network (HGNET), which can efficiently extract features of agents and map information from visual and vectorized inputs, modeling multimodal interaction relationships. Second, we design the Feature-Guided Attention (FGAT) module to leverage the potential guiding effects between different prediction targets, thereby improving prediction accuracy. Additionally, to enhance the temporal consistency and causal relationships of the predictions, we propose a Time Series Memory framework to learn the conditional distribution models of the prediction outputs at future time steps from multivariate time series. The results demonstrate that our model exhibits competitive performance, which ranks 3rd in the 2024 Waymo Occupancy and Flow Prediction Challenge. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01094 [pdf, other]

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, **gdong Wang

Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to text prompts. In this study, we propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models. For this purpose, we establish a new benchmark comprising text prompts that fully reflect multiple dynamics grades, and define a set of dynamics scores corresponding to various temporal granularities to comprehensively evaluate the dynamics of each generated video. Based on the new benchmark and the dynamics scores, we assess T2V models with the design of three metrics: dynamics range, dynamics controllability, and dynamics-based quality. Experiments show that DEVIL achieves a Pearson correlation exceeding 90% with human ratings, demonstrating its potential to advance T2V generation models. Code is available at https://github.com/MingXiangL/DEVIL. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01093 [pdf, other]

IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation

Authors: Senyu Han, Lu Chen, Li-Min Lin, Zhengshan Xu, Kai Yu

Abstract: Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates dram… ▽ More Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates drama scripts and makes the plot played by agents more controllable. The director agent writes plot outlines that the user desires to see, instructs the actor agents to role-play their characters, and reschedules the plot when human players participate in the scenario to ensure the plot is progressing towards the objective. To evaluate the framework, we create a novel drama plot that involves several actor agents and check the interactions between them under the instruction of the director agent. Evaluation results show that our framework could generate complete, diverse drama scripts from only a rough outline of plot objectives, meanwhile maintaining the characteristics of characters in the drama. Our codes and prompts are available at https://github.com/OpenDFM/ibsen. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted by ACL 2024 Main

arXiv:2407.00994 [pdf, other]

LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation

Authors: Longchao Da, Tie** Chen, Lu Cheng, Hua Wei

Abstract: The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should w… ▽ More The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should we trust the responses from LLMs? This paper presents a novel way to evaluate the uncertainty that captures the directional instability, by constructing a directional graph from entailment probabilities, and we innovatively conduct Random Walk Laplacian given the asymmetric property of a constructed directed graph, then the uncertainty is aggregated by the derived eigenvalues from the Laplacian process. We also provide a way to incorporate the existing work's semantics uncertainty with our proposed layer. Besides, this paper identifies the vagueness issues in the raw response set and proposes an augmentation approach to mitigate such a problem, we conducted extensive empirical experiments and demonstrated the superiority of our proposed solutions. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 11 pages main content, 5 pages appendix

ACM Class: I.2.7

arXiv:2407.00903 [pdf, other]

Observation of topological transitions associated with a Weyl exceptional ring

Authors: Hao-Long Zhang, Pei-Rong Han, Xue-Jia Yu, Shou-Bang Yang, Jia-Hao Lü, Wen Ning, Fan Wu, Qi-** Su, Chui-** Yang, Zhen-Biao Yang, Shi-Biao Zheng

Abstract: The environment-induced dissipation of an open system, once thought as a nuisance, can actually lead to emergence of many intriguing phenomena that are absent in an isolated system. Among these, Weyl exceptional rings (WER), extended from point-like singularities, are particularly interesting. Theoretically, a WER was predicted to carry a topological charge with a nonzero Chern number, but it has… ▽ More The environment-induced dissipation of an open system, once thought as a nuisance, can actually lead to emergence of many intriguing phenomena that are absent in an isolated system. Among these, Weyl exceptional rings (WER), extended from point-like singularities, are particularly interesting. Theoretically, a WER was predicted to carry a topological charge with a nonzero Chern number, but it has not been measured so far. We here investigate this topology in a circuit, where the WER is synthesized with a superconducting qubit controllably coupled to a decaying resonator. The high flexibility of the system enables us to characterize its eigenvectors on different manifolds of parameter space. We extract both the quantized Berry phase and Chern number from these eigenvectors. Furthermore, we demonstrate a topological transition triggered by shrinking the size of the manifold$-$a unique feature of the WER. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 16 pages, 10 figures

arXiv:2407.00897 [pdf, other]

Multi-field quantum conferencing overcomes the network capacity limit

Authors: Yuan-Mei Xie, Yu-Shuo Lu, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen

Abstract: Quantum conferencing enables multiple nodes within a quantum network to share a secure group key for private message broadcasting. The key rate, however, is limited by the repeaterless capacity to distribute multiparticle entangled states across the network. Currently, in the finite-size regime, no feasible schemes utilizing existing experimental techniques can overcome the fundamental rate-distan… ▽ More Quantum conferencing enables multiple nodes within a quantum network to share a secure group key for private message broadcasting. The key rate, however, is limited by the repeaterless capacity to distribute multiparticle entangled states across the network. Currently, in the finite-size regime, no feasible schemes utilizing existing experimental techniques can overcome the fundamental rate-distance limit of quantum conferencing in quantum networks without repeaters. Here, we propose a practical, multi-field scheme that breaks this limit, involving virtually establishing Greenberger-Horne-Zeilinger states through post-measurement coincidence matching. This proposal features a measurement-device-independent characteristic and can directly scale to support any number of users. Simulations show that the fundamental limitation on the group key rate can be overcome in a reasonable running time of sending $10^{14}$ pulses. We predict that it offers an efficient design for long-distance broadcast communication in future quantum networks. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 20 pages, 6 figures

arXiv:2407.00782 [pdf, other]

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

Authors: Zimu Lu, Aojun Zhou, Ke Wang, Houxing Ren, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li

Abstract: Direct Preference Optimization (DPO) has proven effective at improving the performance of large language models (LLMs) on downstream tasks such as reasoning and alignment. In this work, we propose Step-Controlled DPO (SCDPO), a method for automatically providing stepwise error supervision by creating negative samples of mathematical reasoning rationales that start making errors at a specified step… ▽ More Direct Preference Optimization (DPO) has proven effective at improving the performance of large language models (LLMs) on downstream tasks such as reasoning and alignment. In this work, we propose Step-Controlled DPO (SCDPO), a method for automatically providing stepwise error supervision by creating negative samples of mathematical reasoning rationales that start making errors at a specified step. By applying these samples in DPO training, SCDPO can better align the model to understand reasoning errors and output accurate reasoning steps. We apply SCDPO to both code-integrated and chain-of-thought solutions, empirically showing that it consistently improves the performance compared to naive DPO on three different SFT models, including one existing SFT model and two models we finetuned. Qualitative analysis of the credit assignment of SCDPO and DPO demonstrates the effectiveness of SCDPO at identifying errors in mathematical solutions. We then apply SCDPO to an InternLM2-20B model, resulting in a 20B model that achieves high scores of 88.5% on GSM8K and 58.1% on MATH, rivaling all other open-source LLMs, showing the great potential of our method. △ Less

Submitted 2 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00769 [pdf, other]

Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

Authors: Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhiling Pei, Xingcheng Zhang, Wanli Ouyang

Abstract: Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and de… ▽ More Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and device levels to achieve unprecedented scalability for tensor networks. This enables the handling of large-scale tensor networks with memory capacities reaching tens of terabytes, surpassing memory space constraints on a single node. Our techniques enable accommodating large-scale tensor networks with up to tens of terabytes of memory, reaching up to 2304 GPUs with a peak computing power of 561 PFLOPS half-precision. Notably, we have achieved a time-to-solution of 14.22 seconds with energy consumption of 2.39 kWh which achieved fidelity of 0.002 and our most remarkable result is a time-to-solution of 17.18 seconds, with energy consumption of only 0.29 kWh which achieved a XEB of 0.002 after post-processing, outperforming Google's quantum processor Sycamore in both speed and energy efficiency, which recorded 600 seconds and 4.3 kWh, respectively. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00736 [pdf, other]

Quantum Circuit Synthesis and Compilation Optimization: Overview and Prospects

Authors: Yan Ge, Wu Wenjie, Chen Yuheng, Pan Kaisen, Lu Xudong, Zhou Zixiang, Wang Yuhan, Wang Ruocheng, Yan Junchi

Abstract: Quantum computing is regarded as a promising paradigm that may overcome the current computational power bottlenecks in the post-Moore era. The increasing maturity of quantum processors, especially superconducting ones, provides more possibilities for the development and implementation of quantum algorithms. As the crucial stages for quantum algorithm implementation, the logic circuit design and qu… ▽ More Quantum computing is regarded as a promising paradigm that may overcome the current computational power bottlenecks in the post-Moore era. The increasing maturity of quantum processors, especially superconducting ones, provides more possibilities for the development and implementation of quantum algorithms. As the crucial stages for quantum algorithm implementation, the logic circuit design and quantum compiling have also received significant attention, which covers key technologies such as quantum logic circuit synthesis (also widely known as quantum architecture search) and optimization, as well as qubit map** and routing. Recent studies suggest that the scale and precision of related algorithms are steadily increasing, especially with the integration of artificial intelligence methods. In this survey, we systematically review and summarize a vast body of literature, exploring the feasibility of an integrated design and optimization scheme that spans from the algorithmic level to quantum hardware, combining the steps of logic circuit design and compilation optimization. Leveraging the exceptional cognitive and learning capabilities of AI algorithms, one can reduce manual design costs, enhance the precision and efficiency of execution, and facilitate the implementation and validation of the superiority of quantum algorithms on hardware. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 32 page, 3 figures, 3 tables

arXiv:2407.00735 [pdf, other]

Generative prediction of flow field based on the diffusion model

Authors: Jiajun Hu, Zhen Lu, Yue Yang

Abstract: We propose a geometry-to-flow diffusion model that utilizes the input of obstacle shape to predict a flow field past the obstacle. The model is based on a learnable Markov transition kernel to recover the data distribution from the Gaussian distribution. The Markov process is conditioned on the obstacle geometry, estimating the noise to be removed at each step, implemented via a U-Net. A cross-att… ▽ More We propose a geometry-to-flow diffusion model that utilizes the input of obstacle shape to predict a flow field past the obstacle. The model is based on a learnable Markov transition kernel to recover the data distribution from the Gaussian distribution. The Markov process is conditioned on the obstacle geometry, estimating the noise to be removed at each step, implemented via a U-Net. A cross-attention mechanism incorporates the geometry as a prompt. We train the geometry-to-flow diffusion model using a dataset of flows past simple obstacles, including the circle, ellipse, rectangle, and triangle. For comparison, the CNN model is trained using the same dataset. Tests are carried out on flows past obstacles with simple and complex geometries, representing interpolation and extrapolation on the geometry condition, respectively. In the test set, challenging scenarios include a cross and characters `PKU'. Generated flow fields show that the geometry-to-flow diffusion model is superior to the CNN model in predicting instantaneous flow fields and handling complex geometries. Quantitative analysis of the model accuracy and divergence in the fields demonstrate the high robustness of the diffusion model, indicating that the diffusion model learns physical laws implicitly. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00731 [pdf, other]

Large Language Models Struggle in Token-Level Clinical Named Entity Recognition

Authors: Qiuhao Lu, Rui Li, Andrew Wen, **lian Wang, Liwei Wang, Hongfang Liu

Abstract: Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial… ▽ More Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial role in extracting relevant information from clinical texts. Despite the promise of LLMs, current research mostly concentrates on document-level NER, identifying entities in a more general context across entire documents, without extracting their precise location. Additionally, efforts have been directed towards adapting ChatGPT for token-level NER. However, there is a significant research gap when it comes to employing token-level NER for clinical texts, especially with the use of local open-source LLMs. This study aims to bridge this gap by investigating the effectiveness of both proprietary and local LLMs in token-level clinical NER. Essentially, we delve into the capabilities of these models through a series of experiments involving zero-shot prompting, few-shot prompting, retrieval-augmented generation (RAG), and instruction-fine-tuning. Our exploration reveals the inherent challenges LLMs face in token-level NER, particularly in the context of rare diseases, and suggests possible improvements for their application in healthcare. This research contributes to narrowing a significant gap in healthcare informatics and offers insights that could lead to a more refined application of LLMs in the healthcare sector. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: AMIA 2024 Annual Symposium Proceedings

arXiv:2407.00639 [pdf, other]

GRB 221009A/SN 2022xiw: A Supernova Obscured by a Gamma-Ray Burst Afterglow?

Authors: De-Feng Kong, Xiang-Gao Wang, WeiKang Zheng, Hou-Jun Lü, L. P. Xin, Da-Bin Lin, Jia-Xin Cao, Ming-Xuan Lu, B. Ren, Edgar P. Vidal, J. Y. Wei, En-Wei Liang, Alexei V. Filippenko

Abstract: We present optical photometry for the afterglow of GRB 221009A, in some respects the most extraordinary gamma-ray burst (GRB) ever observed. Good quality in the R-band light curve is obtained, covering 0.32-19.57 days since the Fermi-GBM trigger. We find that a weak bump emerges fromthe declining afterglow at $t \approx 11$ days; a supernova (SN) may be responsible. We use a smooth broken power-la… ▽ More We present optical photometry for the afterglow of GRB 221009A, in some respects the most extraordinary gamma-ray burst (GRB) ever observed. Good quality in the R-band light curve is obtained, covering 0.32-19.57 days since the Fermi-GBM trigger. We find that a weak bump emerges fromthe declining afterglow at $t \approx 11$ days; a supernova (SN) may be responsible. We use a smooth broken power-law and $^{56}\mathrm{Ni}$ model to fit the light curve. The best-fitting results reveal that the SN ejected a total mass of $M_\mathrm{ej} = 3.70 M_\odot$, a $^{56}\mathrm{Ni}$ mass of $M_\mathrm{Ni} = 0.23 M_\odot$, and a kinetic energy of $E_\mathrm{SN,K} = 2.35 \times 10^{52} \mathrm{erg}$. We also compare GRB 221009A with other GRB-SN events based on a GRB-associated SN sample, and find that only SN 2003lw and SN 2011kl can be obviously revealed in the afterglow of GRB 221009A by setting these objects at its distance. This suggests that a supernova (SN 2022xiw) is possibly obscured by the brighter afterglow emission from GRB 221009A. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00636 [pdf, ps, other]

Nash equilibria of games with generalized complementarities

Authors: Lu Yu

Abstract: To generalize complementarities for games, we introduce some conditions weaker than quasisupermodularity and the single crossing property. We prove that the Nash equilibria of a game satisfying these conditions form a nonempty complete lattice. This is a purely order-theoretic generalization of Zhou's theorem. To generalize complementarities for games, we introduce some conditions weaker than quasisupermodularity and the single crossing property. We prove that the Nash equilibria of a game satisfying these conditions form a nonempty complete lattice. This is a purely order-theoretic generalization of Zhou's theorem. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00631 [pdf, other]

TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

Authors: **tai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

Abstract: Clinical trials are pivotal for develo** new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More Clinical trials are pivotal for develo** new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex data collection and question definition requiring medical expertise and a deep understanding of trial designs have hindered the involvement of AI thus far. This paper tackles these challenges by presenting a comprehensive suite of meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design, encompassing prediction of trial duration, patient dropout rate, serious adverse event, mortality rate, trial approval outcome, trial failure reason, drug dose finding, design of eligibility criteria. Furthermore, we provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design, ultimately advancing clinical trial research and accelerating medical solution development. The curated dataset, metrics, and basic models are publicly available at https://github.com/ML2Health/ML2ClinicalTrials/tree/main/AI4Trial. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00588 [pdf, other]

Forward and backward problems for coupled subdiffusion systems

Authors: Dian Feng, Yikan Liu, Shuai Lu

Abstract: In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of in… ▽ More In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of initial values through final time observations, we demonstrate a Lipschitz stability estimate, which is consist with the stability observed in the case of a single equation. To numerically address this backward problem, we refer to the explicit formulation of Tikhonov regularization to devise a multi-channel neural network architecture. This innovative architecture offers a versatile approach, exhibiting its efficacy in multidimensional settings through numerical examples and its robustness in handling initial values that have not been trained. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 26 pages, 7 figures

MSC Class: 35R11; 35K58; 35B44

arXiv:2407.00499 [pdf, other]

ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

Authors: Zhiyuan Wang, **hao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended NLG tasks. We propose a sampling-based uncertainty measure leveraging self-consistency and develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the design of the CP algorithm. Experimental results indicate that our uncertainty measure generally surpasses prior state-of-the-art methods. Furthermore, we calibrate the prediction sets within the model's unfixed answer distribution and achieve strict control over the correctness coverage rate across 6 LLMs on 4 free-form NLG datasets, spanning general-purpose and medical domains, while the small average set size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 13 pages, 9 figures, 6 tables

arXiv:2407.00495 [pdf, other]

A Bayesian Solution To The Imitation Gap

Authors: Risto Vuorio, Mattie Fellows, Cong Lu, Clémence Grislain, Shimon Whiteson

Abstract: In many real-world settings, an agent must learn to act in environments where no reward signal can be specified, but a set of expert demonstrations is available. Imitation learning (IL) is a popular framework for learning policies from such demonstrations. However, in some cases, differences in observability between the expert and the agent can give rise to an imitation gap such that the expert's… ▽ More In many real-world settings, an agent must learn to act in environments where no reward signal can be specified, but a set of expert demonstrations is available. Imitation learning (IL) is a popular framework for learning policies from such demonstrations. However, in some cases, differences in observability between the expert and the agent can give rise to an imitation gap such that the expert's policy is not optimal for the agent and a naive application of IL can fail catastrophically. In particular, if the expert observes the Markov state and the agent does not, then the expert will not demonstrate the information-gathering behavior needed by the agent but not the expert. In this paper, we propose a Bayesian solution to the Imitation Gap (BIG), first using the expert demonstrations, together with a prior specifying the cost of exploratory behavior that is not demonstrated, to infer a posterior over rewards with Bayesian inverse reinforcement learning (IRL). BIG then uses the reward posterior to learn a Bayes-optimal policy. Our experiments show that BIG, unlike IL, allows the agent to explore at test time when presented with an imitation gap, whilst still learning to behave optimally using expert demonstrations when no such gap exists. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Showing 1–50 of 24,743 results for author: Lu