-
An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
Authors:
Mengzhao Wang,
Haotian Wu,
Xiangyu Ke,
Yunjun Gao,
Xiaoliang Xu,
Lu Chen
Abstract:
Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph…
▽ More
Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index, integrated with cutting-edge LLMs. It comprises five core components: Data Preprocessing, Vector Representation, Index Construction, Query Execution, and Answer Generation, all orchestrated by a dedicated coordinator to ensure smooth data flow from input to answer generation. One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities, facilitating precise measurement of multi-modal information similarity. Furthermore, the system achieves efficient retrieval through our advanced navigation graph index, refined using computational pruning techniques. Another highlight of our system is its pluggable processing framework, allowing seamless integration of embedding models, graph indexes, and LLMs. This flexibility provides users diverse options for gaining insights from their multi-modal knowledge base. A preliminary video introduction of MQA is available at https://youtu.be/xvUuo2ZIqWk.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Edge AI: A Taxonomy, Systematic Review and Future Directions
Authors:
Sukhpal Singh Gill,
Muhammed Golec,
Jianmin Hu,
Minxian Xu,
Junhui Du,
Huaming Wu,
Guneet Kaur Walia,
Subramaniam Subramanian Murugesan,
Babar Ali,
Mohit Kumar,
Kejiang Ye,
Prabal Verma,
Surendra Kumar,
Felix Cuadrado,
Steve Uhlig
Abstract:
Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge…
▽ More
Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge AI. The goal of Edge AI is to optimize data processing efficiency and velocity while ensuring data confidentiality and integrity. Despite being a relatively new field of research, spanning from 2014 to the present, it has shown significant and rapid development over the last five years. In this article, we present a systematic literature review for Edge AI to discuss the existing research, recent advancements, and future research directions. We created a collaborative edge AI learning system for cloud and edge computing analysis, including an in-depth study of the architectures that facilitate this mechanism. The taxonomy for Edge AI facilitates the classification and configuration of Edge AI systems while also examining its potential influence across many fields through compassing infrastructure, cloud computing, fog computing, services, use cases, ML and deep learning, and resource management. This study highlights the significance of Edge AI in processing real-time data at the edge of the network. Additionally, it emphasizes the research challenges encountered by Edge AI systems, including constraints on resources, vulnerabilities to security threats, and problems with scalability. Finally, this study highlights the potential future research directions that aim to address the current limitations of Edge AI by providing innovative solutions.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Observation of the Galactic Center PeVatron Beyond 100 TeV with HAWC
Authors:
A. Albert,
R. Alfaro,
C. Alvarez,
A. Andrés,
J. C. Arteaga-Velázquez,
D. Avila Rojas,
H. A. Ayala Solares,
R. Babu,
E. Belmont-Moreno,
A. Bernal,
K. S. Caballero-Mora,
T. Capistrán,
A. Carramiñana,
S. Casanova,
U. Cotti,
J. Cotzomi,
S. Coutiño de León,
E. De la Fuente,
C. de León,
D. Depaoli,
N. Di Lalla,
N. Di Lalla,
R. Diaz Hernandez,
B. L. Dingus,
M. A. DuVernois
, et al. (78 additional authors not shown)
Abstract:
We report an observation of ultra-high energy (UHE) gamma rays from the Galactic Center region, using seven years of data collected by the High-Altitude Water Cherenkov (HAWC) Observatory. The HAWC data are best described as a point-like source (HAWC J1746-2856) with a power-law spectrum ($\mathrm{d}N/\mathrm{d}E=φ(E/26 \,\text{TeV})^γ$), where $γ=-2.88 \pm 0.15_{\text{stat}} - 0.1_{\text{sys}} $…
▽ More
We report an observation of ultra-high energy (UHE) gamma rays from the Galactic Center region, using seven years of data collected by the High-Altitude Water Cherenkov (HAWC) Observatory. The HAWC data are best described as a point-like source (HAWC J1746-2856) with a power-law spectrum ($\mathrm{d}N/\mathrm{d}E=φ(E/26 \,\text{TeV})^γ$), where $γ=-2.88 \pm 0.15_{\text{stat}} - 0.1_{\text{sys}} $ and $φ=1.5 \times 10^{-15}$ (TeV cm$^{2}$s)$^{-1}$ $\pm\, 0.3_{\text{stat}}\,^{+0.08_{\text{sys}}}_{-0.13_{\text{sys}}}$ extending from 6 to 114 TeV. We find no evidence of a spectral cutoff up to $100$ TeV using HAWC data. Two known point-like gamma-ray sources are spatially coincident with the HAWC gamma-ray excess: Sgr A$^{*}$ (HESS J1745-290) and the Arc (HESS J1746-285). We subtract the known flux contribution of these point sources from the measured flux of HAWC J1746-2856 to exclude their contamination and show that the excess observed by HAWC remains significant ($>$5$σ$) with the spectrum extending to $>$100 TeV. Our result supports that these detected UHE gamma rays can originate via hadronic interaction of PeV cosmic-ray protons with the dense ambient gas and confirms the presence of a proton PeVatron at the Galactic Center.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Movable Antenna-enabled RIS-aided Integrated Sensing and Communication
Authors:
Haisu Wu,
Hong Ren,
Cunhua Pan
Abstract:
In this paper, we investigate a movable antenna (MA)-aided integrated sensing and communication (ISAC) system, where a reconfigurable intelligent surface (RIS) is employed to enhance wireless communication and sensing performance in dead zones. Specifically, this paper aims to maximize the minimum beampattern gain at the RIS by jointly optimizing beamforming matrix at the base station (BS), the re…
▽ More
In this paper, we investigate a movable antenna (MA)-aided integrated sensing and communication (ISAC) system, where a reconfigurable intelligent surface (RIS) is employed to enhance wireless communication and sensing performance in dead zones. Specifically, this paper aims to maximize the minimum beampattern gain at the RIS by jointly optimizing beamforming matrix at the base station (BS), the reflecting coefficients at the RIS and the positions of the MAs, subject to signal-to-interference-plus-noise ratio (SINR) constraint for the users and maximum transmit power at the BS. To tackle this non-convex optimization problem, we propose an alternating optimization (AO) algorithm and employ semidefinite relaxation (SDR), sequential rank-one constraint relaxation (SRCR) and successive convex approximation (SCA) techniques. Numerical results indicate that the MA and RIS-aided ISAC system outperforms conventional fixed position antenna (FPA) and RIS-aided systems. In addition, the application of MAs can reduce the similarity of user channels and enhance channel gain in the ISAC system.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation
Authors:
Cheng-Yi Li,
Kao-Jung Chang,
Cheng-Fu Yang,
Hsin-Yu Wu,
Wenting Chen,
Hritik Bansal,
Ling Chen,
Yi-** Yang,
Yu-Chun Chen,
Shih-Pin Chen,
Jiing-Feng Lirng,
Kai-Wei Chang,
Shih-Hwa Chiou
Abstract:
Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, includin…
▽ More
Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, including (1) data complexity, (2) model capacity, and (3) evaluation metric fidelity, we collected an 18,885 text-scan pairs 3D-BrainCT dataset and applied clinical visual instruction tuning (CVIT) to train BrainGPT models to generate radiology-adherent 3D brain CT reports. Statistically, our BrainGPT scored BLEU-1 = 44.35, BLEU-4 = 20.38, METEOR = 30.13, ROUGE-L = 47.6, and CIDEr-R = 211.77 during internal testing and demonstrated an accuracy of 0.91 in captioning midline shifts on the external validation CQ500 dataset. By further inspecting the captioned report, we reported that the traditional metrics appeared to measure only the surface text similarity and failed to gauge the information density of the diagnostic purpose. To close this gap, we proposed a novel Feature-Oriented Radiology Task Evaluation (FORTE) to estimate the report's clinical relevance (lesion feature and landmarks). Notably, the BrainGPT model scored an average FORTE F1-score of 0.71 (degree=0.661; landmark=0.706; feature=0.693; impression=0.779). To demonstrate that BrainGPT models possess objective readiness to generate human-like radiology reports, we conducted a Turing test that enrolled 11 physician evaluators, and around 74% of the BrainGPT-generated captions were indistinguishable from those written by humans. Our work embodies a holistic framework that showcased the first-hand experience of curating a 3D brain CT dataset, fine-tuning anatomy-sensible language models, and proposing robust radiology evaluation metrics.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Separation property and asymptotic behavior for a transmission problem of the bulk-surface coupled Cahn-Hilliard system with singular potentials and its Robin approximation
Authors:
Maoyin Lv,
Hao Wu
Abstract:
We consider a class of bulk-surface coupled Cahn-Hilliard systems in a smooth, bounded domain $Ω\subset\mathbb{R}^{d}$ $(d\in\{2,3\})$, where the trace value of the bulk phase variable is connected to the surface phase variable via a Dirichlet boundary condition or its Robin approximation. For a general class of singular potentials (including the physically relevant logarithmic potential), we esta…
▽ More
We consider a class of bulk-surface coupled Cahn-Hilliard systems in a smooth, bounded domain $Ω\subset\mathbb{R}^{d}$ $(d\in\{2,3\})$, where the trace value of the bulk phase variable is connected to the surface phase variable via a Dirichlet boundary condition or its Robin approximation. For a general class of singular potentials (including the physically relevant logarithmic potential), we establish the regularity propagation of global weak solutions to the initial boundary value problem. In particular, when the spatial dimension is two, we prove the instantaneous strict separation property, which ensures that every global weak solution remains uniformly away from the pure states $\pm 1$ after any given positive time. In the three-dimensional case, we obtain the eventual strict separation property that holds for sufficiently large time. This strict separation property allows us to prove that every global weak solution converges to a single equilibrium as time goes to infinity using the Łojasiewicz-Simon approach. Finally, we study the double obstacle limit for the problem with logarithmic potentials in the bulk and on the boundary, showing that as the absolute temperature $Θ$ tends to zero, the corresponding weak solutions converge (for a suitable subsequence) to a weak solution of the problem with a double obstacle potential.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A Fingerprint for Large Language Models
Authors:
Zhiguang Yang,
Hanzhou Wu
Abstract:
Recent advances show that scaling a pre-trained language model could achieve state-of-the-art performance on many downstream tasks, prompting large language models (LLMs) to become a hot research topic in the field of artificial intelligence. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against in…
▽ More
Recent advances show that scaling a pre-trained language model could achieve state-of-the-art performance on many downstream tasks, prompting large language models (LLMs) to become a hot research topic in the field of artificial intelligence. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against infringement. This has motivated the authors in this paper to propose a novel black-box fingerprinting technique for LLMs, which requires neither model training nor model fine-tuning. We first demonstrate that the outputs of LLMs span a unique vector space associated with each model. We model the problem of ownership authentication as the task of evaluating the similarity between the victim model's space and the output's space of the suspect model. To deal with this problem, we propose two solutions, where the first solution involves verifying whether the outputs of the suspected large model are in the same space as those of the victim model, enabling rapid identification of model infringement, and the second one reconstructs the union of the vector spaces for LLM outputs and the victim model to address situations where the victim model has undergone the Parameter-Efficient Fine-Tuning (PEFT) attacks. Experimental results indicate that the proposed technique achieves superior performance in ownership verification and robustness against PEFT attacks. This work reveals inherent characteristics of LLMs and provides a promising solution for ownership verification of LLMs in black-box scenarios, ensuring efficiency, generality and practicality.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
$\text{Memory}^3$: Language Modeling with Explicit Memory
Authors:
Hongkang Yang,
Zehao Lin,
Wen** Wang,
Hao Wu,
Zhiyu Li,
Bo Tang,
Wenqiang Wei,
**bo Wang,
Zeyun Tang,
Shichao Song,
Chenyang Xi,
Yu Yu,
Kai Chen,
Feiyu Xiong,
Linpeng Tang,
Weinan E
Abstract:
The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equip** LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled…
▽ More
The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equip** LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A Review of Image Processing Methods in Prostate Ultrasound
Authors:
Haiqiao Wang,
Hong Wu,
Zhuoyuan Wang,
Peiyan Yue,
Dong Ni,
Pheng-Ann Heng,
Yi Wang
Abstract:
Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in T…
▽ More
Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in TRUS have been proposed and achieved state-of-the-art performance in several tasks, including prostate gland segmentation, prostate image registration, PCa classification and detection, and interventional needle detection.The rapid development of these algorithms over the past two decades necessitates a comprehensive summary. In consequence, this survey provides a systematic analysis of this field, outlining the evolution of image processing methods in the context of TRUS image analysis and meanwhile highlighting their relevant contributions. Furthermore, this survey discusses current challenges and suggests future research directions to possibly advance this field further.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Nonlinear Craig Interpolant Generation over Unbounded Domains by Separating Semialgebraic Sets
Authors:
Hao Wu,
Jie Wang,
Bican Xia,
Xiakun Li,
Naijun Zhan,
Ting Gan
Abstract:
Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques.
In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essenti…
▽ More
Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques.
In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essentially corresponding to the underlying mathematical problem to separate two disjoint semialgebraic sets. By combining the homogenization approach with existing techniques, we prove the existence of a novel class of non-polynomial interpolants called semialgebraic interpolants. These semialgebraic interpolants subsume polynomial interpolants as a special case. To the best of our knowledge, this is the first existence result of this kind. Furthermore, we provide complete sum-of-squares characterizations for both polynomial and semialgebraic interpolants, which can be efficiently solved as semidefinite programs. Examples are provided to demonstrate the effectiveness and efficiency of our approach.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Parametric Primitive Analysis of CAD Sketches with Vision Transformer
Authors:
Xiaogang Wang,
Liang Wang,
Hongyu Wu,
Guoqiang Xiao,
Kai Xu
Abstract:
The design and analysis of Computer-Aided Design (CAD) sketches play a crucial role in industrial product design, primarily involving CAD primitives and their inter-primitive constraints. To address challenges related to error accumulation in autoregressive models and the complexities associated with self-supervised model design for this task, we propose a two-stage network framework. This framewo…
▽ More
The design and analysis of Computer-Aided Design (CAD) sketches play a crucial role in industrial product design, primarily involving CAD primitives and their inter-primitive constraints. To address challenges related to error accumulation in autoregressive models and the complexities associated with self-supervised model design for this task, we propose a two-stage network framework. This framework consists of a primitive network and a constraint network, transforming the sketch analysis task into a set prediction problem to enhance the effective handling of primitives and constraints. By decoupling target types from parameters, the model gains increased flexibility and optimization while reducing complexity. Additionally, the constraint network incorporates a pointer module to explicitly indicate the relationship between constraint parameters and primitive indices, enhancing interpretability and performance. Qualitative and quantitative analyses on two publicly available datasets demonstrate the superiority of this method.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
A compositional ordering-driven morphotropic phase boundary in ferroelectric solid solutions
Authors:
Yubai Shi,
Yifan Shan,
Hongyu Wu,
Zhicheng Zhong,
Ri He,
Run-Wei Li
Abstract:
Ferroelectric solid solutions usually exhibit giant dielectric response and high piezoelectricity in the vicinity of the morphotropic phase boundary (MPB), where the structural phase transitions between the rhombohedral and the tetragonal phases as a result of the composition or strain variation. Here, we propose a compositional ordering-driven MPB in the specified compositional solid solutions. B…
▽ More
Ferroelectric solid solutions usually exhibit giant dielectric response and high piezoelectricity in the vicinity of the morphotropic phase boundary (MPB), where the structural phase transitions between the rhombohedral and the tetragonal phases as a result of the composition or strain variation. Here, we propose a compositional ordering-driven MPB in the specified compositional solid solutions. By preforming machine-learning potential based molecular dynamics simulations on lead zirconate titanate, we find a phase transition from the rhombohedral to tetragonal phase with the decrease of compositional ordering, leading to the MPB on temperature-ordering phase diagram. The compositional ordering-driven MPB can enhances the piezoelectricity with a magnitude comparable to that at the composition-driven MPB. Finally, we demonstrate that the mechanism of high piezoelectricity is polarization rotation driven by external field. This work provides an additional degree of freedom, compositional ordering, to design the high-performance piezoelectric materials.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
Uncovering cognitive taskonomy through transfer learning in masked autoencoder-based fMRI reconstruction
Authors:
Youzhi Qu,
Junfeng Xia,
Xinyao Jian,
Wendu Li,
Kaining Peng,
Zhichao Liang,
Haiyan Wu,
Quanying Liu
Abstract:
Data reconstruction is a widely used pre-training task to learn the generalized features for many downstream tasks. Although reconstruction tasks have been applied to neural signal completion and denoising, neural signal reconstruction is less studied. Here, we employ the masked autoencoder (MAE) model to reconstruct functional magnetic resonance imaging (fMRI) data, and utilize a transfer learnin…
▽ More
Data reconstruction is a widely used pre-training task to learn the generalized features for many downstream tasks. Although reconstruction tasks have been applied to neural signal completion and denoising, neural signal reconstruction is less studied. Here, we employ the masked autoencoder (MAE) model to reconstruct functional magnetic resonance imaging (fMRI) data, and utilize a transfer learning framework to obtain the cognitive taskonomy, a matrix to quantify the similarity between cognitive tasks. Our experimental results demonstrate that the MAE model effectively captures the temporal dynamics patterns and interactions within the brain regions, enabling robust cross-subject fMRI signal reconstruction. The cognitive taskonomy derived from the transfer learning framework reveals the relationships among cognitive tasks, highlighting subtask correlations within motor tasks and similarities between emotion, social, and gambling tasks. Our study suggests that the fMRI reconstruction with MAE model can uncover the latent representation and the obtained taskonomy offers guidance for selecting source tasks in neural decoding tasks for improving the decoding performance on target tasks.
△ Less
Submitted 24 May, 2024;
originally announced July 2024.
-
xTower: A Multilingual LLM for Explaining and Correcting Translation Errors
Authors:
Marcos Treviso,
Nuno M. Guerreiro,
Sweta Agrawal,
Ricardo Rei,
José Pombal,
Tania Vaz,
Helena Wu,
Beatriz Silva,
Daan van Stigt,
André F. T. Martins
Abstract:
While machine translation (MT) systems are achieving increasingly strong performance on benchmarks, they often produce translations with errors and anomalies. Understanding these errors can potentially help improve the translation quality and user experience. This paper introduces xTower, an open large language model (LLM) built on top of TowerBase designed to provide free-text explanations for tr…
▽ More
While machine translation (MT) systems are achieving increasingly strong performance on benchmarks, they often produce translations with errors and anomalies. Understanding these errors can potentially help improve the translation quality and user experience. This paper introduces xTower, an open large language model (LLM) built on top of TowerBase designed to provide free-text explanations for translation errors in order to guide the generation of a corrected translation. The quality of the generated explanations by xTower are assessed via both intrinsic and extrinsic evaluation. We ask expert translators to evaluate the quality of the explanations across two dimensions: relatedness towards the error span being explained and helpfulness in error understanding and improving translation quality. Extrinsically, we test xTower across various experimental setups in generating translation corrections, demonstrating significant improvements in translation quality. Our findings highlight xTower's potential towards not only producing plausible and helpful explanations of automatic translations, but also leveraging them to suggest corrected translations.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Hole probabilities of random zeros on compact Riemann surfaces
Authors:
Hao Wu,
Song-Yan Xie
Abstract:
We establish a convergence speed estimate for hole probabilities of zeros of random holomorphic sections on compact Riemann surfaces.
We establish a convergence speed estimate for hole probabilities of zeros of random holomorphic sections on compact Riemann surfaces.
△ Less
Submitted 1 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Fudan Multi-purpose Active TArget Time Projection Chamber (fMeta-TPC) for Photonnuclear Reaction Experiments
Authors:
Huang-Kai Wu,
Xi-Yang Wang,
Yu-Miao Wang,
You-**g Wang,
De-Qing Fang,
Wan-Bing He,
Wei-Hu Ma,
Xi-Guang Cao,
Chang-Bo Fu,
Xian-Gai Deng,
Yu-Gang Ma
Abstract:
Active Target Time Projection Chambers (AT-TPCs) are state-of-the-art tools in the field of low-energy nuclear physics, particularly suitable for experiments using low-intensity radioactive ion beams or gamma rays. The Fudan Multi-purpose Active Target Time Projection Chamber (fMeta-TPC) with 2048 channels has been developed to study $α$-clustering nuclei. {\fcb In this work, the focus is on the s…
▽ More
Active Target Time Projection Chambers (AT-TPCs) are state-of-the-art tools in the field of low-energy nuclear physics, particularly suitable for experiments using low-intensity radioactive ion beams or gamma rays. The Fudan Multi-purpose Active Target Time Projection Chamber (fMeta-TPC) with 2048 channels has been developed to study $α$-clustering nuclei. {\fcb In this work, the focus is on the study of the photonuclear reaction with the Laser Compton Scattering (LCS) gamma source, especially for the decay of the highly excited $α$-cluster state.} The design of fMeta-TPC is described and a comprehensive evaluation of its offline performance is performed by ultraviolet (UV) laser and $^{241}$Am $α$ source. The result shows that the intrinsic angular resolution of the detector is within 0.30$^{\circ}$ and has an energy resolution of 6.85\% for 3.0 MeV $α$ particles. The gain uniformity of the detector is about 10\% (RMS/Mean), tested by the $^{55}$Fe X-ray source.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
MatchTime: Towards Automatic Soccer Game Commentary Generation
Authors:
Jiayuan Rao,
Haoning Wu,
Chang Liu,
Yanfeng Wang,
Weidi Xie
Abstract:
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for…
▽ More
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
Authors:
Xuyang Wu,
Yuan Wang,
Hsin-Tai Wu,
Zhiqiang Tao,
Yi Fang
Abstract:
Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, and age. In this paper, we empirically investigate \emph{visual fairness} in several mainstream…
▽ More
Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, and age. In this paper, we empirically investigate \emph{visual fairness} in several mainstream LVLMs and audit their performance disparities across sensitive demographic attributes, based on public fairness benchmark datasets (e.g., FACET). To disclose the visual bias in LVLMs, we design a fairness evaluation framework with direct questions and single-choice question-instructed prompts on visual question-answering/classification tasks. The zero-shot prompting results indicate that, despite enhancements in visual understanding, both open-source and closed-source LVLMs exhibit prevalent fairness issues across different instruct prompts and demographic attributes.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Automatic Parameter Tuning of Self-Driving Vehicles
Authors:
Hung-Ju Wu,
Vladislav Nenchev,
Christian Rathgeber
Abstract:
Modern automated driving solutions utilize trajectory planning and control components with numerous parameters that need to be tuned for different driving situations and vehicle types to achieve optimal performance. This paper proposes a method to automatically tune such parameters to resemble expert demonstrations. We utilize a cost function which captures deviations of the closed-loop operation…
▽ More
Modern automated driving solutions utilize trajectory planning and control components with numerous parameters that need to be tuned for different driving situations and vehicle types to achieve optimal performance. This paper proposes a method to automatically tune such parameters to resemble expert demonstrations. We utilize a cost function which captures deviations of the closed-loop operation of the controller from the recorded desired driving behavior. Parameter tuning is then accomplished by using local optimization techniques. Three optimization alternatives are compared in a case study, where a trajectory planner is tuned for lane following in a real-world driving scenario. The results suggest that the proposed approach improves manually tuned initial parameters significantly even with respect to noisy demonstration data.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Indications of superconductivities in blend of variant apatite and covellite
Authors:
Hongyang Wang,
Yi**g Zhao,
Hao Wu,
Ling Wang,
Zhixing Wu,
Zhihui Geng,
Jiewen Xiao,
Weiwei Xue,
Shufeng Ye,
Ning Chen,
Xianfeng Qiao,
Yao Yao
Abstract:
Through heavily do** sulfur into an apatite framework, we synthesize a new blend mainly comprising variant apatite and covellite (copper sulfide). Magnetic measurement exhibits that significant diamagnetism appears at around 260 K and drops dramatically below 30 K implying coexistence of two superconducting phases. The upper critical magnetic field is larger than 1000 Oe at 250 K. Electric measu…
▽ More
Through heavily do** sulfur into an apatite framework, we synthesize a new blend mainly comprising variant apatite and covellite (copper sulfide). Magnetic measurement exhibits that significant diamagnetism appears at around 260 K and drops dramatically below 30 K implying coexistence of two superconducting phases. The upper critical magnetic field is larger than 1000 Oe at 250 K. Electric measurement manifests that the current-voltage curves deviate from the normal linear lineshape suggesting the presence of zero-resistance effect, and the critical current is around 50 $μ$A at 140 K. These exotic magnetic and electric features strongly indicate these two components, variant apatite and covellite, individually trigger two superconducting phases at near-room and low temperatures.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Probing the nature of the $χ_{c1}(3872)$ state using radiative decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1094 additional authors not shown)
Abstract:
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and…
▽ More
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}}
{Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling
Authors:
Jian Yang,
Jiakun Li,
Guoming Li,
Zhen Shen,
Huai-Yu Wu,
Zhaoxin Fan,
Heng Huang
Abstract:
Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo…
▽ More
Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed for real-time multi-view single hand reconstruction. MLP Hand consists of two primary modules: (1) a lightweight MLP-based Skeleton2Mesh model that efficiently recovers hand meshes from hand skeletons, and (2) a multi-view geometry feature fusion prediction module that enhances the Skeleton2Mesh model with detailed geometric information from multiple views. Experiments on three widely used datasets demonstrate that MLPHand can reduce computational complexity by 90% while achieving comparable reconstruction accuracy to existing state-of-the-art baselines.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction
Authors:
**ge Wu,
Zhaolong Wu,
Abul Hasan,
Yunsoo Kim,
Jason P. Y. Cheung,
Teng Zhang,
Honghan Wu
Abstract:
This study proposes an approach for error correction in clinical radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs internal and external retrieval mechanisms to extract relevant medical entities and relations from the report and external knowledge sources. A three-stage inference process is introduced, dec…
▽ More
This study proposes an approach for error correction in clinical radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs internal and external retrieval mechanisms to extract relevant medical entities and relations from the report and external knowledge sources. A three-stage inference process is introduced, decomposing the task into error detection, localization, and correction subtasks, which enhances the explainability and performance of the system. The effectiveness of the approach is evaluated using a benchmark dataset created by corrupting real-world radiology reports with realistic errors, guided by domain experts. Experimental results demonstrate the benefits of the proposed methods, with the combination of internal and external retrieval significantly improving the accuracy of error detection, localization, and correction across various state-of-the-art LLMs. The findings contribute to the development of more robust and reliable error correction systems for clinical documentation.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
IRASim: Learning Interactive Real-Robot Action Simulators
Authors:
Fangqi Zhu,
Hongtao Wu,
Song Guo,
Yuxiao Liu,
Chilam Cheang,
Tao Kong
Abstract:
Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate ext…
▽ More
Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate extremely realistic videos of a robot arm that executes a given action trajectory, starting from an initial given frame. To validate the effectiveness of our method, we create a new benchmark, IRASim Benchmark, based on three real-robot datasets and perform extensive experiments on the benchmark. Results show that IRASim outperforms all the baseline methods and is more preferable in human evaluations. We hope that IRASim can serve as an effective and scalable approach to enhance robot learning in the real world. To promote research for generative real-robot action simulators, we open-source code, benchmark, and checkpoints at https: //gen-irasim.github.io.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Infusing clinical knowledge into tokenisers for language models
Authors:
Abul Hasan,
**ge Wu,
Quang Ngoc Nguyen,
Salomé Andres,
Imane Guellil,
Huayu Zhang,
Arlene Casey,
Beatrice Alex,
Bruce Guthrie,
Honghan Wu
Abstract:
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At t…
▽ More
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At training or inference stage, sentence level localised context will be utilised for choosing the optimal global token representation to realise the semantic-based tokenisation. To avoid pretraining using the new tokeniser, an embedding initialisation approach is proposed to generate representations for new tokens. Using three transformer-based language models, a comprehensive set of experiments are conducted on four real-world datasets for evaluating K-Tokeniser in a wide range of clinical text analytics tasks including clinical concept and relation extraction, automated clinical coding, clinical phenotype identification, and clinical research article classification. Overall, our models demonstrate consistent improvements over their counterparts in all tasks. In particular, substantial improvements are observed in the automated clinical coding task with 13\% increase on Micro $F_1$ score. Furthermore, K-Tokeniser also shows significant capacities in facilitating quicker converge of language models. Specifically, using K-Tokeniser, the language models would only require 50\% of the training data to achieve the best performance of the baseline tokeniser using all training data in the concept extraction task and less than 20\% of the data for the automated coding task. It is worth mentioning that all these improvements require no pre-training process, making the approach generalisable.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound
Authors:
Hong Wu,
Juan Fu,
Hongsheng Ye,
Yuming Zhong,
Xuebin Zou,
Jianhua Zhou,
Yi Wang
Abstract:
Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica…
▽ More
Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classification using multi-modality TRUS. The proposed framework employs two separate 3D ResNet-50 to extract distinctive features from B-mode and shear wave elastography (SWE). Additionally, an attention module is incorporated to effectively refine B-mode features and aggregate the extracted features from both modalities. Furthermore, we utilize few shot segmentation task to enhance the capacity of classification encoder. Due to the limited availability of csPCa masks, a prototype correction module is employed to extract representative prototypes of csPCa. The performance of the framework is assessed on a large-scale dataset consisting of 512 TRUS videos with biopsy-proved prostate cancer. The results demonstrate the strong capability in accurately identifying csPCa, achieving an area under the curve (AUC) of 0.86. Moreover, the framework generates visual class activation map** (CAM), which can serve as valuable assistance for localizing csPCa. These CAM images may offer valuable guidance during TRUS-guided targeted biopsies, enhancing the efficacy of the biopsy procedure.The code is available at https://github.com/2313595986/SmileCode.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language
Authors:
Shun Wang,
Ge Zhang,
Han Wu,
Tyler Loakman,
Wenhao Huang,
Chenghua Lin
Abstract:
Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper…
▽ More
Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper, we investigate the figurative quality of MT and propose a set of human evaluation metrics focused on the translation of figurative language. We additionally present a multilingual parallel metaphor corpus generated by post-editing. Our evaluation protocol is designed to estimate four aspects of MT: Metaphorical Equivalence, Emotion, Authenticity, and Quality. In doing so, we observe that translations of figurative expressions display different traits from literal ones.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
XENONnT WIMP Search: Signal & Background Modeling and Statistical Inference
Authors:
XENON Collaboration,
E. Aprile,
J. Aalbers,
K. Abe,
S. Ahmed Maouloud,
L. Althueser,
B. Andrieu,
E. Angelino,
D. Antón Martin,
F. Arneodo,
L. Baudis,
M. Bazyk,
L. Bellagamba,
R. Biondi,
A. Bismark,
K. Boese,
A. Brown,
G. Bruno,
R. Budnik,
J. M. R. Cardoso,
A. P. Cimental Chávez,
A. P. Colijn,
J. Conrad,
J. J. Cuenca-García,
V. D'Andrea
, et al. (139 additional authors not shown)
Abstract:
The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 t…
▽ More
The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 tonne-years yielded no signal excess over background expectations, from which competitive exclusion limits were derived on WIMP-nucleon elastic scatter cross sections, for WIMP masses ranging from 6 GeV/$c^2$ up to the TeV/$c^2$ scale. This work details the modeling and statistical methods employed in this search. By means of calibration data, we model the detector response, which is then used to derive background and signal models. The construction and validation of these models is discussed, alongside additional purely data-driven backgrounds. We also describe the statistical inference framework, including the definition of the likelihood function and the construction of confidence intervals.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Transferable Watermarking to Self-supervised Pre-trained Graph Encoders by Trigger Embeddings
Authors:
Xiangyu Zhao,
Hanzhou Wu,
Xinpeng Zhang
Abstract:
Recent years have witnessed the prosperous development of Graph Self-supervised Learning (GSSL), which enables to pre-train transferable foundation graph encoders. However, the easy-to-plug-in nature of such encoders makes them vulnerable to copyright infringement. To address this issue, we develop a novel watermarking framework to protect graph encoders in GSSL settings. The key idea is to force…
▽ More
Recent years have witnessed the prosperous development of Graph Self-supervised Learning (GSSL), which enables to pre-train transferable foundation graph encoders. However, the easy-to-plug-in nature of such encoders makes them vulnerable to copyright infringement. To address this issue, we develop a novel watermarking framework to protect graph encoders in GSSL settings. The key idea is to force the encoder to map a set of specially crafted trigger instances into a unique compact cluster in the outputted embedding space during model pre-training. Consequently, when the encoder is stolen and concatenated with any downstream classifiers, the resulting model inherits the backdoor of the encoder and predicts the trigger instances to be in a single category with high probability regardless of the ground truth. Experimental results have shown that, the embedded watermark can be transferred to various downstream tasks in black-box settings, including node classification, link prediction and community detection, which forms a reliable watermark verification system for GSSL in reality. This approach also shows satisfactory performance in terms of model fidelity, reliability and robustness.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Adversarial Attacks on Multimodal Agents
Authors:
Chen Henry Wu,
**g Yu Koh,
Ruslan Salakhutdinov,
Daniel Fried,
Aditi Raghunathan
Abstract:
Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments. In this paper, we show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment. Our attacks use adversarial text strings to guide gradient-base…
▽ More
Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments. In this paper, we show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment. Our attacks use adversarial text strings to guide gradient-based perturbation over one trigger image in the environment: (1) our captioner attack attacks white-box captioners if they are used to process images into captions as additional inputs to the VLM; (2) our CLIP attack attacks a set of CLIP models jointly, which can transfer to proprietary VLMs. To evaluate the attacks, we curated VisualWebArena-Adv, a set of adversarial tasks based on VisualWebArena, an environment for web-based multimodal agent tasks. Within an L-infinity norm of $16/256$ on a single image, the captioner attack can make a captioner-augmented GPT-4V agent execute the adversarial goals with a 75% success rate. When we remove the captioner or use GPT-4V to generate its own captions, the CLIP attack can achieve success rates of 21% and 43%, respectively. Experiments on agents based on other VLMs, such as Gemini-1.5, Claude-3, and GPT-4o, show interesting differences in their robustness. Further analysis reveals several key factors contributing to the attack's success, and we also discuss the implications for defenses as well. Project page: https://chenwu.io/attack-agent Code and data: https://github.com/ChenWu98/agent-attack
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Sample-Based Matroid Prophet Inequalities
Authors:
Hu Fu,
Pinyan Lu,
Zhihao Gavin Tang,
Hongxun Wu,
**zhao Wu,
Qianfan Zhang
Abstract:
We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way)…
▽ More
We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way) connections with the long-standing matroid secretary conjecture.
In this work, we give a $(\frac14 - \varepsilon)$-competitive matroid prophet inequality with only $O_\varepsilon(\mathrm{poly} \log n)$ samples. Our algorithm consists of two parts: (i) a novel quantile-based reduction from matroid prophet inequalities to online contention resolution schemes (OCRSs) with $O_\varepsilon(\log n)$ samples, and (ii) a $(\frac14 - \varepsilon)$-selectable matroid OCRS with $O_\varepsilon(\mathrm{poly} \log n)$ samples which carefully addresses an adaptivity challenge.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Cephalometric Landmark Detection across Ages with Prototypical Network
Authors:
Han Wu,
Chong Wang,
Lanzhuju Mei,
Tong Yang,
Min Zhu,
Dingggang Shen,
Zhiming Cui
Abstract:
Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across…
▽ More
Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across various age groups, including adolescents and adults. In this paper, we propose CeLDA, the first work for Cephalometric Landmark Detection across Ages. Our method leverages a prototypical network for landmark detection by comparing image features with landmark prototypes. To tackle the appearance discrepancy of landmarks between age groups, we design new strategies for CeLDA to improve prototype alignment and obtain a holistic estimation of landmark prototypes from a large set of training images. Moreover, a novel prototype relation mining paradigm is introduced to exploit the anatomical relations between the landmark prototypes. Extensive experiments validate the superiority of CeLDA in detecting cephalometric landmarks on both adult and adolescent subjects. To our knowledge, this is the first effort toward develo** a unified solution and dataset for cephalometric landmark detection across age groups. Our code and dataset will be made public on https://github.com/ShanghaiTech-IMPACT/Cephalometric-Landmark-Detection-across-Ages-with-Prototypical-Network
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Authors:
Haoze Wu,
Zihan Qiu,
Zili Wang,
Hang Zhao,
Jie Fu
Abstract:
Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty c…
▽ More
Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty can lead to incorrect selections. Inspired by the Global Workspace Theory (GWT), we propose a new fine-tuning method, GW-MoE, to address this issue. The core idea is to broadcast the uncertain tokens across experts during fine-tuning. Therefore, these tokens can acquire the necessary knowledge from any expert during inference and become less sensitive to the choice. GW-MoE does not introduce additional inference overhead. We validate that GW can mitigate the uncertain problem and consistently improve in different tasks (text classification, question answering, summarization, code generation, and mathematical problem solving) and model sizes (650M and 8B parameters).
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Precision measurement of the $Ξ^-_b$ baryon lifetime
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1064 additional authors not shown)
Abstract:
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys…
▽ More
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second systematic. This value is averaged with the corresponding value from Run 1 to obtain ${r_τ^{\rm Run\,1,2} = 1.078\pm0.012\pm0.007}$. Multiplying by the world-average value of the $Λ^0_b$ lifetime yields $τ_{Ξ^-_b}^{\rm Run~1,2} = 1.578\pm0.018\pm0.010\pm0.011$ ps, where the uncertainties are statistical, systematic, and due to the limited knowledge of the $Λ^0_b$ lifetime. This measurement improves the precision of the current world average of the $Ξ^-_b$ lifetime by about a factor of two, and is in good agreement with the most recent theoretical predictions.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter
Authors:
M. Aamir,
B. Acar,
G. Adamov,
T. Adams,
C. Adloff,
S. Afanasiev,
C. Agrawal,
C. Agrawal,
A. Ahmad,
H. A. Ahmed,
S. Akbar,
N. Akchurin,
B. Akgul,
B. Akgun,
R. O. Akpinar,
E. Aktas,
A. AlKadhim,
V. Alexakhin,
J. Alimena,
J. Alison,
A. Alpana,
W. Alshehri,
P. Alvarez Dominguez,
M. Alyari,
C. Amendola
, et al. (550 additional authors not shown)
Abstract:
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr…
▽ More
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields
Authors:
Yixiong Yang,
Shilin Hu,
Haoyu Wu,
Ramon Baldrich,
Dimitris Samaras,
Maria Vanrell
Abstract:
The task of extracting intrinsic components, such as reflectance and shading, from neural radiance fields is of growing interest. However, current methods largely focus on synthetic scenes and isolated objects, overlooking the complexities of real scenes with backgrounds. To address this gap, our research introduces a method that combines relighting with intrinsic decomposition. By leveraging ligh…
▽ More
The task of extracting intrinsic components, such as reflectance and shading, from neural radiance fields is of growing interest. However, current methods largely focus on synthetic scenes and isolated objects, overlooking the complexities of real scenes with backgrounds. To address this gap, our research introduces a method that combines relighting with intrinsic decomposition. By leveraging light variations in scenes to generate pseudo labels, our method provides guidance for intrinsic decomposition without requiring ground truth data. Our method, grounded in physical constraints, ensures robustness across diverse scene types and reduces the reliance on pre-trained models or hand-crafted priors. We validate our method on both synthetic and real-world datasets, achieving convincing results. Furthermore, the applicability of our method to image editing tasks demonstrates promising outcomes.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
A Unified Graph Selective Prompt Learning for Graph Neural Networks
Authors:
Bo Jiang,
Hao Wu,
Ziyan Zhang,
Beibei Wang,
** Tang
Abstract:
In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to mod…
▽ More
In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to modify the input graph data by adding some (learnable) prompt vectors into graph node features to better align with the downstream tasks on the smaller dataset. However, existing GPFs generally suffer from two main limitations. First, GPFs generally focus on node prompt learning which ignore the prompting for graph edges. Second, existing GPFs generally conduct the prompt learning on all nodes equally which fails to capture the importances of different nodes and may perform sensitively w.r.t noisy nodes in aligning with the downstream tasks. To address these issues, in this paper, we propose a new unified Graph Selective Prompt Feature learning (GSPF) for GNN fine-tuning. The proposed GSPF integrates the prompt learning on both graph node and edge together, which thus provides a unified prompt model for the graph data. Moreover, it conducts prompt learning selectively on nodes and edges by concentrating on the important nodes and edges for prompting which thus make our model be more reliable and compact. Experimental results on many benchmark datasets demonstrate the effectiveness and advantages of the proposed GSPF method.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
Authors:
Chunyi Li,
Xiele Wu,
Haoning Wu,
Donghui Feng,
Zicheng Zhang,
Guo Lu,
Xiongkuo Min,
Xiaohong Liu,
Guangtao Zhai,
Weisi Lin
Abstract:
Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in…
▽ More
Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Chain-of-Though (CoT) prompting strategies for medical error detection and correction
Authors:
Zhaolong Wu,
Abul Hasan,
**ge Wu,
Yunsoo Kim,
Jason P. Y. Cheung,
Teng Zhang,
Honghan Wu
Abstract:
This paper describes our submission to the MEDIQA-CORR 2024 shared task for automatically detecting and correcting medical errors in clinical notes. We report results for three methods of few-shot In-Context Learning (ICL) augmented with Chain-of-Thought (CoT) and reason prompts using a large language model (LLM). In the first method, we manually analyse a subset of train and validation dataset to…
▽ More
This paper describes our submission to the MEDIQA-CORR 2024 shared task for automatically detecting and correcting medical errors in clinical notes. We report results for three methods of few-shot In-Context Learning (ICL) augmented with Chain-of-Thought (CoT) and reason prompts using a large language model (LLM). In the first method, we manually analyse a subset of train and validation dataset to infer three CoT prompts by examining error types in the clinical notes. In the second method, we utilise the training dataset to prompt the LLM to deduce reasons about their correctness or incorrectness. The constructed CoTs and reasons are then augmented with ICL examples to solve the tasks of error detection, span identification, and error correction. Finally, we combine the two methods using a rule-based ensemble method. Across the three sub-tasks, our ensemble method achieves a ranking of 3rd for both sub-task 1 and 2, while securing 7th place in sub-task 3 among all submissions.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
An alternative to purification in CFT
Authors:
Xin Jiang,
Peng Wang,
Houwen Wu,
Haitang Yang
Abstract:
In conformal field theories, in contrast to \emph{adding} some auxiliary states into the bipartite mixed state $ρ_{AB}$ as the usual purifications do, we show a pure entangled state $ψ_{AB}$ can be constructed by \emph{subtracting} the undetectable regions. In this pure state $ψ_{AB}$, the von Neumann entropy $S_{\text{vN}}(A)$ naturally captures quantum entanglement between $A$ and $B$. We verify…
▽ More
In conformal field theories, in contrast to \emph{adding} some auxiliary states into the bipartite mixed state $ρ_{AB}$ as the usual purifications do, we show a pure entangled state $ψ_{AB}$ can be constructed by \emph{subtracting} the undetectable regions. In this pure state $ψ_{AB}$, the von Neumann entropy $S_{\text{vN}}(A)$ naturally captures quantum entanglement between $A$ and $B$. We verify that $S_{\text{vN}}(A)$ is equal to the entanglement wedge cross-section $E_{W}$ in AdS spacetime, which is conjectured to be the holographic dual of the entanglement of purification. We show such constructed entanglement entropy has a phase transition. The ordinary entanglement entropies of critical and non-critical QFTs are simply limits of the two phases.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals
Authors:
Qihan Hu,
Daomiao Wang,
Hong Wu,
Jian Liu,
Cuiwei Yang
Abstract:
The progression of deep learning and the widespread adoption of sensors have facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. However, prevalent MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation, disregarding the asynchronous nature of cardiovascular events and the inherent…
▽ More
The progression of deep learning and the widespread adoption of sensors have facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. However, prevalent MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation, disregarding the asynchronous nature of cardiovascular events and the inherent heterogeneity across views, leading to catastrophic view confusion. Efficient training strategies specifically tailored for MVF models to attain comprehensive representations need simultaneous consideration. Crucially, real-world data frequently arrives with incomplete views, an aspect rarely noticed by researchers. Thus, the View-Centric Transformer (VCT) and Multitask Masked Autoencoder (M2AE) are specifically designed to emphasize the centrality of each view and harness unlabeled data to achieve superior fused representations. Additionally, we systematically define the missing-view problem for the first time and introduce prompt techniques to aid pretrained MVF models in flexibly adapting to various missing-view scenarios. Rigorous experiments involving atrial fibrillation detection, blood pressure estimation, and sleep staging-typical health monitoring tasks-demonstrate the remarkable advantage of our method in MVF compared to prevailing methodologies. Notably, the prompt technique requires finetuning less than 3% of the entire model's data, substantially fortifying the model's resilience to view missing while circumventing the need for complete retraining. The results demonstrate the effectiveness of our approaches, highlighting their potential for practical applications in cardiovascular health monitoring. Codes and models are released at URL.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Numerical Insights into noise amplification of high-energy mid-infrared supercontinuum generation in normal dispersion multimode fibers
Authors:
Chaofan Yang,
Dian Duan,
Fan Zou,
Kuo Liu,
Ruibo **,
Zechuan Liu,
Haoyu Wu
Abstract:
We report on the noise properties of high-energy mid-infrared supercontinuum (MIR-SC) generation in normal dispersion multimode fibers from the numerical perspective. Noise amplification in multi-modes is primarily due to the stimulated Raman scattering (SRS) effect. This leads to the emergence of "incoherent cloud formation" and "incoherent optical wave breaking", similar to those observed in sin…
▽ More
We report on the noise properties of high-energy mid-infrared supercontinuum (MIR-SC) generation in normal dispersion multimode fibers from the numerical perspective. Noise amplification in multi-modes is primarily due to the stimulated Raman scattering (SRS) effect. This leads to the emergence of "incoherent cloud formation" and "incoherent optical wave breaking", similar to those observed in single-mode fibers. Increasing the pump technical noise from 0.1 % to 1 % significantly shortens the lumped coherence length L_C and exacerbates the influence of incoherent broadening dynamics competing with coherent dynamics, resulting in MIR-SC being a strong consistency in the collapse evolution of amplitude noise and phase coherence. To minimize this noise amplification and achieve high-energy low-noise MIR-SC in practical applications, it is essential to use short-pulse pum** with low amplitude noise, ensuring that L_C>>L_OWB (where L_OWB denotes the optical wave breaking length).
△ Less
Submitted 12 June, 2024;
originally announced June 2024.