-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Conformance Testing of Relational DBMS Against SQL Specifications
Authors:
Shuang Liu,
Chenglin Tian,
Jun Sun,
Ruifeng Wang,
Wei Lu,
Yongxin Zhao,
Yinxing Xue,
Junjie Wang,
Xiaoyong Du
Abstract:
A Relational Database Management System (RDBMS) is one of the fundamental software that supports a wide range of applications, making it critical to identify bugs within these systems. There has been active research on testing RDBMS, most of which employ crash or use metamorphic relations as the oracle. Although existing approaches can detect bugs in RDBMS, they are far from comprehensively evalua…
▽ More
A Relational Database Management System (RDBMS) is one of the fundamental software that supports a wide range of applications, making it critical to identify bugs within these systems. There has been active research on testing RDBMS, most of which employ crash or use metamorphic relations as the oracle. Although existing approaches can detect bugs in RDBMS, they are far from comprehensively evaluating the RDBMS's correctness (i.e., with respect to the semantics of SQL). In this work, we propose a method to test the semantic conformance of RDBMS i.e., whether its behavior respects the intended semantics of SQL. Specifically, we have formally defined the semantics of SQL and implemented them in Prolog. Then, the Prolog implementation serves as the reference RDBMS, enabling differential testing on existing RDBMS. We applied our approach to four widely-used and thoroughly tested RDBMSs, i.e., MySQL, TiDB, SQLite, and DuckDB. In total, our approach uncovered 19 bugs and 11 inconsistencies, which are all related to violating the SQL specification or missing/unclear specification, thereby demonstrating the effectiveness and applicability of our approach.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
**ming Guo,
Xiaolin Chen,
**gcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources…
▽ More
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.
△ Less
Submitted 30 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Multigrid preconditioning for discontinuous Galerkin discretizations of an elliptic optimal control problem with a convection-dominated state equation
Authors:
Si**g Liu,
Valeria Simoncini
Abstract:
We consider discontinuous Galerkin methods for an elliptic distributed optimal control problem constrained by a convection-dominated problem. We prove global optimal convergence rates using an inf-sup condition, with the diffusion parameter $\varepsilon$ and regularization parameter $β$ explicitly tracked. We then propose a multilevel preconditioner based on downwind ordering to solve the discreti…
▽ More
We consider discontinuous Galerkin methods for an elliptic distributed optimal control problem constrained by a convection-dominated problem. We prove global optimal convergence rates using an inf-sup condition, with the diffusion parameter $\varepsilon$ and regularization parameter $β$ explicitly tracked. We then propose a multilevel preconditioner based on downwind ordering to solve the discretized system. The preconditioner only requires two approximate solves of single convection-dominated equations using multigrid methods. Moreover, for the strongly convection-dominated case, only two sweeps of block Gauss-Seidel iterations are needed. We also derive a simple bound indicating the role played by the multigrid preconditioner. Numerical results are shown to support our findings.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Dense Outflowing Molecular Gas in Massive Star-forming Regions
Authors:
Yani Xu,
Junzhi Wang,
Shu Liu,
Juan Li,
Yuqiang LI,
Rui Luo,
Chao Ou,
Siqi Zheng,
Yijia Liu
Abstract:
Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) map** observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample o…
▽ More
Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) map** observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample of 33 massive star-forming regions using the 10-m Submillimeter Telescope (SMT). With the spatial distribution of line wings of HCO$^+$ 3-2 and HCN 3-2, outflows are detected in 25 sources, resulting in a detection rate of 76$\%$. The optically thin H$^{13}$CN and H$^{13}$CO$^+$ 3-2 lines are used to identify line wings as outflows and estimate core mass. The mass $M_{out}$, momentum $P_{out}$, kinetic energy $E_{K}$, force $F_{out}$ and mass loss rate $\dot M_{out}$ of outflow and core mass, are obtained for each source. A sublinear tight correlation is found between the mass of dense molecular outflow and core mass, with an index of $\sim$ 0.8 and a correlation coefficient of 0.88.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Improving Adversarial Robustness via Feature Pattern Consistency Constraint
Authors:
Jiacong Hu,
**gwen Ye,
Zunlei Feng,
Jiazhen Yang,
Shunyu Liu,
Xiaotian Yu,
Lingxiang Jia,
Mingli Song
Abstract:
Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate…
▽ More
Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference, inevitably increasing computational burdens. Conversely, clean training, which strengthens the model's robustness by relying solely on clean examples, can address the aforementioned issues. In this paper, we align with this methodological stream and enhance its generalizability to unknown adversarial examples. This enhancement is achieved by scrutinizing the behavior of latent features within the network. Recognizing that a correct prediction relies on the correctness of the latent feature's pattern, we introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern. Specifically, we propose Spatial-wise Feature Modification and Channel-wise Feature Selection to enhance latent features. Subsequently, we employ the Pattern Consistency Loss to constrain the similarity between the feature pattern of the latent features and the correct feature pattern. Our experiments demonstrate that the FPCC method empowers latent features to uphold correct feature patterns even in the face of adversarial examples, resulting in inherent adversarial robustness surpassing state-of-the-art models.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference
Authors:
Jiabao Ji,
Yujian Liu,
Yang Zhang,
Gaowen Liu,
Ramana Rao Kompella,
Sijia Liu,
Shiyu Chang
Abstract:
As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the oth…
▽ More
As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. ULD then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs. We show that such reversed objectives would naturally resolve both aforementioned challenges while significantly improving the training efficiency. Extensive experiments demonstrate that our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold. Notably, our method loses 0% of model utility on the ToFU benchmark, whereas baseline methods may sacrifice 17% of utility on average to achieve comparable forget quality. Our code will be publicly available at https://github.com/UCSB-NLP-Chang/ULD.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Authors:
Bing Han,
Long Zhou,
Shujie Liu,
Sanyuan Chen,
Lingwei Meng,
Yanming Qian,
Yanqing Liu,
Sheng Zhao,
**yu Li,
Furu Wei
Abstract:
With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h…
▽ More
With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings huge computational overhead to the inference process of autoregression. To address these issues, we propose VALL-E R, a robust and efficient zero-shot TTS system, building upon the foundation of VALL-E. Specifically, we introduce a phoneme monotonic alignment strategy to strengthen the connection between phonemes and acoustic sequence, ensuring a more precise alignment by constraining the acoustic tokens to match their associated phonemes. Furthermore, we employ a codec-merging approach to downsample the discrete codes in shallow quantization layer, thereby accelerating the decoding speed while preserving the high quality of speech output. Benefiting from these strategies, VALL-E R obtains controllablity over phonemes and demonstrates its strong robustness by approaching the WER of ground truth. In addition, it requires fewer autoregressive steps, with over 60% time reduction during inference. This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia. Audio samples will be available at: https://aka.ms/valler.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Label Smoothing Improves Machine Unlearning
Authors:
Zonglin Di,
Zhaowei Zhu,
**ghan Jia,
Jiancheng Liu,
Zafar Takhirov,
Bo Jiang,
Yuanshun Yao,
Sijia Liu,
Yang Liu
Abstract:
The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of…
▽ More
The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing. This work introduces UGradSL, a simple, plug-and-play MU approach that uses smoothed labels. We provide theoretical analyses demonstrating why properly introducing label smoothing improves MU performance. We conducted extensive experiments on six datasets of various sizes and different modalities, demonstrating the effectiveness and robustness of our proposed method. The consistent improvement in MU performance is only at a marginal cost of additional computations. For instance, UGradSL improves over the gradient ascent MU baseline by 66% unlearning accuracy without sacrificing unlearning efficiency.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
TextGrad: Automatic "Differentiation" via Text
Authors:
Mert Yuksekgonul,
Federico Bianchi,
Joseph Boen,
Sheng Liu,
Zhi Huang,
Carlos Guestrin,
James Zou
Abstract:
AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, develo** principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic different…
▽ More
AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, develo** principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
McEval: Massively Multilingual Code Evaluation
Authors:
Linzheng Chai,
Shukai Liu,
Jian Yang,
Yuwei Yin,
Ke **,
Jiaheng Liu,
Tao Sun,
Ge Zhang,
Changyu Ren,
Hongcheng Guo,
Zekun Wang,
Boyang Wang,
Xianjie Wu,
Bing Wang,
Tongliang Li,
Liqun Yang,
Sufeng Duan,
Zhoujun Li
Abstract:
Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited nu…
▽ More
Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited number of languages, where other languages are translated from the Python samples (e.g. MultiPL-E) degrading the data diversity. To further facilitate the research of code LLMs, we propose a massively multilingual code benchmark covering 40 programming languages (McEval) with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. The benchmark contains challenging code completion, understanding, and generation evaluation tasks with finely curated massively multilingual instruction corpora McEval-Instruct. In addition, we introduce an effective multilingual coder mCoder trained on McEval-Instruct to support multilingual programming language generation. Extensive experimental results on McEval show that there is still a difficult journey between open-source models and closed-source LLMs (e.g. GPT-series models) in numerous languages. The instruction corpora, evaluation benchmark, and leaderboard are available at \url{https://mceval.github.io/}.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Needle In A Multimodal Haystack
Authors:
Weiyun Wang,
Shuibo Zhang,
Yiming Ren,
Yuchen Duan,
Tiantong Li,
Shuo Liu,
Mengkang Hu,
Zhe Chen,
Kaipeng Zhang,
Lewei Lu,
Xizhou Zhu,
** Luo,
Yu Qiao,
Jifeng Dai,
Wenqi Shao,
Wenhai Wang
Abstract:
With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplored. In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capab…
▽ More
With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplored. In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents. Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning. In each task, the model is required to answer the questions according to different key information scattered throughout the given multimodal document. Evaluating the leading MLLMs on MM-NIAH, we observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation. We hope this work can provide a platform for further research on long multimodal document comprehension and contribute to the advancement of MLLMs. Code and benchmark are released at https://github.com/OpenGVLab/MM-NIAH.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
VoxNeuS: Enhancing Voxel-Based Neural Surface Reconstruction via Gradient Interpolation
Authors:
Sidun Liu,
Peng Qiao,
Zongxin Ye,
Wenyu Li,
Yong Dou
Abstract:
Neural Surface Reconstruction learns a Signed Distance Field~(SDF) to reconstruct the 3D model from multi-view images. Previous works adopt voxel-based explicit representation to improve efficiency. However, they ignored the gradient instability of interpolation in the voxel grid, leading to degradation on convergence and smoothness. Besides, previous works entangled the optimization of geometry a…
▽ More
Neural Surface Reconstruction learns a Signed Distance Field~(SDF) to reconstruct the 3D model from multi-view images. Previous works adopt voxel-based explicit representation to improve efficiency. However, they ignored the gradient instability of interpolation in the voxel grid, leading to degradation on convergence and smoothness. Besides, previous works entangled the optimization of geometry and radiance, which leads to the deformation of geometry to explain radiance, causing artifacts when reconstructing textured planes.
In this work, we reveal that the instability of gradient comes from its discontinuity during trilinear interpolation, and propose to use the interpolated gradient instead of the original analytical gradient to eliminate the discontinuity. Based on gradient interpolation, we propose VoxNeuS, a lightweight surface reconstruction method for computational and memory efficient neural surface reconstruction. Thanks to the explicit representation, the gradient of regularization terms, i.e. Eikonal and curvature loss, are directly solved, avoiding computation and memory-access overhead.
Further, VoxNeuS adopts a geometry-radiance disentangled architecture to handle the geometry deformation from radiance optimization.
The experimental results show that VoxNeuS achieves better reconstruction quality than previous works. The entire training process takes 15 minutes and less than 3 GB of memory on a single 2080ti GPU.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A Multi-Scale Boltzmann Equation for Complex Systems of Neutral Gases across All Flow Regimes
Authors:
Sha Liu,
Junzhe Cao,
Sirui Yang,
Chengwen Zhong
Abstract:
A Multi-scale Boltzmann Equation (MBE) is found from the gas-kinetic theory and the direct modeling philosophy as a master equation for complex physical systems of neutral gases across all flow regimes, which locates between the continuum limit and the free-molecular limit, covering a vast range of applications such as hypersonic flows over aerospace crafts and delicate flows around MEMS. The most…
▽ More
A Multi-scale Boltzmann Equation (MBE) is found from the gas-kinetic theory and the direct modeling philosophy as a master equation for complex physical systems of neutral gases across all flow regimes, which locates between the continuum limit and the free-molecular limit, covering a vast range of applications such as hypersonic flows over aerospace crafts and delicate flows around MEMS. The most explicit characteristic of MBE is evolving the variable observation time in the expression, which distinguishes the MBE from the single-scale master or governing equation where a fixed scale is implied in the assumptions. The fundamental properties of MBE, such as the asymptotic property, are proved theoretically, while a concise numerical scheme is developed for MBE to demonstrate its validity by benchmark multi-scale problems.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Unambiguous detection of high energy vortex states via the superkick effect
Authors:
Zhengjiang Li,
Shiyu Liu,
Bei Liu,
Liangliang Ji,
Igor P. Ivanov
Abstract:
Vortex states of photons, electrons, and other particles are freely propagating wave packets with helicoidal wave fronts winding around the axis of a phase vortex. A particle prepared in a vortex state possesses a non-zero orbital angular momentum projection on the propagation direction, a quantum number that has never been exploited in experimental particle and nuclear physics. Low-energy vortex…
▽ More
Vortex states of photons, electrons, and other particles are freely propagating wave packets with helicoidal wave fronts winding around the axis of a phase vortex. A particle prepared in a vortex state possesses a non-zero orbital angular momentum projection on the propagation direction, a quantum number that has never been exploited in experimental particle and nuclear physics. Low-energy vortex photons, electrons, neutrons, and helium atoms have been demonstrated in experiment and found numerous applications, and there exist proposals of boosting them to higher energies. However, the verification that a high energy particle is indeed in a vortex state will be a major challenge, since the low energy techniques become impractical for highly energetic particles. Here, we propose a new diagnostic method based of the so-called superkick effect, which can unambiguously detect the presence of a phase vortex. A proof-of-principle experiment with vortex electrons can be done with the existing technology and will, at the same time, constitute the first observation of the superkick effect.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining
Authors:
Shuqi Liu,
Bowei He,
Linqi Song
Abstract:
Large Language Models (LLMs) have shown human-like reasoning abilities but still face challenges in solving complex logical problems. Existing unidirectional chaining methods, such as forward chaining and backward chaining, suffer from issues like low prediction accuracy and efficiency. To address these, we propose a bidirectional chaining method, Bi-Chainer, which dynamically switches to depth-fi…
▽ More
Large Language Models (LLMs) have shown human-like reasoning abilities but still face challenges in solving complex logical problems. Existing unidirectional chaining methods, such as forward chaining and backward chaining, suffer from issues like low prediction accuracy and efficiency. To address these, we propose a bidirectional chaining method, Bi-Chainer, which dynamically switches to depth-first reasoning in the opposite reasoning direction when it encounters multiple branching options within the current direction. Thus, the intermediate reasoning results can be utilized as guidance to facilitate the reasoning process. We show that Bi-Chainer achieves sizable accuracy boots over unidirectional chaining frameworks on four challenging logical reasoning datasets. Moreover, Bi-Chainer enhances the accuracy of intermediate proof steps and reduces the average number of inference calls, resulting in more efficient and accurate reasoning.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Learning effective Hamiltonians for adaptive time-evolution quantum algorithms
Authors:
Hongzheng Zhao,
Ao Chen,
Shu-Wei Liu,
Marin Bukov,
Markus Heyl,
Roderich Moessner
Abstract:
Digital quantum simulation of many-body dynamics relies on Trotterization to decompose the target time evolution into elementary quantum gates operating at a fixed equidistant time discretization. Recent advances have outlined protocols enabling more efficient adaptive Trotter protocols, which have been shown to exhibit a controlled error in the dynamics of local observables and correlation functi…
▽ More
Digital quantum simulation of many-body dynamics relies on Trotterization to decompose the target time evolution into elementary quantum gates operating at a fixed equidistant time discretization. Recent advances have outlined protocols enabling more efficient adaptive Trotter protocols, which have been shown to exhibit a controlled error in the dynamics of local observables and correlation functions. However, it has remained open to which extent the errors on the actual generator of the dynamics, i.e., the target many-body Hamiltonian, remain controlled. Here, we propose to use quantum Hamiltonian learning to numerically obtain the effective Hamiltonian and apply it on the recently introduced ADA-Trotter algorithm as a concrete demonstration. Our key observation is that deviations from the target generator remain bounded on all simulation times. This result suggests that the ADA-Trotter not only generates reliable digital quantum simulation of local dynamics, but also controllably approximates the global quantum state of the target system. Our proposal is sufficiently general and readily applicable to other adaptive time-evolution algorithms.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval
Authors:
Yan Gao,
Zhiwei Cao,
Zhongjian Miao,
Baosong Yang,
Shiyu Liu,
Min Zhang,
**song Su
Abstract:
To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient $λ$. Despite its success, $k$NN retrieval at each timestep leads to substantial time…
▽ More
To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient $λ$. Despite its success, $k$NN retrieval at each timestep leads to substantial time overhead. To address this issue, dominant studies resort to $k$NN-MT with adaptive retrieval ($k$NN-MT-AR), which dynamically estimates $λ$ and skips $k$NN retrieval if $λ$ is less than a fixed threshold. Unfortunately, $k$NN-MT-AR does not yield satisfactory results. In this paper, we first conduct a preliminary study to reveal two key limitations of $k$NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of $λ$ for determining $k$NN retrieval skip**, and 2) using a fixed threshold fails to accommodate the dynamic demands for $k$NN retrieval at different timesteps. To mitigate these limitations, we then propose $k$NN-MT with dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects. Firstly, we equip $k$NN-MT with a MLP-based classifier for determining whether to skip $k$NN retrieval at each timestep. Particularly, we explore several carefully-designed scalar features to fully exert the potential of the classifier. Secondly, we propose a timestep-aware threshold adjustment method to dynamically generate the threshold, which further improves the efficiency of our model. Experimental results on the widely-used datasets demonstrate the effectiveness and generality of our model.\footnote{Our code is available at \url{https://github.com/DeepLearnXMU/knn-mt-dr}.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Modeling User Retention through Generative Flow Networks
Authors:
Ziru Liu,
Shuchang Liu,
Bin Yang,
Zhenghai Xue,
Qingpeng Cai,
Xiangyu Zhao,
Zijian Zhang,
Lantao Hu,
Han Li,
Peng Jiang
Abstract:
Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges includi…
▽ More
Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges including the intractable leave-and-return user activities, the sparse and delayed signal, and the uncertain relations between users' retention and their immediate feedback towards each item in the recommendation list. In this work, we regard the retention signal as an overall estimation of the user's end-of-session satisfaction and propose to estimate this signal through a probabilistic flow. This flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session, and we show that the flow combined with traditional learning-to-rank objectives eventually optimizes a non-discounted cumulative reward for both immediate user feedback and user retention. We verify the effectiveness of our method through both offline empirical studies on two public datasets and online A/B tests in an industrial platform.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
Authors:
Wei Li,
Pin-Yu Chen,
Sijia Liu,
Ren Wang
Abstract:
Deep neural networks are susceptible to backdoor attacks, where adversaries manipulate model predictions by inserting malicious samples into the training data. Currently, there is still a lack of direct filtering methods for identifying suspicious training data to unveil potential backdoor samples. In this paper, we propose a novel method, Prediction Shift Backdoor Detection (PSBD), leveraging an…
▽ More
Deep neural networks are susceptible to backdoor attacks, where adversaries manipulate model predictions by inserting malicious samples into the training data. Currently, there is still a lack of direct filtering methods for identifying suspicious training data to unveil potential backdoor samples. In this paper, we propose a novel method, Prediction Shift Backdoor Detection (PSBD), leveraging an uncertainty-based approach requiring minimal unlabeled clean validation data. PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels with dropout applied during inference, while backdoor samples exhibit less PS. We hypothesize PS results from neuron bias effect, making neurons favor features of certain classes. PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference. Extensive experiments have been conducted to verify the effectiveness and efficiency of PSBD, which achieves state-of-the-art results among mainstream detection methods.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Chebyshev Moment Method for Regular Graphs I: Kesten-McKay and Semicircle distributions
Authors:
Yulin Gong,
Wenbo Li,
Shi** Liu
Abstract:
We develop the Chebyshev moment method to study the spectrum of regular graphs, motivated by the work of Serré. By this method, we give an elementary proof of the weak convergence to the Kesten-McKay distribution for the normalized spectral measures of random $N$-lifts in probability as $N$ tends to infinity. For a sequence of random $(q_n+1)$-regular graphs $G_n$ with $n$ vertices, we show that i…
▽ More
We develop the Chebyshev moment method to study the spectrum of regular graphs, motivated by the work of Serré. By this method, we give an elementary proof of the weak convergence to the Kesten-McKay distribution for the normalized spectral measures of random $N$-lifts in probability as $N$ tends to infinity. For a sequence of random $(q_n+1)$-regular graphs $G_n$ with $n$ vertices, we show that if $q_n=n^{o(1)}$ and $q_n$ tends to infinity, then the normalized spectral measure converges in Wasserstein $p$-distance $W_{p}$ to the semicircle distribution for any $p \in [1,\infty)$ almost surely. This strengthens the result of Dumitriu and Pal.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
The PLATO Mission
Authors:
Heike Rauer,
Conny Aerts,
Juan Cabrera,
Magali Deleuil,
Anders Erikson,
Laurent Gizon,
Mariejo Goupil,
Ana Heras,
Jose Lorenzo-Alvarez,
Filippo Marliani,
Cesar Martin-Garcia,
J. Miguel Mas-Hesse,
Laurence O'Rourke,
Hugh Osborn,
Isabella Pagano,
Giampaolo Piotto,
Don Pollacco,
Roberto Ragazzoni,
Gavin Ramsay,
Stéphane Udry,
Thierry Appourchaux,
Willy Benz,
Alexis Brandeker,
Manuel Güdel,
Eduardo Janot-Pacheco
, et al. (801 additional authors not shown)
Abstract:
PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observati…
▽ More
PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observations from the ground, planets will be characterised for their radius, mass, and age with high accuracy (5 %, 10 %, 10 % for an Earth-Sun combination respectively). PLATO will provide us with a large-scale catalogue of well-characterised small planets up to intermediate orbital periods, relevant for a meaningful comparison to planet formation theories and to better understand planet evolution. It will make possible comparative exoplanetology to place our Solar System planets in a broader context. In parallel, PLATO will study (host) stars using asteroseismology, allowing us to determine the stellar properties with high accuracy, substantially enhancing our knowledge of stellar structure and evolution.
The payload instrument consists of 26 cameras with 12cm aperture each. For at least four years, the mission will perform high-precision photometric measurements. Here we review the science objectives, present PLATO's target samples and fields, provide an overview of expected core science performance as well as a description of the instrument and the mission profile at the beginning of the serial production of the flight cameras. PLATO is scheduled for a launch date end 2026. This overview therefore provides a summary of the mission to the community in preparation of the upcoming operational phases.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Authors:
Sanyuan Chen,
Shujie Liu,
Long Zhou,
Yanqing Liu,
Xu Tan,
**yu Li,
Sheng Zhao,
Yao Qian,
Furu Wei
Abstract:
This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in…
▽ More
This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. It not only stabilizes the decoding but also circumvents the infinite loop issue. Grouped Code Modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling. Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases. The advantages of this work could contribute to valuable endeavors, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis. See https://aka.ms/valle2 for demos of VALL-E 2.
△ Less
Submitted 17 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Hierarchical Bayesian approach for adaptive integration of Bragg peaks in time-of-flight neutron scattering data
Authors:
Viktor Reshniak,
** Wang,
Guannan Zhang,
Siyan Liu,
Junqi Yin
Abstract:
The Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL) operates in the event mode. Time-of-flight (TOF) information about each detected neutron is collected separately and saved as a descriptive entry in a database enabling unprecedented accuracy of the collected experimental data. Nevertheless, the common data processing pipeline still involves the binning of data to perform…
▽ More
The Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL) operates in the event mode. Time-of-flight (TOF) information about each detected neutron is collected separately and saved as a descriptive entry in a database enabling unprecedented accuracy of the collected experimental data. Nevertheless, the common data processing pipeline still involves the binning of data to perform analysis and feature extraction. For weak reflections, improper binning leads to sparse histograms with low signal-to-noise ratios, rendering them uninformative. In this study, we propose the Bayesian approach for the identification of Bragg peaks in TOF diffraction data. The method is capable of adaptively handling the varying sampling rates found in different regions of the reciprocal space. Unlike histogram fitting methods, our approach focuses on estimating the true neutron flux function. We accomplish this by employing a profile fitting algorithm based on the event-level likelihood, along with a multiresolution histogram-level prior. By using this approach, we ensure that there is no loss of information due to data reduction in strong reflections and that the search space is appropriately restricted for weak reflections. To demonstrate the effectiveness of our proposed model, we apply it to real experimental data collected at the TOPAZ single crystal diffractometer at SNS.
△ Less
Submitted 1 April, 2024;
originally announced June 2024.
-
Sub-nanometer depth resolution and single dopant visualization achieved by tilt-coupled multislice electron ptychography
Authors:
Zehao Dong,
Yang Zhang,
Chun-Chien Chiu,
Sicheng Lu,
Jianbing Zhang,
Yu-Chen Liu,
Suya Liu,
Jan-Chi Yang,
Pu Yu,
Yayu Wang,
Zhen Chen
Abstract:
Real-space imaging of three-dimensional atomic structures is a critical yet challenging task in materials science. Although scanning transmission electron microscopy has achieved sub-angstrom lateral resolution through techniques like electron ptychography1,2, depth resolution remains limited to only 2 to 3 nanometers with a single projection setup3,4. Attaining better depth resolution typically n…
▽ More
Real-space imaging of three-dimensional atomic structures is a critical yet challenging task in materials science. Although scanning transmission electron microscopy has achieved sub-angstrom lateral resolution through techniques like electron ptychography1,2, depth resolution remains limited to only 2 to 3 nanometers with a single projection setup3,4. Attaining better depth resolution typically necessitates large sample tilt angles and many projections, as seen in atomic electron tomography5,6. Here, we develop a new algorithm based on multislice electron ptychography which couples only a few projections at small tilt angles, but is sufficient to improve the depth resolution by more than threefold to the sub-nanometer scale, and potentially to the atomic level. This technique maintains high resolving power for both light and heavy atoms, and significantly improves the visibility of single dopants. We are thus able to experimentally detect dilute substitutional praseodymium dopants in a brownmillerite oxide, Ca2Co2O5, in three dimensions and observe the accompanying lattice distortion. This technique requires only a moderate level of data acquisition or processing, and can be seamlessly integrated into electron microscopes equipped with conventional components.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis
Authors:
Chengeng Liu,
Sihong Liu,
Chaomin Shen,
Yupeng Gao,
Yuxuan Liu
Abstract:
Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineati…
▽ More
Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineation of detailed rock fragment size distributions was achieved through the analysis of drone-captured imagery, coupled with the application of an enhanced Unet semantic segmentation model integrated with an expansion-based post-processing technique. The quarry slope was stratified into four vertical sections, with the size distribution of each section quantified via ellipsoid shape approximations. Our results disclose pronounced vertical segregation patterns, with finer particles concentrated in the upper slope regions and coarser particles in the lower. Utilizing relative characteristic diameters, we offered insight into the degree of segregation, thereby illustrating the spatial heterogeneity in fragment size more clearly. The techniques outlined in this study deliver a scalable and accurate method for assessing fragment size distribution, with the potential to better inform resource management and operational decisions in quarry management.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
Authors:
Jianben He,
Xingbo Wang,
Shiyi Liu,
Guande Wu,
Claudio Silva,
Huamin Qu
Abstract:
Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modaliti…
▽ More
Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modalities within multimodal inputs. This oversight hinders the development of effective prompts that guide model multimodal reasoning processes by fully exploiting the rich context provided by multiple modalities. In this paper, we present POEM, a visual analytics system to facilitate efficient prompt engineering for enhancing the multimodal reasoning performance of LLMs. The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts. Through diverse recommendations of demonstration examples and instructional principles, POEM supports users in iteratively crafting and refining prompts to better align and enhance model knowledge with human insights. The effectiveness and efficiency of our system are validated through two case studies and interviews with experts.
△ Less
Submitted 14 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
A Multivariate Equivalence Test Based on Mahalanobis Distance with a Data-Driven Margin
Authors:
Chao Wang,
Yu-Ting Weng,
Shaobo Liu,
Tengfei Li,
Meiyu Shen,
Yi Tsong
Abstract:
Multivariate equivalence testing is needed in a variety of scenarios for drug development. For example, drug products obtained from natural sources may contain many components for which the individual effects and/or their interactions on clinical efficacy and safety cannot be completely characterized. Such lack of sufficient characterization poses a challenge for both generic drug developers to de…
▽ More
Multivariate equivalence testing is needed in a variety of scenarios for drug development. For example, drug products obtained from natural sources may contain many components for which the individual effects and/or their interactions on clinical efficacy and safety cannot be completely characterized. Such lack of sufficient characterization poses a challenge for both generic drug developers to demonstrate and regulatory authorities to determine the sameness of a proposed generic product to its reference product. Another case is to ensure batch-to-batch consistency of naturally derived products containing a vast number of components, such as botanical products. The equivalence or sameness between products containing many components that cannot be individually evaluated needs to be studied in a holistic manner. Multivariate equivalence test based on Mahalanobis distance may be suitable to evaluate many variables holistically. Existing studies based on such method assumed either a predetermined constant margin, for which a consensus is difficult to achieve, or a margin derived from the data, where, however, the randomness is ignored during the testing. In this study, we propose a multivariate equivalence test based on Mahalanobis distance with a data-drive margin with the randomness in the margin considered. Several possible implementations are compared with existing approaches via extensive simulation studies.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input
Authors:
Joachim Ott,
Zuowen Wang,
Shih-Chii Liu
Abstract:
Event cameras are advantageous for tasks that require vision sensors with low-latency and sparse output responses. However, the development of deep network algorithms using event cameras has been slow because of the lack of large labelled event camera datasets for network training. This paper reports a method for creating new labelled event datasets by using a text-to-X model, where X is one or mu…
▽ More
Event cameras are advantageous for tasks that require vision sensors with low-latency and sparse output responses. However, the development of deep network algorithms using event cameras has been slow because of the lack of large labelled event camera datasets for network training. This paper reports a method for creating new labelled event datasets by using a text-to-X model, where X is one or multiple output modalities, in the case of this work, events. Our proposed text-to-events model produces synthetic event frames directly from text prompts. It uses an autoencoder which is trained to produce sparse event frames representing event camera outputs. By combining the pretrained autoencoder with a diffusion model architecture, the new text-to-events model is able to generate smooth synthetic event streams of moving objects. The autoencoder was first trained on an event camera dataset of diverse scenes. In the combined training with the diffusion model, the DVS gesture dataset was used. We demonstrate that the model can generate realistic event sequences of human gestures prompted by different text statements. The classification accuracy of the generated sequences, using a classifier trained on the real dataset, ranges between 42% to 92%, depending on the gesture group. The results demonstrate the capability of this method in synthesizing event datasets.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurement of the branching fraction ratios $R(D^{+})$ and $R(D^{*+})$ using muonic $τ$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1063 additional authors not shown)
Abstract:
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining…
▽ More
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining $D^+$ mesons with $τ^-\toμ^-\overlineν_μν_τ$ candidates, where the $D^+$ is reconstructed via the $D^+\to K^-π^+π^+$ decay. The results are
\begin{align*}
R(D^{+}) &= 0.249 \pm 0.043 \pm 0.047,
R(D^{*+}) &= 0.402 \pm 0.081\pm 0.085,
\end{align*}
where the first uncertainties are statistical and the second systematic. The two measurements have a correlation coefficient of $-0.39$ and are compatible with the Standard Model.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu…
▽ More
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contributions from resonances decaying to $D^{\ast-}D^{+}$ and $D^{\ast+}D^{-}$ states linked by $C$ parity. This procedure allows the $C$-parities of resonances in the $D^{\ast\pm}D^{\mp}$ mass spectra to be determined. Four charmonium(-like) states are observed decaying into $D^{\ast\pm}D^{\mp}$: $η_c(3945)$, $h_c(4000)$, $χ_{c1}(4010)$ and $h_c(4300)$, with quantum numbers $J^{PC}$ equal to $0^{-+}$, $1^{+-}$, $1^{++}$ and $1^{+-}$, respectively. At least three of these states have not been observed previously. In addition, the existence of the $T_{\bar{c}\bar{s}0}^{*}(2870)^{0}$ and $T_{\bar{c}\bar{s}1}^{*}(2900)^{0}$ resonances in the $D^-K^+$ mass spectrum, already observed in the $B^+ \to D^+ D^- K^+$ decay, is confirmed in a different production channel.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors
Authors:
Han Li,
Zehao Huang,
Zitian Wang,
Wenge Rong,
Naiyan Wang,
Si Liu
Abstract:
3D lane detection and topology reasoning are essential tasks in autonomous driving scenarios, requiring not only detecting the accurate 3D coordinates on lane lines, but also reasoning the relationship between lanes and traffic elements. Current vision-based methods, whether explicitly constructing BEV features or not, all establish the lane anchors/queries in 3D space while ignoring the 2D lane p…
▽ More
3D lane detection and topology reasoning are essential tasks in autonomous driving scenarios, requiring not only detecting the accurate 3D coordinates on lane lines, but also reasoning the relationship between lanes and traffic elements. Current vision-based methods, whether explicitly constructing BEV features or not, all establish the lane anchors/queries in 3D space while ignoring the 2D lane priors. In this study, we propose Topo2D, a novel framework based on Transformer, leveraging 2D lane instances to initialize 3D queries and 3D positional embeddings. Furthermore, we explicitly incorporate 2D lane features into the recognition of topology relationships among lane centerlines and between lane centerlines and traffic elements. Topo2D achieves 44.5% OLS on multi-view topology reasoning benchmark OpenLane-V2 and 62.6% F-Socre on single-view 3D lane detection benchmark OpenLane, exceeding the performance of existing state-of-the-art methods.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency
Authors:
Haoran Ma,
Yifan Qiao,
Shi Liu,
Shan Yu,
Yuanjiang Ni,
Qingda Lu,
Jiesheng Wu,
Yiying Zhang,
Miryung Kim,
Harry Xu
Abstract:
Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit…
▽ More
Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunities for significantly simplifying the coherence implementation if the ownership semantics can be exposed to and leveraged by the runtime. This paper discusses the design and implementation of DistR, a Rust-based DSM system that outperforms the two state-of-the-art DSM systems GAM and Grappa by up to 2.64x and 29.16x in throughput, and scales much better with the number of servers.
△ Less
Submitted 27 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Feasibility of State Space Models for Network Traffic Generation
Authors:
Andrew Chu,
Xi Jiang,
Shinan Liu,
Arjun Bhagoji,
Francesco Bronzino,
Paul Schmitt,
Nick Feamster
Abstract:
Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of rea…
▽ More
Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of real-world traffic In this paper, we i) survey the evolution of traffic simulators/generators and ii) propose the use of state-space models, specifically Mamba, for packet-level, synthetic network trace generation by modeling it as an unsupervised sequence generation problem. Early evaluation shows that state-space models can generate synthetic network traffic with higher statistical similarity to real traffic than the state-of-the-art. Our approach thus has the potential to reliably generate realistic, informative synthetic network traces for downstream tasks.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Replicability in High Dimensional Statistics
Authors:
Max Hopkins,
Russell Impagliazzo,
Daniel Kane,
Sihan Liu,
Christopher Ye
Abstract:
The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for $1$-dimensional tasks including statistical queries. In this work, we study the comput…
▽ More
The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for $1$-dimensional tasks including statistical queries. In this work, we study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks, including multi-hypothesis testing and mean estimation.
Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetric tilings. As a consequence, we obtain matching sample complexity upper and lower bounds for replicable mean estimation of distributions with bounded covariance, resolving an open problem of [Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sivakumar, and Sorrell, STOC2023] and for the $N$-Coin Problem, resolving a problem of [Karbasi, Velegkas, Yang, and Zhou, NeurIPS2023] up to log factors.
While our equivalence is computational, allowing us to shave log factors in sample complexity from the best known efficient algorithms, efficient isoperimetric tilings are not known. To circumvent this, we introduce several relaxed paradigms that do allow for sample and computationally efficient algorithms, including allowing pre-processing, adaptivity, and approximate replicability. In these cases we give efficient algorithms matching or beating the best known sample complexity for mean estimation and the coin problem, including a generic procedure that reduces the standard quadratic overhead of replicability to linear in expectation.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Exploiting Chaotic Dynamics as Deep Neural Networks
Authors:
Shuhong Liu,
Nozomi Akashi,
Qingyao Huang,
Yasuo Kuniyoshi,
Kohei Nakajima
Abstract:
Chaos presents complex dynamics arising from nonlinearity and a sensitivity to initial states. These characteristics suggest a depth of expressivity that underscores their potential for advanced computational applications. However, strategies to effectively exploit chaotic dynamics for information processing have largely remained elusive. In this study, we reveal that the essence of chaos can be f…
▽ More
Chaos presents complex dynamics arising from nonlinearity and a sensitivity to initial states. These characteristics suggest a depth of expressivity that underscores their potential for advanced computational applications. However, strategies to effectively exploit chaotic dynamics for information processing have largely remained elusive. In this study, we reveal that the essence of chaos can be found in various state-of-the-art deep neural networks. Drawing inspiration from this revelation, we propose a novel method that directly leverages chaotic dynamics for deep learning architectures. Our approach is systematically evaluated across distinct chaotic systems. In all instances, our framework presents superior results to conventional deep neural networks in terms of accuracy, convergence speed, and efficiency. Furthermore, we found an active role of transient chaos formation in our scheme. Collectively, this study offers a new path for the integration of chaos, which has long been overlooked in information processing, and provides insights into the prospective fusion of chaotic dynamics within the domains of machine learning and neuromorphic computation.
△ Less
Submitted 29 May, 2024;
originally announced June 2024.
-
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Authors:
Dejia Xu,
Weili Nie,
Chao Liu,
Sifei Liu,
Jan Kautz,
Zhangyang Wang,
Arash Vahdat
Abstract:
Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users. However, these models often do not offer precise control over camera poses for video generation, limiting the expression of cinematic language and user control. To address this issue, we introduce CamCo, which allows fine-grained Camera pose Contro…
▽ More
Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users. However, these models often do not offer precise control over camera poses for video generation, limiting the expression of cinematic language and user control. To address this issue, we introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. We equip a pre-trained image-to-video generator with accurately parameterized camera pose input using Plücker coordinates. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block that enforces epipolar constraints to the feature maps. Additionally, we fine-tune CamCo on real-world videos with camera poses estimated through structure-from-motion algorithms to better synthesize object motion. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models while effectively generating plausible object motion. Project page: https://ir1d.github.io/CamCo/
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Authors:
Philip Anastassiou,
Jiawei Chen,
Jitong Chen,
Yuanzhe Chen,
Zhuo Chen,
Ziyi Chen,
Jian Cong,
Lelai Deng,
Chuang Ding,
Lu Gao,
Mingqing Gong,
Peisong Huang,
Qingqing Huang,
Zhiying Huang,
Yuanyuan Huo,
Dongya Jia,
Chumin Li,
Feiya Li,
Hui Li,
Jiaxin Li,
Xiaoyang Li,
Xingxing Li,
Lin Liu,
Shouda Liu,
Sichao Liu
, et al. (21 additional authors not shown)
Abstract:
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub…
▽ More
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Towards an Extensible Model-Based Digital Twin Framework for Space Launch Vehicles
Authors:
Ran Wei,
Ruizhe Yang,
Shijun Liu,
Chongsheng Fan,
Rong Zhou,
Zekun Wu,
Haochi Wang,
Yifan Cai,
Zhe Jiang
Abstract:
The concept of Digital Twin (DT) is increasingly applied to systems on different levels of abstraction across domains, to support monitoring, analysis, diagnosis, decision making and automated control. Whilst the interest in applying DT is growing, the definition of DT is unclear, neither is there a clear pathway to develop DT to fully realise its capacities. In this paper, we revise the concept o…
▽ More
The concept of Digital Twin (DT) is increasingly applied to systems on different levels of abstraction across domains, to support monitoring, analysis, diagnosis, decision making and automated control. Whilst the interest in applying DT is growing, the definition of DT is unclear, neither is there a clear pathway to develop DT to fully realise its capacities. In this paper, we revise the concept of DT and its categorisation. We propose a DT maturity matrix, based on which we propose a model-based DT development methodology. We also discuss how model-based tools can be used to support the methodology and present our own supporting tool. We report our preliminary findings with a discussion on a case study, in which we use our proposed methodology and our supporting tool to develop an extensible DT platform for the assurance of Electrical and Electronics systems of space launch vehicles.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Altermagnetism: Exploring New Frontiers in Magnetism and Spintronics
Authors:
Ling Bai,
Wanxiang Feng,
Siyuan Liu,
Libor Šmejkal,
Yuriy Mokrousov,
Yugui Yao
Abstract:
Recent developments have introduced a groundbreaking form of collinear magnetism known as "altermagnetism". This emerging magnetic phase is characterized by robust time-reversal symmetry breaking, antiparallel magnetic order, and alternating spin-splitting band structures, yet it exhibits vanishing net magnetization constrained by symmetry. Altermagnetism uniquely integrates traits previously cons…
▽ More
Recent developments have introduced a groundbreaking form of collinear magnetism known as "altermagnetism". This emerging magnetic phase is characterized by robust time-reversal symmetry breaking, antiparallel magnetic order, and alternating spin-splitting band structures, yet it exhibits vanishing net magnetization constrained by symmetry. Altermagnetism uniquely integrates traits previously considered mutually exclusive to conventional collinear ferromagnetism and antiferromagnetism, thereby facilitating phenomena and functionalities previously not achievable within these traditional categories of magnetism. Initially proposed theoretically, the existence of the altermagnetic phase has since been corroborated by a range of experimental studies, which have confirmed its unique properties and potential for applications. This review explores the rapidly expanding research on altermagnets, emphasizing the novel physical phenomena they manifest, methodologies for inducing altermagnetism, and promising altermagnetic materials. The goal of this review is to furnish readers with a comprehensive overview of altermagnetism and to inspire further innovative studies on altermagnetic materials which could potentially revolutionize applications in technology and materials science.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
Authors:
Songtao Liu,
Hanjun Dai,
Yue Zhao,
Peng Liu
Abstract:
Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecul…
▽ More
Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. Code is available at https://github.com/SongtaoLiu0823/CREBM.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Graph Adversarial Diffusion Convolution
Authors:
Songtao Liu,
**ghui Chen,
Tianfan Fu,
Lu Lin,
Marinka Zitnik,
Dinghao Wu
Abstract:
This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC…
▽ More
This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporating an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Authors:
Hongkang Li,
Meng Wang,
Tengfei Ma,
Sijia Liu,
Zaixi Zhang,
Pin-Yu Chen
Abstract:
Graph Transformers, which incorporate self-attention and positional encoding, have recently emerged as a powerful architecture for various graph learning tasks. Despite their impressive performance, the complex non-convex interactions across layers and the recursive graph structure have made it challenging to establish a theoretical foundation for learning and generalization. This study introduces…
▽ More
Graph Transformers, which incorporate self-attention and positional encoding, have recently emerged as a powerful architecture for various graph learning tasks. Despite their impressive performance, the complex non-convex interactions across layers and the recursive graph structure have made it challenging to establish a theoretical foundation for learning and generalization. This study introduces the first theoretical investigation of a shallow Graph Transformer for semi-supervised node classification, comprising a self-attention layer with relative positional encoding and a two-layer perceptron. Focusing on a graph data model with discriminative nodes that determine node labels and non-discriminative nodes that are class-irrelevant, we characterize the sample complexity required to achieve a desirable generalization error by training with stochastic gradient descent (SGD). This paper provides the quantitative characterization of the sample complexity and number of iterations for convergence dependent on the fraction of discriminative nodes, the dominant patterns, and the initial model errors. Furthermore, we demonstrate that self-attention and positional encoding enhance generalization by making the attention map sparse and promoting the core neighborhood during training, which explains the superior feature representation of Graph Transformers. Our theoretical results are supported by empirical experiments on synthetic and real-world benchmarks.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Pionic transitions of the spin-2 partner of $X(3872)$ to $χ_{cJ}$
Authors:
Shi-Dong Liu,
Fan Wang,
Zhao-Sai Jia,
Gang Li,
Xiao-Hai Liu,
Ju-Jun Xie
Abstract:
We investigated the pionic transitions between the $X_2$ [spin-2 partner of the $X(3872)$] and $χ_{c1,2}$ using a nonrelativistic effective field theory. The $X_2$ is assumed to be a bound state of the $D^{*}$ and $\bar{D}^*$ mesons and to decay through several kinds of loops, including the bubble, triangle and box loops. Within the present model, the widths for the single-pion decays…
▽ More
We investigated the pionic transitions between the $X_2$ [spin-2 partner of the $X(3872)$] and $χ_{c1,2}$ using a nonrelativistic effective field theory. The $X_2$ is assumed to be a bound state of the $D^{*}$ and $\bar{D}^*$ mesons and to decay through several kinds of loops, including the bubble, triangle and box loops. Within the present model, the widths for the single-pion decays $X_2\toπ^0χ_{cJ}$ are predicted to be about $3$--$30$ keV. For the dipion decays, the widths are a few keVs. These widths yield a branching fraction of $10^{-3}$--$10^{-2}$. The ratio $R_{\mathrm{c}0}=Γ(X_2\toπ^+π^-χ_{cJ})/Γ(X_2\toπ^0π^0χ_{cJ}) \simeq 1.6$, which is a bit smaller than the expected value of $2$, and $R_{21}=Γ(X_2\toππχ_{c2})/Γ(X_2\toππχ_{c1}) \simeq 0.85$. These ratios are nearly independent of the $X_2$ mass and the coupling constants, which might be a good quantity for the experiments. Moreover, the invariant mass spectra of the $π^0χ_{cJ}$ final state for the dipion processes are presented, showing a cusp structure at the $D {\bar D}^*$ threshold enhanced and narrowed by the nearby triangle singularity.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Video Coding with Cross-Component Sample Offset
Authors:
Han Gao,
Xin Zhao,
Tianqi Liu,
Shan Liu
Abstract:
Beyond the exploration of traditional spatial, temporal and subjective visual signal redundancy in image and video compression, recent research has focused on leveraging cross-color component redundancy to enhance coding efficiency. Cross-component coding approaches are motivated by the statistical correlations among different color components, such as those in the Y'CbCr color space, where luma (…
▽ More
Beyond the exploration of traditional spatial, temporal and subjective visual signal redundancy in image and video compression, recent research has focused on leveraging cross-color component redundancy to enhance coding efficiency. Cross-component coding approaches are motivated by the statistical correlations among different color components, such as those in the Y'CbCr color space, where luma (Y) color component typically exhibits finer details than chroma (Cb/Cr) color components. Inspired by previous cross-component coding algorithms, this paper introduces a novel in-loop filtering approach named Cross-Component Sample Offset (CCSO). CCSO utilizes co-located and neighboring luma samples to generate correction signals for both luma and chroma reconstructed samples. It is a multiplication-free, non-linear map** process implemented using a look-up-table. The input to the map** is a group of reconstructed luma samples, and the output is an offset value applied on the center luma or co-located chroma sample. Experimental results demonstrate that the proposed CCSO can be applied to both image and video coding, resulting in improved coding efficiency and visual quality. The method has been adopted into an experimental next-generation video codec beyond AV1 developed by the Alliance for Open Media (AOMedia), achieving significant objective coding gains up to 3.5\,\% and 1.8\,\% for PSNR and VMAF quality metrics, respectively, under random access configuration. Additionally, CCSO notably improves the subjective visual quality.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.