-
CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models
Authors:
Dong Shu,
Mingyu **,
Tianle Chen,
Chong Zhang,
Yongfeng Zhang
Abstract:
This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs), such as GPT-4 and LLaMA-2, by identifying and mitigating their vulnerabilities through explainable analysis of prompt attacks. We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively me…
▽ More
This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs), such as GPT-4 and LLaMA-2, by identifying and mitigating their vulnerabilities through explainable analysis of prompt attacks. We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively measure attack effectiveness and explore the embedded defense mechanisms in these models. Our approach is distinctive for its capacity to elucidate the reasons behind the generation of harmful responses by LLMs through an incremental counterfactual methodology. By organizing the prompt modification process into four incremental levels: (word, sentence, character, and a combination of character and word) we facilitate a thorough examination of the susceptibilities inherent to LLMs. The findings from our study not only provide counterfactual explanation insight but also demonstrate that our framework significantly enhances the effectiveness of attack prompts.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Topological Fermi-arc surface state covered by floating electrons on a two-dimensional electride
Authors:
Chan-young Lim,
Min-Seok Kim,
Dong Cheol Lim,
Sunghun Kim,
Yeonghoon Lee,
Jaehoon Cha,
Gyubin Lee,
Sang Yong Song,
Dinesh Thapa,
Jonathan D. Denlinger,
Seong-Gon Kim,
Sung Wng Kim,
Jungpil Seo,
Yeongkwan Kim
Abstract:
Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromag…
▽ More
Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromagnetic electride $[Gd_{2}$C]^{2+}\cdot2e^{-}$. In particular, the presence of Weyl cones and Fermi-arc states is demonstrated through photon energy-dependent ARPES measurements, agreeing with theoretical band structure calculations. Notably, the STM measurements reveal that the Fermi-arc states exist underneath a floating quantum electron liquid on the top Gd layer, forming double-stacked surface states in a heterostructure. Our work thus not only unveils the non-trivial topology of the $[Gd_{2}$C]^{2+}\cdot2e^{-}$ electride but also realizes a surface heterostructure that can host phenomena distinct from the bulk.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Measurement of $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien,
F. Becherer
, et al. (414 additional authors not shown)
Abstract:
We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We det…
▽ More
We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We determine these parameters for two ranges of $K^0_S π^0$ invariant mass: $m(K^0_S π^0)\in (0.8, 1.0)$ $GeV/c^2$, which is dominated by $B^0 \to K^{*0} (\to K^0_S π^0) γ$ decays, and a complementary region $m(K^0_S π^0)\in (0.6, 0.8)\cup(1.0, 1.8)$ $GeV/c^2$. Our results have improved precision as compared to previous measurements and are consistent with theory predictions.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
A Novel Quantum Realization of Jet Clustering in High-Energy Physics Experiments
Authors:
Yongfeng Zhu,
Weifeng Zhuang,
Chen Qian,
Yunheng Ma,
Dong E. Liu,
Manqi Ruan,
Chen Zhou
Abstract:
Exploring the application of quantum technologies to fundamental sciences holds the key to fostering innovation for both sides. In high-energy particle collisions, quarks and gluons are produced and immediately form collimated particle sprays known as jets. Accurate jet clustering is crucial as it retains the information of the originating quark or gluon and forms the basis for studying properties…
▽ More
Exploring the application of quantum technologies to fundamental sciences holds the key to fostering innovation for both sides. In high-energy particle collisions, quarks and gluons are produced and immediately form collimated particle sprays known as jets. Accurate jet clustering is crucial as it retains the information of the originating quark or gluon and forms the basis for studying properties of the Higgs boson, which underlies teh mechanism of mass generation for subatomic particles. For the first time, by map** collision events into graphs--with particles as nodes and their angular separations as edges--we realize jet clustering using the Quantum Approximate Optimization Algorithm (QAOA), a hybrid quantum-classical algorithm for addressing classical combinatorial optimization problems with available quantum resources. Our results, derived from 30 qubits on quantum computer simulator and 6 qubits on quantum computer hardware, demonstrate that jet clustering performance with QAOA is comparable with or even better than classical algorithms for a small-sized problem. This study highlights the feasibility of quantum computing to revolutionize jet clustering, bringing the practical application of quantum computing in high-energy physics experiments one step closer.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Authors:
Yuzhang Tian,
Jianbo Zhao,
Haoyu Dong,
Junyu Xiong,
Shiyu Xia,
Mengyu Zhou,
Yun Lin,
José Cambronero,
Yeye He,
Shi Han,
Dongmei Zhang
Abstract:
Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization…
▽ More
Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting. Moreover, fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%. Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task. We methodically leverage the inherent layout and structure of spreadsheets, demonstrating that SpreadsheetLLM is highly effective across a variety of spreadsheet tasks.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Measurement of branching fractions, CP asymmetry, and isospin asymmetry for $\boldsymbol{B\rightarrowργ}$ decays using Belle and Belle II data
Authors:
Belle II Collaboration,
I. Adachi,
K. Adamczyk,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien,
F. Becherer
, et al. (385 additional authors not shown)
Abstract:
We present measurements of $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays using a combined data sample of $772 \times 10^6$ $B\overline{B}$ pairs collected by the Belle experiment and $387\times 10^6$ $B\overline{B}$ pairs collected by the Belle II experiment in $e^{+}e^{-}$ collisions at the $Υ(4S)$ resonance. After an optimized selection, a simultaneous fit to the Belle and Belle I…
▽ More
We present measurements of $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays using a combined data sample of $772 \times 10^6$ $B\overline{B}$ pairs collected by the Belle experiment and $387\times 10^6$ $B\overline{B}$ pairs collected by the Belle II experiment in $e^{+}e^{-}$ collisions at the $Υ(4S)$ resonance. After an optimized selection, a simultaneous fit to the Belle and Belle II data sets yields $114\pm 12$ $B^{+}\rightarrowρ^{+}γ$ and $99\pm 12$ $B^{0}\rightarrowρ^{0}γ$ decays. The measured branching fractions are $(13.1^{+2.0 +1.3}_{-1.9 -1.2})\times 10^{-7}$ and $(7.5\pm 1.3^{+1.0}_{-0.8})\times 10^{-7}$ for $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays, respectively, where the first uncertainty is statistical and the second is systematic. We also measure the isospin asymmetry $A_{\rm I}(B\rightarrowργ)=(10.9^{+11.2 +7.8}_{-11.7 -7.3})\%$ and the direct CP asymmetry $A_{CP}(B^{+}\rightarrowρ^{+}γ)=(-8.2\pm 15.2^{+1.6}_{-1.2})\%$.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Accelerating Eigenvalue Computation for Nuclear Structure Calculations via Perturbative Corrections
Authors:
Dong Min Roh,
Esmond Ng,
Chao Yang,
Dean Lee,
Pieter Maris,
James P. Vary
Abstract:
We present a new method for computing the lowest few eigenvalues and the corresponding eigenvectors of a nuclear many-body Hamiltonian represented in a truncated configuration interaction subspace, i.e., the no-core shell model (NCSM). The method uses the hierarchical structure of the NCSM Hamiltonian to partition the Hamiltonian as the sum of two matrices. The first matrix corresponds to the Hami…
▽ More
We present a new method for computing the lowest few eigenvalues and the corresponding eigenvectors of a nuclear many-body Hamiltonian represented in a truncated configuration interaction subspace, i.e., the no-core shell model (NCSM). The method uses the hierarchical structure of the NCSM Hamiltonian to partition the Hamiltonian as the sum of two matrices. The first matrix corresponds to the Hamiltonian represented in a small configuration space, whereas the second is viewed as the perturbation to the first matrix. Eigenvalues and eigenvectors of the first matrix can be computed efficiently. Perturbative corrections to the eigenvectors of the first matrix can be obtained from the solutions of a sequence of linear systems of equations defined in the small configuration space. These correction vectors can be combined with the approximate eigenvectors of the first matrix to construct a subspace from which more accurate approximations of the desired eigenpairs can be obtained. We call this method a Subspace Projection with Perturbative Corrections (SPPC) method. We show by numerical examples that the SPPC method can be more efficient than conventional iterative methods for solving large-scale eigenvalue problems such as the Lanczos, block Lanczos and the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The method can also be combined with other methods to avoid convergence stagnation.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Convergences of Combinatorial Ricci Flows to Degenerated Circle Packings in Hyperbolic Background Geometry
Authors:
Guangming Hu,
Sicheng Lu,
Dong Tan,
Youliang Zhong,
Puchun Zhou
Abstract:
This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circl…
▽ More
This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circle packed surface, analougus to the methods of Chow-Luo and Takatsu.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Optimal Carbon Emission Control With Allowances Purchasing
Authors:
Xinfu Chen,
Yuchao Dong,
Wenlin Huang,
** Liang
Abstract:
In this paper, we consider a company can simultaneously reduce its emissions and buy carbon allowances at any time. We establish an optimal control model involving two stochastic processes with two control variables, which is a singular control problem. This model can then be converted into a Hamilton-Jacobi-Bellman (HJB) equation, which is a two-dimensional variational equality with gradient barr…
▽ More
In this paper, we consider a company can simultaneously reduce its emissions and buy carbon allowances at any time. We establish an optimal control model involving two stochastic processes with two control variables, which is a singular control problem. This model can then be converted into a Hamilton-Jacobi-Bellman (HJB) equation, which is a two-dimensional variational equality with gradient barrier, so that the free boundary is a surface. We prove the existence and uniqueness of the solution. Finally, some numerical results are shown.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Authors:
Hao Wang,
Pengzhen Ren,
Zequn Jie,
Xiao Dong,
Chengjian Feng,
Yinlong Qian,
Lin Ma,
Dongmei Jiang,
Yaowei Wang,
Xiangyuan Lan,
Xiaodan Liang
Abstract:
Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training on diverse large-scale datasets. However, these approaches still face two primary challenges: (i) how to universally integrate diverse data sources…
▽ More
Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training on diverse large-scale datasets. However, these approaches still face two primary challenges: (i) how to universally integrate diverse data sources for end-to-end training, and (ii) how to effectively leverage the language-aware capability for region-level cross-modality understanding. To address these challenges, we propose a novel unified open-vocabulary detection method called OV-DINO, which pre-trains on diverse large-scale datasets with language-aware selective fusion in a unified framework. Specifically, we introduce a Unified Data Integration (UniDI) pipeline to enable end-to-end training and eliminate noise from pseudo-label generation by unifying different data sources into detection-centric data. In addition, we propose a Language-Aware Selective Fusion (LASF) module to enable the language-aware ability of the model through a language-aware query selection and fusion process. We evaluate the performance of the proposed OV-DINO on popular open-vocabulary detection benchmark datasets, achieving state-of-the-art results with an AP of 50.6\% on the COCO dataset and 40.0\% on the LVIS dataset in a zero-shot manner, demonstrating its strong generalization ability. Furthermore, the fine-tuned OV-DINO on COCO achieves 58.4\% AP, outperforming many existing methods with the same backbone. The code for OV-DINO will be available at \href{https://github.com/wanghao9610/OV-DINO}{https://github.com/wanghao9610/OV-DINO}.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
The discovery of a nearby 421~s transient with CHIME/FRB/Pulsar
Authors:
Fengqiu Adam Dong,
Tracy Clarke,
Alice P. Curtin,
Ajay Kumar,
Ingrid Stairs,
Shami Chatterjee,
Amanda M. Cook,
Emmanuel Fonseca,
B. M. Gaensler,
Jason W. T. Hessels,
Victoria M. Kaspi,
Mattias Lazda,
Kiyoshi W. Masui,
James W. McKee,
Bradley W. Meyers,
Aaron B. Pearlman,
Scott M. Ransom,
Paul Scholz,
Kaitlyn Shin,
Kendrick M. Smith,
Chia Min Tan
Abstract:
Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio puls…
▽ More
Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio pulsars and magnetars. However, they pulse on timescales (minutes) much longer than previously seen. While minute timescales are common rotation periods for white dwarfs, LPTs are much brighter than the known pulsating white dwarfs, and dipolar radiation from isolated (as opposed to binary) magnetic white dwarfs has yet to be observed. Here, we report the discovery of a new $\sim$421~s LPT, CHIME J0630+25, using the CHIME/FRB and CHIME/Pulsar instruments. We used standard pulsar timing techniques and obtained a phase-coherent timing solution which yielded limits on the inferred magnetic field and characteristic age. CHIME J0630+25 is remarkably nearby ($170 \pm 80$~pc), making it the closest LPT discovered to date.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
A Transverse-Read-assisted Valid-Bit Collection to Accelerate Stochastic Conmputing MAC for Energy-Efficient in-RTM DNNs
Authors:
Jihe Wang,
Zhiying Zhang,
Xingwu Dong,
Danghui Wang
Abstract:
It looks very attractive to coordinate racetrack-memory(RM) and stochastic-computing (SC) jointly to build an ultra-low power neuron-architecture.However,the above combination has always been questioned in a fatal weakness that the narrow bit-view of the RM-MTJ structure,a.k.a.shift-and-access pattern,cannot physically match the great throughput of direct-stored stochastic sequences.Fortunately,a…
▽ More
It looks very attractive to coordinate racetrack-memory(RM) and stochastic-computing (SC) jointly to build an ultra-low power neuron-architecture.However,the above combination has always been questioned in a fatal weakness that the narrow bit-view of the RM-MTJ structure,a.k.a.shift-and-access pattern,cannot physically match the great throughput of direct-stored stochastic sequences.Fortunately,a recently developed Transverse-Read(TR) provides a wider segment-view to RM via detecting the resistance of domain-walls between a couple of MTJs on single nanowire,therefore RM can be enhanced with a faster access to the sequences without any substantial domain-shift.To utilize TR for a power-efficient SC-DNNs, in this work, we propose a segment-based compression to leverage one-cycle TR to only read those kernel segments of stochastic sequences,meanwhile,remove a large number of redundant segments for ultra-high storage density.In decompression stage,the low-discrepancy stochastic sequences can be quickly reassembled by a select-and-output loop using kernel segments rather than slowly regenerated by costly SNGs.Since TR can provide an ideal in-memory acceleration in one-counting, counter-free SC-MACs are designed and deployed near RMs to form a power-efficient neuron-architecture,in which,the binary results of TR are activated straightforward without sluggish APCs.The results show that under the TR aided RM model,the power efficiency,speed,and stochastic accuracy of Seed-based Fast Stochastic Computing significantly enhance the performance of DNNs.The speed of computation is 2.88x faster in Lenet-5 and 4.40x faster in VGG-19 compared to the CORUSCANT model.The integration of TR with RTM is deployed near the memory to create a power-efficient neuron architecture, eliminating the need for slow Accumulative Parallel Counters (APCs) and improving access speed to stochastic sequences.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Video-to-Audio Generation with Hidden Alignment
Authors:
Manjie Xu,
Chenxing Li,
Yong Ren,
Rilin Chen,
Yu Gu,
Wei Liang,
Dong Yu
Abstract:
Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techni…
▽ More
Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techniques. Beginning with a foundational model VTA-LDM built on a simple yet surprisingly effective intuition, we explore various vision encoders and auxiliary embeddings through ablation studies. Employing a comprehensive evaluation pipeline that emphasizes generation quality and video-audio synchronization alignment, we demonstrate that our model exhibits state-of-the-art video-to-audio generation capabilities. Furthermore, we provide critical insights into the impact of different data augmentation methods on enhancing the generation framework's overall capacity. We showcase possibilities to advance the challenge of generating synchronized audio from semantic and temporal perspectives. We hope these insights will serve as a step** stone toward develo** more realistic and accurate audio-visual generation models.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Drantal-NeRF: Diffusion-Based Restoration for Anti-aliasing Neural Radiance Field
Authors:
Ganlin Yang,
Kaidong Zhang,
**g**g Fu,
Dong Liu
Abstract:
Aliasing artifacts in renderings produced by Neural Radiance Field (NeRF) is a long-standing but complex issue in the field of 3D implicit representation, which arises from a multitude of intricate causes and was mitigated by designing more advanced but complex scene parameterization methods before. In this paper, we present a Diffusion-based restoration method for anti-aliasing Neural Radiance Fi…
▽ More
Aliasing artifacts in renderings produced by Neural Radiance Field (NeRF) is a long-standing but complex issue in the field of 3D implicit representation, which arises from a multitude of intricate causes and was mitigated by designing more advanced but complex scene parameterization methods before. In this paper, we present a Diffusion-based restoration method for anti-aliasing Neural Radiance Field (Drantal-NeRF). We consider the anti-aliasing issue from a low-level restoration perspective by viewing aliasing artifacts as a kind of degradation model added to clean ground truths. By leveraging the powerful prior knowledge encapsulated in diffusion model, we could restore the high-realism anti-aliasing renderings conditioned on aliased low-quality counterparts. We further employ a feature-wrap** operation to ensure multi-view restoration consistency and finetune the VAE decoder to better adapt to the scene-specific data distribution. Our proposed method is easy to implement and agnostic to various NeRF backbones. We conduct extensive experiments on challenging large-scale urban scenes as well as unbounded 360-degree scenes and achieve substantial qualitative and quantitative improvements.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Analytical Insights on Hadronic Top Quark Polarimetry
Authors:
Zhongtian Dong,
Dorival Gonçalves,
Kyoungchul Kong,
Andrew J. Larkoski,
Alberto Navarro
Abstract:
Top quark polarization provides an important tool for studying its production mechanisms, spin correlations, top quark properties, and new physics searches. Unlike lighter quarks, the top quark's polarization remains intact until its decay, enabling precise spin measurements. While the down-type fermions from $W$ boson decay are known to be effective spin analyzers, charged leptons have typically…
▽ More
Top quark polarization provides an important tool for studying its production mechanisms, spin correlations, top quark properties, and new physics searches. Unlike lighter quarks, the top quark's polarization remains intact until its decay, enabling precise spin measurements. While the down-type fermions from $W$ boson decay are known to be effective spin analyzers, charged leptons have typically been the main target for most analyses. In this paper, we investigate the relevance of global jet dynamics -- considering kinematics, jet charges, and particle multiplicity -- for hadronic top quark polarimetry. The formalism used allows for analytical derivations obtained throughout the manuscript, offering deeper insights into the corresponding phenomenology.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Authors:
Bowen Zhang,
Yiji Cheng,
Chunyu Wang,
Ting Zhang,
Jiaolong Yang,
Yansong Tang,
Feng Zhao,
Dong Chen,
Baining Guo
Abstract:
We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a nov…
▽ More
We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a novel data scheduling strategy and a weight consolidation regularization term, which improves the decoder's capability of rendering sharper details. Additionally, we optimize the guiding effect of the portrait image by computing a finer-grained hierarchical representation that captures rich 2D texture cues, and injecting them to the 3D diffusion model at multiple layers via cross-attention. When trained on 46K avatars with a noise schedule optimized for triplanes, the resulting model can generate 3D avatars with notably better details than previous methods and can generalize to in-the-wild portrait input.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
RespEar: Earable-Based Robust Respiratory Rate Monitoring
Authors:
Yang Liu,
Kayla-Jade Butkow,
Jake Stuchbury-Wass,
Adam Pullin,
Dong Ma,
Cecilia Mascolo
Abstract:
Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challengi…
▽ More
Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challenging. In this work, we present RespEar, an earable-based system for robust RR monitoring. By leveraging the unique properties of in-ear microphones in earbuds, RespEar enables the use of Respiratory Sinus Arrhythmia (RSA) and Locomotor Respiratory Coupling (LRC), physiological couplings between cardiovascular activity, gait and respiration, to indirectly determine RR. This effectively addresses the challenges posed by the almost imperceptible breathing signals under daily activities. We further propose a suite of meticulously crafted signal processing schemes to improve RR estimation accuracy and robustness. With data collected from 18 subjects over 8 activities, RespEar measures RR with a mean absolute error (MAE) of 1.48 breaths per minutes (BPM) and a mean absolute percent error (MAPE) of 9.12% in sedentary conditions, and a MAE of 2.28 BPM and a MAPE of 11.04% in active conditions, respectively, which is unprecedented for a method capable of generalizing across conditions with a single modality.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Threats and Defenses in Federated Learning Life Cycle: A Comprehensive Survey and Challenges
Authors:
Yanli Li,
Zhongliang Guo,
Nan Yang,
Huaming Chen,
Dong Yuan,
Wei** Ding
Abstract:
Federated Learning (FL) offers innovative solutions for privacy-preserving collaborative machine learning (ML). Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model's utility or compromise participants' privacy, either directly or indirectly. In response, numerous defense…
▽ More
Federated Learning (FL) offers innovative solutions for privacy-preserving collaborative machine learning (ML). Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model's utility or compromise participants' privacy, either directly or indirectly. In response, numerous defense frameworks have been proposed, demonstrating effectiveness in specific settings and scenarios. To provide a clear understanding of the current research landscape, this paper reviews the most representative and state-of-the-art threats and defense frameworks throughout the FL service life cycle. We start by identifying FL threats that harm utility and privacy, including those with potential or direct impacts. Then, we dive into the defense frameworks, analyze the relationship between threats and defenses, and compare the trade-offs among different defense strategies. Finally, we summarize current research bottlenecks and offer insights into future research directions to conclude this survey. We hope this survey sheds light on trustworthy FL research and contributes to the FL community.
△ Less
Submitted 11 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
PDEformer-1: A Foundation Model for One-Dimensional Partial Differential Equations
Authors:
Zhanhong Ye,
Xiang Huang,
Leheng Chen,
Zining Liu,
Bingyang Wu,
Hongsheng Liu,
Zidong Wang,
Bin Dong
Abstract:
This paper introduces PDEformer-1, a versatile neural solver capable of simultaneously addressing various partial differential equations (PDEs). With the PDE represented as a computational graph, we facilitate the seamless integration of symbolic and numeric information inherent in a PDE. A graph Transformer and an implicit neural representation (INR) are employed subsequently to generate mesh-fre…
▽ More
This paper introduces PDEformer-1, a versatile neural solver capable of simultaneously addressing various partial differential equations (PDEs). With the PDE represented as a computational graph, we facilitate the seamless integration of symbolic and numeric information inherent in a PDE. A graph Transformer and an implicit neural representation (INR) are employed subsequently to generate mesh-free predicted solutions. We generated a dataset with up to three million samples involving diverse one-dimensional PDEs to pretrain our model. Compared with baseline models trained specifically on benchmark datasets, our pretrained model achieves comparable accuracy via zero-shot inference, and the advantage expands after finetuning. For PDEs new or unseen in the pretraining stage, our model can adapt quickly by finetuning on a relatively small set of examples from the target equation. Additionally, PDEformer-1 demonstrates promising results in the inverse problem of PDE scalar coefficient recovery and coefficient field recovery.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
On the equivalence problem of Smith forms for multivariate polynomial matrices
Authors:
Dong Lu,
Dingkang Wang,
Fanghui Xiao,
Xiaopeng Zheng
Abstract:
This paper delves into the equivalence problem of Smith forms for multivariate polynomial matrices. Generally speaking, multivariate ($n \geq 2$) polynomial matrices and their Smith forms may not be equivalent. However, under certain specific condition, we derive the necessary and sufficient condition for their equivalence. Let $F\in K[x_1,\ldots,x_n]^{l\times m}$ be of rank $r$,…
▽ More
This paper delves into the equivalence problem of Smith forms for multivariate polynomial matrices. Generally speaking, multivariate ($n \geq 2$) polynomial matrices and their Smith forms may not be equivalent. However, under certain specific condition, we derive the necessary and sufficient condition for their equivalence. Let $F\in K[x_1,\ldots,x_n]^{l\times m}$ be of rank $r$, $d_r(F)\in K[x_1]$ be the greatest common divisor of all the $r\times r$ minors of $F$, where $K$ is a field, $x_1,\ldots,x_n$ are variables and $1 \leq r \leq \min\{l,m\}$. Our key findings reveal the result: $F$ is equivalent to its Smith form if and only if all the $i\times i$ reduced minors of $F$ generate $K[x_1,\ldots,x_n]$ for $i=1,\ldots,r$.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Parallel Segment Entanglement Swap**
Authors:
Binjie He,
Seng W. Loke,
Dong Zhang
Abstract:
In the noisy intermediate-scale quantum era, scientists are trying to improve the entanglement swap** success rate by researching anti-noise technology on the physical level, thereby obtaining a higher generation rate of long-distance entanglement. However, we may improve the generation rate from another perspective, which is studying an efficient entanglement swap** strategy. This paper analy…
▽ More
In the noisy intermediate-scale quantum era, scientists are trying to improve the entanglement swap** success rate by researching anti-noise technology on the physical level, thereby obtaining a higher generation rate of long-distance entanglement. However, we may improve the generation rate from another perspective, which is studying an efficient entanglement swap** strategy. This paper analyzes the challenges faced by existing entanglement swap** strategies, including the node allocation principle, time synchronization, and processing of entanglement swap** failure. We present Parallel Segment Entanglement Swap** (PSES) to solve these problems. The core idea of PSES is to segment the path and perform parallel entanglement swap** between segments to improve the generation rate of long-distance entanglement. We construct a tree-like model as the carrier of PSES and propose heuristic algorithms called Layer Greedy and Segment Greedy to transform the path into a tree-like model. Moreover, we realize the time synchronization and design the on-demand retransmission mechanism to process entanglement swap** failure. The experiments show that PSES performs superiorly to other entanglement swap** strategies, and the on-demand retransmission mechanism can reduce the average entanglement swap** time by 80% and the average entanglement consumption by 80%.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making
Authors:
Yangyang Yu,
Zhiyuan Yao,
Haohang Li,
Zhiyang Deng,
Yupeng Cao,
Zhi Chen,
Jordan W. Suchow,
Rong Liu,
Zhenyu Cui,
Denghui Zhang,
Koduvayur Subbalakshmi,
Guojun Xiong,
Yueru He,
Jimin Huang,
Dong Li,
Qianqian Xie
Abstract:
Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and man…
▽ More
Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and manage risks. Although LLMs have been used to develop agent systems that surpass human teams and yield impressive investment returns, opportunities to enhance multi-sourced information synthesis and optimize decision-making outcomes through timely experience refinement remain unexplored. Here, we introduce the FinCon, an LLM-based multi-agent framework with CONceptual verbal reinforcement tailored for diverse FINancial tasks. Inspired by effective real-world investment firm organizational structures, FinCon utilizes a manager-analyst communication hierarchy. This structure allows for synchronized cross-functional agent collaboration towards unified goals through natural language interactions and equips each agent with greater memory capacity than humans. Additionally, a risk-control component in FinCon enhances decision quality by episodically initiating a self-critiquing mechanism to update systematic investment beliefs. The conceptualized beliefs serve as verbal reinforcement for the future agent's behavior and can be selectively propagated to the appropriate node that requires knowledge updates. This feature significantly improves performance while reducing unnecessary peer-to-peer communication costs. Moreover, FinCon demonstrates strong generalization capabilities in various financial tasks, including single stock trading and portfolio management.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
TCKIN: A Novel Integrated Network Model for Predicting Mortality Risk in Sepsis Patients
Authors:
Fanglin Dong
Abstract:
Sepsis poses a major global health threat, accounting for millions of deaths annually and significant economic costs. Accurate predictions of mortality risk in sepsis patients facilitate the efficient allocation of medical resources, thereby enhancing patient survival and quality of life. Through precise risk assessments, healthcare facilities can effectively distribute intensive care beds, medica…
▽ More
Sepsis poses a major global health threat, accounting for millions of deaths annually and significant economic costs. Accurate predictions of mortality risk in sepsis patients facilitate the efficient allocation of medical resources, thereby enhancing patient survival and quality of life. Through precise risk assessments, healthcare facilities can effectively distribute intensive care beds, medical equipment, and staff, ensuring high-risk patients receive timely and appropriate care. Early identification and intervention significantly decrease mortality rates and improve patient outcomes. Current methods typically utilize only one type of data--either constant, temporal, or ICD codes. This study introduces the Time-Constant KAN Integrated Network(TCKIN), an innovative model that enhances the accuracy of sepsis mortality risk predictions by integrating both temporal and constant data from electronic health records and ICD codes. Validated against the MIMIC-III and MIMIC-IV datasets, TCKIN surpasses existing machine learning and deep learning methods in accuracy, sensitivity, and specificity. Notably, TCKIN achieved AUCs of 87.76% and 88.07%, demonstrating superior capability in identifying high-risk patients. Additionally, TCKIN effectively combats the prevalent issue of data imbalance in clinical settings, improving the detection of patients at elevated risk of mortality and facilitating timely interventions. These results confirm the model's effectiveness and its potential to transform patient management and treatment optimization in clinical practice. With this advanced risk assessment tool, healthcare providers can devise more tailored treatment plans, optimize resource utilization, and ultimately enhance survival rates and quality of life for sepsis patients.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Decomposition Betters Tracking Everything Everywhere
Authors:
Rui Li,
Dong Liu
Abstract:
Recent studies on motion estimation have advocated an optimized motion representation that is globally consistent across the entire video, preferably for every pixel. This is challenging as a uniform representation may not account for the complex and diverse motion and appearance of natural videos. We address this problem and propose a new test-time optimization method, named DecoMotion, for estim…
▽ More
Recent studies on motion estimation have advocated an optimized motion representation that is globally consistent across the entire video, preferably for every pixel. This is challenging as a uniform representation may not account for the complex and diverse motion and appearance of natural videos. We address this problem and propose a new test-time optimization method, named DecoMotion, for estimating per-pixel and long-range motion. DecoMotion explicitly decomposes video content into static scenes and dynamic objects, either of which uses a quasi-3D canonical volume to represent. DecoMotion separately coordinates the transformations between local and canonical spaces, facilitating an affine transformation for the static scene that corresponds to camera motion. For the dynamic volume, DecoMotion leverages discriminative and temporally consistent features to rectify the non-rigid transformation. The two volumes are finally fused to fully represent motion and appearance. This divide-and-conquer strategy leads to more robust tracking through occlusions and deformations and meanwhile obtains decomposed appearances. We conduct evaluations on the TAP-Vid benchmark. The results demonstrate our method boosts the point-tracking accuracy by a large margin and performs on par with some state-of-the-art dedicated point-tracking solutions.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
TARGO: Benchmarking Target-driven Object Gras** under Occlusions
Authors:
Yan Xia,
Ran Ding,
Ziyuan Qin,
Guanqi Zhan,
Kaichen Zhou,
Long Yang,
Hao Dong,
Daniel Cremers
Abstract:
Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic gras**. However, previous gras** models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Gras** under Occlusions, named TARGO. We make the following contr…
▽ More
Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic gras**. However, previous gras** models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Gras** under Occlusions, named TARGO. We make the following contributions: 1) We are the first to study the occlusion level of gras**. 2) We set up an evaluation benchmark consisting of large-scale synthetic data and part of real-world data, and we evaluated five grasp models and found that even the current SOTA model suffers when the occlusion level increases, leaving gras** under occlusion still a challenge. 3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of gras** under occlusion and generalized to the real world. 4) We further propose a transformer-based gras** model involving a shape completion module, termed TARGO-Net, which performs most robustly as occlusion increases. Our benchmark dataset can be found at https://TARGO-benchmark.github.io/.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
Authors:
Yijun Dong,
Hoang Phan,
Xiang Pan,
Qi Lei
Abstract:
We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduc…
▽ More
We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Few-Shot Keyword Spotting from Mixed Speech
Authors:
Junming Yuan,
Ying Shi,
LanTian Li,
Dong Wang,
Askar Hamdulla
Abstract:
Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has propos…
▽ More
Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has proposed a Mix-Training (MT) approach to solve the problem, however, it has never been tested in the few-shot scenario. In this paper, we investigate the possibility of using MT and other relevant methods to solve the two practical challenges together: few-shot and mixed speech. Experiments conducted on the LibriSpeech and Google Speech Command corpora demonstrate that MT is highly effective on this task when employed in either the pre-training phase or the fine-tuning phase. Moreover, combining SSL-based large-scale pre-training (HuBert) and MT fine-tuning yields very strong results in all the test conditions.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
Authors:
Miao Zheng,
Hao Liang,
Fan Yang,
Haoze Sun,
Tianpeng Li,
Lingchu Xiong,
Yan Zhang,
Youzhen Wu,
Kun Li,
Yanjun Shen,
Mingan Lin,
Tao Zhang,
Guosheng Dong,
Yu**g Qiao,
Kun Fang,
Weipeng Chen,
Bin Cui,
Wentao Zhang,
Zenan Zhou
Abstract:
In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul…
▽ More
In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficult to use. To address this issue, we propose PAS, an LLM-based plug-and-play APE system. PAS utilizes LLMs trained on high-quality, automatically generated prompt complementary datasets, resulting in exceptional performance. In comprehensive benchmarks, PAS achieves state-of-the-art (SoTA) results compared to previous APE models, with an average improvement of 6.09 points. Moreover, PAS is highly efficient, achieving SoTA performance with only 9000 data points. Additionally, PAS can autonomously generate prompt augmentation data without requiring additional human labor. Its flexibility also allows it to be compatible with all existing LLMs and applicable to a wide range of tasks. PAS excels in human evaluations, underscoring its suitability as a plug-in for users. This combination of high performance, efficiency, and flexibility makes PAS a valuable system for enhancing the usability and effectiveness of LLMs through improved prompt engineering.
△ Less
Submitted 12 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
Authors:
Yibo Miao,
Yifan Zhu,
Yinpeng Dong,
Lijia Yu,
Jun Zhu,
Xiao-Shan Gao
Abstract:
The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus o…
▽ More
The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus on the quality of video generation. While some evaluations of text-to-image models have considered safety, they cover fewer aspects and do not address the unique temporal risk inherent in video generation. To bridge this research gap, we introduce T2VSafetyBench, a new benchmark designed for conducting safety-critical assessments of text-to-video models. We define 12 critical aspects of video generation safety and construct a malicious prompt dataset using LLMs and jailbreaking prompt attacks. Based on our evaluation results, we draw several important findings, including: 1) no single model excels in all aspects, with different models showing various strengths; 2) the correlation between GPT-4 assessments and manual reviews is generally high; 3) there is a trade-off between the usability and safety of text-to-video generative models. This indicates that as the field of video generation rapidly advances, safety risks are set to surge, highlighting the urgency of prioritizing video safety. We hope that T2VSafetyBench can provide insights for better understanding the safety of video generation in the era of generative AI.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Graph Anomaly Detection with Noisy Labels by Reinforcement Learning
Authors:
Zhu Wang,
Shuang Zhou,
Junnan Dong,
Chang Yang,
Xiao Huang,
Shengjie Zhao
Abstract:
Graph anomaly detection (GAD) has been widely applied in many areas, e.g., fraud detection in finance and robot accounts in social networks. Existing methods are dedicated to identifying the outlier nodes that deviate from normal ones. While they heavily rely on high-quality annotation, which is hard to obtain in real-world scenarios, this could lead to severely degraded performance based on noisy…
▽ More
Graph anomaly detection (GAD) has been widely applied in many areas, e.g., fraud detection in finance and robot accounts in social networks. Existing methods are dedicated to identifying the outlier nodes that deviate from normal ones. While they heavily rely on high-quality annotation, which is hard to obtain in real-world scenarios, this could lead to severely degraded performance based on noisy labels. Thus, we are motivated to cut the edges of suspicious nodes to alleviate the impact of noise. However, it remains difficult to precisely identify the nodes with noisy labels. Moreover, it is hard to quantitatively evaluate the regret of cutting the edges, which may have either positive or negative influences. To this end, we propose a novel framework REGAD, i.e., REinforced Graph Anomaly Detector. Specifically, we aim to maximize the performance improvement (AUC) of a base detector by cutting noisy edges approximated through the nodes with high-confidence labels. (i) We design a tailored action and search space to train a policy network to carefully prune edges step by step, where only a few suspicious edges are prioritized in each step. (ii) We design a policy-in-the-loop mechanism to iteratively optimize the policy based on the feedback from base detector. The overall performance is evaluated by the cumulative rewards. Extensive experiments are conducted on three datasets under different anomaly ratios. The results indicate the superior performance of our proposed REGAD.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
When is the consistent prediction likely to be a correct prediction?
Authors:
Alex Nguyen,
Dheeraj Mekala,
Chengyu Dong,
**gbo Shang
Abstract:
Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all o…
▽ More
Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all outputs, are more likely to be correct. This is predominantly because we demonstrate that LLMs can autonomously produce chain-of-thought (CoT) style reasoning with no custom prompts merely while generating longer responses, which lead to consistent predictions that are more accurate. In the zero-shot setting, by sampling Mixtral-8x7B model multiple times and considering longer responses, we achieve 86% of its self-consistency performance obtained through zero-shot CoT prompting on the GSM8K and MultiArith datasets. Finally, we demonstrate that the probability of LLMs generating a longer response is quite low, highlighting the need for decoding strategies conditioned on output length.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Enlarging Feature Support Overlap for Domain Generalization
Authors:
Yaoyao Zhu,
Xiuding Cai,
Dong Miao,
Yu Yao,
Zhongliang Fu
Abstract:
Deep models often struggle with out-of-distribution (OOD) generalization, limiting their real-world applicability beyond controlled laboratory settings. Invariant risk minimization (IRM) addresses this issue by learning invariant features and minimizing the risk across different domains. Thus, it avoids the pitfalls of pseudo-invariant features and spurious causality associated with empirical risk…
▽ More
Deep models often struggle with out-of-distribution (OOD) generalization, limiting their real-world applicability beyond controlled laboratory settings. Invariant risk minimization (IRM) addresses this issue by learning invariant features and minimizing the risk across different domains. Thus, it avoids the pitfalls of pseudo-invariant features and spurious causality associated with empirical risk minimization (ERM). However, according to the support overlap theorem, ERM and IRM may fail to address the OOD problem when pseudo-invariant features have insufficient support overlap. To this end, we propose a novel method to enlarge feature support overlap for domain generalization. Specifically, we introduce Bayesian random semantic data augmentation to increase sample diversity and overcome the deficiency of IRM. Experiments on several challenging OOD generalization benchmarks demonstrate that our approach surpasses existing models, delivering superior performance and robustness. The code is available at \url{https://github.com/YaoyaoZhu19/BSDG}.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports
Authors:
Yutong Zhang,
Yi Pan,
Tianyang Zhong,
Peixin Dong,
Kangni Xie,
Yuxiao Liu,
Hanqi Jiang,
Zhengliang Liu,
Shijie Zhao,
Tuo Zhang,
Xi Jiang,
Dinggang Shen,
Tianming Liu,
Xin Zhang
Abstract:
Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti…
▽ More
Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence (AGI) for computer vision, showcasing their potential in the biomedical domain. In this study, we evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets, including 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy), and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Large Language Models Understand Layouts
Authors:
Weiming Li,
Manni Duan,
Dong An,
Yan Shao
Abstract:
Large language models (LLMs) demonstrate extraordinary abilities in a wide range of natural language processing (NLP) tasks. In this paper, we show that, beyond text understanding capability, LLMs are capable of processing text layouts that are denoted by spatial markers. They are able to answer questions that require explicit spatial perceiving and reasoning, while a drastic performance drop is o…
▽ More
Large language models (LLMs) demonstrate extraordinary abilities in a wide range of natural language processing (NLP) tasks. In this paper, we show that, beyond text understanding capability, LLMs are capable of processing text layouts that are denoted by spatial markers. They are able to answer questions that require explicit spatial perceiving and reasoning, while a drastic performance drop is observed when the spatial markers from the original data are excluded. We perform a series of experiments with the GPT-3.5, Baichuan2, Llama2 and ChatGLM3 models on various types of layout-sensitive datasets for further analysis. The experimental results reveal that the layout understanding ability of LLMs is mainly introduced by the coding data for pretraining, which is further enhanced at the instruction-tuning stage. In addition, layout understanding can be enhanced by integrating low-cost, auto-generated data approached by a novel text game. Finally, we show that layout understanding ability is beneficial for building efficient visual question-answering (VQA) systems.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Transition magnetic moment about neutrinos
Authors:
Long Ruan,
Shu-Min Zhao,
Ming-Yue Liu,
Xing-Yu Han,
Xi Wang,
Xing-Xing Dong
Abstract:
This paper investigates the neutrino transition magnetic moment in the $U(1)_X$SSM. $U(1)_X$SSM is the $U(1)$ extension of Minimal Supersymmetric Standard Model (MSSM) and its local gauge group is extended to $SU(3)_C\times SU(2)_L \times U(1)_Y\times U(1)_X$. To obtain this model, three singlet new Higgs superfields and right-handed neutrinos are added to the MSSM, which can explain the results o…
▽ More
This paper investigates the neutrino transition magnetic moment in the $U(1)_X$SSM. $U(1)_X$SSM is the $U(1)$ extension of Minimal Supersymmetric Standard Model (MSSM) and its local gauge group is extended to $SU(3)_C\times SU(2)_L \times U(1)_Y\times U(1)_X$. To obtain this model, three singlet new Higgs superfields and right-handed neutrinos are added to the MSSM, which can explain the results of neutrino oscillation experiments. The neutrino transition magnetic moment is induced by electroweak radiative corrections. By applying effective Lagrangian method and on-shell scheme, we study the associated Feynman diagrams and the transition magnetic moment of neutrinos in the model. We fit experimental data for neutrino mass variances and mixing angle. Based on the range of data selection, the influences of different sensitive parameters on the results are analysed. The numerical analysis shows that many parameters have an effect on the neutrino transition moment, such as $g_X$, $M_2$, $λ_H$ and $g_{YX}$. For our numerical results, the order of magnitude of $μ_{ij}$ is between $10^{-20}$ and $10^{-19}$.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation
Authors:
**peng Hu,
Tengteng Dong,
Hui Ma,
Peng Zou,
Xiao Sun,
Meng Wang
Abstract:
Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In t…
▽ More
Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In this paper, we propose a specialized psychological large language model (LLM), named PsycoLLM, trained on a proposed high-quality psychological dataset, including single-turn QA, multi-turn dialogues enriched with prior knowledge and knowledge-based QA. Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis. The experimental results on the benchmark illustrates the effectiveness of PsycoLLM, which demonstrates superior performance compared to other LLMs.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge
Authors:
Hyun** Cho,
Dong Un Kang,
Se Young Chun
Abstract:
Short-term object interaction anticipation is an important task in egocentric video analysis, including precise predictions of future interactions and their timings as well as the categories and positions of the involved active objects. To alleviate the complexity of this task, our proposed method, SOIA-DOD, effectively decompose it into 1) detecting active object and 2) classifying interaction an…
▽ More
Short-term object interaction anticipation is an important task in egocentric video analysis, including precise predictions of future interactions and their timings as well as the categories and positions of the involved active objects. To alleviate the complexity of this task, our proposed method, SOIA-DOD, effectively decompose it into 1) detecting active object and 2) classifying interaction and predicting their timing. Our method first detects all potential active objects in the last frame of egocentric video by fine-tuning a pre-trained YOLOv9. Then, we combine these potential active objects as query with transformer encoder, thereby identifying the most promising next active object and predicting its future interaction and time-to-contact. Experimental results demonstrate that our method outperforms state-of-the-art models on the challenge test set, achieving the best performance in predicting next active objects and their interactions. Finally, our proposed ranked the third overall top-5 mAP when including time-to-contact predictions. The source code is available at https://github.com/Keeny**/SOIA-DOD.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Bulk high-temperature superconductivity in the high-pressure tetragonal phase of bilayer La2PrNi2O7
Authors:
Ningning Wang,
Gang Wang,
Xiaoling Shen,
Jun Hou,
Jun Luo,
** Ma,
Huaixin Yang,
Lifen Shi,
Jie Dou,
Jie Feng,
Jie Yang,
Yunqing Shi,
Zhian Ren,
Hanming Ma,
Pengtao Yang,
Ziyi Liu,
Yue Liu,
Hua Zhang,
Xiaoli Dong,
Yuxin Wang,
Kun Jiang,
Jiang** Hu,
Stuart Calder,
Jiaqiang Yan,
Jian** Sun
, et al. (4 additional authors not shown)
Abstract:
The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the…
▽ More
The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the filamentary nature with low superconducting volume fraction. The presence of a novel "1313" polymorph and competing R-P phases obscured proper identification of the phase for HTSC. Thus, achieving bulk HTSC and identifying the phase at play are the most prominent tasks at present. Here, we address these issues in the praseodymium (Pr)-doped La2PrNi2O7 polycrystalline samples. We find that the substitutions of Pr for La effectively inhibits the intergrowth of different R-P phases, resulting in nearly pure bilayer structure. For La2PrNi2O7, pressure-induced orthorhombic-to-tetragonal structural transition takes place at Pc ~ 11 GPa, above which HTSC emerges gradually upon further compression. The superconducting transition temperatures at 18-20 GPa reach Tconset = 82.5 K and Tczero = 60 K, which are the highest values among known nickelate superconductors. More importantly, bulk HTSC was testified by detecting clear diamagnetic signals below ~75 K corresponding to an estimated superconducting volume fraction ~ 57(5)% at 20 GPa. Our results not only resolve the existing controversies but also illuminate directions for exploring bulk HTSC in the bilayer nickelates.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
New User Event Prediction Through the Lens of Causal Inference
Authors:
Henry Shaowu Yuchi,
Shixiang Zhu,
Li Dong,
Yigit M. Arisoy,
Matthew C. Spencer
Abstract:
Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to classify users into behavior-based categories and analyze each of them separately. However, this…
▽ More
Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to classify users into behavior-based categories and analyze each of them separately. However, this approach requires extensive data to fully understand user behavior, presenting challenges in modeling newcomers without historical knowledge. In this paper, we propose a novel discrete event prediction framework for new users through the lens of causal inference. Our method offers an unbiased prediction for new users without needing to know their categories. We treat the user event history as the ''treatment'' for future events and the user category as the key confounder. Thus, the prediction problem can be framed as counterfactual outcome estimation, with the new user model trained on an adjusted dataset where each event is re-weighted by its inverse propensity score. We demonstrate the superior performance of the proposed framework with a numerical simulation study and two real-world applications, including Netflix rating prediction and seller contact prediction for customer support at Amazon.
△ Less
Submitted 10 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
LINEAR: Learning Implicit Neural Representation With Explicit Physical Priors for Accelerated Quantitative T1rho Map**
Authors:
Yuanyuan Liu,
**wen Xie,
Zhuo-Xu Cui,
Qingyong Zhu,
**g Cheng,
Dong Liang,
Yanjie Zhu
Abstract:
Quantitative T1rho parameter map** has shown promise in clinical and research studies. However, it suffers from long scan times. Deep learning-based techniques have been successfully applied in accelerated quantitative MR parameter map**. However, most methods require fully-sampled training dataset, which is impractical in the clinic. In this study, a novel subject-specific unsupervised method…
▽ More
Quantitative T1rho parameter map** has shown promise in clinical and research studies. However, it suffers from long scan times. Deep learning-based techniques have been successfully applied in accelerated quantitative MR parameter map**. However, most methods require fully-sampled training dataset, which is impractical in the clinic. In this study, a novel subject-specific unsupervised method based on the implicit neural representation is proposed to reconstruct images from highly undersampled k-space data and estimate parameter maps from reconstructions, which only takes spatiotemporal coordinates as the input. Specifically, the proposed method learned a implicit neural representation of the MR images driven by two explicit priors of images (or k-space data), including the low-rankness of Hankel matrix, and the self-consistency of k-space data. The ablation experiments show that the proposed method can characterize the physical priors of MR images well. Moreover,experimental results of retrospective and prospective data show that the proposed method outperforms the state-of-the-art methods in terms of supressing artifacts and achieving the lowest error.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Symmetric Biderivations on Complex Semisimple Lie Algebras
Authors:
Liu Shiyuan,
Liu Dong,
Zhao Yueqiang
Abstract:
This paper studies biderivations on finite-dimensional complex semisimple Lie algebras to their finite-dimensional modules. More precisely, we prove that all such symmetric biderivations are trivial. As applications, we determine all biderivations on Takiff algebras and symplectic oscillator algebras, as well as all supersymmetric biderivations over some Lie superalgebras including all finite-dime…
▽ More
This paper studies biderivations on finite-dimensional complex semisimple Lie algebras to their finite-dimensional modules. More precisely, we prove that all such symmetric biderivations are trivial. As applications, we determine all biderivations on Takiff algebras and symplectic oscillator algebras, as well as all supersymmetric biderivations over some Lie superalgebras including all finite-dimensional classical simple Lie superalgebras.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
LLMBox: A Comprehensive Library for Large Language Models
Authors:
Tianyi Tang,
Yiwen Hu,
Bingqian Li,
Wenyang Luo,
Zi**g Qin,
Haoxiang Sun,
Jiapeng Wang,
Shiyi Xu,
Xiaoxue Cheng,
Geyang Guo,
Han Peng,
Bowen Zheng,
Yiru Tang,
Yingqian Min,
Yushuo Chen,
Jie Chen,
Yuanqian Zhao,
Luran Ding,
Yuhao Wang,
Zican Dong,
Chunxuan Xia,
Junyi Li,
Kun Zhou,
Wayne Xin Zhao,
Ji-Rong Wen
Abstract:
To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,…
▽ More
To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Assessing Code Generation with Intermediate Languages
Authors:
Xun Deng,
Sicheng Zhong,
Honghua Dong,
**gyu Hu,
Sidi Mohamed Beillahi,
Xujie Si,
Fan Long
Abstract:
Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code, and systematically evaluates their impact on the performance of LLMs in code…
▽ More
Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code, and systematically evaluates their impact on the performance of LLMs in code generation tasks. Our experiments encompass eleven models across the CodeLlama, GPT, and Mistral families, as well as newly released smaller models. Our findings reveal that intermediate languages generally exhibit greater efficacy in larger models that have not yet achieved state-of-the-art performance. Natural language consistently emerges as the most effective intermediate representation across all target languages. However, we observe no universally effective intermediate formal language across different models and target languages. Furthermore, we uncover a weak correlation between the correctness of intermediate solutions and final generation, suggesting that improvements may stem from the chain-of-thought effect rather than language-specific transfer. Interestingly, we discover that for GPT family models, prompting multiple times without explicit self-correction instructions yields performance gains across the studied languages.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Authors:
Danni Yang,
Ruohan Dong,
Jiayi Ji,
Yiwei Ma,
Haowei Wang,
Xiaoshuai Sun,
Rongrong Ji
Abstract:
Recently, diffusion models have increasingly demonstrated their capabilities in vision understanding. By leveraging prompt-based learning to construct sentences, these models have shown proficiency in classification and visual grounding tasks. However, existing approaches primarily showcase their ability to perform sentence-level localization, leaving the potential for leveraging contextual inform…
▽ More
Recently, diffusion models have increasingly demonstrated their capabilities in vision understanding. By leveraging prompt-based learning to construct sentences, these models have shown proficiency in classification and visual grounding tasks. However, existing approaches primarily showcase their ability to perform sentence-level localization, leaving the potential for leveraging contextual information for phrase-level understanding largely unexplored. In this paper, we utilize Panoptic Narrative Grounding (PNG) as a proxy task to investigate this capability further. PNG aims to segment object instances mentioned by multiple noun phrases within a given narrative text. Specifically, we introduce the DiffPNG framework, a straightforward yet effective approach that fully capitalizes on the diffusion's architecture for segmentation by decomposing the process into a sequence of localization, segmentation, and refinement steps. The framework initially identifies anchor points using cross-attention mechanisms and subsequently performs segmentation with self-attention to achieve zero-shot PNG. Moreover, we introduce a refinement module based on SAM to enhance the quality of the segmentation masks. Our extensive experiments on the PNG dataset demonstrate that DiffPNG achieves strong performance in the zero-shot PNG task setting, conclusively proving the diffusion model's capability for context-aware, phrase-level understanding. Source code is available at \url{https://github.com/nini0919/DiffPNG}.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
PAPM: A Physics-aware Proxy Model for Process Systems
Authors:
Pengwei Liu,
Zhongkai Hao,
Xingyu Ren,
Hangjie Yuan,
Jiayang Ren,
Dong Ni
Abstract:
In the context of proxy modeling for process systems, traditional data-driven deep learning approaches frequently encounter significant challenges, such as substantial training costs induced by large amounts of data, and limited generalization capabilities. As a promising alternative, physics-aware models incorporate partial physics knowledge to ameliorate these challenges. Although demonstrating…
▽ More
In the context of proxy modeling for process systems, traditional data-driven deep learning approaches frequently encounter significant challenges, such as substantial training costs induced by large amounts of data, and limited generalization capabilities. As a promising alternative, physics-aware models incorporate partial physics knowledge to ameliorate these challenges. Although demonstrating efficacy, they fall short in terms of exploration depth and universality. To address these shortcomings, we introduce a physics-aware proxy model (PAPM) that fully incorporates partial prior physics of process systems, which includes multiple input conditions and the general form of conservation relations, resulting in better out-of-sample generalization. Additionally, PAPM contains a holistic temporal-spatial step** module for flexible adaptation across various process systems. Through systematic comparisons with state-of-the-art pure data-driven and physics-aware models across five two-dimensional benchmarks in nine generalization tasks, PAPM notably achieves an average performance improvement of 6.7%, while requiring fewer FLOPs, and just 1% of the parameters compared to the prior leading method. The code is available at https://github.com/pengwei07/PAPM.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer
Authors:
Wei Dong,
Han Zhou,
Ruiyi Wang,
Xiaohong Liu,
Guangtao Zhai,
Jun Chen
Abstract:
Image dehazing, a pivotal task in low-level vision, aims to restore the visibility and detail from hazy images. Many deep learning methods with powerful representation learning capability demonstrate advanced performance on non-homogeneous dehazing, however, these methods usually struggle with processing high-resolution images (e.g., $4000 \times 6000$) due to their heavy computational demands. To…
▽ More
Image dehazing, a pivotal task in low-level vision, aims to restore the visibility and detail from hazy images. Many deep learning methods with powerful representation learning capability demonstrate advanced performance on non-homogeneous dehazing, however, these methods usually struggle with processing high-resolution images (e.g., $4000 \times 6000$) due to their heavy computational demands. To address these challenges, we introduce an innovative non-homogeneous Dehazing method via Deformable Convolutional Transformer-like architecture (DehazeDCT). Specifically, we first design a transformer-like network based on deformable convolution v4, which offers long-range dependency and adaptive spatial aggregation capabilities and demonstrates faster convergence and forward speed. Furthermore, we leverage a lightweight Retinex-inspired transformer to achieve color correction and structure refinement. Extensive experiment results and highly competitive performance of our method in NTIRE 2024 Dense and Non-Homogeneous Dehazing Challenge, ranking second among all 16 submissions, demonstrate the superior capability of our proposed method. The code is available: https://github.com/movingforward100/Dehazing_R.
△ Less
Submitted 24 May, 2024;
originally announced July 2024.
-
SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention
Authors:
Yunzhong Si,
Huiying Xu,
Xinzhong Zhu,
Wenhao Zhang,
Yao Dong,
Yuxing Chen,
Hongbo Li
Abstract:
Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potenti…
▽ More
Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potential of multi-semantic information for feature guidance and mitigation of semantic disparities. Our study attempts to reveal the synergistic relationship between spatial and channel attention at multiple semantic levels, proposing a novel Spatial and Channel Synergistic Attention module (SCSA). Our SCSA consists of two parts: the Shareable Multi-Semantic Spatial Attention (SMSA) and the Progressive Channel-wise Self-Attention (PCSA). SMSA integrates multi-semantic information and utilizes a progressive compression strategy to inject discriminative spatial priors into PCSA's channel self-attention, effectively guiding channel recalibration. Additionally, the robust feature interactions based on the self-attention mechanism in PCSA further mitigate the disparities in multi-semantic information among different sub-features within SMSA. We conduct extensive experiments on seven benchmark datasets, including classification on ImageNet-1K, object detection on MSCOCO 2017, segmentation on ADE20K, and four other complex scene detection datasets. Our results demonstrate that our proposed SCSA not only surpasses the current state-of-the-art attention but also exhibits enhanced generalization capabilities across various task scenarios. The code and models are available at: https://github.com/HZAI-ZJNU/SCSA.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Search for the baryon number and lepton number violating decays $τ^-\to Λπ^-$ and $τ^-\to \barΛπ^-$ at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien
, et al. (349 additional authors not shown)
Abstract:
We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper…
▽ More
We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper limits at 90\% credibility level on the branching fractions of $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛπ^-$ are determined to be $4.7 \times 10^{-8}$ and $4.3 \times 10^{-8}$, respectively.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Robust Skin Color Driven Privacy Preserving Face Recognition via Function Secret Sharing
Authors:
Dong Han,
Yufan Jiang,
Yong Li,
Ricardo Mendes,
Joachim Denzler
Abstract:
In this work, we leverage the pure skin color patch from the face image as the additional information to train an auxiliary skin color feature extractor and face recognition model in parallel to improve performance of state-of-the-art (SOTA) privacy-preserving face recognition (PPFR) systems. Our solution is robust against black-box attacking and well-established generative adversarial network (GA…
▽ More
In this work, we leverage the pure skin color patch from the face image as the additional information to train an auxiliary skin color feature extractor and face recognition model in parallel to improve performance of state-of-the-art (SOTA) privacy-preserving face recognition (PPFR) systems. Our solution is robust against black-box attacking and well-established generative adversarial network (GAN) based image restoration. We analyze the potential risk in previous work, where the proposed cosine similarity computation might directly leak the protected precomputed embedding stored on the server side. We propose a Function Secret Sharing (FSS) based face embedding comparison protocol without any intermediate result leakage. In addition, we show in experiments that the proposed protocol is more efficient compared to the Secret Sharing (SS) based protocol.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.