Search | arXiv e-print repository

Few-shot Image Generation via Information Transfer from the Built Geodesic Surface

Authors: Yuexing Han, Liheng Ruan, Bing Wang

Abstract: Images generated by most of generative models trained with limited data often exhibit deficiencies in either fidelity, diversity, or both. One effective solution to address the limitation is few-shot generative model adaption. However, the type of approaches typically rely on a large-scale pre-trained model, serving as a source domain, to facilitate information transfer to the target domain. In th… ▽ More Images generated by most of generative models trained with limited data often exhibit deficiencies in either fidelity, diversity, or both. One effective solution to address the limitation is few-shot generative model adaption. However, the type of approaches typically rely on a large-scale pre-trained model, serving as a source domain, to facilitate information transfer to the target domain. In this paper, we propose a method called Information Transfer from the Built Geodesic Surface (ITBGS), which contains two module: Feature Augmentation on Geodesic Surface (FAGS); Interpolation and Regularization (I\&R). With the FAGS module, a pseudo-source domain is created by projecting image features from the training dataset into the Pre-Shape Space, subsequently generating new features on the Geodesic surface. Thus, no pre-trained models is needed for the adaption process during the training of generative models with FAGS. I\&R module are introduced for supervising the interpolated images and regularizing their relative distances, respectively, to further enhance the quality of generated images. Through qualitative and quantitative experiments, we demonstrate that the proposed method consistently achieves optimal or comparable results across a diverse range of semantically distinct datasets, even in extremely few-shot scenarios. △ Less

Submitted 2 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00918 [pdf, ps, other]

Partial Wave Analysis of $J/ψ\rightarrow γγφ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (603 additional authors not shown)

Abstract: Using a sample of $(10087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII detector at the BEPCII collider, a partial wave analysis on the decay $γγφ$ is performed to investigate the intermediate resonances in $J/ψ\rightarrowγX, X\rightarrowγφ$. The resonances $f_{1}(1285)$, $η(1405)$, $f_{1}(1420)$, $f_{1}(1510)$, $f_{2}(1525)$, $X(1835)$, $f_{2}(1950)$, $f_{2}(2010)$, $f_{0}(2200)$ and… ▽ More Using a sample of $(10087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII detector at the BEPCII collider, a partial wave analysis on the decay $γγφ$ is performed to investigate the intermediate resonances in $J/ψ\rightarrowγX, X\rightarrowγφ$. The resonances $f_{1}(1285)$, $η(1405)$, $f_{1}(1420)$, $f_{1}(1510)$, $f_{2}(1525)$, $X(1835)$, $f_{2}(1950)$, $f_{2}(2010)$, $f_{0}(2200)$ and $η_{c}$ are observed with statistical significance greater than 5$σ$. The product branching fractions $\mathcal{B}(J/ψ\rightarrowγX, X\rightarrow γφ)$ are reported. The resonance parameters of $η(1405)$ and $X(1835)$ are also measured. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.00878 [pdf, ps, other]

Observation of $\mathcal R(3810)$ in $e^+e^-\rightarrow {\rm hadrons}$ and Improved Measurements of the Resonance Parameters of $\mathcal R(3760)$ and $\mathcal R(3780)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (596 additional authors not shown)

Abstract: We report the measurement of the cross sections for $e^+e^-\rightarrow {\rm hadrons}$ at center-of-mass (c.m.) energies from 3.645 to 3.871 GeV. We observe a new resonance $\mathcal R(3810)$ in the cross sections for the first time, and observe the $\mathcal R(3760)$ resonance with high significance in the cross sections. The $\mathcal R(3810)$ has a mass of $(3804.5 \pm 0.9 \pm 0.9)$ ~MeV/$c^2$,… ▽ More We report the measurement of the cross sections for $e^+e^-\rightarrow {\rm hadrons}$ at center-of-mass (c.m.) energies from 3.645 to 3.871 GeV. We observe a new resonance $\mathcal R(3810)$ in the cross sections for the first time, and observe the $\mathcal R(3760)$ resonance with high significance in the cross sections. The $\mathcal R(3810)$ has a mass of $(3804.5 \pm 0.9 \pm 0.9)$ ~MeV/$c^2$, a total width of $(5.4 \pm 3.5 \pm 3.2)$~MeV, and an electronic partial width of $(19.4 \pm 7.4 \pm 12.1)$~eV. Its significance is $7.7σ$. The $\mathcal R(3810)$ could be interpreted as a hadro-charmonium resonance predicted by Quantum Chromodynamics (QCD). In addition, we measure the mass $(3751.9\pm 3.8\pm 2.8)$ ~MeV/$c^2$, the total width $(32.8 \pm 5.8 \pm 8.7)$~MeV, and the electronic partial width $(184\pm 75\pm 86)$~eV with improved precision for the $\mathcal R(3760)$. Furthermore, for the $\mathcal R(3780)$ we measure the mass $(3778.7\pm 0.5\pm 0.3)$ ~MeV/$c^2$ and total width $(20.3 \pm 0.8 \pm 1.7)$~MeV with improved precision, and the electronic partial width $(265\pm 69\pm 83)$~eV. The $\mathcal R(3780)$ can be interpreted as the $1^3D_1$ state of charmonium. Its mass and total width differ significantly from the corresponding fitted values given by the Particle Data Group in 2022 by 7.1 and 3.2 times the uncertainties for $ψ(3770)$, respectively. $ψ(3770)$ has been interpreted as the $1^3D_1$ state for 45 years. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2401.00646 [pdf, ps, other]

doi 10.1103/PhysRevB.108.214108

High magnetic field phase diagram and weak FM breaking in (Ni0.93Co0.07)3V2O8

Authors: Jiating Wu, Minjie Zhang, Ke Shi, Huxin Yin, Yuyan Han, Lansheng Ling, Wei Tong, Chuanying Xi, Li Pi, Zhaosheng Wang

Abstract: We present magnetostriction and thermal expansion measurements on multiferroic (Ni0.93Co0.07)3V2O8. The high field phase diagrams up to 33 T along the a, b and c directions are built. For H//a, as the magnetic field increases, two intermediate phases appear between the incommensurate phase and the paramagnetic phase at about 7 K, and then a magnetically induced phase appears above the paramagnetic… ▽ More We present magnetostriction and thermal expansion measurements on multiferroic (Ni0.93Co0.07)3V2O8. The high field phase diagrams up to 33 T along the a, b and c directions are built. For H//a, as the magnetic field increases, two intermediate phases appear between the incommensurate phase and the paramagnetic phase at about 7 K, and then a magnetically induced phase appears above the paramagnetic phase. For H//b,thermal expansion measurement indicates a mutation in the spin lattice coupling of the high field phases. The interlaced phase boundary suggests a mixed state in the optical high field phase. For H//c, an intermediate phase between the commensurate phase and the incommensurate phase is detected. A nonlinear boundary between the intermediate phase and the low temperature incommensurate phase, and a clear boundary between the commensurate phase and the paramagnetic phase are found. These results indicate that do** Co2+ breaks the weak ferromagnetic moment of the commensurate phase, which exists in the parent compound Ni3V2O8 and (Ni0.9Co0.1)3V2O8. This nonlinear influence reflects complicated spin modulation in Ni3V2O8 by do** Co2+. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. B 108, 214108(2023)

arXiv:2401.00637 [pdf]

Nonlinear vibration of a dipteran flight robot system with rotational geometric nonlinearity

Authors: Yanwei Han, Zijian Zhang

Abstract: The dipteran flight mechanism of the insects is commonly used to design the nonlinear flight robot system. However, the dynamic response of the click mechanism of the nonlinear robot system with multiple stability still unclear. In this paper, a novel dipteran robot model with click mechanism proposed based on the multiple stability of snap-through buckling. The motion of equation of the nonlinear… ▽ More The dipteran flight mechanism of the insects is commonly used to design the nonlinear flight robot system. However, the dynamic response of the click mechanism of the nonlinear robot system with multiple stability still unclear. In this paper, a novel dipteran robot model with click mechanism proposed based on the multiple stability of snap-through buckling. The motion of equation of the nonlinear flight robot system is obtained by using the Euler-Lagrange equation. The nonlinear potential energy, the elastic force, equilibrium bifurcation, as well as equilibrium stability are investigated to show the multiple stability characteristics. The transient sets of bifurcation and persistent set of regions in the system parameter plane and the corresponding phase portraits are obtained with multiple stability of single and double well behaviors. Then, the periodic free vibration response are defined by the analytical solution of three kinds of elliptical functions, as well as the amplitude frequency responses are investigated by numerical integration. Based on the topological equivalent method, the chaotic thresholds of the homo-clinic orbits for the chaotic vibration of harmonic forced robot system are derived to show the chaotic parametric condition. Finally, the prototype of nonlinear flap** robot is manufactured and the experimental system is setup. The nonlinear static moment of force curves, periodic response and dynamic flight vibration of dipteran robot system are carried out. It is shown that the test results are agree well with the theoretical analysis and numerical simulation. Those result have the potential application for the structure design of the efficient flight robot. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 30 pages, 24 figure

MSC Class: 70K05 ACM Class: J.2.7

arXiv:2401.00551 [pdf, other]

A Generalist FaceX via Learning Unified Facial Representation

Authors: Yue Han, Jiangning Zhang, Junwei Zhu, Xiangtai Li, Yanhao Ge, Wei Li, Chengjie Wang, Yong Liu, Xiaoming Liu, Ying Tai

Abstract: This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously. To achieve this goal, we initially formulate a unified facial representation for a broad spectrum of facial editing tasks, which macroscopically decomposes a face into fundamental identity, intra-personal variation, and environmental factors. Based on this, we introduce Faci… ▽ More This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously. To achieve this goal, we initially formulate a unified facial representation for a broad spectrum of facial editing tasks, which macroscopically decomposes a face into fundamental identity, intra-personal variation, and environmental factors. Based on this, we introduce Facial Omni-Representation Decomposing (FORD) for seamless manipulation of various facial components, microscopically decomposing the core aspects of most facial editing tasks. Furthermore, by leveraging the prior of a pretrained StableDiffusion (SD) to enhance generation quality and accelerate training, we design Facial Omni-Representation Steering (FORS) to first assemble unified facial representations and then effectively steer the SD-aware generation process by the efficient Facial Representation Controller (FRC). %Without any additional features, Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks. Full codes and models will be available at https://github.com/diffusion-facex/FaceX. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: Project page: https://diffusion-facex.github.io/

arXiv:2312.17483 [pdf, ps, other]

Maximizing the Yield of Bucket Brigade Quantum Random Access Memory using Redundancy Repair

Authors: Dongmin Kim, Sovanmonynuth Heng, Sengthai Heng, Youngsun Han

Abstract: Quantum Random Access Memory (qRAM) is an essential computing element for running oracle-based quantum algorithms. qRAM exploits the principle of quantum superposition to access all data stored in the memory cell simultaneously and guarantees the superior performance of quantum algorithms. A qRAM memory cell comprises logical qubits encoded through quantum error correction technology for the succe… ▽ More Quantum Random Access Memory (qRAM) is an essential computing element for running oracle-based quantum algorithms. qRAM exploits the principle of quantum superposition to access all data stored in the memory cell simultaneously and guarantees the superior performance of quantum algorithms. A qRAM memory cell comprises logical qubits encoded through quantum error correction technology for the successful operation of qRAM against various quantum noises. In addition to quantum noise, the low-technology nodes based on silicon technology can increase the qubit density and may introduce defective qubits. As qRAM comprises many qubits, its yield will be reduced by defective qubits; these qubits must be handled using QEC scheme. However, the QEC scheme requires numerous physical qubits, which burdens resource overhead. To resolve this overhead problem, we propose a quantum memory architecture that compensates for defective qubits by introducing redundant qubits. We also analyze the yield improvement offered by our proposed architecture by varying the ideal fabrication error rate from 0.5% to 1% for different numbers of logical qubits in the qRAM. In the qRAM comprising 1,024 logical qubits, eight redundant logical qubits improved the yield by 95.92% from that of qRAM not employing the redundant repair scheme. △ Less

Submitted 22 May, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: 12 pages, 7 figures, 1 table

arXiv:2312.17282 [pdf]

Nonlinear energy harvesting system with multiple stability

Authors: Yanwei Han, Zijian Zhang

Abstract: The nonlinear energy harvesting systems of the forced vibration with an electron-mechanical coupling are widely used to capture ambient vibration energy and convert mechanical energy into electrical energy. However, the nonlinear response mechanism of the friction induced vibration (FIV) energy harvesting system with multiple stability and stick-slip motion is still unclear. In the current paper,… ▽ More The nonlinear energy harvesting systems of the forced vibration with an electron-mechanical coupling are widely used to capture ambient vibration energy and convert mechanical energy into electrical energy. However, the nonlinear response mechanism of the friction induced vibration (FIV) energy harvesting system with multiple stability and stick-slip motion is still unclear. In the current paper, a novel nonlinear energy harvesting model with multiple stability of single-, double- and triple-well potential is proposed based on V-shaped structure spring and the belt conveying system. The dynamic equations for the energy harvesting system with multiple stability and self-excited friction are established by using Euler-Lagrangian equations. Secondly, the nonlinear restoring force, friction force, and potential energy surfaces for static characteristics of the energy harvesting system are obtained to show the nonlinear varying stiffness, multiple equilibrium points, discontinuous behaviors and multiple well response. Then, the equilibrium surface of bifurcation sets of the autonomous system is given to show the third-order quasi zero stiffness (QZS3), fifth-order quasi zero stiffness (QZS5), double well (DW) and triple well (TW). Furthermore, the response amplitudes of charge, current, voltage and power of the forced electron-mechanical coupled vibration system for QZS3, QZS5, DW and TW are analyzed by using the numerically solution. Finally, a prototype of FIV energy harvesting system is manufactured and the experimental system is setup. The experimental work of static restoring force, dam** force and electrical output are well agreeable with the numerical results, which testified the proposed FIV energy harvesting model. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 29 Pages, 29 figures

MSC Class: 34-xx ACM Class: J.2

arXiv:2312.16954 [pdf, other]

Blockchain-based Privacy-Preserving Public Key Searchable Encryption with Strong Traceability

Authors: Yue Han, **guang Han, Weizhi Meng, Jianchang Lai, Ge Wu

Abstract: Public key searchable encryption (PKSE) scheme allows data users to search over encrypted data. To identify illegal users, many traceable PKSE schemes have been proposed. However, existing schemes cannot trace the keywords which illegal users searched and protect users' privacy simultaneously. In some practical applications, tracing both illegal users' identities and the keywords which they search… ▽ More Public key searchable encryption (PKSE) scheme allows data users to search over encrypted data. To identify illegal users, many traceable PKSE schemes have been proposed. However, existing schemes cannot trace the keywords which illegal users searched and protect users' privacy simultaneously. In some practical applications, tracing both illegal users' identities and the keywords which they searched is quite important to against the abuse of data. It is a challenge to bind users' identities and keywords while protecting their privacy. Moreover, existing traceable PKSE schemes do not consider the unforgeability and immutability of trapdoor query records, which can lead to the occurrence of frame-up and denying. In this paper, to solve these problems, we propose a blockchain-based privacy-preserving PKSE with strong traceability (BP3KSEST) scheme. Our scheme provides the following features: (1) authorized users can authenticate to trapdoor generation center and obtain trapdoors without releasing their identities and keywords; (2) when data users misbehave in the system, the trusted third party (TTP) can trace both their identities and the keywords which they searched; (3) trapdoor query records are unforgeable; (4) trapdoor query records are immutable because records are stored in blockchain. Notably, this scheme is suitable to the scenarios where privacy must be considered, e.g., electronic health record (EHR). We formalize both the definition and security model of our BP3KSEST scheme, and present a concrete construction. Furthermore, the security of the proposed scheme is formally proven. Finally, the implementation and evaluation are conducted to analyze its efficiency. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16554 [pdf, other]

A Theoretical Analysis of Efficiency Constrained Utility-Privacy Bi-Objective Optimization in Federated Learning

Authors: Hanlin Gu, Xinyuan Zhao, Gongxi Zhu, Yuxing Han, Yan Kang, Lixin Fan, Qiang Yang

Abstract: Federated learning (FL) enables multiple clients to collaboratively learn a shared model without sharing their individual data. Concerns about utility, privacy, and training efficiency in FL have garnered significant research attention. Differential privacy has emerged as a prevalent technique in FL, safeguarding the privacy of individual user data while impacting utility and training efficiency.… ▽ More Federated learning (FL) enables multiple clients to collaboratively learn a shared model without sharing their individual data. Concerns about utility, privacy, and training efficiency in FL have garnered significant research attention. Differential privacy has emerged as a prevalent technique in FL, safeguarding the privacy of individual user data while impacting utility and training efficiency. Within Differential Privacy Federated Learning (DPFL), previous studies have primarily focused on the utility-privacy trade-off, neglecting training efficiency, which is crucial for timely completion. Moreover, differential privacy achieves privacy by introducing controlled randomness (noise) on selected clients in each communication round. Previous work has mainly examined the impact of noise level ($σ$) and communication rounds ($T$) on the privacy-utility dynamic, overlooking other influential factors like the sample ratio ($q$, the proportion of selected clients). This paper systematically formulates an efficiency-constrained utility-privacy bi-objective optimization problem in DPFL, focusing on $σ$, $T$, and $q$. We provide a comprehensive theoretical analysis, yielding analytical solutions for the Pareto front. Extensive empirical experiments verify the validity and efficacy of our analysis, offering valuable guidance for low-cost parameter design in DPFL. △ Less

Submitted 29 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.16383 [pdf, ps, other]

Frame-level emotional state alignment method for speech emotion recognition

Authors: Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, **long Xue, Yichen Han, Ya Li

Abstract: Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address th… ▽ More Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address this problem, we propose a frame-level emotional state alignment method for SER. First, we fine-tune HuBERT model to obtain a SER system with task-adaptive pretraining (TAPT) method, and extract embeddings from its transformer layers to form frame-level pseudo-emotion labels with clustering. Then, the pseudo labels are used to pretrain HuBERT. Hence, the each frame output of HuBERT has corresponding emotional information. Finally, we fine-tune the above pretrained HuBERT for SER by adding an attention layer on the top of it, which can focus only on those frames that are emotionally more consistent with utterance-level label. The experimental results performed on IEMOCAP indicate that our proposed method performs better than state-of-the-art (SOTA) methods. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2312.15868 [pdf, other]

Video Frame Interpolation with Region-Distinguishable Priors from SAM

Authors: Yan Han, Xiaogang Xu, Yingqi Lin, Jiafei Wu, Zhe Liu

Abstract: In existing Video Frame Interpolation (VFI) approaches, the motion estimation between neighboring frames plays a crucial role. However, the estimation accuracy in existing methods remains a challenge, primarily due to the inherent ambiguity in identifying corresponding areas in adjacent frames for interpolation. Therefore, enhancing accuracy by distinguishing different regions before motion estima… ▽ More In existing Video Frame Interpolation (VFI) approaches, the motion estimation between neighboring frames plays a crucial role. However, the estimation accuracy in existing methods remains a challenge, primarily due to the inherent ambiguity in identifying corresponding areas in adjacent frames for interpolation. Therefore, enhancing accuracy by distinguishing different regions before motion estimation is of utmost importance. In this paper, we introduce a novel solution involving the utilization of open-world segmentation models, e.g., SAM (Segment Anything Model), to derive Region-Distinguishable Priors (RDPs) in different frames. These RDPs are represented as spatial-varying Gaussian mixtures, distinguishing an arbitrary number of areas with a unified modality. RDPs can be integrated into existing motion-based VFI methods to enhance features for motion estimation, facilitated by our designed play-and-plug Hierarchical Region-aware Feature Fusion Module (HRFFM). HRFFM incorporates RDP into various hierarchical stages of VFI's encoder, using RDP-guided Feature Normalization (RDPFN) in a residual learning manner. With HRFFM and RDP, the features within VFI's encoder exhibit similar representations for matched regions in neighboring frames, thus improving the synthesis of intermediate frames. Extensive experiments demonstrate that HRFFM consistently enhances VFI performance across various scenes. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Code will be released

arXiv:2312.15855 [pdf, other]

Geometric-Aware Low-Light Image and Video Enhancement via Depth Guidance

Authors: Yingqi Lin, Xiaogang Xu, Yan Han, Jiafei Wu, Zhe Liu

Abstract: Low-Light Enhancement (LLE) is aimed at improving the quality of photos/videos captured under low-light conditions. It is worth noting that most existing LLE methods do not take advantage of geometric modeling. We believe that incorporating geometric information can enhance LLE performance, as it provides insights into the physical structure of the scene that influences illumination conditions. To… ▽ More Low-Light Enhancement (LLE) is aimed at improving the quality of photos/videos captured under low-light conditions. It is worth noting that most existing LLE methods do not take advantage of geometric modeling. We believe that incorporating geometric information can enhance LLE performance, as it provides insights into the physical structure of the scene that influences illumination conditions. To address this, we propose a Geometry-Guided Low-Light Enhancement Refine Framework (GG-LLERF) designed to assist low-light enhancement models in learning improved features for LLE by integrating geometric priors into the feature representation space. In this paper, we employ depth priors as the geometric representation. Our approach focuses on the integration of depth priors into various LLE frameworks using a unified methodology. This methodology comprises two key novel modules. First, a depth-aware feature extraction module is designed to inject depth priors into the image representation. Then, Hierarchical Depth-Guided Feature Fusion Module (HDGFFM) is formulated with a cross-domain attention mechanism, which combines depth-aware features with the original image features within the LLE model. We conducted extensive experiments on public low-light image and video enhancement benchmarks. The results illustrate that our designed framework significantly enhances existing LLE methods. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: code will be released

arXiv:2312.13771 [pdf, other]

AppAgent: Multimodal Agents as Smartphone Users

Authors: Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

Abstract: Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tap**… ▽ More Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tap** and swi**. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shop**, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks. △ Less

Submitted 21 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Project Page is https://appagent-official.github.io/

arXiv:2312.13593 [pdf, ps, other]

Search for the decay $χ_{c1}(3872)\toπ^{+}π^{-}χ_{c1}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (608 additional authors not shown)

Abstract: Using a data sample corresponding to an integrated luminosity of 10.9 fb$^{-1}$ collected at center-of-mass energies from 4.16 to 4.34 GeV with the BESIII detector, we search for the decay $χ_{c1}(3872) \to π^{+}π^{-}χ_{c1}$ in the radiative production $e^{+}e^{-} \to γχ_{c1}(3872)$. No significant signal is observed, and the ratio for the branching fraction of $χ_{c1}(3872) \to π^{+}π^{-}χ_{c1}$… ▽ More Using a data sample corresponding to an integrated luminosity of 10.9 fb$^{-1}$ collected at center-of-mass energies from 4.16 to 4.34 GeV with the BESIII detector, we search for the decay $χ_{c1}(3872) \to π^{+}π^{-}χ_{c1}$ in the radiative production $e^{+}e^{-} \to γχ_{c1}(3872)$. No significant signal is observed, and the ratio for the branching fraction of $χ_{c1}(3872) \to π^{+}π^{-}χ_{c1}$ to $χ_{c1}(3872) \to π^{+}π^{-}J/ψ$ is measured as $\mathcal{R}\equiv\frac{\mathcal{B}[χ_{c1}(3872) \to π^{+}π^{-}χ_{c1}]}{\mathcal{B}[χ_{c1}(3872) \to π^{+}π^{-} J/ψ]}<0.18$ at 90$\%$ confidence level. The upper limit on the product of the cross section $σ[e^{+}e^{-}\toγχ_{c1}(3872)]$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^{+}π^{-}χ_{c1}]$ at each center-of-mass energy is also given. These measurements favor the non-conventional charmonium nature of the $χ_{c1}(3872)$ state. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 8 pages, 1 figure

arXiv:2312.13516 [pdf, ps, other]

Stochastic Maximum Principle for a generalized Volterra Control System

Authors: Yuhang Li, Yuecai Han

Abstract: In this paper, we consider the stochastic optimal control problem for a generalized Volterra control system. The corresponding state process is a kind of a generalized stochastic Volterra integral differential equations. We prove the existence and uniqueness of the solution of this type of equations. We obtain the stochastic maximum principle of the optimal control system by introducing a kind of… ▽ More In this paper, we consider the stochastic optimal control problem for a generalized Volterra control system. The corresponding state process is a kind of a generalized stochastic Volterra integral differential equations. We prove the existence and uniqueness of the solution of this type of equations. We obtain the stochastic maximum principle of the optimal control system by introducing a kind of generalized anticipated backward stochastic differential equations. We prove the existence and uniqueness of the solution of this adjoint equation, which may be singular at some points. As an application, the linear quadratic control problem is investigated to illustrate the main results. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.13043 [pdf, other]

Search for the $e^+e^-\toη_{b}(1S)ω$ and $e^+e^-\toχ_{b0}(1P)ω$ processes at $\sqrt{s}=10.745\,\mathrm{GeV}$

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, M. Barrett, J. Baudot, M. Bauer, A. Baur, A. Beaubien, F. Becherer, J. Becker , et al. (397 additional authors not shown)

Abstract: We search for the $e^+e^-\toη_b(1S)ω$ and $e^+e^-\toχ_{b0}(1P)ω$ processes at a center-of-mass energy of 10.745 GeV, which is close to the peak of the $Υ(10753)$ state. We use data collected by the Belle II experiment during a special run, corresponding to an integrated luminosity of $9.8\,\mathrm{fb}^{-1}$. We reconstruct $ω\toπ^+π^-π^0$ decays and use the $ω$ meson's recoil mass to search for th… ▽ More We search for the $e^+e^-\toη_b(1S)ω$ and $e^+e^-\toχ_{b0}(1P)ω$ processes at a center-of-mass energy of 10.745 GeV, which is close to the peak of the $Υ(10753)$ state. We use data collected by the Belle II experiment during a special run, corresponding to an integrated luminosity of $9.8\,\mathrm{fb}^{-1}$. We reconstruct $ω\toπ^+π^-π^0$ decays and use the $ω$ meson's recoil mass to search for the signals. We do not find evidence for either process, and set upper limits on the corresponding Born-level cross sections of 2.5 pb and 7.8 pb, respectively, at the 90% confidence level. The $χ_{b0}(1P)ω$ limit is the result of a combination of this analysis and a previous search using full reconstruction. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 8 pages, 3 figures, submitted to PRD

Report number: Belle II Preprint 2023-018, KEK Preprint 2023-36

arXiv:2312.12991 [pdf, other]

Multi-Calorimetry in Light-based Neutrino Detectors

Authors: Anatael Cabrera, Yang Han, Steven Calvez, Emmanuel Chauveau, Hanyi Chen, Hervé de Kerret, Stefano Dusini, Marco Grassi, Leonard Imbert, Jiajun Li, Roberto Carlos Mandujano, Diana Navas-Nicolás, Hiroshi Nunokawa, Michel Obolensky, Juan Pedro Ochoa-Ricoux, Guillaume Pronost, Benoit Viaud, Frederic Yermia

Abstract: Neutrino detectors are among the largest photonics instruments built for fundamental research. Since its inception, neutrino detection has been inexorably linked to the challenging detection of scarce photons in huge instrumented volumes. Many discoveries in neutrino physics, including the neutrino itself, are inseparable from the evolution of the detector photonics interfaces, i.e. photo-sensors… ▽ More Neutrino detectors are among the largest photonics instruments built for fundamental research. Since its inception, neutrino detection has been inexorably linked to the challenging detection of scarce photons in huge instrumented volumes. Many discoveries in neutrino physics, including the neutrino itself, are inseparable from the evolution of the detector photonics interfaces, i.e. photo-sensors and readout electronics, to yield ever higher precision and richer detection information. The measurement of the energy of neutrinos, referred to as calorimetry, is pursued today to reach permille level systematics control precision, thus leading to further innovation in specialised photonics. This publication describes a novel articulation that detectors may be endowed with multiple photonics interfaces for simultaneous light detection to yield unprecedented high-precision calorimetry. This multi-calorimetry approach opens the novel notion of dual-calorimetry detectors as an evolution from the single-calorimetry setups used over several decades for most experiments so far. The dual-calorimetry design exploits unique response synergies between photon counting and photon-integration detection systems, including correlations and cancellations between calorimetric responses, to yield the unprecedented mitigation of the dominant response systematic effects today for the possible improved design of a new generation of neutrino experiments. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.12847 [pdf, ps, other]

Moments of Mandelbrot cascades at critical exponents

Authors: Yong Han, Yanqi Qiu, Zipeng Wang

Abstract: We obtain the asymptotic growth rate of the moments of the Mandelbrot random cascades at critical exponents. The key ingredient is a $q$ to $q/2$ reduction method for the moment-estimation, which is obtained by combining the martingale inequalities due to Burkholder and Burkholder-Rosenthal. We obtain the asymptotic growth rate of the moments of the Mandelbrot random cascades at critical exponents. The key ingredient is a $q$ to $q/2$ reduction method for the moment-estimation, which is obtained by combining the martingale inequalities due to Burkholder and Burkholder-Rosenthal. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 15 pages

arXiv:2312.12798 [pdf]

Effect of Resonant Acoustic Powder Mixing on Delay Time of W-KClO4-BaCrO4 Mixtures

Authors: Kyungmin Kwon, Seunghwan Ryu, Soyun Joo, Youngjoon Han, Donghyeon Baek, Moonsoo Park, Dongwon Kim, Seungbum Hong

Abstract: This study investigates the impact of resonant acoustic powder mixing on the delay time of the W-KClO4-BaCrO4 (WKB) mixture and its potential implications for powder and material synthesis. Through thermal analysis, an inverse linear relationship was found between thermal conductivity and delay time, allowing us to use thermal conductivity as a reliable proxy for the delay time. By comparing the t… ▽ More This study investigates the impact of resonant acoustic powder mixing on the delay time of the W-KClO4-BaCrO4 (WKB) mixture and its potential implications for powder and material synthesis. Through thermal analysis, an inverse linear relationship was found between thermal conductivity and delay time, allowing us to use thermal conductivity as a reliable proxy for the delay time. By comparing the thermal conductivity of WKB mixtures mixed manually and using acoustic powder mixer, we found that acoustic powder mixing resulted in minimal deviations in thermal conductivity, proving more uniform mixing. Furthermore, DSC analysis and Sestak-Berggren modeling demonstrated consistent reaction dynamics with a constant activation energy as the reaction progressed in samples mixed using acoustic waves. These findings underscore the critical role of uniform powder mixing in enhancing the thermodynamic quality of the WKB mixture and emphasize the importance of develo** novel methods for powder and material synthesis. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 29 pages, 8 figures

arXiv:2312.12585 [pdf, other]

BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

Authors: **g Cui, Yufei Han, Yuzhe Ma, Jianbin Jiao, Junge Zhang

Abstract: Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algo… ▽ More Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts 0.003% of total training steps) during training and infrequent attacks during testing. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Extended version of the submission accepted by AAAI 2024. It is revised by integrating review comments

arXiv:2312.12198 [pdf, other]

Mask Grounding for Referring Image Segmentation

Authors: Yong Xien Chng, Henry Zheng, Yizeng Han, Xuchong Qiu, Gao Huang

Abstract: Referring Image Segmentation (RIS) is a challenging task that requires an algorithm to segment objects referred by free-form language expressions. Despite significant progress in recent years, most state-of-the-art (SOTA) methods still suffer from considerable language-image modality gap at the pixel and word level. These methods generally 1) rely on sentence-level language features for language-i… ▽ More Referring Image Segmentation (RIS) is a challenging task that requires an algorithm to segment objects referred by free-form language expressions. Despite significant progress in recent years, most state-of-the-art (SOTA) methods still suffer from considerable language-image modality gap at the pixel and word level. These methods generally 1) rely on sentence-level language features for language-image alignment and 2) lack explicit training supervision for fine-grained visual grounding. Consequently, they exhibit weak object-level correspondence between visual and language features. Without well-grounded features, prior methods struggle to understand complex expressions that require strong reasoning over relationships among multiple objects, especially when dealing with rarely used or ambiguous clauses. To tackle this challenge, we introduce a novel Mask Grounding auxiliary task that significantly improves visual grounding within language features, by explicitly teaching the model to learn fine-grained correspondence between masked textual tokens and their matching visual objects. Mask Grounding can be directly used on prior RIS methods and consistently bring improvements. Furthermore, to holistically address the modality gap, we also design a cross-modal alignment loss and an accompanying alignment module. These additions work synergistically with Mask Grounding. With all these techniques, our comprehensive approach culminates in MagNet (Mask-grounded Network), an architecture that significantly outperforms prior arts on three key benchmarks (RefCOCO, RefCOCO+ and G-Ref), demonstrating our method's effectiveness in addressing current limitations of RIS algorithms. Our code and pre-trained weights will be released. △ Less

Submitted 25 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR2024; Project page: https://yxchng.github.io/projects/mask-grounding

arXiv:2312.12146 [pdf, ps, other]

doi 10.1007/s00023-024-01467-6

Deviation of top eigenvalue for some tridiagonal matrices under various moment assumptions

Authors: Yi Han

Abstract: Symmetric tridiagonal matrices appear ubiquitously in mathematical physics, serving as the matrix representation of discrete random Schrödinger operators. In this work we investigate the top eigenvalue of these matrices in the large deviation regime, assuming the random potentials are on the diagonal with a certain decaying factor $N^{-α}$, and the probability law $μ$ of the potentials satisfy spe… ▽ More Symmetric tridiagonal matrices appear ubiquitously in mathematical physics, serving as the matrix representation of discrete random Schrödinger operators. In this work we investigate the top eigenvalue of these matrices in the large deviation regime, assuming the random potentials are on the diagonal with a certain decaying factor $N^{-α}$, and the probability law $μ$ of the potentials satisfy specific decay assumptions. We investigate two different models, one of which has random matrix behavior at the spectral edge but the other does not. Both the light-tailed regime, i.e. when $μ$ has all moments, and the heavy-tailed regime are covered. Precise right tail estimates and a crude left tail estimate are derived. In particular we show that when the tail $μ$ has a certain decay rate, then the top eigenvalue is distributed as the Frechet law composed with some deterministic functions. The proof relies on computing one point perturbations of fixed tridiagonal matrices. △ Less

Submitted 6 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 17 pages. Accepted in Annales Henri Poincare

arXiv:2312.11893 [pdf, ps, other]

Maximum Principle for Control System driven by Mixed Fractional Brownian Motion

Authors: Yuhang Li, Yuecai Han

Abstract: In this paper, we study the optimal control problem for system driven by mixed fractional Brownian motion (including a fractional Brownian motion with Hurst parameter $H>1/2$ and the underlying standard Brownian motion). By using Malliavin calculus, we obtain the necessary condition the optimal control should satisfy. Through martingale representation theorem and the properties of the transforms o… ▽ More In this paper, we study the optimal control problem for system driven by mixed fractional Brownian motion (including a fractional Brownian motion with Hurst parameter $H>1/2$ and the underlying standard Brownian motion). By using Malliavin calculus, we obtain the necessary condition the optimal control should satisfy. Through martingale representation theorem and the properties of the transforms operator, we give out the adjoint backward stochastic differential equation in a natural way. As a straightforward consequence, the maximum principle for control system driven by fractional Brownian motion and an independent Brownian motion is also deduced, which is different to the underlying case. As an application, the linear quadratic case is investigated to illustrate the main results. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.10962 [pdf, other]

Observation of significant flavor-SU(3) breaking in the kaon wave function at $12~{\rm GeV}^2<Q^2<25~{\rm GeV}^2$ and discovery of the charmless decay $ψ(3770)\to K_S^0K_L^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (607 additional authors not shown)

Abstract: We present cross sections for the reaction $e^+e^-\to K_S^0K_L^0$ at center-of-mass energies ranging from 3.51 GeV to 4.95 GeV using data samples collected in the BESIII experiment, corresponding to a total integrated luminosity of 26.5 fb$^{-1}$. The ratio of neutral-to-charged kaon form factors at large momentum transfers ($12~{\rm GeV}^2<Q^2<25~{\rm GeV}^2$) is determined to be $0.21\pm 0.01$,… ▽ More We present cross sections for the reaction $e^+e^-\to K_S^0K_L^0$ at center-of-mass energies ranging from 3.51 GeV to 4.95 GeV using data samples collected in the BESIII experiment, corresponding to a total integrated luminosity of 26.5 fb$^{-1}$. The ratio of neutral-to-charged kaon form factors at large momentum transfers ($12~{\rm GeV}^2<Q^2<25~{\rm GeV}^2$) is determined to be $0.21\pm 0.01$, which indicates a small but significant effect of flavor-SU(3) breaking in the kaon wave function, and consequently excludes the possibility that flavor-SU(3) breaking is the primary reason for the strong experimental violation of the pQCD prediction $|F(π^{\pm})|/|F(K^{\pm})|=f^2_π/f^2_{K}$, where $F(π^{\pm})$ and $F(K^{\pm})$ are the form factors, and $f_π$ and $f_{K}$ are the decay constants of charged pions and kaons, respectively. We also observe a significant signal for the charmless decay $ψ(3770)\to K_S^0K_L^0$ for the first time. Within a $1σ$ contour of the likelihood value, the the branching fraction for $ψ(3770)\to K_S^0K_L^0$ is determined to be ${\cal B}=(2.63_{-1.59}^{+1.40})\times 10^{-5}$, and the relative phase between the continuum and $ψ(3770)$ amplitudes is $φ=(-0.39_{-0.10}^{+0.05})π$. The branching fraction is in good agreement with the $\mathcal{S}$- and $\mathcal{D}$-wave charmonia mixing scheme proposed in the interpretation of the "$ρπ$ puzzle" between $J/ψ$ and $ψ(3686)$ decays. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 18 pages, 56 figures

arXiv:2312.10358 [pdf, other]

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Authors: Yayue Deng, **long Xue, Yukang Jia, Qifei Li, Yichen Han, Feng** Wang, Yingming Gao, Dengfeng Ke, Ya Li

Abstract: Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context comprehension, context representation still lacks effective representation capabilities and context-sensitive discriminability. In this paper, we introduce a con… ▽ More Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context comprehension, context representation still lacks effective representation capabilities and context-sensitive discriminability. In this paper, we introduce a contrastive learning-based CSS framework, CONCSS. Within this framework, we define an innovative pretext task specific to CSS that enables the model to perform self-supervised learning on unlabeled conversational datasets to boost the model's context understanding. Additionally, we introduce a sampling strategy for negative sample augmentation to enhance context vectors' discriminability. This is the first attempt to integrate contrastive learning into CSS. We conduct ablation studies on different contrastive learning strategies and comprehensive experiments in comparison with prior CSS systems. Results demonstrate that the synthesized speech from our proposed method exhibits more contextually appropriate and sensitive prosody. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: 5 pages, 2 figures, 3 tables, Accepted by ICASSP 2024

arXiv:2312.10112 [pdf, other]

NM-FlowGAN: Modeling sRGB Noise with a Hybrid Approach based on Normalizing Flows and Generative Adversarial Networks

Authors: Young Joo Han, Ha-** Yu

Abstract: Modeling and synthesizing real sRGB noise is crucial for various low-level vision tasks, such as building datasets for training image denoising systems. The distribution of real sRGB noise is highly complex and affected by a multitude of factors, making its accurate modeling extremely challenging. Therefore, recent studies have proposed methods that employ data-driven generative models, such as ge… ▽ More Modeling and synthesizing real sRGB noise is crucial for various low-level vision tasks, such as building datasets for training image denoising systems. The distribution of real sRGB noise is highly complex and affected by a multitude of factors, making its accurate modeling extremely challenging. Therefore, recent studies have proposed methods that employ data-driven generative models, such as generative adversarial networks (GAN) and Normalizing Flows. These studies achieve more accurate modeling of sRGB noise compared to traditional noise modeling methods. However, there are performance limitations due to the inherent characteristics of each generative model. To address this issue, we propose NM-FlowGAN, a hybrid approach that exploits the strengths of both GAN and Normalizing Flows. We simultaneously employ a pixel-wise noise modeling network based on Normalizing Flows, and spatial correlation modeling networks based on GAN. In our experiments, our NM-FlowGAN outperforms other baselines on the sRGB noise synthesis task. Moreover, the denoising neural network, trained with synthesized image pairs from our model, also shows superior performance compared to other baselines. Our code is available at: \url{https://github.com/YoungJooHan/NM-FlowGAN}. △ Less

Submitted 14 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 25 pages, 11 figures, 7 tables

MSC Class: 68T45 ACM Class: I.4.4

arXiv:2312.10104 [pdf, other]

Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models

Authors: Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

Abstract: As Archimedes famously said, ``Give me a lever long enough and a fulcrum on which to place it, and I shall move the world'', in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters. Specifically, we use this tiny \textbf{Lever-LM} to configure effective in-context demonstration (ICD)… ▽ More As Archimedes famously said, ``Give me a lever long enough and a fulcrum on which to place it, and I shall move the world'', in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters. Specifically, we use this tiny \textbf{Lever-LM} to configure effective in-context demonstration (ICD) sequences to improve the In-Context Learinng (ICL) performance of LVLMs. Previous studies show that diverse ICD configurations like the selection and ordering of the demonstrations heavily affect the ICL performance, highlighting the significance of configuring effective ICD sequences. Motivated by this and by re-considering the the process of configuring ICD sequence, we find this is a mirror process of human sentence composition and further assume that effective ICD configurations may contain internal statistical patterns that can be captured by Lever-LM. Then a dataset with effective ICD sequences is constructed to train Lever-LM. After training, given novel queries, new ICD sequences are configured by the trained Lever-LM to solve vision-language tasks through ICL. Experiments show that these ICD sequences can improve the ICL performance of two LVLMs compared with some strong baselines in Visual Question Answering and Image Captioning, validating that Lever-LM can really capture the statistical patterns for levering LVLMs. △ Less

Submitted 6 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 17 pages, 6 figures

arXiv:2312.10103 [pdf, other]

GSVA: Generalized Segmentation via Multimodal Large Language Models

Authors: Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

Abstract: Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. GRES poses challenges in modeling the complex spatial relationships of the instances in the image and identifying non-existing referents. Multimodal Large Language Models (MLLMs) have recently shown tremendous progre… ▽ More Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. GRES poses challenges in modeling the complex spatial relationships of the instances in the image and identifying non-existing referents. Multimodal Large Language Models (MLLMs) have recently shown tremendous progress in these complicated vision-language tasks. Connecting Large Language Models (LLMs) and vision models, MLLMs are proficient in understanding contexts with visual inputs. Among them, LISA, as a representative, adopts a special [SEG] token to prompt a segmentation mask decoder, e.g., SAM, to enable MLLMs in the RES task. However, existing solutions to GRES remain unsatisfactory since current segmentation MLLMs cannot correctly handle the cases where users might reference multiple subjects in a singular prompt or provide descriptions incongruent with any image target. In this paper, we propose Generalized Segmentation Vision Assistant (GSVA) to address this gap. Specifically, GSVA reuses the [SEG] token to prompt the segmentation model towards supporting multiple mask references simultaneously and innovatively learns to generate a [REJ] token to reject the null targets explicitly. Experiments validate GSVA's efficacy in resolving the GRES issue, marking a notable enhancement and setting a new record on the GRES benchmark gRefCOCO dataset. GSVA also proves effective across various classic referring segmentation and comprehension tasks. △ Less

Submitted 21 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR2024 (19 pages, 9 figures, 11 tables)

arXiv:2312.09827 [pdf, other]

doi 10.1103/PhysRevC.109.054910

Identified charged-hadron production in $p$$+$Al, $^3$He$+$Au, and Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, J. Alexander, M. Alfred, V. Andrieux, K. Aoki, N. Apadula, H. Asano, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, X. Bai, N. S. Bandara, B. Bannier, K. N. Barish, S. Bathe, V. Baublis , et al. (456 additional authors not shown)

Abstract: The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interprete… ▽ More The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interpreted in terms of radially expanding thermalized systems. The particle ratios of $K/π$ and $p/π$ have been measured in different centrality ranges of large (Cu$+$Au, U$+$U) and small ($p$$+$Al, $^3$He$+$Au) collision systems. The values of $K/π$ ratios measured in all considered collision systems were found to be consistent with those measured in $p$$+$$p$ collisions. However the values of $p/π$ ratios measured in large collision systems reach the values of $\approx0.6$, which is $\approx2$ times larger than in $p$$+$$p$ collisions. These results can be qualitatively understood in terms of the baryon enhancement expected from hadronization by recombination. Identified charged-hadron nuclear-modification factors ($R_{AB}$) are also presented. Enhancement of proton $R_{AB}$ values over meson $R_{AB}$ values was observed in central $^3$He$+$Au, Cu$+$Au, and U$+$U collisions. The proton $R_{AB}$ values measured in $p$$+$Al collision system were found to be consistent with $R_{AB}$ values of $φ$, $π^\pm$, $K^\pm$, and $π^0$ mesons, which may indicate that the size of the system produced in $p$$+$Al collisions is too small for recombination to cause a noticeable increase in proton production. △ Less

Submitted 22 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 480 authors from 78 institutions, 18 pages, 6 tables, 16 figures. v2 is version accepted for publication in Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

Journal ref: Phys. Rev. C 109, 054910 (2024)

arXiv:2312.08874 [pdf, other]

Agent Attention: On the Integration of Softmax and Linear Attention

Authors: Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

Abstract: The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention… ▽ More The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module. The agent tokens first act as the agent for the query tokens $Q$ to aggregate information from $K$ and $V$, and then broadcast the information back to $Q$. Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving global context modelling capability. Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. Extensive experiments demonstrate the effectiveness of agent attention with various vision Transformers and across diverse vision tasks, including image classification, object detection, semantic segmentation and image generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Code is available at https://github.com/LeapLabTHU/Agent-Attention. △ Less

Submitted 22 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.07572 [pdf, ps, other]

Search for $D^{0}\to K_{S}^{0} K^{-} e^{+}ν_{e}$, $D^{+}\to K_{S}^{0} K_{S}^{0} e^{+}ν_{e}$, and $D^{+}\to K^{+}K^{-} e^{+}ν_{e}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (604 additional authors not shown)

Abstract: A search has been performed for the semileptonic decays $D^{0}\to K_{S}^{0} K^{-} e^{+}ν_{e}$, $D^{+}\to K_{S}^{0} K_{S}^{0} e^{+}ν_{e}$ and $D^{+}\to K^{+}K^{-} e^{+}ν_{e}$, using $7.9~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper li… ▽ More A search has been performed for the semileptonic decays $D^{0}\to K_{S}^{0} K^{-} e^{+}ν_{e}$, $D^{+}\to K_{S}^{0} K_{S}^{0} e^{+}ν_{e}$ and $D^{+}\to K^{+}K^{-} e^{+}ν_{e}$, using $7.9~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper limits are set at the 90\% confidence level of $2.13\times10^{-5}$, $1.54\times10^{-5}$ and $2.10\times10^{-5}$ for the branching fractions of $D^{0}\to K_{S}^{0} K^{-} e^{+}ν_{e}$, $D^{+}\to K_{S}^{0} K_{S}^{0} e^{+}ν_{e}$ and $D^{+}\to K^{+}K^{-} e^{+}ν_{e}$, respectively. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures

arXiv:2312.07158 [pdf, other]

Cost Aware Untargeted Poisoning Attack against Graph Neural Networks,

Authors: Yuwei Han, Yuni Lai, Yulin Zhu, Kai Zhou

Abstract: Graph Neural Networks (GNNs) have become widely used in the field of graph mining. However, these networks are vulnerable to structural perturbations. While many research efforts have focused on analyzing vulnerability through poisoning attacks, we have identified an inefficiency in current attack losses. These losses steer the attack strategy towards modifying edges targeting misclassified nodes… ▽ More Graph Neural Networks (GNNs) have become widely used in the field of graph mining. However, these networks are vulnerable to structural perturbations. While many research efforts have focused on analyzing vulnerability through poisoning attacks, we have identified an inefficiency in current attack losses. These losses steer the attack strategy towards modifying edges targeting misclassified nodes or resilient nodes, resulting in a waste of structural adversarial perturbation. To address this issue, we propose a novel attack loss framework called the Cost Aware Poisoning Attack (CA-attack) to improve the allocation of the attack budget by dynamically considering the classification margins of nodes. Specifically, it prioritizes nodes with smaller positive margins while postponing nodes with negative margins. Our experiments demonstrate that the proposed CA-attack significantly enhances existing attack strategies △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06964 [pdf, other]

Ground Calibration Result of the Lobster Eye Imager for Astronomy

Authors: Huaqing Cheng, Zhixing Ling, Chen Zhang, Xiao** Sun, Shengli Sun, Yuan Liu, Yanfeng Dai, Zhenqing Jia, Haiwu Pan, Wenxin Wang, Donghua Zhao, Yifan Chen, Zhiwei Cheng, Wei Fu, Yixiao Han, Junfei Li, Zhengda Li, Xiaohao Ma, Yulong Xue, Ailiang Yan, Qiang Zhang, Yusa Wang, Xiongtao Yang, Zijian Zhao, Weimin Yuan

Abstract: We report on results of the on-ground X-ray calibration of the Lobster Eye Imager for Astronomy (LEIA), an experimental space wide-field (18.6*18.6 square degrees) X-ray telescope built from novel lobster eye mirco-pore optics. LEIA was successfully launched on July 27, 2022 onboard the SATech-01 satellite. To achieve full characterisation of its performance before launch, a series of tests and ca… ▽ More We report on results of the on-ground X-ray calibration of the Lobster Eye Imager for Astronomy (LEIA), an experimental space wide-field (18.6*18.6 square degrees) X-ray telescope built from novel lobster eye mirco-pore optics. LEIA was successfully launched on July 27, 2022 onboard the SATech-01 satellite. To achieve full characterisation of its performance before launch, a series of tests and calibrations have been carried out at different levels of devices, assemblies and the complete module. In this paper, we present the results of the end-to-end calibration campaign of the complete module carried out at the 100-m X-ray Test Facility at IHEP. The PSF, effective area and energy response of the detectors were measured in a wide range of incident directions at several X-ray line energies. The distributions of the PSF and effective areas are roughly uniform across the FoV, in large agreement with the prediction of lobster-eye optics. The mild variations and deviations from the prediction of idealized lobster-eye optics can be understood to be caused by the imperfect shapes and alignment of the micro-pores as well as the obscuration by the supporting frames, which can be well reproduced by MC simulations. The spatial resolution of LEIA defined by the FWHM of the focal spot ranges from 4-8 arcmin with a median of 5.7. The measured effective areas are in range of 2-3 $cm^2$ at ~1.25 keV across the entire FoV, and its dependence on photon energy is in large agreement with simulations. The gains of the CMOS sensors are in range of 6.5-6.9 eV/DN, and the energy resolutions in the range of ~120-140 eV at 1.25 keV and ~170-190 eV at 4.5 keV. These results have been ingested into the calibration database and applied to the analysis of the scientific data acquired by LEIA. This work paves the way for the calibration of the Wide-field X-Ray Telescope modules of the Einstein Probe mission. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 24 pages, 13 figures. Submitted to Experimental Astronomy

arXiv:2312.05324 [pdf, other]

doi 10.1103/PhysRevLett.132.181901

Determination of spin-parity quantum numbers of X(2370) as $0^{-+}$ from $J/ψ\rightarrowγK^{0}_{S}K^{0}_{S}η^{\prime}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (605 additional authors not shown)

Abstract: Based on $(10087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII detector, a partial wave analysis of the decay $J/ψ\rightarrowγK^{0}_{S}K^{0}_{S}η^{\prime}$ is performed. The mass and width of the $X(2370)$ are measured to be $2395 \pm 11 ({\rm stat})^{+26}_{-94}({\rm syst})\ \mathrm{MeV}/c^{2}$ and $188^{+18}_{-17}({\rm stat})^{+124}_{-33}({\rm syst})~\mathrm{MeV}$, respectively. The c… ▽ More Based on $(10087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII detector, a partial wave analysis of the decay $J/ψ\rightarrowγK^{0}_{S}K^{0}_{S}η^{\prime}$ is performed. The mass and width of the $X(2370)$ are measured to be $2395 \pm 11 ({\rm stat})^{+26}_{-94}({\rm syst})\ \mathrm{MeV}/c^{2}$ and $188^{+18}_{-17}({\rm stat})^{+124}_{-33}({\rm syst})~\mathrm{MeV}$, respectively. The corresponding product branching fraction is $\mathcal{B}[J/ψ\rightarrowγX(2370)] \times \mathcal{B}[X(2370) \rightarrow f_{0}(980)η^{\prime}] \times \mathcal{B}[f_{0}(980) \rightarrow K^{0}_{S}K^{0}_{S}] = \left( 1.31 \pm 0.22 ({\rm stat})^{+2.85}_{-0.84}({\rm syst}) \right) \times 10^{-5}$. The statistical significance of the $X(2370)$ is greater than $11.7σ$ and the spin-parity is determined to be $0^{-+}$ for the first time. The measured mass and spin-parity of the $X(2370)$ are consistent with the predictions of the lightest pseudoscalar glueball. △ Less

Submitted 6 May, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: 8 pages, 2 figures

Journal ref: Phys. Rev. Lett. 132 (2024) 181901

arXiv:2312.03442 [pdf, other]

High-Quality Facial Geometry and Appearance Capture at Home

Authors: Yuxuan Han, Junfeng Lyu, Feng Xu

Abstract: Facial geometry and appearance capture have demonstrated tremendous success in 3D scanning real humans in studios. Recent works propose to democratize this technique while kee** the results high quality. However, they are still inconvenient for daily usage. In addition, they focus on an easier problem of only capturing facial skin. This paper proposes a novel method for high-quality face capture… ▽ More Facial geometry and appearance capture have demonstrated tremendous success in 3D scanning real humans in studios. Recent works propose to democratize this technique while kee** the results high quality. However, they are still inconvenient for daily usage. In addition, they focus on an easier problem of only capturing facial skin. This paper proposes a novel method for high-quality face capture, featuring an easy-to-use system and the capability to model the complete face with skin, mouth interior, hair, and eyes. We reconstruct facial geometry and appearance from a single co-located smartphone flashlight sequence captured in a dim room where the flashlight is the dominant light source (e.g. rooms with curtains or at night). To model the complete face, we propose a novel hybrid representation to effectively model both eyes and other facial regions, along with novel techniques to learn it from images. We apply a combined lighting model to compactly represent real illuminations and exploit a morphable face albedo model as a reflectance prior to disentangle diffuse and specular. Experiments show that our method can capture high-quality 3D relightable scans. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: Project page: https://yxuhan.github.io/CoRA/index.html ; Github repo: https://github.com/yxuhan/CoRA

arXiv:2312.03325 [pdf, other]

FAGC:Feature Augmentation on Geodesic Curve in the Pre-Shape Space

Authors: Yuexing Han, Guanxin Wan, Bing Wang

Abstract: Deep learning has yielded remarkable outcomes in various domains. However, the challenge of requiring large-scale labeled samples still persists in deep learning. Thus, data augmentation has been introduced as a critical strategy to train deep learning models. However, data augmentation suffers from information loss and poor performance in small sample environments. To overcome these drawbacks, we… ▽ More Deep learning has yielded remarkable outcomes in various domains. However, the challenge of requiring large-scale labeled samples still persists in deep learning. Thus, data augmentation has been introduced as a critical strategy to train deep learning models. However, data augmentation suffers from information loss and poor performance in small sample environments. To overcome these drawbacks, we propose a feature augmentation method based on shape space theory, i.e., feature augmentation on Geodesic curve, called FAGC in brevity.First, we extract features from the image with the neural network model. Then, the multiple image features are projected into a pre-shape space as features. In the pre-shape space, a Geodesic curve is built to fit the features. Finally, the many generated features on the Geodesic curve are used to train the various machine learning models. The FAGC module can be seamlessly integrated with most machine learning methods. And the proposed method is simple, effective and insensitive for the small sample datasets.Several examples demonstrate that the FAGC method can greatly improve the performance of the data preprocessing model in a small sample environment. △ Less

Submitted 25 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.01410 [pdf, other]

doi 10.1103/PhysRevD.109.103531

Coupled Dark Sector Models and Cosmological Tensions

Authors: Gang Liu, Jiaze Gao, Yufen Han, Yuhao Mu, Lixin Xu

Abstract: In this paper, we introduce two coupling models of early dark energy (EDE) and cold dark matter aimed at alleviating cosmological tensions. We utilize the EDE component in the coupling models to relieve the Hubble tension, while leveraging the interaction between dark matter and dark energy to alleviate the large-scale structure tension. The interaction is implemented in the form of pure momentum… ▽ More In this paper, we introduce two coupling models of early dark energy (EDE) and cold dark matter aimed at alleviating cosmological tensions. We utilize the EDE component in the coupling models to relieve the Hubble tension, while leveraging the interaction between dark matter and dark energy to alleviate the large-scale structure tension. The interaction is implemented in the form of pure momentum coupling and Yukawa coupling. We employed various cosmological datasets, including cosmic microwave background radiation, baryon acoustic oscillations, Type Ia supernovae, the local distance-ladder data (SH0ES), and the Dark Energy Survey Year-3 data, to analyze our models. We first exclude SH0ES data from the entire dataset to constrain the parameters of novel models. We observe that the constraints on $H_0$ from two coupling models are slightly higher than that from the $Λ$CDM model, but they exhibit a significant inconsistency with the SH0ES data, consistent with prior research findings in the EDE model. Subsequently, we incorporate SH0ES data to re-constrain the parameters of various models, our findings reveal that both coupling models yield best-fit values for $H_0$ approximately around $72.23$ km/s/Mpc, effectively mitigating the Hubble tension. Similar to the EDE model, the coupling models yield the $S_8$ values that still surpasses the result of the $Λ$CDM model. Nevertheless, the best-fit values for $S_8$ obtained with the two new models are 0.8192 and 0.8177, respectively, which are lower than the 0.8316 achieved by the EDE model. Consequently, although our coupling models fail to fully resolve the large-scale structure tension, they partially mitigate the adverse effect of the original EDE model. △ Less

Submitted 23 April, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: 14 pages, 9 figures. In this replacement, we have amalgamated the original content of this manuscript with that of a previous paper [arXiv:2310.09798]. arXiv admin note: substantial text overlap with arXiv:2310.09798

Journal ref: Phys. Rev. D 109, 103531 (2024)

arXiv:2312.00849 [pdf, other]

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Authors: Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, **yi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua

Abstract: Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-sta… ▽ More Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors with promising data and computation efficiency. Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the hallucination rate of the base MLLM by 34.8%, outperforming the concurrent LLaVA-RLHF trained on 10k annotated data. The final model achieves state-of-the-art performance in trustworthiness among open-source MLLMs, and shows better robustness than GPT-4V in preventing hallucinations aroused from over-generalization. We open-source our code, model, and data at https://github.com/RLHF-V/RLHF-V. △ Less

Submitted 8 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024

arXiv:2312.00485 [pdf, other]

Backbone-based Dynamic Graph Spatio-Temporal Network for Epidemic Forecasting

Authors: Junkai Mao, Yuexing Han, Gouhei Tanaka, Bing Wang

Abstract: Accurate epidemic forecasting is a critical task in controlling disease transmission. Many deep learning-based models focus only on static or dynamic graphs when constructing spatial information, ignoring their relationship. Additionally, these models often rely on recurrent structures, which can lead to error accumulation and computational time consumption. To address the aforementioned problems,… ▽ More Accurate epidemic forecasting is a critical task in controlling disease transmission. Many deep learning-based models focus only on static or dynamic graphs when constructing spatial information, ignoring their relationship. Additionally, these models often rely on recurrent structures, which can lead to error accumulation and computational time consumption. To address the aforementioned problems, we propose a novel model called Backbone-based Dynamic Graph Spatio-Temporal Network (BDGSTN). Intuitively, the continuous and smooth changes in graph structure, make adjacent graph structures share a basic pattern. To capture this property, we use adaptive methods to generate static backbone graphs containing the primary information and temporal models to generate dynamic temporal graphs of epidemic data, fusing them to generate a backbone-based dynamic graph. To overcome potential limitations associated with recurrent structures, we introduce a linear model DLinear to handle temporal dependencies and combine it with dynamic graph convolution for epidemic forecasting. Extensive experiments on two datasets demonstrate that BDGSTN outperforms baseline models and ablation comparison further verifies the effectiveness of model components. Furthermore, we analyze and measure the significance of backbone and temporal graphs by using information metrics from different aspects. Finally, we compare model parameter volume and training time to confirm the superior complexity and efficiency of BDGSTN. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.18315 [pdf, ps, other]

Cylindrical Symplectic Representation and Global Regular Solution of Incompressible Navier-Stokes Equations in $\mathbb{R}^3$

Authors: Yongqian Han

Abstract: The existence and uniqueness of global regular solution of incompressible Navier-Stokes equations in $\mathbb{R}^3$ are derived provided the initial velocity vector field holds a special structure. The existence and uniqueness of global regular solution of incompressible Navier-Stokes equations in $\mathbb{R}^3$ are derived provided the initial velocity vector field holds a special structure. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 28 pages. arXiv admin note: text overlap with arXiv:2305.13737

MSC Class: 35Q30; 76D05; 76F02; 37L20

arXiv:2311.18308 [pdf, ps, other]

Symplectic Representation and Turbulent Global Solutions of Incompressible Navier-Stokes Equations in $\R^3$

Authors: Yongqian Han

Abstract: The incompressible Navier-Stokes equations are considered. We find that there exist infinite non-trivial solutions of static Euler equations. Moreover there exist random solutions of static Euler equations. Provided Reynolds number is large enough and time variable $t$ goes to infinity, these random solutions of static Euler equations are the path limits of corresponding Navier-Stokes flows. But t… ▽ More The incompressible Navier-Stokes equations are considered. We find that there exist infinite non-trivial solutions of static Euler equations. Moreover there exist random solutions of static Euler equations. Provided Reynolds number is large enough and time variable $t$ goes to infinity, these random solutions of static Euler equations are the path limits of corresponding Navier-Stokes flows. But the double limit of these Navier-Stokes flows do not exist. Therefore these solutions are called turbulent solutions. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 8 page

MSC Class: 35Q30; 76D05; 76F02; 37L20

arXiv:2311.18272 [pdf, ps, other]

Topological Directional Coupler

Authors: Yandong Li, Minwoo Jung, Yang Yu, Yuchen Han, Baile Zhang, Gennady Shvets

Abstract: Interferometers and beam splitters are fundamental building blocks for photonic neuromorphic and quantum computing machinery. In waveguide-based photonic integrated circuits, beam-splitting is achieved with directional couplers that rely on transition regions where the waveguides are adiabatically bent to suppress back-reflection. We present a novel, compact approach to introducing guided mode cou… ▽ More Interferometers and beam splitters are fundamental building blocks for photonic neuromorphic and quantum computing machinery. In waveguide-based photonic integrated circuits, beam-splitting is achieved with directional couplers that rely on transition regions where the waveguides are adiabatically bent to suppress back-reflection. We present a novel, compact approach to introducing guided mode coupling. By leveraging multimodal domain walls between microwave topological photonic crystals, we use the photonic-spin-conservation to suppress back-reflection while relaxing the topological protection of the valley degree of freedom to implement tunable beam splitting. Rapid advancements in chip-scale topological photonics suggest that the proposed simultaneous utilization of multiple topological degrees of freedom could benefit the development of novel photonic computing platforms. △ Less

Submitted 6 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: This manuscript has been accepted by Laser & Photonics Reviews for publication

arXiv:2311.17131 [pdf, other]

doi 10.1103/PhysRevD.109.072010

Measurement of Branching Fractions for $Λ_{c}^{+} \rightarrow n K_{S}^{0} π^{+}$ and $Λ_{c}^{+} \rightarrow n K_{S}^{0} K^{+}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (603 additional authors not shown)

Abstract: Based on 4.5 fb$^{-1}$ of $e^{+}e^{-}$ collision data accumulated at center-of-mass energies between $4.600\,\mathrm{GeV}$ and $4.699\,\mathrm{GeV}$ with the BESIII detector, we measure the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+} \rightarrow n K_{S}^{0} π^{+}$ with the precision improved by a factor of 2.8 and report the first evidence for the singly-Cabibbo-suppressed… ▽ More Based on 4.5 fb$^{-1}$ of $e^{+}e^{-}$ collision data accumulated at center-of-mass energies between $4.600\,\mathrm{GeV}$ and $4.699\,\mathrm{GeV}$ with the BESIII detector, we measure the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+} \rightarrow n K_{S}^{0} π^{+}$ with the precision improved by a factor of 2.8 and report the first evidence for the singly-Cabibbo-suppressed decay $Λ_{c}^{+} \rightarrow n K_{S}^{0} K^{+}$. The branching fractions for $Λ_{c}^{+} \rightarrow n K_{S}^{0} π^{+}$ and $Λ_{c}^{+} \rightarrow n K_{S}^{0} K^{+}$ are determined to be $(1.86\pm0.08\pm0.04)\times10^{-2}$ and $\left(4.3^{+1.9}_{-1.5}\pm0.3\right)\times10^{-4}$, respectively, where the first uncertainties are statistical and the second ones are systematic. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 12 pages, 2 figures

Journal ref: Phys. Rev. D 109, 072010 (2024)

arXiv:2311.16763 [pdf]

doi 10.1007/s11433-023-2329-x

Structural transition, electric transport, and electronic structures in the compressed trilayer nickelate La4Ni3O10

Authors: **gyuan Li, Cui-Qun Chen, Chaoxin Huang, Yifeng Han, Mengwu Huo, Xing Huang, Peiyue Ma, Zhengyang Qiu, Junfeng Chen, Xunwu Hu, Lan Chen, Tao Xie, Bing Shen, Hualei Sun, Dao-Xin Yao, Meng Wang

Abstract: Atomic structure and electronic band structure are fundamental properties for understanding the mechanism of superconductivity. Motivated by the discovery of pressure-induced high-temperature superconductivity at 80 K in the bilayer Ruddlesden-Popper nickelate La3Ni2O7, the atomic structure and electronic band structure of the trilayer nickelate La4Ni3O10 under pressure up to 44.3 GPa are investig… ▽ More Atomic structure and electronic band structure are fundamental properties for understanding the mechanism of superconductivity. Motivated by the discovery of pressure-induced high-temperature superconductivity at 80 K in the bilayer Ruddlesden-Popper nickelate La3Ni2O7, the atomic structure and electronic band structure of the trilayer nickelate La4Ni3O10 under pressure up to 44.3 GPa are investigated. A structural transition from the monoclinic P21/a space group to the tetragonal I4/mmm around 12.6-13.4 GPa is identified, accompanying with a drop of resistance below 7 K. Density functional theory calculations suggest that the bonding state of Ni 3dz2 orbital rises and crosses the Fermi level at high pressures, which may give rise to possible superconductivity observed in resistance under pressure in La4Ni3O10. The trilayer nickelate La4Ni3O10 shows some similarities with the bilayer La3Ni2O7 and has unique properties, providing a new platform to investigate the underlying mechanism of superconductivity in nickelates. △ Less

Submitted 30 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 19 pages, 4 figures

Journal ref: SCIENCE CHINA Physics, Mechanics & Astronomy 67.11(2024):117403

arXiv:2311.16483 [pdf, other]

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

Authors: Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang

Abstract: Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tunin… ▽ More Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tuning dataset leveraging GPT-4. We develop a multi-step data generation process in which different steps are responsible for generating tabular data, creating chart figures, and designing instruction tuning data separately. Our method's flexibility enables us to generate diverse, high-quality instruction-tuning data consistently and efficiently while maintaining a low resource expenditure. Additionally, it allows us to incorporate a wider variety of chart and task types not yet featured in existing datasets. Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset. ChartLlama outperforms all prior methods in ChartQA, Chart-to-text, and Chart-extraction evaluation benchmarks. Additionally, ChartLlama significantly improves upon the baseline in our specially compiled chart dataset, which includes new chart and task types. The results of ChartLlama confirm the value and huge potential of our proposed data generation method in enhancing chart comprehension. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Code and model on https://tingxueronghua.github.io/ChartLlama/

arXiv:2311.15948 [pdf, other]

A First Look with JWST Aperture Masking Interferometry (AMI): Resolving Circumstellar Dust around the Wolf-Rayet Binary WR 137 beyond the Rayleigh Limit

Authors: Ryan M. Lau, Matthew J. Hankins, Joel Sanchez-Bermudez, Deepashri Thatte, Anthony Soulain, Rachel A. Cooper, Anand Sivaramakrishnan, Michael F. Corcoran, Alexandra Z. Greenbaum, Theodore R. Gull, Yinuo Han, Olivia C. Jones, Thomas Madura, Anthony F. J. Moffat, Mark R. Morris, Takashi Onaka, Christopher M. P. Russell, Noel D. Richardson, Nathan Smith, Peter Tuthill, Kevin Volk, Gerd Weigelt, Peredur M. Williams

Abstract: We present infrared aperture masking interferometry (AMI) observations of newly formed dust from the colliding winds of the massive binary system Wolf-Rayet (WR) 137 with JWST using the Near Infrared Imager and Slitless Spectrograph (NIRISS). NIRISS AMI observations of WR 137 and a point-spread-function calibrator star, HD~228337, were taken using the F380M and F480M filters in 2022 July and Augus… ▽ More We present infrared aperture masking interferometry (AMI) observations of newly formed dust from the colliding winds of the massive binary system Wolf-Rayet (WR) 137 with JWST using the Near Infrared Imager and Slitless Spectrograph (NIRISS). NIRISS AMI observations of WR 137 and a point-spread-function calibrator star, HD~228337, were taken using the F380M and F480M filters in 2022 July and August as part of the Director's Discretionary Early Release Science (DD-ERS) program 1349. Interferometric observables (squared visibilities and closure phases) from the WR 137 "interferogram" were extracted and calibrated using three independent software tools: ImPlaneIA, AMICAL, and SAMpip. The analysis of the calibrated observables yielded consistent values except for slightly discrepant closure phases measured by ImPlaneIA. Based on all three sets of calibrated observables, images were reconstructed using three independent software tools: BSMEM, IRBis, and SQUEEZE. All reconstructed image combinations generated consistent images in both F380M and F480M filters. The reconstructed images of WR 137 reveal a bright central core with a $\sim300$ mas linear filament extending to the northwest. A geometric colliding-wind model with dust production constrained to the orbital plane of the binary system and enhanced as the system approaches periapsis provided a general agreement with the interferometric observables and reconstructed images. Based on a colliding-wind dust condensation analysis, we suggest that dust formation within the orbital plane of WR 137 is induced by enhanced equatorial mass-loss from the rapidly rotating O9 companion star, whose axis of rotation is aligned with that of the orbit. △ Less

Submitted 22 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 18 pages, 8 figures, Accepted for publication in ApJ. Updated plotting error in Fig. 2

arXiv:2311.15835 [pdf, other]

Surface skyrmions and dual topological Hall effect in antiferromagnetic topological insulator EuCd$_2$As$_2$

Authors: Min Wu, R. Yang, Xiangde Zhu, Yixiong Ren, Ang Qian, Yongjie Xie, Changming Yue, Yong Nie, Xiang Yuan, Ning Wang, Daifeng Tu, Ding Li, Yuyan Han, Zhaosheng Wang, Yaomin Dai, Guolin Zheng, Jianhui Zhou, Wei Ning, Xianggang Qiu, Mingliang Tian

Abstract: In this work, we synthesized single crystal of EuCd$_2$As$_2$, which exhibits A-type antiferromagnetic (AFM) order with in-plane spin orientation below $T_N$ = 9.5~K.Optical spectroscopy and transport measurements suggest its topological insulator (TI) nature with an insulating gap around 0.1eV. Remarkably, a dual topological Hall resistivity that exhibits same magnitude but opposite signs in the… ▽ More In this work, we synthesized single crystal of EuCd$_2$As$_2$, which exhibits A-type antiferromagnetic (AFM) order with in-plane spin orientation below $T_N$ = 9.5~K.Optical spectroscopy and transport measurements suggest its topological insulator (TI) nature with an insulating gap around 0.1eV. Remarkably, a dual topological Hall resistivity that exhibits same magnitude but opposite signs in the positive to negative and negative to positive magnetic field hysteresis branches emerges below 20~K. With magnetic force microscopy (MFM) images and numerical simulations, we attribute the dual topological Hall effect to the Néel-type skyrmions stabilized by the interactions between topological surface states and magnetism, and the sign reversal in different hysteresis branches indicates potential coexistence of skyrmions and antiskyrmions. Our work uncovers a unique two-dimensional (2D) magnetism on the surface of intrinsic AFM TI, providing a promising platform for novel topological quantum states and AFM spintronic applications. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 7 pages, 3 figures

arXiv:2311.15278 [pdf, ps, other]

Ancient mean curvature flows from minimal hypersurfaces

Authors: Yongheng Han

Abstract: For $n\geq 2$, we construct $I$-dimensional family of embedded ancient solutions to mean curvature flow arise from an unstable minimal hypersurface $Σ$ with finite total curvature in $\mathbb{R}^{n+1}$, where $I$ is the Morse index of the Jacobi operator on $Σ$. For $n\geq 2$, we construct $I$-dimensional family of embedded ancient solutions to mean curvature flow arise from an unstable minimal hypersurface $Σ$ with finite total curvature in $\mathbb{R}^{n+1}$, where $I$ is the Morse index of the Jacobi operator on $Σ$. △ Less

Submitted 27 May, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 15 pages

MSC Class: 53E10

arXiv:2311.15185 [pdf, ps, other]

Contractibility of the solution sets for set optimization problems

Authors: Bin Chen, Yu Han

Abstract: This paper aims at investigating the contractibility of the solution sets for set optimization problems by utilizing strictly quasi cone-convexlikeness, which is an assumption weaker than strictly cone-convexity, strictly cone-quasiconvexity and strictly naturally quasi cone-convexity. We establish the contractibility of l-minimal, l-weak minimal, u-minimal and u-weak minimal solution sets for set… ▽ More This paper aims at investigating the contractibility of the solution sets for set optimization problems by utilizing strictly quasi cone-convexlikeness, which is an assumption weaker than strictly cone-convexity, strictly cone-quasiconvexity and strictly naturally quasi cone-convexity. We establish the contractibility of l-minimal, l-weak minimal, u-minimal and u-weak minimal solution sets for set optimization problems by using the star-shape sets and the nonlinear scalarizing functions for sets. Moreover, we also discuss the arcwise connectedness and the contractibility of p-minimal and p-weak minimal solution sets for set optimization problems by using the scalarization technique. Finally, our main results are applied to the contractibility of the solution sets for vector optimization problems. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Showing 201–250 of 2,005 results for author: han, y