Search | arXiv e-print repository

doi 10.1016/j.ijpt.2024.100020

The status and challenges for prostate SBRT treatments in United States proton therapy centers: An NRG Oncology practice survey

Authors: Jiajian Shen, Paige A. Taylor, Carlos E. Vargas, Minglei Kang, Jatinder Saini, Jun Zhou, Peilong Wang, Wei Liu, Charles B. Simone II, Ying Xiao, Liyong Lin

Abstract: A survey was designed to inquire about the practice of proton SBRT treatment for prostate cancer. The survey was distributed to all 30 proton therapy centers in the United States that participate in the National Clinical Trial Network in Feb. 2023. The survey focused on usage, patient selection criteria, prescriptions, target contours, dose constraints, treatment plan optimization and evaluation m… ▽ More A survey was designed to inquire about the practice of proton SBRT treatment for prostate cancer. The survey was distributed to all 30 proton therapy centers in the United States that participate in the National Clinical Trial Network in Feb. 2023. The survey focused on usage, patient selection criteria, prescriptions, target contours, dose constraints, treatment plan optimization and evaluation methods, patient-specific QA, and IGRT methods. Results: We received responses from 25 centers (83% participation). Only 8 respondent proton centers (32%) reported performing SBRT of the prostate. The remaining 17 centers cited three primary reasons for not offering this treatment: no clinical need, lack of volumetric imaging, and/or lack of clinical evidence. Only 1 center cited the reduction in overall reimbursement as a concern for not offering prostate SBRT. Several common practices among the 8 centers offering SBRT for the prostate were noted, such as using Hydrogel spacers, fiducial markers, and MRI for target delineation. Most proton centers (87.5%) utilized pencil beam scanning (PBS) delivery and completed Imaging and Radiation Oncology Core (IROC) phantom credentialing. Treatment planning typically used parallel opposed lateral beams, and consistent parameters for setup and range uncertainties were used for plan optimization and robustness evaluation. Measurements-based patient-specific QA, beam delivery every other day, fiducial contours for IGRT, and total doses of 35-40 GyRBE were consistent across all centers. However, there was no consensus on the risk levels for patient selection. Conclusion: Prostate SBRT is used in about 1/3 of proton centers in the US. There was a significant consistency in practices among proton centers treating with proton SBRT. It is possible that the adoption of proton SBRT may become more common if proton SBRT is more commonly offered in clinical trials. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17229 [pdf, other]

Preserving Fairness Generalization in Deepfake Detection

Authors: Li Lin, Xinan He, Yan Ju, Xin Wang, Feng Ding, Shu Hu

Abstract: Although effective deepfake detection models have been developed in recent years, recent studies have revealed that these models can result in unfair performance disparities among demographic groups, such as race and gender. This can lead to particular groups facing unfair targeting or exclusion from detection, potentially allowing misclassified deepfakes to manipulate public opinion and undermine… ▽ More Although effective deepfake detection models have been developed in recent years, recent studies have revealed that these models can result in unfair performance disparities among demographic groups, such as race and gender. This can lead to particular groups facing unfair targeting or exclusion from detection, potentially allowing misclassified deepfakes to manipulate public opinion and undermine trust in the model. The existing method for addressing this problem is providing a fair loss function. It shows good fairness performance for intra-domain evaluation but does not maintain fairness for cross-domain testing. This highlights the significance of fairness generalization in the fight against deepfakes. In this work, we propose the first method to address the fairness generalization problem in deepfake detection by simultaneously considering features, loss, and optimization aspects. Our method employs disentanglement learning to extract demographic and domain-agnostic forgery features, fusing them to encourage fair learning across a flattened loss landscape. Extensive experiments on prominent deepfake datasets demonstrate our method's effectiveness, surpassing state-of-the-art approaches in preserving fairness during cross-domain deepfake detection. The code is available at https://github.com/Purdue-M2/Fairness-Generalization △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

arXiv:2402.16718 [pdf]

An Overview of the Development of Stereotactic Body Radiation Therapy

Authors: Yanqi Zong, Zhengrong Cui, Luqi Lin, Sihao Wang, Yizhi Chen

Abstract: Stereotactic body radiation therapy (SBRT) refers to focusing high-energy rays in three-dimensional space on the tumor lesion area, reducing the dose received by surrounding normal tissues, which can effectively improve the local control rate of the tumor and reduce the probability of complications. With the comprehensive development of medical imaging, radiation biology and other disciplines, thi… ▽ More Stereotactic body radiation therapy (SBRT) refers to focusing high-energy rays in three-dimensional space on the tumor lesion area, reducing the dose received by surrounding normal tissues, which can effectively improve the local control rate of the tumor and reduce the probability of complications. With the comprehensive development of medical imaging, radiation biology and other disciplines, this less-fractional, high-dose radiotherapy method has been increasingly developed and applied in clinical practice. The background, radio-biological basis, key technologies and main equipment of SBRT are discussed, and its future development direction is prospected. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.13538 [pdf, other]

The Staggered Mesh Method: Accurate Exact Exchange towards the Thermodynamic Limit for Solids

Authors: Stephen Jon Quiton, Hamlin Wu, Xin Xing, Lin Lin, Martin Head-Gordon

Abstract: In periodic systems, the Hartree-Fock (HF) exchange energy exhibits the slowest convergence of all HF energy components as the system size approaches the thermodynamic limit. We demonstrate that the recently proposed staggered mesh method for Fock exchange energy [Xing, Li, and Lin, Math. Comp., 2024], which is specifically designed to sidestep certain singularities in exchange energy evaluation,… ▽ More In periodic systems, the Hartree-Fock (HF) exchange energy exhibits the slowest convergence of all HF energy components as the system size approaches the thermodynamic limit. We demonstrate that the recently proposed staggered mesh method for Fock exchange energy [Xing, Li, and Lin, Math. Comp., 2024], which is specifically designed to sidestep certain singularities in exchange energy evaluation, can expedite the finite-size convergence rate for the exact exchange energy across a range of insulators and semiconductors when compared to the regular and truncated Coulomb methods. This remains true even for two computationally cheaper versions of this new method, which we call Non-SCF and Split-SCF staggered mesh. Additionally, a sequence of numerical tests on simple solids showcases the staggered mesh method's ability to improve convergence towards the thermodynamic limit for band gaps, bulk moduli, equilibrium lattice dimensions, energies, and phonon force constants. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 31 pages, 12 figures, submitted to the Journal of Chemical Theory and Computation

arXiv:2402.12193 [pdf, other]

A Chinese Dataset for Evaluating the Safeguards in Large Language Models

Authors: Yuxia Wang, Zenan Zhai, Haonan Li, Xudong Han, Lizhi Lin, Zhenxuan Zhang, **gru Zhao, Preslav Nakov, Timothy Baldwin

Abstract: Many studies have demonstrated that large language models (LLMs) can produce harmful responses, exposing users to unexpected risks when LLMs are deployed. Previous studies have proposed comprehensive taxonomies of the risks posed by LLMs, as well as corresponding prompts that can be used to examine the safety mechanisms of LLMs. However, the focus has been almost exclusively on English, and little… ▽ More Many studies have demonstrated that large language models (LLMs) can produce harmful responses, exposing users to unexpected risks when LLMs are deployed. Previous studies have proposed comprehensive taxonomies of the risks posed by LLMs, as well as corresponding prompts that can be used to examine the safety mechanisms of LLMs. However, the focus has been almost exclusively on English, and little has been explored for other languages. Here we aim to bridge this gap. We first introduce a dataset for the safety evaluation of Chinese LLMs, and then extend it to two other scenarios that can be used to better identify false negative and false positive examples in terms of risky prompt rejections. We further present a set of fine-grained safety assessment criteria for each risk type, facilitating both manual annotation and automatic evaluation in terms of LLM response harmfulness. Our experiments on five LLMs show that region-specific risks are the prevalent type of risk, presenting the major issue with all Chinese LLMs we experimented with. Our data is available at https://github.com/Libr-AI/do-not-answer. Warning: this paper contains example data that may be offensive, harmful, or biased. △ Less

Submitted 26 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 14 pages

arXiv:2402.11262 [pdf, other]

Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima

Authors: Shanshan Zhong, Zhongzhan Huang, Daifeng Li, Wushao Wen, **ghui Qin, Liang Lin

Abstract: Multimodal recommender systems utilize various types of information to model user preferences and item features, hel** users discover items aligned with their interests. The integration of multimodal information mitigates the inherent challenges in recommender systems, e.g., the data sparsity problem and cold-start issues. However, it simultaneously magnifies certain risks from multimodal inform… ▽ More Multimodal recommender systems utilize various types of information to model user preferences and item features, hel** users discover items aligned with their interests. The integration of multimodal information mitigates the inherent challenges in recommender systems, e.g., the data sparsity problem and cold-start issues. However, it simultaneously magnifies certain risks from multimodal information inputs, such as information adjustment risk and inherent noise risk. These risks pose crucial challenges to the robustness of recommendation models. In this paper, we analyze multimodal recommender systems from the novel perspective of flat local minima and propose a concise yet effective gradient strategy called Mirror Gradient (MG). This strategy can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs. We also provide strong theoretical evidence and conduct extensive empirical experiments to show the superiority of MG across various multimodal recommendation models and benchmarks. Furthermore, we find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models, making it a promising new and fundamental paradigm for training multimodal recommender systems. The code is released at https://github.com/Qrange-group/Mirror-Gradient. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: Accepted by WWW'24

arXiv:2402.11205 [pdf, other]

An Efficient Quantum Circuit for Block Encoding a Pairing Hamiltonian

Authors: Diyi Liu, Weijie Du, Lin Lin, James P. Vary, Chao Yang

Abstract: We present an efficient quantum circuit for block encoding pairing Hamiltonian often studied in nuclear physics. Our block encoding scheme does not require map** the creation and annihilation operators to the Pauli operators and representing the Hamiltonian as a linear combination of unitaries. Instead, we show how to encode the Hamiltonian directly using controlled swap operations. We analyze t… ▽ More We present an efficient quantum circuit for block encoding pairing Hamiltonian often studied in nuclear physics. Our block encoding scheme does not require map** the creation and annihilation operators to the Pauli operators and representing the Hamiltonian as a linear combination of unitaries. Instead, we show how to encode the Hamiltonian directly using controlled swap operations. We analyze the gate complexity of the block encoding circuit and show that it scales polynomially with respect to the number of qubits required to represent a quantum state associated with the pairing Hamiltonian. We also show how the block encoding circuit can be combined with the quantum singular value transformation to construct an efficient quantum circuit for approximating the density of states of a pairing Hamiltonian. The techniques presented can be extended to encode more general second-quantized Hamiltonians. △ Less

Submitted 21 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: 27 pages, 18 figures

MSC Class: 68Q12; 81P68

arXiv:2402.09508 [pdf, other]

Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls

Authors: Liwei Lin, Gus Xia, Yixiao Zhang, Junyan Jiang

Abstract: Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. T… ▽ More Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. This approach enables autoregressive language models to seamlessly address music inpainting tasks. Additionally, our method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement. We apply this method to fine-tune MusicGen, a leading autoregressive music generation model. Our experiments demonstrate promising results across multiple music editing tasks, offering more flexible controls for future AI-driven music editing tools. The source codes and a demo page showcasing our work are available at https://kikyo-16.github.io/AIR. △ Less

Submitted 10 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.07400 [pdf, other]

The ALMaQUEST Survey XIII: Understanding radial trends in star formation quenching via the relative roles of gas availability and star formation efficiency

Authors: Hsi-An Pan, Lihwai Lin, Sara L. Ellison, Mallory D. Thorp, Sebastian F. Sanchez, Asa F. L. Bluck, Francesco Belfiore, Joanna M. Piotrowska, Jillian M. Scudder, William M. Baker

Abstract: Star formation quenching is one of the key processes that shape the evolution of galaxies. In this study, we investigate the changes in molecular gas and star formation properties as galaxies transit from the star-forming main sequence to the passive regime. Our analysis reveals that as galaxies move away from the main sequence towards the green valley the radial profile of specific star formation… ▽ More Star formation quenching is one of the key processes that shape the evolution of galaxies. In this study, we investigate the changes in molecular gas and star formation properties as galaxies transit from the star-forming main sequence to the passive regime. Our analysis reveals that as galaxies move away from the main sequence towards the green valley the radial profile of specific star formation rate surface density ($Σ_\mathrm{sSFR}$) is suppressed compared with main sequence galaxies out to a galactocentric radius of 1.5 $R_{e}$ ($\sim$ 7 kpc for our sample). By combining radial profiles of gas fraction ($f_\mathrm{gas}$) and star formation efficiency (SFE), we can discern the underlying mechanism that determines $Σ_\mathrm{sSFR}$ at different galactocentric radii. Analysis of relative contributions of $f_\mathrm{gas}$ and SFE to $Σ_\mathrm{sSFR}$ uncovers a diverse range of quenching modes. Star formation in approximately half of our quenching galaxies is primarily driven by a single mode (i.e. either $f_\mathrm{gas}$ or SFE), or a combination of both. A collective analysis of all galaxies reveals that the reduction in star formation within the central regions ($R$ $<$ 0.5 $R_{e}$) is primarily attributable to a decrease in SFE. Conversely, in the disk regions ($R$ $>$ 0.5 $R_{e}$), both $f_\mathrm{gas}$ and SFE contribute to the suppression of star formation. Our findings suggest that multiple quenching mechanisms may be at play in our sample galaxies, and even within a single galaxy. We also compare our observational outcomes with those from galaxy simulations and discuss the implications of our data. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 27 pages, 8 figures, 1 table. Accepted for publication in The Astrophysical Journal

arXiv:2402.05389 [pdf, other]

doi 10.1103/PhysRevB.109.155163

Block Mott insulating state induced by next-nearest neighbor hop** in the S = 3/2 zigzag chain BaCoTe2O7

Authors: Ling-Fang Lin, Yang Zhang, Gonzalo Alvarez, Adriana Moreo, Elbio Dagotto

Abstract: Quasi-one-dimensional correlated electronic multi-orbital systems with either ladder or chain geometries continue attracting considerable interest due to their complex electronic phases arising from the interplay of the hop** matrix, the crystal-fields splitting, the electronic correlations, and strong quantum fluctuations. Recently, the intriguing cobalt zigzag chain system BaCoTe$_2$O$_7$, wit… ▽ More Quasi-one-dimensional correlated electronic multi-orbital systems with either ladder or chain geometries continue attracting considerable interest due to their complex electronic phases arising from the interplay of the hop** matrix, the crystal-fields splitting, the electronic correlations, and strong quantum fluctuations. Recently, the intriguing cobalt zigzag chain system BaCoTe$_2$O$_7$, with electronic density $n = 7$, was prepared experimentally. Here, we systematically study the electronic and magnetic properties of this quasi-one-dimensional compound from the theory perspective. Based on first-principles density functional theory calculations, strongly anisotropic one-dimensional electronic Co $3d$ bands were found near the Fermi level. By evaluating the relevant hop** amplitudes, we provide the magnitude and origin of the nearest-neighbor (NN) and next nearest-neighbor (NNN) hop** matrices in BaCoTe$_2$O$_7$. With this information, we constructed a three-orbital electronic Hubbard model for this zigzag chain system, and studied two cases: with only a NN hop** matrix, and with NN plus NNN hop** matrices. Introducing the Hubbard and Hund couplings and studying the model via the density matrix renormalization group method, we constructed the ground-state phase diagram. A robust staggered antiferromagnetic (AFM) region was found when only the NN hop** matrix in the chain direction was employed. However, for the realistic case where the NNN hop** matrix is also included, the dominant state becomes instead a block AFM order, in agreement with experiments. The system displays Mott insulator characteristics with three half-filled orbitals, when the block AFM order is stable. Our results for BaCoTe$_2$O$_7$ provide guidance to experimentalists and theorists working on this zigzag one-dimensional chain and related materials. △ Less

Submitted 6 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 9 pages, 10 figures. arXiv admin note: text overlap with arXiv:2106.02753

arXiv:2402.05383 [pdf, other]

First measurement of the yield of $^8$He isotopes produced in liquid scintillator by cosmic-ray muons at Daya Bay

Authors: Daya Bay Collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding , et al. (177 additional authors not shown)

Abstract: Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546… ▽ More Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546$\pm$0.076 for $^8$He, and 6.73$\pm$0.73, 6.75$\pm$0.70, and 13.74$\pm$0.82 for $^9$Li at average muon energies of 63.9~GeV, 64.7~GeV, and 143.0~GeV, respectively. The measured production rate of $^8$He isotopes is more than an order of magnitude lower than any other measurement of cosmogenic isotope production. It replaces the results of previous attempts to determine the ratio of $^8$He to $^9$Li production that yielded a wide range of limits from 0 to 30\%. The results provide future liquid-scintillator-based experiments with improved ability to predict cosmogenic backgrounds. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.05285 [pdf, other]

Prediction of $s^\pm$-wave superconductivity enhanced by electronic do** in trilayer nickelates La$_4$Ni$_3$O$_{10}$ under pressure

Authors: Yang Zhang, Ling-Fang Lin, Adriana Moreo, Thomas A. Maier, Elbio Dagotto

Abstract: Motivated by the recently reported signatures of superconductivity in trilayer La$_4$Ni$_3$O$_{10}$ under pressure, we comprehensively study this system using {\it ab initio} and random-phase approximation techniques. Without electronic interactions, the Ni $d_{3z^2-r^2}$ orbitals show a bonding-antibonding and nonbonding splitting behavior via the O $p_z$ orbitals inducing a ``trimer'' lattice in… ▽ More Motivated by the recently reported signatures of superconductivity in trilayer La$_4$Ni$_3$O$_{10}$ under pressure, we comprehensively study this system using {\it ab initio} and random-phase approximation techniques. Without electronic interactions, the Ni $d_{3z^2-r^2}$ orbitals show a bonding-antibonding and nonbonding splitting behavior via the O $p_z$ orbitals inducing a ``trimer'' lattice in La$_4$Ni$_3$O$_{10}$, analogous to the dimers of La$_3$Ni$_2$O$_{7}$. The Fermi surface consists of three electron sheets with mixed $e_g$ orbitals, and a hole and an electron pocket made up of the $d_{3z^2-r^2}$ orbital, suggesting a Ni two-orbital minimum model. In addition, we find that superconducting pairing is induced in the $s^{\pm}$-wave channel due to partial nesting between the {\bf M}=$(π, π)$ centered pockets and portions of the Fermi surface centered at the {\bf $Γ$}=$(0, 0)$ point. With changing electronic density $n$, the $s^\pm$ instability remains leading and its pairing strength shows a dome-like behavior with a maximum around $n = 4.2$ ($\sim 6.7\%$ electron do**). The superconducting instability disappears at the same electronic density as that in the new 1313 stacking La$_3$Ni$_2$O$_7$, correlated with the vanishing of the hole pocket that arises from the trilayer sublattice, suggesting that the high-$T_c$ superconductivity of La$_3$Ni$_2$O$_7$ may $not$ originate from a trilayer- and single-layer structure. Furthermore, we predict an interesting spin-density-wave state in La$_4$Ni$_3$O$_{10}$ with an in-plane ($π$, $π$) order and antiferromagnetic coupling between the top and bottom Ni layers, while the middle layer has spin zero. △ Less

Submitted 25 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: Main text with 6 pages and 4 figures plus SM with 4 pages and 5 figures

arXiv:2402.05035

A Survey on Domain Generalization for Medical Image Analysis

Authors: Ziwei Niu, Shuyi Ouyang, Shiao Xie, Yen-wei Chen, Lanfen Lin

Abstract: Medical Image Analysis (MedIA) has emerged as a crucial tool in computer-aided diagnosis systems, particularly with the advancement of deep learning (DL) in recent years. However, well-trained deep models often experience significant performance degradation when deployed in different medical sites, modalities, and sequences, known as a domain shift issue. In light of this, Domain Generalization (D… ▽ More Medical Image Analysis (MedIA) has emerged as a crucial tool in computer-aided diagnosis systems, particularly with the advancement of deep learning (DL) in recent years. However, well-trained deep models often experience significant performance degradation when deployed in different medical sites, modalities, and sequences, known as a domain shift issue. In light of this, Domain Generalization (DG) for MedIA aims to address the domain shift challenge by generalizing effectively and performing robustly across unknown data distributions. This paper presents the a comprehensive review of substantial developments in this area. First, we provide a formal definition of domain shift and domain generalization in medical field, and discuss several related settings. Subsequently, we summarize the recent methods from three viewpoints: data manipulation level, feature representation level, and model training level, and present some algorithms in detail for each viewpoints. Furthermore, we introduce the commonly used datasets. Finally, we summarize existing literature and present some potential research topics for the future. For this survey, we also created a GitHub project by collecting the supporting resources, at the link: https://github.com/Ziwei-Niu/DG_for_MedIA △ Less

Submitted 13 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: This is a withdrawn submission and will be considered invalid. Due to some errors and overlap with published papers, we have chosen to withdraw it

arXiv:2402.01013 [pdf, other]

Quantum Multiple Eigenvalue Gaussian filtered Search: an efficient and versatile quantum phase estimation method

Authors: Zhiyan Ding, Haoya Li, Lin Lin, HongKang Ni, Lexing Ying, Ruizhe Zhang

Abstract: Quantum phase estimation is one of the most powerful quantum primitives. This work proposes a new approach for the problem of multiple eigenvalue estimation: Quantum Multiple Eigenvalue Gaussian filtered Search (QMEGS). QMEGS leverages the Hadamard test circuit structure and only requires simple classical postprocessing. QMEGS is the first algorithm to simultaneously satisfy the following two prop… ▽ More Quantum phase estimation is one of the most powerful quantum primitives. This work proposes a new approach for the problem of multiple eigenvalue estimation: Quantum Multiple Eigenvalue Gaussian filtered Search (QMEGS). QMEGS leverages the Hadamard test circuit structure and only requires simple classical postprocessing. QMEGS is the first algorithm to simultaneously satisfy the following two properties: (1) It can achieve the Heisenberg-limited scaling without relying on any spectral gap assumption. (2) With a positive energy gap and additional assumptions on the initial state, QMEGS can estimate all dominant eigenvalues to $ε$ accuracy utilizing a significantly reduced circuit depth compared to the standard quantum phase estimation algorithm. In the most favorable scenario, the maximal runtime can be reduced to as low as $\log(1/ε)$. This implies that QMEGS serves as an efficient and versatile approach, achieving the best-known results for both gapped and gapless systems. Numerical results validate the efficiency of our proposed algorithm in various regimes. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00489 [pdf]

Proton Pencil-Beam Scanning Stereotactic Body Radiation Therapy and Hypofractionated Radiation Therapy for Thoracic Malignancies: Patterns of Practice Survey and Recommendations for Future Development from NRG Oncology and PTCOG

Authors: Wei Liu, Hongying Feng, Paige A. Taylor, Minglei Kang, Jiajian Shen, Jatinder Saini, Jun Zhou, Huan B. Giap, Nathan Y. Yu, Terence S. Sio, Pranshu Mohindra, Joe Y. Chang, Jeffrey D. Bradley, Ying Xiao, Charles B. Simone II, Liyong Lin

Abstract: Stereotactic body radiation therapy (SBRT) and hypofractionation using pencil-beam scanning (PBS) proton therapy (PBSPT) is an attractive option for thoracic malignancies. Combining the advantages of target coverage conformity and critical organ sparing from both PBSPT and SBRT, this new delivery technique has great potential to improve the therapeutic ratio, particularly for tumors near critical… ▽ More Stereotactic body radiation therapy (SBRT) and hypofractionation using pencil-beam scanning (PBS) proton therapy (PBSPT) is an attractive option for thoracic malignancies. Combining the advantages of target coverage conformity and critical organ sparing from both PBSPT and SBRT, this new delivery technique has great potential to improve the therapeutic ratio, particularly for tumors near critical organs. Safe and effective implementation of PBSPT SBRT/hypofractionation to treat thoracic malignancies is more challenging than the conventionally-fractionated PBSPT due to concerns of amplified uncertainties at the larger dose per fraction. NRG Oncology and Particle Therapy Cooperative Group (PTCOG) Thoracic Subcommittee surveyed US proton centers to identify practice patterns of thoracic PBSPT SBRT/hypofractionation. From these patterns, we present recommendations for future technical development of proton SBRT/hypofractionation for thoracic treatment. Amongst other points, the recommendations highlight the need for volumetric image guidance and multiple CT-based robust optimization and robustness tools to minimize further the impact of uncertainties associated with respiratory motion. Advances in direct motion analysis techniques are urgently needed to supplement current motion management techniques. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 36 pages, 4 figures, 4 tables

arXiv:2402.00345 [pdf, other]

IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators

Authors: Luyang Lin, Lingzhi Wang, Xiaoyan Zhao, **g Li, Kam-Fai Wong

Abstract: This study focuses on media bias detection, crucial in today's era of influential social media platforms sha** individual attitudes and opinions. In contrast to prior work that primarily relies on training specific models tailored to particular datasets, resulting in limited adaptability and subpar performance on out-of-domain data, we introduce a general bias detection framework, IndiVec, built… ▽ More This study focuses on media bias detection, crucial in today's era of influential social media platforms sha** individual attitudes and opinions. In contrast to prior work that primarily relies on training specific models tailored to particular datasets, resulting in limited adaptability and subpar performance on out-of-domain data, we introduce a general bias detection framework, IndiVec, built upon large language models. IndiVec begins by constructing a fine-grained media bias database, leveraging the robust instruction-following capabilities of large language models and vector database techniques. When confronted with new input for bias detection, our framework automatically selects the most relevant indicator from the vector database and employs majority voting to determine the input's bias label. IndiVec excels compared to previous methods due to its adaptability (demonstrating consistent performance across diverse datasets from various sources) and explainability (providing explicit top-k indicators to interpret bias predictions). Experimental results on four political bias datasets highlight IndiVec's significant superiority over baselines. Furthermore, additional experiments and analysis provide profound insights into the framework's effectiveness. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Report number: Accepted to EACL 2024

arXiv:2402.00290 [pdf, other]

MEIA: Towards Realistic Multimodal Interaction and Manipulation for Embodied Robots

Authors: Yang Liu, Xinshuai Song, Kaixuan Jiang, Weixing Chen, **gzhou Luo, Guanbin Li, Liang Lin

Abstract: With the surge in the development of large language models, embodied intelligence has attracted increasing attention. Nevertheless, prior works on embodied intelligence typically encode scene or historical memory in an unimodal manner, either visual or linguistic, which complicates the alignment of the model's action planning with embodied control. To overcome this limitation, we introduce the Mul… ▽ More With the surge in the development of large language models, embodied intelligence has attracted increasing attention. Nevertheless, prior works on embodied intelligence typically encode scene or historical memory in an unimodal manner, either visual or linguistic, which complicates the alignment of the model's action planning with embodied control. To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions. Specifically, we propose a novel Multimodal Environment Memory (MEM) module, facilitating the integration of embodied control with large models through the visual-language memory of scenes. This capability enables MEIA to generate executable action plans based on diverse requirements and the robot's capabilities. Furthermore, we construct an embodied question answering dataset based on a dynamic virtual cafe environment with the help of the large language model. In this virtual environment, we conduct several experiments, utilizing multiple large models through zero-shot learning, and carefully design scenarios for various situations. The experimental results showcase the promising performance of our MEIA in various embodied interactive tasks. △ Less

Submitted 26 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: Codes will be available at https://github.com/HCPLab-SYSU/CausalVLR

arXiv:2402.00045 [pdf, other]

Detecting Multimedia Generated by Large AI Models: A Survey

Authors: Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu

Abstract: The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting mu… ▽ More The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, and online detection tools to provide a valuable resource for researchers and practitioners in this field. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, hel** to ensure the integrity of information in the digital realm. The project link is https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey. △ Less

Submitted 7 February, 2024; v1 submitted 22 January, 2024; originally announced February 2024.

arXiv:2401.17049 [pdf, ps, other]

Movable Antenna-Enabled Co-Frequency Co-Time Full-Duplex Wireless Communication

Authors: **gze Ding, Zijian Zhou, Wenyao Li, Chenbo Wang, Lifeng Lin, Bingli Jiao

Abstract: Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This method is especially beneficial for co-frequency co-time full-duplex (CCFD) wireless communication, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/… ▽ More Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This method is especially beneficial for co-frequency co-time full-duplex (CCFD) wireless communication, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/receive antennas, we can mitigate the SI and enhance the reception of incoming signals. Thus, this paper proposes a novel MA-enabled point-to-point CCFD system and formulates the minimum achievable rate of two CCFD terminals. To maximize the minimum achievable rate and determine the near-optimal positions of the MAs, we introduce a solution based on projected particle swarm optimization (PPSO), which can circumvent common suboptimal positioning issues. Moreover, numerical results reveal that the PPSO method leads to a better performance compared to the conventional alternating position optimization (APO). The results also demonstrate that an MA-enabled CCFD system outperforms the one using fixed-position antennas (FPAs). △ Less

Submitted 7 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: This paper has been submitted to IEEE Wireless Communications Letters

arXiv:2401.14856 [pdf, other]

Memory-Inspired Temporal Prompt Interaction for Text-Image Classification

Authors: Xinyao Yu, Hao Sun, Ziwei Niu, Rui Qin, Zhenjia Bai, Yen-Wei Chen, Lanfen Lin

Abstract: In recent years, large-scale pre-trained multimodal models (LMM) generally emerge to integrate the vision and language modalities, achieving considerable success in various natural language processing and computer vision tasks. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy i… ▽ More In recent years, large-scale pre-trained multimodal models (LMM) generally emerge to integrate the vision and language modalities, achieving considerable success in various natural language processing and computer vision tasks. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy is studied to align modalities more efficiently. In this contex, we propose a novel prompt-based multimodal interaction strategy inspired by human memory strategy, namely Memory-Inspired Temporal Prompt Interaction (MITP). Our proposed method involves in two stages as in human memory strategy: the acquiring stage, and the consolidation and activation stage. We utilize temporal prompts on intermediate layers to imitate the acquiring stage, leverage similarity-based prompt interaction to imitate memory consolidation, and employ prompt generation strategy to imitate memory activation. The main strength of our paper is that we interact the prompt vectors on intermediate layers to leverage sufficient information exchange between modalities, with compressed trainable parameters and memory usage. We achieve competitive results on several datasets with relatively small memory usage and 2.0M of trainable parameters (about 1% of the pre-trained foundation model). △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.14828 [pdf, other]

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

Authors: **gyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

Abstract: Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D b… ▽ More Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while kee** the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively. △ Less

Submitted 25 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: Accpeted by Siggraph 2024 & ACM Transactions on Graphics

arXiv:2401.14580 [pdf, other]

Design Your Own Universe: A Physics-Informed Agnostic Method for Enhancing Graph Neural Networks

Authors: Dai Shi, Andi Han, Lequan Lin, Yi Guo, Zhiyong Wang, Junbin Gao

Abstract: Physics-informed Graph Neural Networks have achieved remarkable performance in learning through graph-structured data by mitigating common GNN challenges such as over-smoothing, over-squashing, and heterophily adaption. Despite these advancements, the development of a simple yet effective paradigm that appropriately integrates previous methods for handling all these challenges is still underway. I… ▽ More Physics-informed Graph Neural Networks have achieved remarkable performance in learning through graph-structured data by mitigating common GNN challenges such as over-smoothing, over-squashing, and heterophily adaption. Despite these advancements, the development of a simple yet effective paradigm that appropriately integrates previous methods for handling all these challenges is still underway. In this paper, we draw an analogy between the propagation of GNNs and particle systems in physics, proposing a model-agnostic enhancement framework. This framework enriches the graph structure by introducing additional nodes and rewiring connections with both positive and negative weights, guided by node labeling information. We theoretically verify that GNNs enhanced through our approach can effectively circumvent the over-smoothing issue and exhibit robustness against over-squashing. Moreover, we conduct a spectral analysis on the rewired graph to demonstrate that the corresponding GNNs can fit both homophilic and heterophilic graphs. Empirical validations on benchmarks for homophilic, heterophilic graphs, and long-term graph datasets show that GNNs enhanced by our method significantly outperform their original counterparts. △ Less

Submitted 12 June, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.14074 [pdf, other]

ProCNS: Progressive Prototype Calibration and Noise Suppression for Weakly-Supervised Medical Image Segmentation

Authors: Y. Liu, L. Lin, K. K. Y. Wong, X. Tang

Abstract: Weakly-supervised segmentation (WSS) has emerged as a solution to mitigate the conflict between annotation cost and model performance by adopting sparse annotation formats (e.g., point, scribble, block, etc.). Typical approaches attempt to exploit anatomy and topology priors to directly expand sparse annotations into pseudo-labels. However, due to a lack of attention to the ambiguous edges in medi… ▽ More Weakly-supervised segmentation (WSS) has emerged as a solution to mitigate the conflict between annotation cost and model performance by adopting sparse annotation formats (e.g., point, scribble, block, etc.). Typical approaches attempt to exploit anatomy and topology priors to directly expand sparse annotations into pseudo-labels. However, due to a lack of attention to the ambiguous edges in medical images and insufficient exploration of sparse supervision, existing approaches tend to generate erroneous and overconfident pseudo proposals in noisy regions, leading to cumulative model error and performance degradation. In this work, we propose a novel WSS approach, named ProCNS, encompassing two synergistic modules devised with the principles of progressive prototype calibration and noise suppression. Specifically, we design a Prototype-based Regional Spatial Affinity (PRSA) loss to maximize the pair-wise affinities between spatial and semantic elements, providing our model of interest with more reliable guidance. The affinities are derived from the input images and the prototype-refined predictions. Meanwhile, we propose an Adaptive Noise Perception and Masking (ANPM) module to obtain more enriched and representative prototype representations, which adaptively identifies and masks noisy regions within the pseudo proposals, reducing potential erroneous interference during prototype computation. Furthermore, we generate specialized soft pseudo-labels for the noisy regions identified by ANPM, providing supplementary supervision. Extensive experiments on three medical image segmentation tasks involving different modalities demonstrate that the proposed framework significantly outperforms representative state-of-the-art methods △ Less

Submitted 5 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.13225 [pdf, ps, other]

A New Look at the Scalar Meson $f_0(500)$ via $D^+\to π^+π^-\ell^+ν_\ell$ Decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai, X. Cai , et al. (615 additional authors not shown)

Abstract: Using $2.93~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773 GeV, we investigate the semileptonic decays $D^+\to π^+π^- \ell^+ν_\ell$ ($\ell=e$ and $μ$). The $D^+\to f_0(500)μ^+ν_μ$ decay is observed for the first time. By analyzing simultaneously the differential decay rates of $D^+\to f_0(500) μ^+ν_μ$ and… ▽ More Using $2.93~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773 GeV, we investigate the semileptonic decays $D^+\to π^+π^- \ell^+ν_\ell$ ($\ell=e$ and $μ$). The $D^+\to f_0(500)μ^+ν_μ$ decay is observed for the first time. By analyzing simultaneously the differential decay rates of $D^+\to f_0(500) μ^+ν_μ$ and $D^+\to f_0(500) e^+ν_e$ in different $\ell^+ν_\ell$ four-momentum transfer intervals, the product of the relevant hadronic form factor $f^{f_0}_{+}(0)$ and the magnitude of the $c\to d$ Cabibbo-Kobayashi-Maskawa matrix element $|V_{cd}|$ is determined to be $f_{+}^{f_0} (0)|V_{cd}|=0.0787\pm0.0060_{\rm stat}\pm0.0033_{\rm syst}$ for the first time. With the input of $|V_{cd}|$ from the global fit in the standard model, we determine $f_{+}^{f_0} (0)=0.350\pm0.027_{\rm stat}\pm0.015_{\rm syst}$. The absolute branching fractions of $D^+\to f_0(500)_{(π^+π^-)}μ^+ν_μ$ and $D^+\to ρ^0_{(π^+π^-)} μ^+ν_μ$ are determined as $(0.72\pm0.13_{\rm stat}\pm0.10_{\rm syst})\times10^{-3}$ and $(1.64\pm0.13_{\rm stat}\pm0.11_{\rm syst})\times 10^{-3}$. Combining these results with those of previous BESIII measurements on their semielectronic counterparts from the same data sample, we test lepton flavor universality by measuring the branching fraction ratios ${\mathcal B}_{D^+\to ρ^0 μ^+ν_μ}/{\mathcal B}_{D^+\to ρ^0 e^+ν_e}=0.88\pm0.10$ and ${\mathcal B}_{D^+\to f_0(500) μ^+ν_μ}/{\mathcal B}_{D^+\to f_0(500) e^+ν_e}=1.14\pm0.28$, which are compatible with the standard model expectation. △ Less

Submitted 4 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: Supplemental Materials added in this version

Report number: BAM-00660

arXiv:2401.12498 [pdf, other]

Understanding Cellular Noise with Optical Perturbation and Deep Learning

Authors: Chuanbo Liu, Yu Fu, Lu Lin, Elliot L. Elson, ** Wang

Abstract: Noise plays a crucial role in the regulation of cellular and organismal function and behavior. Exploring noise's impact is key to understanding fundamental biological processes, such as gene expression, signal transduction, and the mechanisms of development and evolution. Currently, a comprehensive method to quantify dynamical behavior of cellular noise within these biochemical systems is lack… ▽ More Noise plays a crucial role in the regulation of cellular and organismal function and behavior. Exploring noise's impact is key to understanding fundamental biological processes, such as gene expression, signal transduction, and the mechanisms of development and evolution. Currently, a comprehensive method to quantify dynamical behavior of cellular noise within these biochemical systems is lacking. In this study, we introduce an optically-controlled perturbation system utilizing the light-sensitive Phytochrome B (PhyB) from \textit{Arabidopsis thaliana}, which enables precise noise modulation with high spatial-temporal resolution. Our system exhibits exceptional sensitivity to light, reacting consistently to pulsed light signals, distinguishing it from other photoreceptor-based promoter systems that respond to a single light wavelength. To characterize our system, we developed a stochastic model for phytochromes that accounts for photoactivation/deactivation, thermal reversion, and the dynamics of the light-activated gene promoter system. To precisely control our system, we determined the rate constants for this model using an omniscient deep neural network that can directly map rate constant combinations to time-dependent state joint distributions. By adjusting the activation rates through light intensity and degradation rates via N-terminal mutagenesis, we illustrate that out optical-controlled perturbation can effectively modulate molecular expression level as well as noise. Our results highlight the potential of employing an optically-controlled gene perturbation system as a noise-controlled stimulus source. This approach, when combined with the analytical capabilities of a sophisticated deep neural network, enables the accurate estimation of rate constants from observational data in a broad range of biochemical reaction networks. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 33 pages, 4 figures

arXiv:2401.11891 [pdf, other]

Validation of Classical Transport Cross Section for Ion-Ion Interactions Under Repulsive Yukawa Potential

Authors: Tian-Xing Hu, Dong Wu, C. L. Lin, Z. M. Sheng, B. He, J. Zhang

Abstract: Value of cross section is a fundamental parameter to depict the transport of charged particles in matters. Due to masses of orders of magnitude higher than electrons and convenience of realistic calculation, the cross section of elastic nuclei-nuclei collision is usually treated via classical mechanics. The famous Bohr criterion was firstly proposed to judge whether the treatment via classical mec… ▽ More Value of cross section is a fundamental parameter to depict the transport of charged particles in matters. Due to masses of orders of magnitude higher than electrons and convenience of realistic calculation, the cross section of elastic nuclei-nuclei collision is usually treated via classical mechanics. The famous Bohr criterion was firstly proposed to judge whether the treatment via classical mechanics is reliable or not. Later, Lindhard generalized the results of Coulomb to screening potentials. Considering the increasing importance of detailed ion-ion interactions under modern simulation codes in inertial confinement fusion (ICF) researches, the validation of classical transport cross section for ion-ion interactions in a big range of parameter space is certainly required. In this work, the transport cross sections via classical mechanics under repulsive Yukawa potential are compared with those via quantum mechanics. Differences of differential cross sections are found with respect to scattering angles and velocities. Our results generally indicate that the classical picture fails at the cases of both low and high velocities, which represent a significant extension of the famous Bohr criterion and its generalized variations. Furthermore, the precise validation zones of classical picture is also analysed in this work. This work is of significant importance for benchmarking the modern ion-kinetic simulation codes in ICF researches, concerning the stop** power of $α$ particles in DT fuels, ion-ion friction and viscous effects in the formation of kinetic shocks. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11085 [pdf, other]

Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition

Authors: Yuefang Gao, Yuhao Xie, Zeke Zexi Hu, Tianshui Chen, Liang Lin

Abstract: Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER) due to the distribution variation across different domains. Current works mainly focus on learning domain-invariant features through global feature adaptation, while neglecting the transferability of local features. Additionally, these methods lack discriminative supervision during training on target… ▽ More Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER) due to the distribution variation across different domains. Current works mainly focus on learning domain-invariant features through global feature adaptation, while neglecting the transferability of local features. Additionally, these methods lack discriminative supervision during training on target datasets, resulting in deteriorated feature representation in target domain. To address these limitations, we propose an Adaptive Global-Local Representation Learning and Selection (AGLRLS) framework. The framework incorporates global-local adversarial adaptation and semantic-aware pseudo label generation to enhance the learning of domain-invariant and discriminative feature during training. Meanwhile, a global-local prediction consistency learning is introduced to improve classification results during inference. Specifically, the framework consists of separate global-local adversarial learning modules that learn domain-invariant global and local features independently. We also design a semantic-aware pseudo label generation module, which computes semantic labels based on global and local features. Moreover, a novel dynamic threshold strategy is employed to learn the optimal thresholds by leveraging independent prediction of global and local features, ensuring filtering out the unreliable pseudo labels while retaining reliable ones. These labels are utilized for model optimization through the adversarial learning process in an end-to-end manner. During inference, a global-local prediction consistency module is developed to automatically learn an optimal result from multiple predictions. We conduct comprehensive experiments and analysis based on a fair evaluation benchmark. The results demonstrate that the proposed framework outperforms the current competing methods by a substantial margin. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.10608 [pdf, other]

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

Authors: Hongyi Wang, Xiuju Du, **g Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

Abstract: The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing reg… ▽ More The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones for this task, which ignore the inherent multi-scale hierarchical data structure of digital pathology images. To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor. Different from traditional models that are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology images of different magnifications at a time to jointly predict the gene expressions at their corresponding common ST spot, aiming at learning a many-to-one relationship through training. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code is available at: https://github.com/Dootmaan/M2ORT/. △ Less

Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.10215 [pdf, other]

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

Authors: Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, Tatsuya Harada

Abstract: Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based war**, mesh-based, and neural re… ▽ More Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based war**, mesh-based, and neural rendering approaches, present challenges in maintaining multi-view consistency, incorporating non-facial information, and generalizing to new identities. In this paper, we propose a framework named GPAvatar that reconstructs 3D head avatars from one or several images in a single forward pass. The key idea of this work is to introduce a dynamic point-based expression field driven by a point cloud to precisely and effectively capture expressions. Furthermore, we use a Multi Tri-planes Attention (MTA) fusion module in the tri-planes canonical field to leverage information from multiple input images. The proposed method achieves faithful identity reconstruction, precise expression control, and multi-view consistency, demonstrating promising results for free-viewpoint rendering and novel view synthesis. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: ICLR 2024, code is available at https://github.com/xg-chu/GPAvatar

arXiv:2401.10190 [pdf, other]

A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions

Authors: Gil Goldshlager, Nilin Abrahamsen, Lin Lin

Abstract: Neural network wavefunctions optimized using the variational Monte Carlo method have been shown to produce highly accurate results for the electronic structure of atoms and small molecules, but the high cost of optimizing such wavefunctions prevents their application to larger systems. We propose the Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to reduce this bottlene… ▽ More Neural network wavefunctions optimized using the variational Monte Carlo method have been shown to produce highly accurate results for the electronic structure of atoms and small molecules, but the high cost of optimizing such wavefunctions prevents their application to larger systems. We propose the Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to reduce this bottleneck. SPRING combines ideas from the recently introduced minimum-step stochastic reconfiguration optimizer (MinSR) and the classical randomized Kaczmarz method for solving linear least-squares problems. We demonstrate that SPRING outperforms both MinSR and the popular Kronecker-Factored Approximate Curvature method (KFAC) across a number of small atoms and molecules, given that the learning rates of all methods are optimally tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after forty thousand training iterations, whereas both MinSR and KFAC fail to do so even after one hundred thousand iterations. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.08119 [pdf, other]

SpecSTG: A Fast Spectral Diffusion Framework for Probabilistic Spatio-Temporal Traffic Forecasting

Authors: Lequan Lin, Dai Shi, Andi Han, Junbin Gao

Abstract: Traffic forecasting, a crucial application of spatio-temporal graph (STG) learning, has traditionally relied on deterministic models for accurate point estimations. Yet, these models fall short of identifying latent risks of unexpected volatility in future observations. To address this gap, probabilistic methods, especially variants of diffusion models, have emerged as uncertainty-aware solutions.… ▽ More Traffic forecasting, a crucial application of spatio-temporal graph (STG) learning, has traditionally relied on deterministic models for accurate point estimations. Yet, these models fall short of identifying latent risks of unexpected volatility in future observations. To address this gap, probabilistic methods, especially variants of diffusion models, have emerged as uncertainty-aware solutions. However, existing diffusion methods typically focus on generating separate future time series for individual sensors in the traffic network, resulting in insufficient involvement of spatial network characteristics in the probabilistic learning process. To better leverage spatial dependencies and systematic patterns inherent in traffic data, we propose SpecSTG, a novel spectral diffusion framework. Our method generates the Fourier representation of future time series, transforming the learning process into the spectral domain enriched with spatial information. Additionally, our approach incorporates a fast spectral graph convolution designed for Fourier input, alleviating the computational burden associated with existing models. Numerical experiments show that SpecSTG achieves outstanding performance with traffic flow and traffic speed datasets compared to state-of-the-art baselines. The source code for SpecSTG is available at https://anonymous.4open.science/r/SpecSTG. △ Less

Submitted 23 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.07061 [pdf, other]

Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition

Authors: Hefeng Wu, Guangzhi Ye, Ziyang Zhou, Ling Tian, Qing Wang, Liang Lin

Abstract: Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy… ▽ More Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: 13 pages

arXiv:2401.05976 [pdf, other]

The ALMaQUEST Survey XII: Dense Molecular Gas as traced by HCN and HCO$^{+}$ in Green Valley Galaxies

Authors: Lihwai Lin, Hsi-An Pan, Sara L. Ellison, Nanase Harada, Maria J. Jimenez-Donaire, K. Decker French, William M. Baker, Bau-Ching Hsieh, Yusei Koyama, Carlos Lopez-Coba, Tomonari Michiyama, Kate Rowlands, Sebastian F. Sanchez, Mallory Thorp

Abstract: We present ALMA observations of two dense gas tracers, HCN(1-0) and HCO$^{+}$(1-0), for three galaxies in the green valley and two galaxies on the star-forming main sequence with comparable molecular gas fractions as traced by the CO(1-0) emissions, selected from the ALMaQUEST survey. We investigate whether the deficit of molecular gas star formation efficiency (SFE$_{\rm mol}$) that leads to the… ▽ More We present ALMA observations of two dense gas tracers, HCN(1-0) and HCO$^{+}$(1-0), for three galaxies in the green valley and two galaxies on the star-forming main sequence with comparable molecular gas fractions as traced by the CO(1-0) emissions, selected from the ALMaQUEST survey. We investigate whether the deficit of molecular gas star formation efficiency (SFE$_{\rm mol}$) that leads to the low specific star formation rate in these green valley galaxies is due to a lack of dense gas (characterized by the dense gas fraction $f_{\rm dense}$) or the low star formation efficiency of dense gas (SFE$_{\rm dense}$). We find that SFE$_{\rm mol}$ as traced by the CO emissions, when considering both star-forming and retired spaxels together, is tightly correlated with SFE$_{\rm dense}$ and depends only weakly on $f_{\rm dense}$. The specific star formation rate (sSFR) on kpc scales is primarily driven by SFE$_{\rm mol}$ and SFE$_{\rm dense}$, followed by the dependence on $f_{\rm mol}$, and is least correlated with $f_{\rm dense}$ or the dense-to-stellar mass ratio ($R_{\rm dense}$). When compared with other works in the literature, we find that our green valley sample shows lower global SFE$_{\rm mol}$ as well as lower SFE$_{\rm dense}$ while exhibiting similar dense gas fractions when compared to star-forming and starburst galaxies. We conclude that the star formation of the 3 green valley galaxies with a normal abundance of molecular gas is suppressed mainly due to the reduced SFE$_{\rm dense}$ rather than the lack of dense gas. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 20 pages, 13 figures, ApJ accepted

arXiv:2401.05823 [pdf, other]

Quantum Probability Theoretic Asset Return Modeling: A Novel Schrödinger-Like Trading Equation and Multimodal Distribution

Authors: Li Lin

Abstract: Quantum theory provides a comprehensive framework for quantifying uncertainty, often applied in quantum finance to explore the stochastic nature of asset returns. This perspective likens returns to microscopic particle motion, governed by quantum probabilities akin to physical laws. However, such approaches presuppose specific microscopic quantum effects in return changes, a premise criticized for… ▽ More Quantum theory provides a comprehensive framework for quantifying uncertainty, often applied in quantum finance to explore the stochastic nature of asset returns. This perspective likens returns to microscopic particle motion, governed by quantum probabilities akin to physical laws. However, such approaches presuppose specific microscopic quantum effects in return changes, a premise criticized for lack of guarantee. This paper diverges by asserting that quantum probability is a mathematical extension of classical probability to complex numbers. It isn't exclusively tied to microscopic quantum phenomena, bypassing the need for quantum effects in returns.By directly linking quantum probability's mathematical structure to traders' decisions and market behaviors, it avoids assuming quantum effects for returns and invoking the wave function. The complex phase of quantum probability, capturing transitions between long and short decisions while considering information interaction among traders, offers an inherent advantage over classical probability in characterizing the multimodal distribution of asset returns.Utilizing Fourier decomposition, we derive a Schrödinger-like trading equation, where each term explicitly corresponds to implications of market trading. The equation indicates discrete energy levels in financial trading, with returns following a normal distribution at the lowest level. As the market transitions to higher trading levels, a phase shift occurs in the return distribution, leading to multimodality and fat tails. Empirical research on the Chinese stock market supports the existence of energy levels and multimodal distributions derived from this quantum probability asset returns model. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.05780 [pdf]

Tunable terahertz photodetector using ferroelectric-integrated graphene plasmonics for portable spectrometer

Authors: Lin Lin, Junxiong Guo, Shangdong Li, Tianxun Gong, Juan Xia, Zenghui Wang, Jun Tang, Yang Zhang, **xing Zhang, Yuan Lin, Wen Huang, Xiaosheng Zhang

Abstract: Terahertz (THz) detector has great potential for use in imaging, spectroscopy, and communications due to its fascinating interactions between radiation and matter. However, current THz detection devices have limitations in sensitivity, operating frequency range, and bulky footprint. While recent ferroelectric-integrated graphene plasmonic devices show promise in overcoming these limitations, they… ▽ More Terahertz (THz) detector has great potential for use in imaging, spectroscopy, and communications due to its fascinating interactions between radiation and matter. However, current THz detection devices have limitations in sensitivity, operating frequency range, and bulky footprint. While recent ferroelectric-integrated graphene plasmonic devices show promise in overcoming these limitations, they are not yet extended to the THz range. Here, we propose a wavelength-sensitive terahertz detector that uses a single layer graphene integrated onto the ferroelectric thin film with patterned polarization domains. This device works at room temperature, with high responsivity and detectivity by coupling graphene plasmons with THz frequencies through spatial modulation of carrier behaviors using ferroelectric polarization, without requiring additional local electrodes. By reconfiguring an interweaving squared ferroelectric domain array with alternating upward and downward polarizations to highly confine graphene surface plasmon polaritons, our device achieves an ultrahigh responsivity of 1717 A W-1 and a normalized detectivity of 1.07*10^13 Jones at a resonance frequency of 6.30 THz and a 0.3 V bias voltage. We also show that the device makes possible for spectrum reconstruction application of portable spectrometer combining the mathematical algorithms. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 17 pages, 5 figures

arXiv:2401.03798 [pdf, other]

On the Nucleosynthesis in Accretion-Induced Collapse of White Dwarfs

Authors: Chun-Ming Yip, Ming-Chung Chu, Shing-Chi Leung, Lap-Ming Lin

Abstract: It has long been hypothesized that accretion-induced collapse (AIC) of white dwarfs contribute to heavy chemical elements production in the universe. We present one-dimensional neutrino-radiative hydrodynamic simulations of AIC followed by post-processing nucleosynthesis calculations of the ejecta. A proto-neutron star is formed after the AIC, and a neutrino burst with peak luminosity… ▽ More It has long been hypothesized that accretion-induced collapse (AIC) of white dwarfs contribute to heavy chemical elements production in the universe. We present one-dimensional neutrino-radiative hydrodynamic simulations of AIC followed by post-processing nucleosynthesis calculations of the ejecta. A proto-neutron star is formed after the AIC, and a neutrino burst with peak luminosity $\sim10^{53}$ erg s$^{-1}$, comparable to that of a core-collapse supernova (CCSN), is emitted. The ejecta mass of AIC could be up to $\sim10^{-2}$ M$_\odot$, and the first neutron-capture peak elements (Sr, Y, and Zr) could be abundantly synthesized, with an overproduction of $\sim10^{6}$ relative to the solar abundances. The yield of $^{56}\text{Ni}$ could be up to at most $\sim10^{-3}$ M$_\odot$, suggesting that the electromagnetic light curve associated with AIC is at least $2$ orders dimmer than those associated with Type Ia supernovae (Type Ia SN). The inferred upper bound of AIC event rate, from nucleosynthesis calculations, is at most $\sim10\,\%$ relative to those of CCSNe and Type Ia SNe. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 13 pages, 9 figures

arXiv:2401.03699 [pdf, other]

Probing Chiral-Symmetric Higher-Order Topological Insulators with Multipole Winding Number

Authors: Ling Lin, Chaohong Lee

Abstract: The interplay between crystalline symmetry and band topology gives rise to unprecedented lower-dimensional boundary states in higher-order topological insulators (HOTIs). However, the measurement of the topological invariants of HOTIs remains a significant challenge. Here, we define a {multipole winding number} (MWN) for chiral-symmetric HOTIs by applying a corner twisted boundary condition. The M… ▽ More The interplay between crystalline symmetry and band topology gives rise to unprecedented lower-dimensional boundary states in higher-order topological insulators (HOTIs). However, the measurement of the topological invariants of HOTIs remains a significant challenge. Here, we define a {multipole winding number} (MWN) for chiral-symmetric HOTIs by applying a corner twisted boundary condition. The MWN, arising from both bulk and boundary states, accurately captures the bulk-corner correspondence including boundary-obstructed topological phases. To address the measurement challenge, we leverage the perturbative nature of the corner twisted boundary condition and develop a real-space approach for determining the MWN in both two-dimensional and three-dimensional systems. The real-space formula provides an experimentally viable strategy for directly probing the topology of chiral-symmetric HOTIs through dynamical evolution. Our findings not only highlight the twisted boundary condition as a powerful tool for investigating HOTIs, but also establish a paradigm for exploring real-space formulas for the topological invariants of HOTIs. △ Less

Submitted 21 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.03328 [pdf, ps, other]

Negatively dependent optimal risk sharing

Authors: Jean-Gabriel Lauzier, Liyuan Lin, Ruodu Wang

Abstract: We analyze the problem of optimally sharing risk using allocations that exhibit counter-monotonicity, the most extreme form of negative dependence. Counter-monotonic allocations take the form of either "winner-takes-all" lotteries or "loser-loses-all" lotteries, and we respectively refer to these (normalized) cases as jackpot or scapegoat allocations. Our main theorem, the counter-monotonic improv… ▽ More We analyze the problem of optimally sharing risk using allocations that exhibit counter-monotonicity, the most extreme form of negative dependence. Counter-monotonic allocations take the form of either "winner-takes-all" lotteries or "loser-loses-all" lotteries, and we respectively refer to these (normalized) cases as jackpot or scapegoat allocations. Our main theorem, the counter-monotonic improvement theorem, states that for a given set of random variables that are either all bounded from below or all bounded from above, one can always find a set of counter-monotonic random variables such that each component is greater or equal than its counterpart in the convex order. We show that Pareto optimal allocations, if they exist, must be jackpot allocations when all agents are risk seeking. We essentially obtain the opposite when all agents have discontinuous Bernoulli utility functions, as scapegoat allocations maximize the probability of being above the discontinuity threshold. We also consider the case of rank-dependent expected utility (RDU) agents and find conditions which guarantee that RDU agents prefer jackpot allocations. We provide an application for the mining of cryptocurrencies and show that in contrast to risk-averse miners, RDU miners with small computing power never join a mining pool. Finally, we characterize the competitive equilibria with risk-seeking agents, providing a first and second fundamental theorem of welfare economics where all equilibrium allocations are jackpot allocations. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 35 pages, 1 figure, Keywords: Pareto optimality, Risk sharing, Counter-monotonicity, Risk seeking, Rank-dependent expected utility, Cryptocurrency mining pools

arXiv:2401.03221 [pdf, other]

MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond

Authors: Yupei Lin, Xiaoyu Xian, Yukai Shi, Liang Lin

Abstract: Recently, text-to-image diffusion models become a new paradigm in image processing fields, including content generation, image restoration and image-to-image translation. Given a target prompt, Denoising Diffusion Probabilistic Models (DDPM) are able to generate realistic yet eligible images. With this appealing property, the image translation task has the potential to be free from target image sa… ▽ More Recently, text-to-image diffusion models become a new paradigm in image processing fields, including content generation, image restoration and image-to-image translation. Given a target prompt, Denoising Diffusion Probabilistic Models (DDPM) are able to generate realistic yet eligible images. With this appealing property, the image translation task has the potential to be free from target image samples for supervision. By using a target text prompt for domain adaption, the diffusion model is able to implement zero-shot image-to-image translation advantageously. However, the sampling and inversion processes of DDPM are stochastic, and thus the inversion process often fail to reconstruct the input content. Specifically, the displacement effect will gradually accumulated during the diffusion and inversion processes, which led to the reconstructed results deviating from the source domain. To make reconstruction explicit, we propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion). More specifically, a prompt redescription mechanism is investigated to align the text prompts with latent code at each time step of the Denoising Diffusion Implicit Models (DDIM) inversion to pursue a structure-preserving reconstruction. With the revised DDIM inversion, MirrorDiffusion is able to realize accurate zero-shot image translation by editing optimized text prompts and latent code. Extensive experiments demonstrate that MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks by clear margins and practical model stability. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: A prompt re-description strategy is proposed for stabilizing the diffusion model in image-to-image translation. Code and dataset page: https://mirrordiffusion.github.io/

arXiv:2401.02913 [pdf, other]

Plug-in Diffusion Model for Sequential Recommendation

Authors: Haokai Ma, Ruobing Xie, Lei Meng, Xin Chen, Xu Zhang, Leyu Lin, Zhanhui Kang

Abstract: Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest pred… ▽ More Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest prediction, leading to the ignorance of the user's generalized preference contained within other items, thereby remaining constrained by the data sparsity issue. To address this issue, this paper presents a novel Plug-in Diffusion Model for Recommendation (PDRec) framework, which employs the diffusion model as a flexible plugin to jointly take full advantage of the diffusion-generating user preferences on all items. Specifically, PDRec first infers the users' dynamic preferences on all items via a time-interval diffusion model and proposes a Historical Behavior Reweighting (HBR) mechanism to identify the high-quality behaviors and suppress noisy behaviors. In addition to the observed items, PDRec proposes a Diffusion-based Positive Augmentation (DPA) strategy to leverage the top-ranked unobserved items as the potential positive samples, bringing in informative and diverse soft signals to alleviate data sparsity. To alleviate the false negative sampling issue, PDRec employs Noise-free Negative Sampling (NNS) to select stable negative samples for ensuring effective model optimization. Extensive experiments and analyses on four datasets have verified the superiority of the proposed PDRec over the state-of-the-art baselines and showcased the universality of PDRec as a flexible plugin for commonly-used sequential encoders in different recommendation scenarios. The code is available in https://github.com/hulkima/PDRec. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2401.02901 [pdf, other]

Charged-current non-standard neutrino interactions at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding , et al. (177 additional authors not shown)

Abstract: The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-… ▽ More The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-based approach (QM-NSI), the constraints on the CC-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ are extracted with and without the assumption that the effects of the new physics are the same in the production and detection processes, respectively. The approach based on the weak effective field theory (WEFT-NSI) deals with four types of CC-NSI represented by the parameters $[\varepsilon_{X}]_{eα}$. For both approaches, the results for the CC-NSI parameters are shown for cases with various fixed values of the CC-NSI and the Dirac CP-violating phases, and when they are allowed to vary freely. We find that constraints on the QM-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ from the Daya Bay experiment alone can reach the order $\mathcal{O}(0.01)$ for the former and $\mathcal{O}(0.1)$ for the latter, while for WEFT-NSI parameters $[\varepsilon_{X}]_{eα}$, we obtain $\mathcal{O}(0.1)$ for both cases. △ Less

Submitted 19 March, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 25 pages, 16 figures, 6 tables; 36 pages, format changed, references added

arXiv:2401.01369 [pdf, other]

doi 10.1145/3543507.3583313

RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems

Authors: Jiahong Zhou, Shunhui Mao, Guoliang Yang, Bo Tang, Qianlong Xie, Lebin Lin, Xingxing Wang, Dong Wang

Abstract: Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus… ▽ More Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i.e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints. Some of them focus on single-phase CR allocation, and others focus on multi-phase CR allocation but introduce some assumptions about queue truncation scenarios. However, these assumptions do not hold in other scenarios, such as retrieval channel selection and prediction model selection. Moreover, existing studies ignore the state transition process of requests between different phases, limiting the effectiveness of their approaches. This paper proposes a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA), which aims to maximize the total business revenue under the limitation of CRs. RL-MPCA formulates the CR allocation problem as a Weakly Coupled MDP problem and solves it with an RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to adapt to various CR allocation scenarios, and calibrates the Q-value by introducing multiple adaptive Lagrange multipliers (adaptive-$λ$) to avoid violating the global CR constraints. Finally, experiments on the offline simulation environment and online real-world recommender system validate the effectiveness of our approach. △ Less

Submitted 27 December, 2023; originally announced January 2024.

Comments: 11 pages, 7 figures, published to Proceedings of the ACM Web Conference 2023

arXiv:2401.00797 [pdf, other]

Distillation is All You Need for Practically Using Different Pre-trained Recommendation Models

Authors: Wenqi Sun, Ruobing Xie, Junjie Zhang, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen

Abstract: Pre-trained recommendation models (PRMs) have attracted widespread attention recently. However, their totally different model structure, huge model size and computation cost hinder their application in practical recommender systems. Hence, it is highly essential to explore how to practically utilize PRMs in real-world recommendations. In this paper, we propose a novel joint knowledge distillation… ▽ More Pre-trained recommendation models (PRMs) have attracted widespread attention recently. However, their totally different model structure, huge model size and computation cost hinder their application in practical recommender systems. Hence, it is highly essential to explore how to practically utilize PRMs in real-world recommendations. In this paper, we propose a novel joint knowledge distillation from different pre-trained recommendation models named PRM-KD for recommendation, which takes full advantages of diverse PRMs as teacher models for enhancing student models efficiently. Specifically, PRM-KD jointly distills diverse informative knowledge from multiple representative PRMs such as UniSRec, Recformer, and UniM^2Rec. The knowledge from the above PRMs are then smartly integrated into the student recommendation model considering their confidence and consistency. We further verify the universality of PRM-KD with various types of student models, including sequential recommendation, feature interaction, and graph-based models. Extensive experiments on five real-world datasets demonstrate the effectiveness and efficacy of PRM-KD, which could be viewed as an economical shortcut in practically and conveniently making full use of different PRMs in online systems. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.00695 [pdf, other]

Credible Teacher for Semi-Supervised Object Detection in Open Scene

Authors: **gyu Zhuang, Kuo Wang, Liang Lin, Guanbin Li

Abstract: Semi-Supervised Object Detection (SSOD) has achieved resounding success by leveraging unlabeled data to improve detection performance. However, in Open Scene Semi-Supervised Object Detection (O-SSOD), unlabeled data may contains unknown objects not observed in the labeled data, which will increase uncertainty in the model's predictions for known objects. It is detrimental to the current methods th… ▽ More Semi-Supervised Object Detection (SSOD) has achieved resounding success by leveraging unlabeled data to improve detection performance. However, in Open Scene Semi-Supervised Object Detection (O-SSOD), unlabeled data may contains unknown objects not observed in the labeled data, which will increase uncertainty in the model's predictions for known objects. It is detrimental to the current methods that mainly rely on self-training, as more uncertainty leads to the lower localization and classification precision of pseudo labels. To this end, we propose Credible Teacher, an end-to-end framework. Credible Teacher adopts an interactive teaching mechanism using flexible labels to prevent uncertain pseudo labels from misleading the model and gradually reduces its uncertainty through the guidance of other credible pseudo labels. Empirical results have demonstrated our method effectively restrains the adverse effect caused by O-SSOD and significantly outperforms existing counterparts. △ Less

Submitted 2 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: Accpet by ICASSP 2024

arXiv:2312.16794 [pdf, other]

ZONE: Zero-Shot Instruction-Guided Local Editing

Authors: Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xuhui Liu, Jiaming Liu, Li Lin, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang

Abstract: Recent advances in vision-language models like Stable Diffusion have shown remarkable power in creative image synthesis and editing.However, most existing text-to-image editing methods encounter two obstacles: First, the text prompt needs to be carefully crafted to achieve good results, which is not intuitive or user-friendly. Second, they are insensitive to local edits and can irreversibly affect… ▽ More Recent advances in vision-language models like Stable Diffusion have shown remarkable power in creative image synthesis and editing.However, most existing text-to-image editing methods encounter two obstacles: First, the text prompt needs to be carefully crafted to achieve good results, which is not intuitive or user-friendly. Second, they are insensitive to local edits and can irreversibly affect non-edited regions, leaving obvious editing traces. To tackle these problems, we propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE. We first convert the editing intent from the user-provided instruction (e.g., "make his tie blue") into specific image editing regions through InstructPix2Pix. We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model. We further develop an edge smoother based on FFT for seamless blending between the layer and the image.Our method allows for arbitrary manipulation of a specific region with a single instruction while preserving the rest. Extensive experiments demonstrate that our ZONE achieves remarkable local editing results and user-friendliness, outperforming state-of-the-art methods. Code is available at https://github.com/lsl001006/ZONE. △ Less

Submitted 12 April, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted at CVPR 2024

arXiv:2312.16607 [pdf, other]

A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mueller matrix images of liver pathological samples with radiomics features derived from corresponding pathological images to classify HCC and ICC. Our fusion network integrates a two-tier fusion approach, comprising early feature-level fusion and late classification-level fusion. By harnessing the strengths of polarization imaging techniques and image feature-based machine learning, our proposed fusion network significantly enhances classification accuracy. Notably, even at reduced imaging resolutions, the fusion network maintains robust performance due to the additional information provided by polarization features, which may not align with human visual perception. Our experimental results underscore the potential of this fusion network as a powerful tool for computer-aided diagnosis of HCC and ICC, showcasing the benefits and prospects of integrating polarization imaging techniques into the current image-intensive digital pathological diagnosis. We aim to contribute this innovative approach to top-tier journals, offering fresh insights and valuable tools in the fields of medical imaging and cancer diagnosis. By introducing polarization imaging into liver cancer classification, we demonstrate its interdisciplinary potential in addressing challenges in medical image analysis, promising advancements in medical imaging and cancer diagnosis. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.15314 [pdf, other]

Exact ground state of interacting electrons in magic angle graphene

Authors: Simon Becker, Lin Lin, Kevin D. Stubbs

Abstract: One of the most remarkable theoretical findings in magic angle twisted bilayer graphene (TBG) is the emergence of ferromagnetic Slater determinants as exact ground states for the interacting Hamiltonian at the chiral limit. This discovery provides an explanation for the correlated insulating phase which has been experimentally observed at half filling. This work is the first mathematical study of… ▽ More One of the most remarkable theoretical findings in magic angle twisted bilayer graphene (TBG) is the emergence of ferromagnetic Slater determinants as exact ground states for the interacting Hamiltonian at the chiral limit. This discovery provides an explanation for the correlated insulating phase which has been experimentally observed at half filling. This work is the first mathematical study of interacting models in magic angle graphene systems. These include not only TBG but also TBG-like systems featuring four flat bands per valley, and twisted trilayer graphene (TTG) systems with equal twist angles. We identify symmetries of the Bistritzer-MacDonald Hamiltonian that are responsible for characterizing the Hartree-Fock ground states as zero energy many-body ground states. Furthermore, for a general class of Hamiltonian, we establish criteria that the ferromagnetic Slater determinants are the unique ground states within the class of uniformly half-filled, translation invariant Slater determinants. We then demonstrate that these criteria can be explicitly verified for TBG and TBG-like systems at the chiral limit, using properties of Jacobi-$θ$ and Weierstrass-$\wp$ functions. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 50 pages, 5 figures

arXiv:2312.14702 [pdf, other]

doi 10.1093/mnras/stae377

The ALMaQUEST Survey XIV: do radial molecular gas flows affect the star-forming ability of barred galaxies?

Authors: Lucy M. Hogarth, Amélie Saintonge, Tim A. Davis, Sara L. Ellison, Lihwai Lin, Carlos López-Cobá, Hsi-An Pan, Mallory D. Thorp

Abstract: We investigate whether barred galaxies are statistically more likely to harbour radial molecular gas flows and what effect those flows have on their global properties. Using 46 galaxies from the ALMA-MaNGA QUEnching and STar formation (ALMaQUEST) survey, we identify galaxies hosting optical bars using a combination of the morphological classifications in Galaxy Zoo 2 and HyperLEDA. In order to det… ▽ More We investigate whether barred galaxies are statistically more likely to harbour radial molecular gas flows and what effect those flows have on their global properties. Using 46 galaxies from the ALMA-MaNGA QUEnching and STar formation (ALMaQUEST) survey, we identify galaxies hosting optical bars using a combination of the morphological classifications in Galaxy Zoo 2 and HyperLEDA. In order to detect radial molecular gas flows, we employ full 3D kinematic modelling of the ALMaQUEST CO(1-0) datacubes. By combining our bar classifications with our radial bar-driven flow detections, we find that galaxies classed as barred are statistically more likely to host large-scale radial gas motions compared to their un-barred and edge-on galaxy counterparts. Moreover, the majority of barred galaxies require multi-component surface brightness profiles in their best-fit models, indicative of the presence of resonance systems. We find that galaxies classed as barred with radial bar-driven flows ("barred + radial flow" subset) have significantly suppressed global star-formation efficiencies compared to barred galaxies without radial bar-driven flows and galaxies in the other morphological sub-samples. Our "barred + radial flow" subset galaxies also possess consistently centrally concentrated molecular gas distributions, with no indication of depleted gas mass fractions, suggesting that gas exhaustion is not the cause of their suppressed star formation. Furthermore, these objects have higher median gas mass surface densities in their central 1 kpc, implying that a central gas enhancements do not fuel central starbursts in these objects. We propose that dynamical effects, such as shear caused by large-scale inflows of gas, act to gravitationally stabilise the inner gas reservoirs. △ Less

Submitted 4 February, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 20 pages, 20 figures

arXiv:2312.09758 [pdf, other]

Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach

Authors: Ziliang Chen, Yongsen Zheng, Zhao-Rong Lai, Quanlong Guan, Liang Lin

Abstract: Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail… ▽ More Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. The \emph{fake invariance} severely endangers OOD generalization since the trustful objective can not be diagnosed and existing causal surgeries are invalid to rectify. In this paper, we review a IRL family (InvRat) under the Partially and Fully Informative Invariant Feature Structural Causal Models (PIIF SCM /FIIF SCM) respectively, to certify their weaknesses in representing fake invariant features, then, unify their causal diagrams to propose ReStructured SCM (RS-SCM). RS-SCM can ideally rebuild the spurious and the fake invariant features simultaneously. Given this, we further develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects. It can be easily implemented by a small feature selection subnet introduced in the IRL family, which is alternatively optimized to achieve our goal. Experiments verified the superiority of our approach to fight against the fake invariant issue across a variety of OOD generalization benchmarks. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: AAAI-2024

arXiv:2312.09501 [pdf, other]

EDA: Evolving and Distinct Anchors for Multimodal Motion Prediction

Authors: Longzhong Lin, Xuewu Lin, Tianwei Lin, Lichao Huang, Rong Xiong, Yue Wang

Abstract: Motion prediction is a crucial task in autonomous driving, and one of its major challenges lands in the multimodality of future behaviors. Many successful works have utilized mixture models which require identification of positive mixture components, and correspondingly fall into two main lines: prediction-based and anchor-based matching. The prediction clustering phenomenon in prediction-based ma… ▽ More Motion prediction is a crucial task in autonomous driving, and one of its major challenges lands in the multimodality of future behaviors. Many successful works have utilized mixture models which require identification of positive mixture components, and correspondingly fall into two main lines: prediction-based and anchor-based matching. The prediction clustering phenomenon in prediction-based matching makes it difficult to pick representative trajectories for downstream tasks, while the anchor-based matching suffers from a limited regression capability. In this paper, we introduce a novel paradigm, named Evolving and Distinct Anchors (EDA), to define the positive and negative components for multimodal motion prediction based on mixture models. We enable anchors to evolve and redistribute themselves under specific scenes for an enlarged regression capacity. Furthermore, we select distinct anchors before matching them with the ground truth, which results in impressive scoring performance. Our approach enhances all metrics compared to the baseline MTR, particularly with a notable relative reduction of 13.5% in Miss Rate, resulting in state-of-the-art performance on the Waymo Open Motion Dataset. Code is available at https://github.com/Longzhong-Lin/EDA. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI2024)

Showing 101–150 of 1,708 results for author: Lin, L