Search | arXiv e-print repository

Oscillations between Grid-Forming Converters in Weakly Connected Offshore WPPs

Authors: Sulav Ghimire, Kanakesh V. Kkuni, Gabriel M. G. Guerreiro, Emerson D. Guest, Kim H. Jensen, Guangya Yang

Abstract: This paper studies control interactions between grid-forming (GFM) converters exhibited by power and frequency oscillations in a weakly connected offshore wind power plant (WPP). Two GFM controls are considered, namely virtual synchronous machine (VSM) and virtual admittance (VAdm) based GFM. The GFM control methods are implemented in wind turbine generators (WTGs) of a verified aggregated model o… ▽ More This paper studies control interactions between grid-forming (GFM) converters exhibited by power and frequency oscillations in a weakly connected offshore wind power plant (WPP). Two GFM controls are considered, namely virtual synchronous machine (VSM) and virtual admittance (VAdm) based GFM. The GFM control methods are implemented in wind turbine generators (WTGs) of a verified aggregated model of a WPP and the control interaction between these GFM WTGs is studied for several cases: cases with the same GFM control methods, and cases with different GFM control methods. A sensitivity analysis is performed for the observed oscillations to understand which system parameter affects the oscillations the most. Several solution methods are proposed and the inapplicability of some of the conventional solution methods are elaborated in this paper. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13614 [pdf, other]

Develo** a $μ$Bq/m$^{3}$ level $^{226}$Ra concentration in water measurement system for the Jiangmen Underground Neutrino Observatory

Authors: C. Li, B. Wang, Y. Liu, C. Guo, Y. P. Zhang, J. C. Liu, Q. Tang, T. Y. Guan, C. G. Yang

Abstract: The Jiangmen Underground Neutrino Observatory (JUNO), a 20~kton multi-purpose low background Liquid Scintillator (LS) detector, was proposed primarily to determine the neutrino mass ordering. To suppress the radioactivity from the surrounding rocks and tag cosmic muons, the JUNO central detector is submerged in a Water Cherenkov Detector (WCD). In addition to being used in the WCD, ultrapure water… ▽ More The Jiangmen Underground Neutrino Observatory (JUNO), a 20~kton multi-purpose low background Liquid Scintillator (LS) detector, was proposed primarily to determine the neutrino mass ordering. To suppress the radioactivity from the surrounding rocks and tag cosmic muons, the JUNO central detector is submerged in a Water Cherenkov Detector (WCD). In addition to being used in the WCD, ultrapure water is used in LS filling, for which the $^{226}$Ra concentration in water needs to be less than 50~$μ$Bq/m$^3$. To precisely measure the $^{226}$Ra concentration in water, a 6.0~$μ$Bq/m$^3$ $^{226}$Ra concentration in water measurement system has been developed. In this paper, the detail of the measurement system as well as the $^{226}$Ra concentration measurement result in regular EWII ultrapure water will be presented. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 16 pages, 7 figures

arXiv:2402.12100 [pdf, other]

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation

Authors: Yi Liu, Guowei Yang, Gelei Deng, Feiyue Chen, Yuqi Chen, Ling Shi, Tianwei Zhang, Yang Liu

Abstract: With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tre… ▽ More With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.12076 [pdf, other]

Periodic Implicit Representation, Design and Optimization of Porous Structures Using Periodic B-splines

Authors: Gao Depeng, Gao Yang, Lin Hongwei

Abstract: Porous structures are intricate solid materials with numerous small pores, extensively used in fields like medicine, chemical engineering, and aerospace. However, the design of such structures using computer-aided tools is a time-consuming and tedious process.In this study, we propose a novel representation method and design approach for porous units that can be infinitely spliced to form a porous… ▽ More Porous structures are intricate solid materials with numerous small pores, extensively used in fields like medicine, chemical engineering, and aerospace. However, the design of such structures using computer-aided tools is a time-consuming and tedious process.In this study, we propose a novel representation method and design approach for porous units that can be infinitely spliced to form a porous structure. We use periodic B-spline functions to represent periodic or symmetric porous units. Starting from a voxel representation of a porous sample, the discrete distance field is computed. To fit the discrete distance field with a periodic B-spline, we introduce the constrained least squares progressive-iterative approximation algorithm, which results in an implicit porous unit. This unit can be subject to optimization to enhance connectivity and utilized for topology optimization, thereby improving the model's stiffness while maintaining periodicity or symmetry. The experimental results demonstrate the potential of the designed complex porous units in enhancing the mechanical performance of the model. Consequently, this study has the potential to incorporate remarkable structures derived from artificial design or nature into the design of high-performing models, showing the promise for biomimetic applications. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 38 pages, 12 figures

arXiv:2402.10776 [pdf]

doi 10.1109/ACCESS.2019.2904788

In-Vivo Hyperspectral Human Brain Image Database for Brain Cancer Detection

Authors: H. Fabelo, S. Ortega, A. Szolna, D. Bulters, J. F. Pineiro, S. Kabwama, A. Shanahan, H. Bulstrode, S. Bisshopp, B. R. Kiran, D. Ravi, R. Lazcano, D. Madronal, C. Sosa, C. Espino, M. Marquez, M. De la Luz Plaza, R. Camacho, D. Carrera, M. Hernandez, G. M. Callico, J. Morera, B. Stanciulescu, G. Z. Yang, R. Salvador , et al. (3 additional authors not shown)

Abstract: The use of hyperspectral imaging for medical applications is becoming more common in recent years. One of the main obstacles that researchers find when develo** hyperspectral algorithms for medical applications is the lack of specific, publicly available, and hyperspectral medical data. The work described in this paper was developed within the framework of the European project HELICoiD (HypErspe… ▽ More The use of hyperspectral imaging for medical applications is becoming more common in recent years. One of the main obstacles that researchers find when develo** hyperspectral algorithms for medical applications is the lack of specific, publicly available, and hyperspectral medical data. The work described in this paper was developed within the framework of the European project HELICoiD (HypErspectraL Imaging Cancer Detection), which had as a main goal the application of hyperspectral imaging to the delineation of brain tumors in real-time during neurosurgical operations. In this paper, the methodology followed to generate the first hyperspectral database of in-vivo human brain tissues is presented. Data was acquired employing a customized hyperspectral acquisition system capable of capturing information in the Visual and Near InfraRed (VNIR) range from 400 to 1000 nm. Repeatability was assessed for the cases where two images of the same scene were captured consecutively. The analysis reveals that the system works more efficiently in the spectral range between 450 and 900 nm. A total of 36 hyperspectral images from 22 different patients were obtained. From these data, more than 300 000 spectral signatures were labeled employing a semi-automatic methodology based on the spectral angle mapper algorithm. Four different classes were defined: normal tissue, tumor tissue, blood vessel, and background elements. All the hyperspectral data has been made available in a public repository. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 19 pages, 12 figures

Journal ref: IEEE Access, 2019, 7, pp. 39098 39116

arXiv:2402.08846 [pdf, other]

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Authors: Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning f… ▽ More In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning for the LLM. We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task. To be more specific, we benchmark and explore various combinations of LLMs and speech encoders, leading to the optimal LLM-based ASR system, which we call SLAM-ASR. The proposed SLAM-ASR provides a clean setup and little task-specific design, where only the linear projector is trained. To the best of our knowledge, SLAM-ASR achieves the best performance on the Librispeech benchmark among LLM-based ASR models and even outperforms the latest LLM-based audio-universal model trained on massive pair data. Finally, we explore the capability emergence of LLM-based ASR in the process of modal alignment. We hope that our study can facilitate the research on extending LLM with cross-modality capacity and shed light on the LLM-based ASR community. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Working in progress and will open-source soon

arXiv:2402.07403 [pdf]

Make it more specific: A novel uncertainty based airway segmentation application on 3D U-Net and its variants

Authors: Shiyi Wang, Yang Nan, Felder Federico N, Sheng Zhang, Walsh Simon L F, Guang Yang

Abstract: Each medical segmentation task should be considered with a specific AI algorithm based on its scenario so that the most accurate prediction model can be obtained. The most popular algorithms in medical segmentation, 3D U-Net and its variants, can directly implement the task of lung trachea segmentation, but its failure to consider the special tree-like structure of the trachea suggests that there… ▽ More Each medical segmentation task should be considered with a specific AI algorithm based on its scenario so that the most accurate prediction model can be obtained. The most popular algorithms in medical segmentation, 3D U-Net and its variants, can directly implement the task of lung trachea segmentation, but its failure to consider the special tree-like structure of the trachea suggests that there is much room for improvement in its segmentation accuracy. Therefore, a research gap exists because a great amount of state-of-the-art DL algorithms are vanilla 3D U-Net structures, which do not introduce the various performance-enhancing modules that come with special natural image modality in lung airway segmentation. In this paper, we proposed two different network structures Branch-Level U-Net (B-UNet) and Branch-Level CE-UNet (B-CE-UNet) which are based on U-Net structure and compared the prediction results with the same dataset. Specially, both of the two networks add branch loss and central line loss to learn the feature of fine branch endings of the airways. Uncertainty estimation algorithms are also included to attain confident predictions and thereby, increase the overall trustworthiness of our whole model. In addition, predictions of the lung trachea based on the maximum connectivity rate were calculated and extracted during post-processing for segmentation refinement and pruning. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.07192 [pdf]

doi 10.1371/journal.pone.0193721

Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

Authors: H. Fabelo, S. Ortega, D. Ravi, B. R. Kiran, C. Sosa, D. Bulters, G. M. Callico, H. Bulstrode, A. Szolna, J. F. Pineiro, S. Kabwama, D. Madronal, R. Lazcano, A. J. OShanahan, S. Bisshopp, M. Hernandez, A. Baez-Quevedo, G. Z. Yang, B. Stanciulescu, R. Salvador, E. Juarez, R. Sarmiento

Abstract: Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries du… ▽ More Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a noncontact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.05383 [pdf, other]

First measurement of the yield of $^8$He isotopes produced in liquid scintillator by cosmic-ray muons at Daya Bay

Authors: Daya Bay Collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding , et al. (177 additional authors not shown)

Abstract: Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546… ▽ More Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546$\pm$0.076 for $^8$He, and 6.73$\pm$0.73, 6.75$\pm$0.70, and 13.74$\pm$0.82 for $^9$Li at average muon energies of 63.9~GeV, 64.7~GeV, and 143.0~GeV, respectively. The measured production rate of $^8$He isotopes is more than an order of magnitude lower than any other measurement of cosmogenic isotope production. It replaces the results of previous attempts to determine the ratio of $^8$He to $^9$Li production that yielded a wide range of limits from 0 to 30\%. The results provide future liquid-scintillator-based experiments with improved ability to predict cosmogenic backgrounds. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.03541 [pdf, other]

HAMLET: Graph Transformer Neural Operator for Partial Differential Equations

Authors: Andrey Bryutkin, Jiahao Huang, Zhongying Deng, Guang Yang, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

Abstract: We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to… ▽ More We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to PDEs of arbitrary geometries and varied input formats. Notably, HAMLET scales effectively with increasing data complexity and noise, showcasing its robustness. HAMLET is not just tailored to a single type of physical simulation, but can be applied across various domains. Moreover, it boosts model resilience and performance, especially in scenarios with limited data. We demonstrate, through extensive experiments, that our framework is capable of outperforming current techniques for PDEs. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 17 pages, 7 figures, 6 tables

arXiv:2402.03473 [pdf, other]

Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images

Authors: Xiaodan Xing, Huiyu Zhou, Yingying Fang, Guang Yang

Abstract: AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital… ▽ More AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital watermarks aimed at facilitating the discernment of synthetic images' authenticity. These watermarks are embedded within the image pixels and are invisible to the human eye while remains their detectability. Nevertheless, a comprehensive investigation into the potential impact of these invisible watermarks on the utility of synthetic medical images has been lacking. In this study, we propose the incorporation of invisible watermarks into synthetic medical images and seek to evaluate their efficacy in the context of downstream classification tasks. Our goal is to pave the way for discussions on the viability of such watermarks in boosting the detectability of synthetic medical images, fortifying ethical standards, and safeguarding against data pollution and potential scams. △ Less

Submitted 21 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 5 pages

Journal ref: ISBI 2024

arXiv:2402.01434 [pdf, other]

Conditioning non-linear and infinite-dimensional diffusion processes

Authors: Elizabeth Louise Baker, Gefan Yang, Michael L. Severinsen, Christy Anna Hipsley, Stefan Sommer

Abstract: Generative diffusion models and many stochastic models in science and engineering naturally live in infinite dimensions before discretisation. To incorporate observed data for statistical and learning tasks, one needs to condition on observations. While recent work has treated conditioning linear processes in infinite dimensions, conditioning non-linear processes in infinite dimensions has not bee… ▽ More Generative diffusion models and many stochastic models in science and engineering naturally live in infinite dimensions before discretisation. To incorporate observed data for statistical and learning tasks, one needs to condition on observations. While recent work has treated conditioning linear processes in infinite dimensions, conditioning non-linear processes in infinite dimensions has not been explored. This paper conditions function valued stochastic processes without prior discretisation. To do so, we use an infinite-dimensional version of Girsanov's theorem to condition a function-valued stochastic process, leading to a stochastic differential equation (SDE) for the conditioned process involving the score. We apply this technique to do time series analysis for shapes of organisms in evolutionary biology, where we discretise via the Fourier basis and then learn the coefficients of the score function with score matching methods. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17001 [pdf, other]

A Framework of Data Assimilation for Wind Flow Fields by Physics-informed Neural Networks

Authors: Chang Yan, Shengfeng Xu, Zhenxu Sun, Thorsten Lutz, Dilong Guo, Guowei Yang

Abstract: Various types of measurement techniques, such as Light Detection and Ranging (LiDAR) devices, anemometers, and wind vanes, are extensively utilized in wind energy to characterize the inflow. However, these methods typically gather data at limited points within local wind fields, capturing only a fraction of the wind field's characteristics at wind turbine sites, thus hindering detailed wind field… ▽ More Various types of measurement techniques, such as Light Detection and Ranging (LiDAR) devices, anemometers, and wind vanes, are extensively utilized in wind energy to characterize the inflow. However, these methods typically gather data at limited points within local wind fields, capturing only a fraction of the wind field's characteristics at wind turbine sites, thus hindering detailed wind field analysis. This study introduces a framework using Physics-informed Neural Networks to assimilate diverse sensor data types. This includes line-of-sight wind speed, velocity magnitude and direction, velocity components, and pressure. Moreover, the parameterized Navier-Stokes equations are integrated as physical constraints, ensuring that the neural networks accurately represent atmospheric flow dynamics. The framework accounts for the turbulent nature of atmospheric boundary layer flow by including artificial eddy viscosity in the network outputs, enhancing the model's ability to learn and accurately depict large-scale flow structures. The reconstructed flow field and the effective wind speed are in good agreement with the actual data. Furthermore, a transfer learning strategy is employed for the online deployment of pre-trained PINN, which requires less time than that of the actual physical flow. This capability allows the framework to reconstruct wind flow fields in real time based on live data. In the demo cases, the maximum error between the effective wind speed reconstructed online and the actual value at the wind turbine site is only 3.7%. The proposed data assimilation framework provides a universal tool for reconstructing spatiotemporal wind flow fields using various measurement data. Additionally, it presents a viable approach for the online assimilation of real-time measurements. To facilitate the utilization of wind energy, our framework's source code is openly accessible. △ Less

Submitted 11 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.16564 [pdf]

Data and Physics driven Deep Learning Models for Fast MRI Reconstruction: Fundamentals and Methodologies

Authors: Jiahao Huang, Yinzhe Wu, Fanwen Wang, Yingying Fang, Yang Nan, Cagan Alkan, Lei Xu, Zhifan Gao, Weiwen Wu, Lei Zhu, Zhaolin Chen, Peter Lally, Neal Bangerter, Kawin Setsompop, Yike Guo, Daniel Rueckert, Ge Wang, Guang Yang

Abstract: Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-… ▽ More Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-and-play models to emergent full spectrum of generative models. We also explore the synergistic integration of data models with physics-based insights, encompassing the advancements in multi-coil hardware accelerations like parallel imaging and simultaneous multi-slice imaging, and the optimization of sampling patterns. We then focus on domain-specific challenges and opportunities, including image redundancy exploitation, image integrity, evaluation metrics, data heterogeneity, and model generalization. This work also discusses potential solutions and future research directions, emphasizing the role of data harmonization, and federated learning for further improving the general applicability and performance of these methods in MRI reconstruction. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.14917 [pdf, other]

Direct WIMP detection rates for transitions in isomeric nuclei

Authors: M. V Smirnov, G. Yang, Yu. N. Novikov, J. D. Vergados, D. Bonatsos

Abstract: The direct detection of dark matter constituents, in particular the weakly interacting massive particles (WIMPs), is central to particle physics and cosmology. In this paper we study WIMP induced transitions from isomeric nuclear states for two possible isomeric candidates: $\rm^{180}Ta$ and $\rm^{166}Ho$. The experimental setup, which can measure the possible decay of $\rm^{180}Ta$ induced by WIM… ▽ More The direct detection of dark matter constituents, in particular the weakly interacting massive particles (WIMPs), is central to particle physics and cosmology. In this paper we study WIMP induced transitions from isomeric nuclear states for two possible isomeric candidates: $\rm^{180}Ta$ and $\rm^{166}Ho$. The experimental setup, which can measure the possible decay of $\rm^{180}Ta$ induced by WIMPs, was proposed. The corresponding estimates of the half-life of $\rm^{180}Ta$ are given in the sense that the WIMP-nucleon interaction can be interpreted as ordinary radioactive decay. △ Less

Submitted 20 June, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.13564 [pdf, ps, other]

RIS Empowered Near-Field Covert Communications

Authors: Jun Liu, Gang Yang, Yuanwei Liu, Xiangyun Zhou

Abstract: This paper studies an extremely large-scale reconfigurable intelligent surface (XL-RIS) empowered covert communication system in the near-field region. Alice covertly transmits messages to Bob with the assistance of the XL-RIS, while evading detection by Willie. To enhance the covert communication performance, we maximize the achievable covert rate by jointly optimizing the hybrid analog and digit… ▽ More This paper studies an extremely large-scale reconfigurable intelligent surface (XL-RIS) empowered covert communication system in the near-field region. Alice covertly transmits messages to Bob with the assistance of the XL-RIS, while evading detection by Willie. To enhance the covert communication performance, we maximize the achievable covert rate by jointly optimizing the hybrid analog and digital beamformers at Alice, as well as the reflection coefficient matrix at the XL-RIS. An alternating optimization algorithm is proposed to solve the joint beamforming design problem. For the hybrid beamformer design, a semi-closed-form solution for fully digital beamformer is first obtained by a weighted minimum mean-square error based algorithm, then the baseband digital and analog beamformers at Alice are designed by approximating the fully digital beamformer via manifold optimization. For the XL-RIS's reflection coefficient matrix design, a low-complexity alternating direction method of multipliers based algorithm is proposed to address the challenge of large-scale variables and unit-modulus constraints. Numerical results unveil that i) the near-field communications can achieve a higher covert rate than the far-field covert communications in general, and still realize covert transmission even if Willie is located at the same direction as Bob and closer to the XL-RIS; ii) the proposed algorithm can enhance the covert rate significantly compared to the benchmark schemes; iii) the proposed algorithm leads to a beam diffraction pattern that can bypass Willie and achieve high-rate covert transmission to Bob. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 15 pages, 8 figures, submitted to IEEE journal

arXiv:2401.13055 [pdf, other]

Investigating the Star Formation Rates of AGN Hosts Relative to the Star-Forming Main Sequence

Authors: Nathan Cristello, Fan Zou, W. N. Brandt, Chien-Ting J. Chen, Joel Leja, Qingling Ni, Guang Yang

Abstract: A fundamental question in galaxy and black-hole evolution remains how galaxies and their supermassive black holes have evolved together over cosmic time. Specifically, it is still unclear how the position of X-ray active galactic nucleus (AGN) host galaxies with respect to the star-forming main sequence (MS) may change with the X-ray luminosity ($L_\mathrm{X}$) of the AGN or the stellar mass (… ▽ More A fundamental question in galaxy and black-hole evolution remains how galaxies and their supermassive black holes have evolved together over cosmic time. Specifically, it is still unclear how the position of X-ray active galactic nucleus (AGN) host galaxies with respect to the star-forming main sequence (MS) may change with the X-ray luminosity ($L_\mathrm{X}$) of the AGN or the stellar mass ($M_\star$) of the host galaxy. We use data from XMM-SERVS to probe this issue. XMM-SERVS is covered by the largest medium-depth X-ray survey (with superb supporting multiwavelength data) and thus contains the largest sample to date for study. To ensure consistency, we locally derive the MS from a large reference galaxy sample. In our analysis, we demonstrate that the turnover of the galaxy MS does not allow reliable conclusions to be drawn for high-mass AGNs, and we establish a robust safe regime where the results do not depend upon the choice of MS definition. Under this framework, our results indicate that less-massive AGN host-galaxies ($\log M_\star\sim9.5-10.5$ $M_\odot$) generally possess enhanced SFRs compared to their normal-galaxy counterparts while the more-massive AGN host galaxies ($\log M_\star\sim10.5-11.5$ $M_\odot$) lie on or below the star-forming MS. Further, we propose an empirical model for how the placement of an AGN with respect to the MS (SFR$_{norm}$) evolves as a function of both $M_\star$ and $L_\mathrm{X}$. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 15 pages, 10 figures, 1 table, accepted for publication in ApJ

arXiv:2401.09065 [pdf, other]

A unified scheme for calculating the exclusive semi-leptonic decays of hadrons

Authors: Guo-He Yang

Abstract: Exclusive semi-leptonic decay stands as a pivotal channel in the exploration of heavy flavor physics, primarily due to its straightforward experimental measurability. In this work, we delve into the hadron matrix element and hadron tensor within the context of exclusive semi-leptonic decays. We challenge the conventional exclusive decay theory by introducing a fresh perspective, revealing that whi… ▽ More Exclusive semi-leptonic decay stands as a pivotal channel in the exploration of heavy flavor physics, primarily due to its straightforward experimental measurability. In this work, we delve into the hadron matrix element and hadron tensor within the context of exclusive semi-leptonic decays. We challenge the conventional exclusive decay theory by introducing a fresh perspective, revealing that while the baryon sector is consistent, the meson sector warrants revision. Employing a novel form factor derived from the Taylor series expansion in our differential width calculations, we demonstrate that fitting experimental data necessitates a more streamlined and minimal parameter set compared to the standard Light Cone Sun Rul(LCSR) form factors. Furthermore, we propose that our hypothesis can be empirically validated or refuted through the precise measurement of the double differential decay width, offering a tangible path forward for experimental validation. △ Less

Submitted 23 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.08079 [pdf, other]

Adversarial Masking Contrastive Learning for vein recognition

Authors: Huafeng Qin, Yiquan Wu, Mounim A. El-Yacoubi, Jun Wang, Guangxiang Yang

Abstract: Vein recognition has received increasing attention due to its high security and privacy. Recently, deep neural networks such as Convolutional neural networks (CNN) and Transformers have been introduced for vein recognition and achieved state-of-the-art performance. Despite the recent advances, however, existing solutions for finger-vein feature extraction are still not optimal due to scarce traini… ▽ More Vein recognition has received increasing attention due to its high security and privacy. Recently, deep neural networks such as Convolutional neural networks (CNN) and Transformers have been introduced for vein recognition and achieved state-of-the-art performance. Despite the recent advances, however, existing solutions for finger-vein feature extraction are still not optimal due to scarce training image samples. To overcome this problem, in this paper, we propose an adversarial masking contrastive learning (AMCL) approach, that generates challenging samples to train a more robust contrastive learning model for the downstream palm-vein recognition task, by alternatively optimizing the encoder in the contrastive learning model and a set of latent variables. First, a huge number of masks are generated to train a robust generative adversarial network (GAN). The trained generator transforms a latent variable from the latent variable space into a mask space. Then, we combine the trained generator with a contrastive learning model to obtain our AMCL, where the generator produces challenging masking images to increase the contrastive loss and the contrastive learning model is trained based on the harder images to learn a more robust feature representation. After training, the trained encoder in the contrastive learning model is combined with a classification layer to build a classifier, which is further fine-tuned on labeled training data for vein recognition. The experimental results on three databases demonstrate that our approach outperforms existing contrastive learning approaches in terms of improving identification accuracy of vein classifiers and achieves state-of-the-art recognition results. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.04092 [pdf, other]

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Authors: Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

Abstract: Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is… ▽ More Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is an alternative that offers both adaptability and human-aligned results. User studies, however, can be very expensive to scale. This paper presents an automatic, versatile, and human-aligned evaluation metric for text-to-3D generative models. To this end, we first develop a prompt generator using GPT-4V to generate evaluating prompts, which serve as input to compare text-to-3D models. We further design a method instructing GPT-4V to compare two 3D assets according to user-defined criteria. Finally, we use these pairwise comparison results to assign these models Elo ratings. Experimental results suggest our metric strongly align with human preference across different evaluation criteria. △ Less

Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Project page: https://gpteval3d.github.io/ ; Code: https://github.com/3DTopia/GPTEval3D

arXiv:2401.03179 [pdf, other]

Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification

Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Geng Yang, Daixun Li, Yunsong Li

Abstract: In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundanc… ▽ More In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundancy levels and integrates performance-aware elements into the fused representation, facilitating the learning of semantics in both forward and backward directions. MIVit stands out by significantly reducing redundancy in the empirical distribution of each modality's separate and fused features. It employs oriented attention fusion (OAF) for extracting shallow local features across modalities in horizontal and vertical dimensions, and a Transformer feature extractor for extracting deep global features through long-range attention. We also propose an information aggregation constraint (IAC) based on mutual information, designed to remove redundant information and preserve complementary information within embedded features. Additionally, the information distribution flow (IDF) in MIVit enhances performance-awareness by distributing global classification information across different modalities' feature maps. This architecture also addresses missing modality challenges with lightweight independent modality classifiers, reducing the computational load typically associated with Transformers. Our results show that MIVit's bidirectional aggregate-distributing mechanism between modalities is highly effective, achieving an average overall accuracy of 95.56% across three multimodal datasets. This performance surpasses current state-of-the-art methods in MLCC. The code for MIVit is accessible at https://github.com/icey-zhang/MIViT. △ Less

Submitted 23 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

arXiv:2401.02901 [pdf, other]

Charged-current non-standard neutrino interactions at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding , et al. (177 additional authors not shown)

Abstract: The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-… ▽ More The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-based approach (QM-NSI), the constraints on the CC-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ are extracted with and without the assumption that the effects of the new physics are the same in the production and detection processes, respectively. The approach based on the weak effective field theory (WEFT-NSI) deals with four types of CC-NSI represented by the parameters $[\varepsilon_{X}]_{eα}$. For both approaches, the results for the CC-NSI parameters are shown for cases with various fixed values of the CC-NSI and the Dirac CP-violating phases, and when they are allowed to vary freely. We find that constraints on the QM-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ from the Daya Bay experiment alone can reach the order $\mathcal{O}(0.01)$ for the former and $\mathcal{O}(0.1)$ for the latter, while for WEFT-NSI parameters $[\varepsilon_{X}]_{eα}$, we obtain $\mathcal{O}(0.1)$ for both cases. △ Less

Submitted 19 March, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 25 pages, 16 figures, 6 tables; 36 pages, format changed, references added

arXiv:2401.01415 [pdf, other]

Probing a Magnetar Origin for the population of Extragalactic Fast X-ray Transients detected by Chandra

Authors: J. Quirola-Vásquez, F. E. Bauer, P. G. Jonker, W. N. Brandt, D. Eappachen, A. J. Levan, E. Lopez, B. Luo, M. E. Ravasio, H. Sun, Y. Q. Xue, G. Yang, X. C. Zheng

Abstract: Twenty-two extragalactic fast X-ray transients (FXTs) have now been discovered from two decades of Chandra data (analyzing ~259 Ms of data), with 17 associated with distant galaxies (>100 Mpc). Different mechanisms and progenitors have been proposed to explain their properties; nevertheless, after analyzing their timing, spectral parameters, host-galaxy properties, luminosity function, and volumet… ▽ More Twenty-two extragalactic fast X-ray transients (FXTs) have now been discovered from two decades of Chandra data (analyzing ~259 Ms of data), with 17 associated with distant galaxies (>100 Mpc). Different mechanisms and progenitors have been proposed to explain their properties; nevertheless, after analyzing their timing, spectral parameters, host-galaxy properties, luminosity function, and volumetric rates, their nature remains uncertain. We interpret a sub-sample of nine FXTs that show a plateau or a fast-rise light curve within the framework of a binary neutron star (BNS) merger magnetar model. We fit their light curves and derive magnetar (magnetic field and initial rotational period) and ejecta (ejecta mass and opacity) parameters. This model predicts two zones: an orientation-dependent free zone (where the magnetar spin-down X-ray photons escape freely to the observer) and a trapped zone (where the X-ray photons are initially obscured and only escape freely once the ejecta material becomes optically thin). We argue that six FXTs show properties consistent with the free zone and three FXTs with the trapped zone. This sub-sample of FXTs has a similar distribution of magnetic fields and initial rotation periods to those inferred for short gamma-ray bursts (SGRBs), suggesting a possible association. We compare the predicted ejecta emission fed by the magnetar emission (called merger-nova) to the optical and near-infrared upper limits of two FXTs, XRT 141001 and XRT 210423 where contemporaneous optical observations are available. The non-detections place lower limits on the redshifts of XRT 141001 and XRT 210423 of z>1.5 and >0.1, respectively. If the magnetar remnants lose energy via gravitational waves, it should be possible to detect similar objects with the current advanced LIGO detectors out to a redshift z<0.03, while future GW detectors will be able to detect them out to z=0.5. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: The paper was accepted for publication in Astronomy & Astrophysics

arXiv:2401.01369 [pdf, other]

doi 10.1145/3543507.3583313

RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems

Authors: Jiahong Zhou, Shunhui Mao, Guoliang Yang, Bo Tang, Qianlong Xie, Lebin Lin, Xingxing Wang, Dong Wang

Abstract: Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus… ▽ More Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i.e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints. Some of them focus on single-phase CR allocation, and others focus on multi-phase CR allocation but introduce some assumptions about queue truncation scenarios. However, these assumptions do not hold in other scenarios, such as retrieval channel selection and prediction model selection. Moreover, existing studies ignore the state transition process of requests between different phases, limiting the effectiveness of their approaches. This paper proposes a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA), which aims to maximize the total business revenue under the limitation of CRs. RL-MPCA formulates the CR allocation problem as a Weakly Coupled MDP problem and solves it with an RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to adapt to various CR allocation scenarios, and calibrates the Q-value by introducing multiple adaptive Lagrange multipliers (adaptive-$λ$) to avoid violating the global CR constraints. Finally, experiments on the offline simulation environment and online real-world recommender system validate the effectiveness of our approach. △ Less

Submitted 27 December, 2023; originally announced January 2024.

Comments: 11 pages, 7 figures, published to Proceedings of the ACM Web Conference 2023

arXiv:2401.01223 [pdf]

Twinning induced by elastic anisotropy in FCC crystals

Authors: Jie Huang, Mingyu Lei, Guangpeng Sun, Guochun Yang, Bin Wen

Abstract: Dislocation slip and deformation twin are widely regarded as two important mechanisms of active competition in the process of plastic deformation. Calculating and comparing the critical resolved shear stress (CRSS) of two deformation modes are the key to discussing the mechanical properties reflected by different mechanisms in crystals. Here, the paper proposes a model to predict the CRSS of discr… ▽ More Dislocation slip and deformation twin are widely regarded as two important mechanisms of active competition in the process of plastic deformation. Calculating and comparing the critical resolved shear stress (CRSS) of two deformation modes are the key to discussing the mechanical properties reflected by different mechanisms in crystals. Here, the paper proposes a model to predict the CRSS of discrete twins, resembling thin layers, using the elastic anisotropy theory and a macroscopic energy perspective. In addition, the directionality of deformation twinning is also verified. We investigated twinning in FCC crystals to illustrate the methodology, and predicted the CRSS of twinning under different variables such as temperature and strain rate, both of which were in excellent agreement with experimental and other theory results. It draws the conclusion that we can promote twinning nucleation by applying shear stress along the <112> direction to reduce the interface energy as a resistance term and increase the difference in strain energy for twinning nucleation. This conclusion provides a guiding direction for exploring and accurately predicting the conditions of twinning in FCC crystals in future. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 20 pages, 4 figures

arXiv:2312.17051 [pdf, other]

FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models

Authors: Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, Wangmeng Zuo

Abstract: Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. While the Contrastive Vision-Language Pre-Training (CLIP) model has been effective in addressing 2D few/zero-shot learning tasks, its direct application to 3D FSCIL faces limitations. These limitations arise from feature space misalignment and signif… ▽ More Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. While the Contrastive Vision-Language Pre-Training (CLIP) model has been effective in addressing 2D few/zero-shot learning tasks, its direct application to 3D FSCIL faces limitations. These limitations arise from feature space misalignment and significant noise in real-world scanned 3D data. To address these challenges, we introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC). RFE aligns the feature spaces of input point clouds and their embeddings by performing a unique dimensionality reduction on the feature space of pre-trained models (PTMs), effectively eliminating redundant information without compromising semantic integrity. On the other hand, SNC is a graph-based 3D model designed to capture robust geometric information within point clouds, thereby augmenting the knowledge lost due to projection, particularly when processing real-world scanned data. Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model. Traditional accuracy metrics are proved to be biased; thus, our metrics focus on the model's proficiency in learning new classes while maintaining the balance between old and new classes. Experimental results on both established 3D FSCIL benchmarks and our dataset demonstrate that our approach significantly outperforms existing state-of-the-art methods. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16122 [pdf, other]

Magnetic vortex control with current-induced axial magnetization in centrosymmetric Weyl materials

Authors: J. G. Yang, Yaroslav Tserkovnyak, D. A. Pesin

Abstract: We consider magnetic Weyl metals as a platform to achieve current control of magnetization textures with transport currents, utilizing their underlying band geometry. We show that the transport current in a Weyl semimetal produces an axial magnetization due to orbital magnetic moments of the Weyl electrons. The associated axial magnetization can generate a torque acting on the localized magnetic m… ▽ More We consider magnetic Weyl metals as a platform to achieve current control of magnetization textures with transport currents, utilizing their underlying band geometry. We show that the transport current in a Weyl semimetal produces an axial magnetization due to orbital magnetic moments of the Weyl electrons. The associated axial magnetization can generate a torque acting on the localized magnetic moments. For the case of a magnetic vortex in a nanodisk of Weyl materials, this current-induced torque can be used to reverse its circulation and polarity. We discuss the axial magnetization torques in Weyl metals on general symmetry grounds, and compare their strength to current-induced torques in more conventional materials. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.13799 [pdf]

Expanding the Pressure Frontier in Grüneisen Parameter Measurement: Study of Sodium Chloride

Authors: Jun Kong, Kaiyuan Shi, Xingbang Dong, Xiao Dong, Xin Zhang, Jiaqing Zhang, Lei Su, Guoqiang Yang

Abstract: The Grüneisen parameter (γ) is crucial for determining many thermal properties, including the anharmonic effect, thermostatistics, and equation of state (EOS) of materials. However, the isentropic adiabatic compression conditions required to measure the Grüneisen parameter under high pressure are difficult to achieve. Thus, direct experimental Grüneisen parameter data in a wide range of pressures… ▽ More The Grüneisen parameter (γ) is crucial for determining many thermal properties, including the anharmonic effect, thermostatistics, and equation of state (EOS) of materials. However, the isentropic adiabatic compression conditions required to measure the Grüneisen parameter under high pressure are difficult to achieve. Thus, direct experimental Grüneisen parameter data in a wide range of pressures is sparse. In this work, we developed a new device that can apply pressure (up to tens of GPa) with an extremely short time about 0.5 ms, confidently achieving isentropic adiabatic compression. Then, we applied our new technique to sodium chloride and measured its Grüneisen parameter, which conforms to previous theoretical predictions. According to our obtained sodium chloride Grüneisen parameters, the calculated Hugoniot curve of the NaCl B1 phase appears up to 20 GPa and 960 K, which compares very well with the shock compression experiment data by Fritz et al. and other calculation works. Our results suggest that this new method can reliably measure the Grüneisen parameter of even more materials, which is significant for researching the equation of state in substances. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.13752 [pdf]

Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Wei** Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, **yu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers. △ Less

Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 19 pages

arXiv:2312.13154 [pdf, other]

Joint Range-Velocity-Azimuth Estimation for OFDM-Based Integrated Sensing and Communication

Authors: Zelin Hu, Qibin Ye, Yixuan Huang, Su Hu, Gang Yang

Abstract: Orthogonal frequency division multiplexing (OFDM)-based integrated sensing and communication (ISAC) is promising for future sixth-generation mobile communication systems. Existing works focus on the joint estimation of the targets' range and velocity for OFDM-based ISAC systems. In contrast, this paper studies the three-dimensional joint estimation (3DJE) of range, velocity, and azimuth for OFDM-b… ▽ More Orthogonal frequency division multiplexing (OFDM)-based integrated sensing and communication (ISAC) is promising for future sixth-generation mobile communication systems. Existing works focus on the joint estimation of the targets' range and velocity for OFDM-based ISAC systems. In contrast, this paper studies the three-dimensional joint estimation (3DJE) of range, velocity, and azimuth for OFDM-based ISAC systems with multiple receive antennas. First, we establish the signal model and derive the Cramer-Rao bounds (CRBs) on the 3DJE. Furthermore, an auto-paired super-resolution 3DJE algorithm is proposed by exploiting the reconstructed observation sub-signal's translational invariance property in the time, frequency, and space domains. Finally, with the 5G New Radio parameter setup, simulation results show that the proposed algorithm achieves better estimation performance and its root mean square error is closer to the root of CRBs than existing methods. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: This manuscript has been submitted to the IEEE journal in 09-Aug-2023

arXiv:2312.12824 [pdf, other]

FedSODA: Federated Cross-assessment and Dynamic Aggregation for Histopathology Segmentation

Authors: Yuan Zhang, Yaolei Qi, Xiaoming Qi, Lotfi Senhadji, Yongyue Wei, Feng Chen, Guanyu Yang

Abstract: Federated learning (FL) for histopathology image segmentation involving multiple medical sites plays a crucial role in advancing the field of accurate disease diagnosis and treatment. However, it is still a task of great challenges due to the sample imbalance across clients and large data heterogeneity from disparate organs, variable segmentation tasks, and diverse distribution. Thus, we propose a… ▽ More Federated learning (FL) for histopathology image segmentation involving multiple medical sites plays a crucial role in advancing the field of accurate disease diagnosis and treatment. However, it is still a task of great challenges due to the sample imbalance across clients and large data heterogeneity from disparate organs, variable segmentation tasks, and diverse distribution. Thus, we propose a novel FL approach for histopathology nuclei and tissue segmentation, FedSODA, via synthetic-driven cross-assessment operation (SO) and dynamic stratified-layer aggregation (DA). Our SO constructs a cross-assessment strategy to connect clients and mitigate the representation bias under sample imbalance. Our DA utilizes layer-wise interaction and dynamic aggregation to diminish heterogeneity and enhance generalization. The effectiveness of our FedSODA has been evaluated on the most extensive histopathology image segmentation dataset from 7 independent datasets. The code is available at https://github.com/yuanzhang7/FedSODA. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP2024

arXiv:2312.10604 [pdf, other]

A Dual Domain Multi-exposure Image Fusion Network based on the Spatial-Frequency Integration

Authors: Guang Yang, Jie Li, Xinbo Gao

Abstract: Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures. Existing deep learning-based multi-exposure image fusion methods primarily focus on spatial domain fusion, neglecting the global modeling ability of the frequency domain. To effectively leverage the global illumination modeling ability of the frequency domain, we propose a novel… ▽ More Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures. Existing deep learning-based multi-exposure image fusion methods primarily focus on spatial domain fusion, neglecting the global modeling ability of the frequency domain. To effectively leverage the global illumination modeling ability of the frequency domain, we propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI. Initially, we revisit the properties of the Fourier transform on the 2D image, and verify the feasibility of multi-exposure image fusion on the frequency domain where the amplitude and phase component is able to guide the integration of the illumination information. Subsequently, we present the deep Fourier-based multi-exposure image fusion framework, which consists of a spatial path and frequency path for local and global modeling separately. Specifically, we introduce a Spatial-Frequency Fusion Block to facilitate efficient interaction between dual domains and capture complementary information from input images with different exposures. Finally, we combine a dual domain loss function to ensure the retention of complementary information in both the spatial and frequency domains. Extensive experiments on the PQA-MEF dataset demonstrate that our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches. Our code is available at https://github.com/SSyangguang/MEF-freq. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.09609 [pdf, other]

Semantic-Aware Transformation-Invariant RoI Align

Authors: Guo-Ye Yang, George Kiyohiro Nakayama, Zi-Kai Xiao, Tai-Jiang Mu, Xiaolei Huang, Shi-Min Hu

Abstract: Great progress has been made in learning-based object detection methods in the last decade. Two-stage detectors often have higher detection accuracy than one-stage detectors, due to the use of region of interest (RoI) feature extractors which extract transformation-invariant RoI features for different RoI proposals, making refinement of bounding boxes and prediction of object categories more robus… ▽ More Great progress has been made in learning-based object detection methods in the last decade. Two-stage detectors often have higher detection accuracy than one-stage detectors, due to the use of region of interest (RoI) feature extractors which extract transformation-invariant RoI features for different RoI proposals, making refinement of bounding boxes and prediction of object categories more robust and accurate. However, previous RoI feature extractors can only extract invariant features under limited transformations. In this paper, we propose a novel RoI feature extractor, termed Semantic RoI Align (SRA), which is capable of extracting invariant RoI features under a variety of transformations for two-stage detectors. Specifically, we propose a semantic attention module to adaptively determine different sampling areas by leveraging the global and local semantic relationship within the RoI. We also propose a Dynamic Feature Sampler which dynamically samples features based on the RoI aspect ratio to enhance the efficiency of SRA, and a new position embedding, \ie Area Embedding, to provide more accurate position information for SRA through an improved sampling area representation. Experiments show that our model significantly outperforms baseline models with slight computational overhead. In addition, it shows excellent generalization ability and can be used to improve performance with various state-of-the-art backbones and detection methods. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.08753 [pdf, other]

RDARS Empowered Massive MIMO System: Two-Timescale Transceiver Design with Imperfect CSI

Authors: Chengzhi Ma, **tao Wang, Xi Yang, Guanghua Yang, Wei Zhang, Shaodan Ma

Abstract: In this paper, we investigate a novel reconfigurable distributed antennas and reflecting surface (RDARS) aided multi-user massive MIMO system with imperfect CSI and propose a practical two-timescale (TTS) transceiver design to reduce the communication overhead and computational complexity of the system. In the RDARS-aided system, not only distribution gain but also reflection gain can be obtained… ▽ More In this paper, we investigate a novel reconfigurable distributed antennas and reflecting surface (RDARS) aided multi-user massive MIMO system with imperfect CSI and propose a practical two-timescale (TTS) transceiver design to reduce the communication overhead and computational complexity of the system. In the RDARS-aided system, not only distribution gain but also reflection gain can be obtained by a flexible combination of the distributed antennas and reflecting surface, which differentiates the system from the others and also makes the TTS design challenging. To enable the optimal TTS transceiver design, the achievable rate of the system is first derived in closed-form. Then the TTS design aiming at the weighted sum rate maximization is considered. To solve the challenging non-convex optimization problem with high-order design variables, i.e., the transmit powers and the phase shifts at the RDARS, a block coordinate descent based method is proposed to find the optimal solutions in semi-closed forms iteratively. Specifically, two efficient algorithms are proposed with provable convergence for the optimal phase shift design, i.e., Riemannian Gradient Ascent based algorithm by exploiting the unit-modulus constraints, and Two-Tier Majorization-Minimization based algorithm with closed-form optimal solutions in each iteration. Simulation results validate the effectiveness of the proposed algorithm and demonstrate the superiority of deploying RDARS in massive MIMO systems to provide substantial rate improvement with a significantly reduced total number of active antennas/RF chains and lower transmit power when compared to the DAS and RIS-aided systems. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 13 pages, 6 figures

arXiv:2312.08445 [pdf, ps, other]

Gluonic evanescent operators: negative-norm states and complex anomalous dimensions

Authors: Qingjun **, Ke Ren, Gang Yang, Rui Yu

Abstract: In this paper, we build on our previous work to further investigate the role of evanescent operators in gauge theories, with a particular focus on their contribution to violations of unitarity. We develop an efficient method for calculating the norms of gauge-invariant operators in Yang-Mills (YM) theory by employing on-shell form factors. Our analysis, applicable to general spacetime dimensions,… ▽ More In this paper, we build on our previous work to further investigate the role of evanescent operators in gauge theories, with a particular focus on their contribution to violations of unitarity. We develop an efficient method for calculating the norms of gauge-invariant operators in Yang-Mills (YM) theory by employing on-shell form factors. Our analysis, applicable to general spacetime dimensions, reveals the existence of negative norm states among evanescent operators. We also explore the one-loop anomalous dimensions of these operators and find complex anomalous dimensions. We broaden our analysis by considering YM theory coupled with scalar fields and we observe similar patterns of non-unitarity. The presence of negative norm states and complex anomalous dimensions across these analyses provides compelling evidence that general gauge theories are non-unitary in non-integer spacetime dimensions. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 41 pages, 2 figures

arXiv:2312.07039 [pdf, other]

Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges

Authors: Weiguang Zhao, Guanyu Yang, Rui Zhang, Chenru Jiang, Chaolong Yang, Yuyao Yan, Amir Hussain, Kaizhu Huang

Abstract: With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification wit… ▽ More With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification, and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D. △ Less

Submitted 16 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06164 [pdf, other]

Implicit Shape Modeling for Anatomical Structure Refinement of Volumetric Medical Images

Authors: Minghui Zhang, Hanxiao Zhang, Xin You, Guang-Zhong Yang, Yun Gu

Abstract: Shape modeling of volumetric data is essential for medical image analysis and computer-aided intervention. In practice, automated shape reconstruction cannot always achieve satisfactory results due to limited image resolution and a lack of sufficiently detailed shape priors used as constraints. In this paper, a unified framework is proposed for 3D shape modelling and segmentation refinement based… ▽ More Shape modeling of volumetric data is essential for medical image analysis and computer-aided intervention. In practice, automated shape reconstruction cannot always achieve satisfactory results due to limited image resolution and a lack of sufficiently detailed shape priors used as constraints. In this paper, a unified framework is proposed for 3D shape modelling and segmentation refinement based on implicit neural networks. To learn a sharable shape prior from different instances within the same category during training, physical details of volumetric data are firstly used to construct Physical-Informed Continuous Coordinate Transform (PICCT) for implicit shape modeling. For improved shape representation, implicit shape constraints based on Signed Distance Function (SDF) are used for both instances and latent templates. For inference, a Template Interaction Module (TIM) is proposed to refine 3D shapes produced by Convolutional Neural Networks (CNNs) via deforming deep implicit templates with latent codes. Experimental results on validation datasets involving liver, pancreas and lung segmentation demonstrate the superiority of our approach in shape refinement and reconstruction. The Chamfer Distance/Earth Mover's Distance achieved by the proposed method are 0.232/0.087 for the Liver dataset, 0.128/0.069 for the Pancreas dataset, and 0.417/0.100 for the Lung Lobe dataset, respectively. △ Less

Submitted 6 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.06155 [pdf]

Illustrating the structures of bias from immortal time using directed acyclic graphs

Authors: Guoyi Yang, Stephen Burgess, C Mary Schooling

Abstract: Background: Immortal time is a period of follow-up during which death or the study outcome cannot occur by design. Bias from immortal time has been increasingly recognized in epidemiologic studies. However, the fundamental causes and structures of bias from immortal time have not been explained systematically using a structural approach. Methods: We use an example "Do Nobel Prize winners live long… ▽ More Background: Immortal time is a period of follow-up during which death or the study outcome cannot occur by design. Bias from immortal time has been increasingly recognized in epidemiologic studies. However, the fundamental causes and structures of bias from immortal time have not been explained systematically using a structural approach. Methods: We use an example "Do Nobel Prize winners live longer than less recognized scientists?" for illustration. We illustrate how immortal time arises and present the structures of bias from immortal time using time-varying directed acyclic graphs (DAGs). We further explore the structures of bias with the exclusion of immortal time and with the presence of competing risks. We discuss how these structures are shared by different study designs in pharmacoepidemiology and provide solutions, where possible, to address the bias. Results: We illustrate that immortal time arises from using postbaseline information to define exposure or eligibility. We use time-varying DAGs to explain the structures of bias from immortal time are confounding by survival until exposure allocation or selection bias from selecting on survival until eligibility. We explain that excluding immortal time from the follow-up does not fully address this confounding or selection bias, and that the presence of competing risks can worsen the bias. Bias from immortal time may be avoided by aligning time zero, exposure allocation and eligibility, and by excluding individuals with prior exposure. Conclusions: Understanding bias from immortal time in terms of confounding or selection bias helps researchers identify and thereby avoid or ameliorate this bias. △ Less

Submitted 9 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 25 pages, 3 figures, 1 table and 3 supplemental figures

arXiv:2312.05984 [pdf, other]

Accurate Differential Operators for Hybrid Neural Fields

Authors: Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan

Abstract: Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybri… ▽ More Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybrid neural fields can cause noticeable and unreasonable artifacts. This is because they do not yield accurate spatial derivatives needed for these downstream applications. In this work, we propose two ways to circumvent these challenges. Our first approach is a post hoc operator that uses local polynomial-fitting to obtain more accurate derivatives from pre-trained hybrid neural fields. Additionally, we also propose a self-supervised fine-tuning approach that refines the neural field to yield accurate derivatives directly while preserving the initial signal. We show the application of our method on rendering, collision simulation, and solving PDEs. We observe that using our approach yields more accurate derivatives, reducing artifacts and leading to more accurate simulations in downstream applications. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.05562 [pdf, other]

Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models

Authors: Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, Taolue Chen

Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Langu… ▽ More Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (lLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most lLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage lLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various lLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by gpt-3.5-turbo (175B). Our study also showcases the potential of lLMs in software engineering applications. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: UNDER REVIEW

arXiv:2312.04868 [pdf]

Manipulator control of the Robotized TMS System with Incurved TMS Coil Case

Authors: Jaewoo Kim, Gi-hun Yang

Abstract: Objective: This study shows the force/torque control strategy for the robotized TMS system whose TMS coil's floor is incurved. The strategy considered the adhesion and friction between the coil and the subject's head. Methods: Hybrid position/force control and proportional torque were used for the strategy. The force magnitude applied for the force control was scheduled by the error between the co… ▽ More Objective: This study shows the force/torque control strategy for the robotized TMS system whose TMS coil's floor is incurved. The strategy considered the adhesion and friction between the coil and the subject's head. Methods: Hybrid position/force control and proportional torque were used for the strategy. The force magnitude applied for the force control was scheduled by the error between the coil's current position and the target point. Results: The larger desired force for the force controller makes the error quickly. By scheduling the force magnitude applied for the force control, the low error between the coil's current and target positions is maintained with the relatively small force after the larger force is applied for around 10 seconds. The proportional torque made the adhesion better by locating the contact area between the coil and the head close to the coil. I was shown by checking the $τ_c/F_c$ value from the experimental results. While the head slowly moved away from the coil during the TMS treatment, the coil still interacted with the head. Using that characteristic, the coil could locate the new target point using the force/torque strategy without any trajectory planning. Conclusion: The proposed force/torque controller enhanced the adhesion between the incurved TMS coil and the subject's head. It also reduced the error quickly by scheduling the magnitude of the force applied. Significance: This study proposes the robotized TMS system's force/torque control strategy considering the physical characteristics from the contact between the incurved TMS coil case and the subject's head. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04377 [pdf, other]

HARQ-IR Aided Short Packet Communications: BLER Analysis and Throughput Maximization

Authors: Fuchao He, Zheng Shi, Guanghua Yang, Xiaofan Li, Xinrong Ye, Shaodan Ma

Abstract: This paper introduces hybrid automatic repeat request with incremental redundancy (HARQ-IR) to boost the reliability of short packet communications. The finite blocklength information theory and correlated decoding events tremendously preclude the analysis of average block error rate (BLER). Fortunately, the recursive form of average BLER motivates us to calculate its value through the trapezoidal… ▽ More This paper introduces hybrid automatic repeat request with incremental redundancy (HARQ-IR) to boost the reliability of short packet communications. The finite blocklength information theory and correlated decoding events tremendously preclude the analysis of average block error rate (BLER). Fortunately, the recursive form of average BLER motivates us to calculate its value through the trapezoidal approximation and Gauss-Laguerre quadrature. Moreover, the asymptotic analysis is performed to derive a simple expression for the average BLER at high signal-to-noise ratio (SNR). Then, we study the maximization of long term average throughput (LTAT) via power allocation meanwhile ensuring the power and the BLER constraints. For tractability, the asymptotic BLER is employed to solve the problem through geometric programming (GP). However, the GP-based solution underestimates the LTAT at low SNR due to a large approximation error in this case. Alternatively, we also develop a deep reinforcement learning (DRL)-based framework to learn power allocation policy. In particular, the optimization problem is transformed into a constrained Markov decision process, which is solved by integrating deep deterministic policy gradient (DDPG) with subgradient method. The numerical results finally demonstrate that the DRL-based method outperforms the GP-based one at low SNR, albeit at the cost of increasing computational burden. △ Less

Submitted 9 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 13 pages, 10 figures

arXiv:2312.04328 [pdf, other]

A Multi-scale Information Integration Framework for Infrared and Visible Image Fusion

Authors: Guang Yang, Jie Li, Hanxiao Lei, Xinbo Gao

Abstract: Infrared and visible image fusion aims at generating a fused image containing the intensity and detail information of source images, and the key issue is effectively measuring and integrating the complementary information of multi-modality images from the same scene. Existing methods mostly adopt a simple weight in the loss function to decide the information retention of each modality rather than… ▽ More Infrared and visible image fusion aims at generating a fused image containing the intensity and detail information of source images, and the key issue is effectively measuring and integrating the complementary information of multi-modality images from the same scene. Existing methods mostly adopt a simple weight in the loss function to decide the information retention of each modality rather than adaptively measuring complementary information for different image pairs. In this study, we propose a multi-scale dual attention (MDA) framework for infrared and visible image fusion, which is designed to measure and integrate complementary information in both structure and loss function at the image and patch level. In our method, the residual downsample block decomposes source images into three scales first. Then, dual attention fusion block integrates complementary information and generates a spatial and channel attention map at each scale for feature fusion. Finally, the output image is reconstructed by the residual reconstruction block. Loss function consists of image-level, feature-level and patch-level three parts, of which the calculation of the image-level and patch-level two parts are based on the weights generated by the complementary information measurement. Indeed, to constrain the pixel intensity distribution between the output and infrared image, a style loss is added. Our fusion results perform robust and informative across different scenarios. Qualitative and quantitative results on two datasets illustrate that our method is able to preserve both thermal radiation and detailed information from two modalities and achieve comparable results compared with the other state-of-the-art methods. Ablation experiments show the effectiveness of our information integration architecture and adaptively measure complementary information retention in the loss function. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04319 [pdf, ps, other]

Color-Kinematics Duality with Minimal Deformation: Two-Loop Four-Gluon Amplitudes in Pure Yang-Mills Revisited

Authors: Zeyu Li, Gang Yang

Abstract: The conjectured duality between color and kinematics has significantly advanced our understanding of both gauge and gravitational theories. However, constructing numerators that manifest the color-kinematics (CK) duality, even for the two-loop four-gluon amplitude in pure Yang-Mills, has been challenging. In this paper, we revisit this amplitude and show that the difficulty of applying CK duality… ▽ More The conjectured duality between color and kinematics has significantly advanced our understanding of both gauge and gravitational theories. However, constructing numerators that manifest the color-kinematics (CK) duality, even for the two-loop four-gluon amplitude in pure Yang-Mills, has been challenging. In this paper, we revisit this amplitude and show that the difficulty of applying CK duality can be overcome by introducing a simple deformation. Our approach distinguishes itself from previous studies by maximizing the use of off-shell CK duality while maintaining a compact ansatz. In particular, the deformation we introduce satisfies a subset of off-shell dual Jacobi relations. The resulting numerators are presented in $d$-dimensionally Lorentz invariant local form and are applicable to all helicities of external gluons. The solution we provide can be directly employed to construct the corresponding gravitational amplitude through double copy. Our findings suggest a novel and efficient strategy for constructing high-loop gauge and gravitational amplitudes using CK duality. △ Less

Submitted 27 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 24 pages, 11 figures

arXiv:2312.04119 [pdf, other]

A brief introduction to a framework named Multilevel Guidance-Exploration Network

Authors: Guoqing Yang, Zhiming Luo, Jianzhe Gao, Yingxin Lai, Kun Yang, Yifan He, Shaozi Li

Abstract: Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to b… ▽ More Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to be reconstructed or predicted as effectively as normal data. Different from their methods, inspired by the Student-Teacher Network, we propose a novel framework called the Multilevel Guidance-Exploration Network(MGENet), which detects anomalies through the difference in high-level representation between the Guidance and Exploration network. Specifically, we first utilize the pre-trained Normalizing Flow that takes skeletal keypoints as input to guide an RGB encoder, which takes unmasked RGB frames as input, to explore motion latent features. Then, the RGB encoder guides the mask encoder, which takes masked RGB frames as input, to explore the latent appearance feature. Additionally, we design a Behavior-Scene Matching Module(BSMM) to detect scene-related behavioral anomalies. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on ShanghaiTech and UBnormal datasets. △ Less

Submitted 9 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: More reasonable

arXiv:2312.04062 [pdf, other]

A Low-Overhead Incorporation-Extrapolation based Few-Shot CSI Feedback Framework for Massive MIMO Systems

Authors: Binggui Zhou, Xi Yang, **tao Wang, Shaodan Ma, Feifei Gao, Guanghua Yang

Abstract: Accurate channel state information (CSI) is essential for downlink precoding in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems with orthogonal frequency-division multiplexing (OFDM). However, obtaining CSI through feedback from the user equipment (UE) becomes challenging with the increasing scale of antennas and subcarriers and leads to extremely high CSI… ▽ More Accurate channel state information (CSI) is essential for downlink precoding in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems with orthogonal frequency-division multiplexing (OFDM). However, obtaining CSI through feedback from the user equipment (UE) becomes challenging with the increasing scale of antennas and subcarriers and leads to extremely high CSI feedback overhead. Deep learning-based methods have emerged for compressing CSI but these methods generally require substantial collected samples and thus pose practical challenges. Moreover, existing deep learning methods also suffer from dramatically growing feedback overhead owing to their focus on full-dimensional CSI feedback. To address these issues, we propose a low-overhead Incorporation-Extrapolation based Few-Shot CSI feedback Framework (IEFSF) for massive MIMO systems. An incorporation-extrapolation scheme for eigenvector-based CSI feedback is proposed to reduce the feedback overhead. Then, to alleviate the necessity of extensive collected samples and enable few-shot CSI feedback, we further propose a knowledge-driven data augmentation (KDDA) method and an artificial intelligence-generated content (AIGC) -based data augmentation method by exploiting the domain knowledge of wireless channels and by exploiting a novel generative model, respectively. Experimental results based on the DeepMIMO dataset demonstrate that the proposed IEFSF significantly reduces CSI feedback overhead by 64 times compared with existing methods while maintaining higher feedback accuracy using only several hundred collected samples. △ Less

Submitted 21 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 16 pages, 12 figures, 5 tables. Accepted by IEEE Transactions on Wireless Communications

arXiv:2312.03582 [pdf]

The Extended Resonant Modal Theory and Its Applications

Authors: Ruqi Xiao, Wen Geyi, Guo Yang, Wen Wu

Abstract: In this paper, we extend the resonant modal theory (RMT) developed previously for a metal object to an arbitrary source region consisting of metals, dielectrics, or the combination of both. The influences of dielectrics on the fields are replaced by equivalent volume sources through the use of the compensation theorem in electromagnetic theory. The resonant frequencies can be determined by finding… ▽ More In this paper, we extend the resonant modal theory (RMT) developed previously for a metal object to an arbitrary source region consisting of metals, dielectrics, or the combination of both. The influences of dielectrics on the fields are replaced by equivalent volume sources through the use of the compensation theorem in electromagnetic theory. The resonant frequencies can be determined by finding the roots of the determinant of the matrix resulted from the discretization of the real homogeneous volume-surface integral equation derived from the requirement that the difference of stored field energies in the source region vanishes. As applications of the extended RMT, three examples have been investigated. The first example is a dielectric resonator antenna, and is designed by exciting the first resonant mode of the composite structure in which the dielectric cylinder is combined with a conformal metallic strip. The second example is a dual-band dielectric-coated metallic wire antenna. The third example studies the resonant modes of a rectangular patch antenna. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.03297 [pdf, other]

SoftMAC: Differentiable Soft Body Simulation with Forecast-based Contact Model and Two-way Coupling with Articulated Rigid Bodies and Clothes

Authors: Min Liu, Gang Yang, Siyuan Luo, Lin Shao

Abstract: Differentiable physics simulation provides an avenue to tackle previously intractable challenges through gradient-based optimization, thereby greatly improving the efficiency of solving robotics-related problems. To apply differentiable simulation in diverse robotic manipulation scenarios, a key challenge is to integrate various materials in a unified framework. We present SoftMAC, a differentiabl… ▽ More Differentiable physics simulation provides an avenue to tackle previously intractable challenges through gradient-based optimization, thereby greatly improving the efficiency of solving robotics-related problems. To apply differentiable simulation in diverse robotic manipulation scenarios, a key challenge is to integrate various materials in a unified framework. We present SoftMAC, a differentiable simulation framework that couples soft bodies with articulated rigid bodies and clothes. SoftMAC simulates soft bodies with the continuum-mechanics-based Material Point Method (MPM). We provide a novel forecast-based contact model for MPM, which effectively reduces penetration without introducing other artifacts like unnatural rebound. To couple MPM particles with deformable and non-volumetric clothes meshes, we also propose a penetration tracing algorithm that reconstructs the signed distance field in local area. Diverging from previous works, SoftMAC simulates the complete dynamics of each modality and incorporates them into a cohesive system with an explicit and differentiable coupling mechanism. The feature empowers SoftMAC to handle a broader spectrum of interactions, such as soft bodies serving as manipulators and engaging with underactuated systems. We conducted comprehensive experiments to validate the effectiveness and accuracy of the proposed differentiable pipeline in downstream robotic manipulation applications. Supplementary materials and videos are available on our project website at https://sites.google.com/view/softmac. △ Less

Submitted 16 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.02432 [pdf, other]

Orthogonal Adaptation for Modular Customization of Diffusion Models

Authors: Ryan Po, Guandao Yang, Kfir Aberman, Gordon Wetzstein

Abstract: Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods facilitate high-fidelity customization for individual concepts or a limited, pre-defined set of them, they fall short of achieving scalability, where a single model can… ▽ More Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods facilitate high-fidelity customization for individual concepts or a limited, pre-defined set of them, they fall short of achieving scalability, where a single model can seamlessly render countless concepts. In this paper, we address a new problem called Modular Customization, with the goal of efficiently merging customized models that were fine-tuned independently for individual concepts. This allows the merged model to jointly synthesize concepts in one image without compromising fidelity or incurring any additional computational costs. To address this problem, we introduce Orthogonal Adaptation, a method designed to encourage the customized models, which do not have access to each other during fine-tuning, to have orthogonal residual weights. This ensures that during inference time, the customized models can be summed with minimal interference. Our proposed method is both simple and versatile, applicable to nearly all optimizable weights in the model architecture. Through an extensive set of quantitative and qualitative evaluations, our method consistently outperforms relevant baselines in terms of efficiency and identity preservation, demonstrating a significant leap toward scalable customization of diffusion models. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Project page: https://ryanpo.com/ortha/

arXiv:2312.02126 [pdf, other]

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

Authors: Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten

Abstract: Dense simultaneous localization and map** (SLAM) is crucial for robotics and augmented reality applications. However, current methods are often hampered by the non-volumetric or implicit way they represent a scene. This work introduces SplaTAM, an approach that, for the first time, leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a si… ▽ More Dense simultaneous localization and map** (SLAM) is crucial for robotics and augmented reality applications. However, current methods are often hampered by the non-volumetric or implicit way they represent a scene. This work introduces SplaTAM, an approach that, for the first time, leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a single unposed RGB-D camera, surpassing the capabilities of existing methods. SplaTAM employs a simple online tracking and map** system tailored to the underlying Gaussian representation. It utilizes a silhouette mask to elegantly capture the presence of scene density. This combination enables several benefits over prior representations, including fast rendering and dense optimization, quickly determining if areas have been previously mapped, and structured map expansion by adding more Gaussians. Extensive experiments show that SplaTAM achieves up to 2x superior performance in camera pose estimation, map construction, and novel-view synthesis over existing methods, paving the way for more immersive high-fidelity SLAM applications. △ Less

Submitted 16 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: CVPR 2024. Website: https://spla-tam.github.io/

Showing 101–150 of 1,451 results for author: Yang, G