-
Flat bands and distinct density wave orders in correlated Kagome superconductor CsCr$_3$Sb$_5$
Authors:
Shuting Peng,
Yulei Han,
Yongkai Li,
Jianchang Shen,
Yu Miao,
Yang Luo,
Linwei Huai,
Zhipeng Ou,
Hongyu Li,
Ziji Xiang,
Zhengtai Liu,
Dawei Shen,
Makoto Hashimoto,
Donghui Lu,
Yugui Yao,
Zhenhua Qiao,
Zhiwei Wang,
Junfeng He
Abstract:
Kagome metal CsV$_3$Sb$_5$ has attracted much recent attention due to the coexistence of multiple exotic orders and the associated proposals to mimic unconventional high temperature superconductors. Nevertheless, magnetism and strong electronic correlations -- two essential ingredients for unconventional superconductivity, are absent in this V-based Kagome metal. CsCr$_3$Sb$_5$ is a newly discover…
▽ More
Kagome metal CsV$_3$Sb$_5$ has attracted much recent attention due to the coexistence of multiple exotic orders and the associated proposals to mimic unconventional high temperature superconductors. Nevertheless, magnetism and strong electronic correlations -- two essential ingredients for unconventional superconductivity, are absent in this V-based Kagome metal. CsCr$_3$Sb$_5$ is a newly discovered Cr-based parallel of CsV$_3$Sb$_5$, in which magnetism appears with charge density wave and superconductivity at different temperature and pressure regions. Enhanced electronic correlations are also suggested by theoretical proposals due to the calculated flat bands. Here, we report angle-resolved photoemission measurements and first-principles calculations on this new material system. Electron energy bands and the associated orbitals are resolved. Flat bands are observed near the Fermi level. Do** dependent measurements on Cs(Cr$_x$V$_{1-x}$)$_3$Sb$_5$ reveal a gradually enhanced band renormalization from CsV$_3$Sb$_5$ to CsCr$_3$Sb$_5$, accompanied by distinct spatial symmetry breaking states in the phase diagram.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Cephalometric Landmark Detection across Ages with Prototypical Network
Authors:
Han Wu,
Chong Wang,
Lanzhuju Mei,
Tong Yang,
Min Zhu,
Dingggang Shen,
Zhiming Cui
Abstract:
Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across…
▽ More
Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across various age groups, including adolescents and adults. In this paper, we propose CeLDA, the first work for Cephalometric Landmark Detection across Ages. Our method leverages a prototypical network for landmark detection by comparing image features with landmark prototypes. To tackle the appearance discrepancy of landmarks between age groups, we design new strategies for CeLDA to improve prototype alignment and obtain a holistic estimation of landmark prototypes from a large set of training images. Moreover, a novel prototype relation mining paradigm is introduced to exploit the anatomical relations between the landmark prototypes. Extensive experiments validate the superiority of CeLDA in detecting cephalometric landmarks on both adult and adolescent subjects. To our knowledge, this is the first effort toward develo** a unified solution and dataset for cephalometric landmark detection across age groups. Our code and dataset will be made public on https://github.com/ShanghaiTech-IMPACT/Cephalometric-Landmark-Detection-across-Ages-with-Prototypical-Network
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
RIGL: A Unified Reciprocal Approach for Tracing the Independent and Group Learning Processes
Authors:
Xiaoshan Yu,
Chuan Qin,
Dazhong Shen,
Shangshang Yang,
Hai** Ma,
Hengshu Zhu,
Xingyi Zhang
Abstract:
In the realm of education, both independent learning and group learning are esteemed as the most classic paradigms. The former allows learners to self-direct their studies, while the latter is typically characterized by teacher-directed scenarios. Recent studies in the field of intelligent education have leveraged deep temporal models to trace the learning process, capturing the dynamics of studen…
▽ More
In the realm of education, both independent learning and group learning are esteemed as the most classic paradigms. The former allows learners to self-direct their studies, while the latter is typically characterized by teacher-directed scenarios. Recent studies in the field of intelligent education have leveraged deep temporal models to trace the learning process, capturing the dynamics of students' knowledge states, and have achieved remarkable performance. However, existing approaches have primarily focused on modeling the independent learning process, with the group learning paradigm receiving less attention. Moreover, the reciprocal effect between the two learning processes, especially their combined potential to foster holistic student development, remains inadequately explored. To this end, in this paper, we propose RIGL, a unified Reciprocal model to trace knowledge states at both the individual and group levels, drawing from the Independent and Group Learning processes. Specifically, we first introduce a time frame-aware reciprocal embedding module to concurrently model both student and group response interactions across various time frames. Subsequently, we employ reciprocal enhanced learning modeling to fully exploit the comprehensive and complementary information between the two behaviors. Furthermore, we design a relation-guided temporal attentive network, comprised of dynamic graph modeling coupled with a temporal self-attention mechanism. It is used to delve into the dynamic influence of individual and group interactions throughout the learning processes. Conclusively, we introduce a bias-aware contrastive learning module to bolster the stability of the model's training. Extensive experiments on four real-world educational datasets clearly demonstrate the effectiveness of the proposed RIGL model.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning
Authors:
Xin Wang,
Zhiyun Song,
Yitao Zhu,
Sheng Wang,
Lichi Zhang,
Dinggang Shen,
Qian Wang
Abstract:
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.…
▽ More
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated. However, most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios. In this work, we propose a self-supervised super-resolution framework for inter-slice super-resolution of MR images. Our framework is first featured by pre-training on video dataset, as temporal correlation of videos is found beneficial for modeling the spatial relation among MR slices. Then, we use public high-quality MR dataset to fine-tune our pre-trained model, for enhancing awareness of our model to medical data. Finally, given a target dataset at hand, we utilize self-supervised fine-tuning to further ensure our model works well with user-specific super-resolution tasks. The proposed method demonstrates superior performance compared to other self-supervised methods and also holds the potential to benefit various downstream applications.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Observation of floating surface state in obstructed atomic insulator candidate NiP$_2$
Authors:
Xiang-Rui Liu,
Ming-Yuan Zhu,
Yuanwen Feng,
Meng Zeng,
Xiao-Ming Ma,
Yu-Jie Hao,
Yue Dai,
Rong-Hao Luo,
Kohei Yamagami,
Yi Liu,
Shengtao Cui,
Zhe Sun,
Jia-Yu Liu,
Zhengtai Liu,
Mao Ye,
Dawei Shen,
Bing Li,
Chang Liu
Abstract:
Obstructed atomic insulator is recently proposed as an unconventional material, in which electric charge centers localized at sites away from the atoms. A half-filling surface state would emerge at specific interfaces cutting through these charge centers and avoid intersecting any atoms. In this article, we utilized angle-resolved photoemission spectroscopy and density functional theory calculatio…
▽ More
Obstructed atomic insulator is recently proposed as an unconventional material, in which electric charge centers localized at sites away from the atoms. A half-filling surface state would emerge at specific interfaces cutting through these charge centers and avoid intersecting any atoms. In this article, we utilized angle-resolved photoemission spectroscopy and density functional theory calculations to study one of the obstructed atomic insulator candidates, NiP$_2$. A floating surface state with large effective mass that is isolated from all bulk states is resolved on the (100) cleavage plane, distinct from previously reported surface states in obstructed atomic insulators that are merged into bulk bands. Density functional theory calculation results elucidate that this floating surface state is originated from the obstructed Wannier charge centers, albeit underwent surface reconstruction that splits the half-filled obstructed surface state. Our findings not only shed lights on the spectroscopy study of obstructed atomic insulators and obstructed surface states, but also provide possible route for development of new catalysts.
△ Less
Submitted 16 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Personalized Predictions from Population Level Experiments: A Study on Alzheimer's Disease
Authors:
Dennis Shen,
Anish Agarwal,
Vishal Misra,
Bjoern Schelter,
Devavrat Shah,
Helen Shiells,
Claude Wischik
Abstract:
The purpose of this article is to infer patient level outcomes from population level randomized control trials (RCTs). In this pursuit, we utilize the recently proposed synthetic nearest neighbors (SNN) estimator. At its core, SNN leverages information across patients to impute missing data associated with each patient of interest. We focus on two types of missing data: (i) unrecorded outcomes fro…
▽ More
The purpose of this article is to infer patient level outcomes from population level randomized control trials (RCTs). In this pursuit, we utilize the recently proposed synthetic nearest neighbors (SNN) estimator. At its core, SNN leverages information across patients to impute missing data associated with each patient of interest. We focus on two types of missing data: (i) unrecorded outcomes from discontinuing the assigned treatments and (ii) unobserved outcomes associated with unassigned treatments. Data imputation in the former powers and de-biases RCTs, while data imputation in the latter simulates "synthetic RCTs" to predict the outcomes for each patient under every treatment. The SNN estimator is interpretable, transparent, and causally justified under a broad class of missing data scenarios. Relative to several standard methods, we empirically find that SNN performs well for the above two applications using Phase 3 clinical trial data on patients with Alzheimer's Disease. Our findings directly suggest that SNN can tackle a current pain point within the clinical trial workflow on patient dropouts and serve as a new tool towards the development of precision medicine. Building on our insights, we discuss how SNN can further generalize to real-world applications.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Gate-Tunable Multi-Band van der Waals Photodetector and Polarization Sensor
Authors:
Daozhi Shen,
HeeBong Yang,
Tarun Patel,
Daniel A. Rhodes,
Thomas Timusk,
Y. Norman Zhou,
Na Young Kim,
Adam W. Tsen
Abstract:
A single photodetector with tunable detection wavelengths and polarization sensitivity can potentially be harnessed for diverse optical applications ranging from imaging and sensing to telecommunications. Such a device will require the combination of multiple material systems with different structures, bandgaps, and photoelectrical responses, which is extremely difficult to engineer using traditio…
▽ More
A single photodetector with tunable detection wavelengths and polarization sensitivity can potentially be harnessed for diverse optical applications ranging from imaging and sensing to telecommunications. Such a device will require the combination of multiple material systems with different structures, bandgaps, and photoelectrical responses, which is extremely difficult to engineer using traditional epitaxial films. Here, we develop a multi-functional and high-performance photosensor using all van der Waals materials. The device features a gate-tunable spectral response that is switchable between near-infrared/visible and short-/mid-wave infrared, as well as broadband operation, at room temperature. The linear polarization sensitivity in the telecommunications O-band can also be directly modulated between horizontal, vertical, and nonpolarizing modes. These effects originate from the balance of photocurrent generation in two of the active layers that can be manipulated by an electric field. The photodetector features high detectivity (>109 cmHz1/2W-1) together with fast operation speed (~ 1 MHz) and can be further exploited for dual visible and infrared imaging.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Phased Consistency Model
Authors:
Fu-Yun Wang,
Zhaoyang Huang,
Alexander William Bergman,
Dazhong Shen,
Peng Gao,
Michael Lingelbach,
Keqiang Sun,
Weikang Bian,
Guanglu Song,
Yu Liu,
Hongsheng Li,
Xiaogang Wang
Abstract:
The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phas…
▽ More
The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator. More details are available at https://g-u-n.github.io/projects/pcm/.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting
Authors:
Shuojue Yang,
Qian Li,
Daiyun Shen,
Bingchen Gong,
Qi Dou,
Yueming **
Abstract:
Tissue deformation poses a key challenge for accurate surgical scene reconstruction. Despite yielding high reconstruction quality, existing methods suffer from slow rendering speeds and long training times, limiting their intraoperative applicability. Motivated by recent progress in 3D Gaussian Splatting, an emerging technology in real-time 3D rendering, this work presents a novel fast reconstruct…
▽ More
Tissue deformation poses a key challenge for accurate surgical scene reconstruction. Despite yielding high reconstruction quality, existing methods suffer from slow rendering speeds and long training times, limiting their intraoperative applicability. Motivated by recent progress in 3D Gaussian Splatting, an emerging technology in real-time 3D rendering, this work presents a novel fast reconstruction framework, termed Deform3DGS, for deformable tissues during endoscopic surgery. Specifically, we introduce 3D GS into surgical scenes by integrating a point cloud initialization to improve reconstruction. Furthermore, we propose a novel flexible deformation modeling scheme (FDM) to learn tissue deformation dynamics at the level of individual Gaussians. Our FDM can model the surface deformation with efficient representations, allowing for real-time rendering performance. More importantly, FDM significantly accelerates surgical scene reconstruction, demonstrating considerable clinical values, particularly in intraoperative settings where time efficiency is crucial. Experiments on DaVinci robotic surgery videos indicate the efficacy of our approach, showcasing superior reconstruction fidelity PSNR: (37.90) and rendering speed (338.8 FPS) while substantially reducing training time to only 1 minute/scene. Our code is available at https://github.com/**lab-imvr/Deform3DGS.
△ Less
Submitted 30 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Large band-splitting in $g$-wave type altermagnet CrSb
Authors:
Jianyang Ding,
Zhicheng Jiang,
Xiuhua Chen,
Zicheng Tao,
Zhengtai Liu,
Jishan Liu,
Tongrui Li,
Jiayu Liu,
Yichen Yang,
Runfeng Zhang,
Liwei Deng,
Wenchuan **g,
Yu Huang,
Yuming Shi,
Shan Qiao,
Yilin Wang,
Yanfeng Guo,
Donglai Feng,
Dawei Shen
Abstract:
Altermagnetism (AM), a newly discovered magnetic state, ingeniously integrates the properties of ferromagnetism and antiferromagnetism, representing a significant breakthrough in the field of magnetic materials. Despite experimental verification of some typical AM materials, such as MnTe and MnTe$_2$, the pursuit of AM materials that feature larger spin splitting and higher transition temperature…
▽ More
Altermagnetism (AM), a newly discovered magnetic state, ingeniously integrates the properties of ferromagnetism and antiferromagnetism, representing a significant breakthrough in the field of magnetic materials. Despite experimental verification of some typical AM materials, such as MnTe and MnTe$_2$, the pursuit of AM materials that feature larger spin splitting and higher transition temperature is still essential. Here, our research focuses on CrSb, which possesses N{é}el temperature of up to 700K and giant spin splitting near the Fermi level ($E_F$). Utilizing high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations, we meticulously map the three-dimensional electronic structure of CrSb. Our photoemission spectroscopic results on both (0001) and (10$\overline{1}$0) cleavages of CrSb collaboratively reveal unprecedented details on AM-induced band splitting, and subsequently pin down its unique bulk $g$-wave symmetry through quantitative analysis of the angular and photon-energy dependence of spin splitting. Moreover, the observed spin splitting reaches the magnitude of 0.93~eV near $E_F$, the most substantial among all confirmed AM materials. This study not only validates the nature of CrSb as a prototype $g$-wave like AM material but also underscores its pivotal role in pioneering applications in spintronics.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning
Authors:
Zhentao Liu,
Huangxuan Zhao,
Wenhui Qin,
Zhenghong Zhou,
Xinggang Wang,
Wen** Wang,
Xiaochun Lai,
Chuansheng Zheng,
Dinggang Shen,
Zhiming Cui
Abstract:
Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substanti…
▽ More
Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. However, sparse-view DSA reconstruction, aimed at reducing radiation dosage, is still underexplored in the research community. The dynamic blood flow and insufficient input of sparse-view DSA images present significant challenges to the 3D vessel reconstruction task. In this study, we propose to use a time-agnostic vessel probability field to solve this problem effectively. Our approach, termed as vessel probability guided attenuation learning, represents the DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the vessel probability field. Functioning as a dynamic mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism facilitates a self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves the reconstruction quality. Our model is trained by minimizing the disparity between synthesized projections and real captured DSA images. We further employ two training strategies to improve our reconstruction quality: (1) coarse-to-fine progressive training to achieve better geometry and (2) temporal perturbed rendering loss to enforce temporal consistency. Experimental results have demonstrated superior quality on both 3D vessel reconstruction and 2D view synthesis.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion
Authors:
Zihao Zhu,
Tianli Tao,
Yitian Tao,
Haowen Deng,
Xinyi Cai,
Gaofeng Wu,
Kaidong Wang,
Haifeng Tang,
Lixuan Zhu,
Zhuoyang Gu,
Jiawei Huang,
Dinggang Shen,
Han Zhang
Abstract:
The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit…
▽ More
The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the develo** brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Exterior stability of Minkowski spacetime with borderline decay
Authors:
Dawei Shen
Abstract:
In 1993, the global stability of Minkowski spacetime has been proven in the celebrated work of Christodoulou and Klainerman \cite{Ch-Kl}. In 2003, Klainerman and Nicolò \cite{Kl-Ni} revisited Minkowski stability in the exterior of an outgoing null cone. In \cite{Shen23}, the author extended the results of \cite{Ch-Kl} to minimal decay assumptions. In this paper, we prove that the exterior stabilit…
▽ More
In 1993, the global stability of Minkowski spacetime has been proven in the celebrated work of Christodoulou and Klainerman \cite{Ch-Kl}. In 2003, Klainerman and Nicolò \cite{Kl-Ni} revisited Minkowski stability in the exterior of an outgoing null cone. In \cite{Shen23}, the author extended the results of \cite{Ch-Kl} to minimal decay assumptions. In this paper, we prove that the exterior stability of Minkowski holds with decay which is borderline compared to the minimal decay considered in \cite{Shen23}.
△ Less
Submitted 29 April, 2024;
originally announced May 2024.
-
Parts-per-billion Trace Element Detection in Anhydrous Minerals by Micro-scale Quantitative NMR
Authors:
Yunhua Fu,
Renbiao Tao,
Lifei Zhang,
Shijie Li,
Ya-Nan Yang,
Dehan Shen,
Zilong Wang,
Thomas Meier
Abstract:
Nominally anhydrous minerals (NAMs) composing Earth's and planetary rocks incorporate microscopic amounts of volatiles. However, volatile distribution in NAMs and their effect on physical properties of rocks remain controversial. Thus, constraining trace volatile concentrations in NAMs is tantamount to our understanding of the evolution of rocky planets and planetesimals. Here, we present a novel…
▽ More
Nominally anhydrous minerals (NAMs) composing Earth's and planetary rocks incorporate microscopic amounts of volatiles. However, volatile distribution in NAMs and their effect on physical properties of rocks remain controversial. Thus, constraining trace volatile concentrations in NAMs is tantamount to our understanding of the evolution of rocky planets and planetesimals. Here, we present a novel approach of trace-element quantification using micro-scale Nuclear Magnetic Resonance (NMR) spectroscopy. This approach employs the principle of enhanced mass-sensitivity in NMR microcoils formerly used in \textit{in-situ} high pressure experiments. We were able to demonstrate that this method is in excellent agreement with standard methods across their respective detection capabilities. We show that by simultaneous detection of internal reference nuclei, the quantification sensitivity can be substantially increased, leading to quantifiable trace volatile element amounts of about $50$ wt-ppb measured in a micro-meter sized single anorthitic mineral grain, greatly enhancing detection capabilities of volatiles in geologically important systems.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach
Authors:
Feihu Jiang,
Chuan Qin,
**gshuai Zhang,
Kaichun Yao,
Xi Chen,
Dazhong Shen,
Chen Zhu,
Hengshu Zhu,
Hui Xiong
Abstract:
In the contemporary era of widespread online recruitment, resume understanding has been widely acknowledged as a fundamental and crucial task, which aims to extract structured information from resume documents automatically. Compared to the traditional rule-based approaches, the utilization of recently proposed pre-trained document understanding models can greatly enhance the effectiveness of resu…
▽ More
In the contemporary era of widespread online recruitment, resume understanding has been widely acknowledged as a fundamental and crucial task, which aims to extract structured information from resume documents automatically. Compared to the traditional rule-based approaches, the utilization of recently proposed pre-trained document understanding models can greatly enhance the effectiveness of resume understanding. The present approaches have, however, disregarded the hierarchical relations within the structured information presented in resumes, and have difficulty parsing resumes in an efficient manner. To this end, in this paper, we propose a novel model, namely ERU, to achieve efficient resume understanding. Specifically, we first introduce a layout-aware multi-modal fusion transformer for encoding the segments in the resume with integrated textual, visual, and layout information. Then, we design three self-supervised tasks to pre-train this module via a large number of unlabeled resumes. Next, we fine-tune the model with a multi-granularity sequence labeling task to extract structured information from resumes. Finally, extensive experiments on a real-world dataset clearly demonstrate the effectiveness of ERU.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Authors:
Zhuofan Zong,
Bingqi Ma,
Dazhong Shen,
Guanglu Song,
Hao Shao,
Dongzhi Jiang,
Hongsheng Li,
Yu Liu
Abstract:
As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understandi…
▽ More
As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understanding, e.g., the CLIP vision encoder leads to outstanding results on general image understanding but poor performance on document or chart content. To alleviate the bias of CLIP vision encoder, we first delve into the inherent behavior of different pre-trained vision encoders and then propose the MoVA, a powerful and novel MLLM, adaptively routing and fusing task-specific vision experts with a coarse-to-fine mechanism. In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts. This benefits from the powerful model function understanding ability of the large language model (LLM) equipped with expert-routing low-rank adaptation (LoRA). In the fine-grained stage, we elaborately conduct the mixture-of-vision-expert adapter (MoV-Adapter) to extract and fuse task-specific knowledge from various experts. This coarse-to-fine paradigm effectively leverages representations from experts based on multimodal context and model expertise, further enhancing the generalization ability. We conduct extensive experiments to evaluate the effectiveness of the proposed approach. Without any bells and whistles, MoVA can achieve significant performance gains over current state-of-the-art methods in a wide range of challenging multimodal benchmarks. Codes and models will be available at https://github.com/TempleX98/MoVA.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Authors:
Dazhong Shen,
Guanglu Song,
Zeyue Xue,
Fu-Yun Wang,
Yu Liu
Abstract:
Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space. However, we argue that a global CFG scale results in spatial inconsistency on varying semantic strengths and suboptimal image quality. To address this problem, we present a novel approach, Semantic-aware Classifi…
▽ More
Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space. However, we argue that a global CFG scale results in spatial inconsistency on varying semantic strengths and suboptimal image quality. To address this problem, we present a novel approach, Semantic-aware Classifier-Free Guidance (S-CFG), to customize the guidance degrees for different semantic units in text-to-image diffusion models. Specifically, we first design a training-free semantic segmentation method to partition the latent image into relatively independent semantic regions at each denoising step. In particular, the cross-attention map in the denoising U-net backbone is renormalized for assigning each patch to the corresponding token, while the self-attention map is used to complete the semantic regions. Then, to balance the amplification of diverse semantic units, we adaptively adjust the CFG scales across different semantic regions to rescale the text guidance degrees into a uniform level. Finally, extensive experiments demonstrate the superiority of S-CFG over the original CFG strategy on various text-to-image diffusion models, without requiring any extra training cost. our codes are available at https://github.com/SmilesDZgk/S-CFG.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Authors:
Dongzhi Jiang,
Guanglu Song,
Xiaoshi Wu,
Renrui Zhang,
Dazhong Shen,
Zhuofan Zong,
Yu Liu,
Hongsheng Li
Abstract:
Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffu…
▽ More
Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm. To address the issue, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with an image-to-text concept matching mechanism. We leverage an image captioning model to measure image-to-text alignment and guide the diffusion model to revisit ignored tokens. A novel attribute concentration module is also proposed to address the attribute binding problem. Without any image or human preference data, we use only 20K text prompts to fine-tune SDXL to obtain CoMat-SDXL. Extensive experiments show that CoMat-SDXL significantly outperforms the baseline model SDXL in two text-to-image alignment benchmarks and achieves start-of-the-art performance.
△ Less
Submitted 3 June, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness
Authors:
Yuchen Fei,
Yanmei Luo,
Yan Wang,
Jiaqi Cui,
Yuanyuan Xu,
Jiliu Zhou,
Dinggang Shen
Abstract:
To obtain high-quality positron emission tomography (PET) while minimizing radiation exposure, a range of methods have been designed to reconstruct standard-dose PET (SPET) from corresponding low-dose PET (LPET) images. However, most current methods merely learn the map** between single-dose-level LPET and SPET images, but omit the dose disparity of LPET images in clinical scenarios. In this pap…
▽ More
To obtain high-quality positron emission tomography (PET) while minimizing radiation exposure, a range of methods have been designed to reconstruct standard-dose PET (SPET) from corresponding low-dose PET (LPET) images. However, most current methods merely learn the map** between single-dose-level LPET and SPET images, but omit the dose disparity of LPET images in clinical scenarios. In this paper, to reconstruct high-quality SPET images from multi-dose-level LPET images, we design a novel two-phase multi-dose-level PET reconstruction algorithm with dose level awareness, containing a pre-training phase and a SPET prediction phase. Specifically, the pre-training phase is devised to explore both fine-grained discriminative features and effective semantic representation. The SPET prediction phase adopts a coarse prediction network utilizing pre-learned dose level prior to generate preliminary result, and a refinement network to precisely preserve the details. Experiments on MICCAI 2022 Ultra-low Dose PET Imaging Challenge Dataset have demonstrated the superiority of our method.
△ Less
Submitted 10 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks
Authors:
Luoyu Wang,
Yitian Tao,
Qing Yang,
Yan Liang,
Siwei Liu,
Hongcheng Shi,
Dinggang Shen,
Han Zhang
Abstract:
Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of P…
▽ More
Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of PET/MR hinders its applications, let alone modern AI-based PET/MR fusion models. Our objective is to develop a clinically feasible AI-based disease diagnosis model trained on comprehensive sf-PET/MR data with the power of, during inferencing, allowing single modality input (e.g., PET only) as well as enforcing multimodal-based accuracy. To this end, we propose MX-ARM, a multimodal MiXture-of-experts Alignment and Reconstruction Model. It is modality detachable and exchangeable, allocating different multi-layer perceptrons dynamically ("mixture of experts") through learnable weights to learn respective representations from different modalities. Such design will not sacrifice model performance in uni-modal situation. To fully exploit the inherent complex and nonlinear relation among modalities while producing fine-grained representations for uni-modal inference, we subsequently add a modal alignment module to line up a dominant modality (e.g., PET) with representations of auxiliary modalities (MR). We further adopt multimodal reconstruction to promote the quality of learned features. Experiments on precious multimodal sf-PET/MR data for Mild Cognitive Impairment diagnosis showcase the efficacy of our model toward clinically feasible precision medicine.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
AFDGCF: Adaptive Feature De-correlation Graph Collaborative Filtering for Recommendations
Authors:
Wei Wu,
Chao Wang,
Dazhong Shen,
Chuan Qin,
Liyi Chen,
Hui Xiong
Abstract:
Collaborative filtering methods based on graph neural networks (GNNs) have witnessed significant success in recommender systems (RS), capitalizing on their ability to capture collaborative signals within intricate user-item relationships via message-passing mechanisms. However, these GNN-based RS inadvertently introduce excess linear correlation between user and item embeddings, contradicting the…
▽ More
Collaborative filtering methods based on graph neural networks (GNNs) have witnessed significant success in recommender systems (RS), capitalizing on their ability to capture collaborative signals within intricate user-item relationships via message-passing mechanisms. However, these GNN-based RS inadvertently introduce excess linear correlation between user and item embeddings, contradicting the goal of providing personalized recommendations. While existing research predominantly ascribes this flaw to the over-smoothing problem, this paper underscores the critical, often overlooked role of the over-correlation issue in diminishing the effectiveness of GNN representations and subsequent recommendation performance. Up to now, the over-correlation issue remains unexplored in RS. Meanwhile, how to mitigate the impact of over-correlation while preserving collaborative filtering signals is a significant challenge. To this end, this paper aims to address the aforementioned gap by undertaking a comprehensive study of the over-correlation issue in graph collaborative filtering models. Firstly, we present empirical evidence to demonstrate the widespread prevalence of over-correlation in these models. Subsequently, we dive into a theoretical analysis which establishes a pivotal connection between the over-correlation and over-smoothing issues. Leveraging these insights, we introduce the Adaptive Feature De-correlation Graph Collaborative Filtering (AFDGCF) framework, which dynamically applies correlation penalties to the feature dimensions of the representation matrix, effectively alleviating both over-correlation and over-smoothing issues. The efficacy of the proposed framework is corroborated through extensive experiments conducted with four representative graph collaborative filtering models across four publicly available datasets.
△ Less
Submitted 15 April, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Exploring Fermi Surface Nesting and the Nature of Heavy Quasiparticles in the Spin-Triplet Superconductor Candidate CeRh$_2$As$_2$
Authors:
Bo Chen,
Hao Liu,
Qi-Yi Wu,
Chen Zhang,
Xue-Qing Ye,
Yin-Zou Zhao,
Jiao-Jiao Song,
Xin-Yi Tian,
Ba-Lei Tan,
Zheng-Tai Liu,
Mao Ye,
Zhen-Hua Chen,
Yao-Bo Huang,
Da-Wei Shen,
Ya-Hua Yuan,
Jun He,
Yu-Xia Duan,
Jian-Qiao Meng
Abstract:
In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structur…
▽ More
In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structures reveal primarily localized 4f electrons, with minor itinerant contributions. Additionally, a transition from localized to itinerant behavior and significant c-f hybridization anisotropy underscore the role of f-electrons in sha** electronic properties. These findings deepen our understanding of CeRh$_2$As$_2$'s unconventional superconductivity and magnetism. Further exploration promises advances in superconductivity research.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Authors:
Fu-Yun Wang,
Xiaoshi Wu,
Zhaoyang Huang,
Xiaoyu Shi,
Dazhong Shen,
Guanglu Song,
Yu Liu,
Hongsheng Li
Abstract:
Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-spec…
▽ More
Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting. MOTIA comprises two main phases: input-specific adaptation and pattern-aware outpainting. The input-specific adaptation phase involves conducting efficient and effective pseudo outpainting learning on the single-shot source video. This process encourages the model to identify and learn patterns within the source video, as well as bridging the gap between standard generative processes and outpainting. The subsequent phase, pattern-aware outpainting, is dedicated to the generalization of these learned patterns to generate outpainting outcomes. Additional strategies including spatial-aware insertion and noise travel are proposed to better leverage the diffusion model's generative prior and the acquired video patterns from source videos. Extensive evaluations underscore MOTIA's superiority, outperforming existing state-of-the-art methods in widely recognized benchmarks. Notably, these advancements are achieved without necessitating extensive, task-specific tuning.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Self-learning Canonical Space for Multi-view 3D Human Pose Estimation
Authors:
Xiaoben Li,
Mancheng Meng,
Ziyan Wu,
Terrence Chen,
Fan Yang,
Dinggang Shen
Abstract:
Multi-view 3D human pose estimation is naturally superior to single view one, benefiting from more comprehensive information provided by images of multiple views. The information includes camera poses, 2D/3D human poses, and 3D geometry. However, the accurate annotation of these information is hard to obtain, making it challenging to predict accurate 3D human pose from multi-view images. To deal w…
▽ More
Multi-view 3D human pose estimation is naturally superior to single view one, benefiting from more comprehensive information provided by images of multiple views. The information includes camera poses, 2D/3D human poses, and 3D geometry. However, the accurate annotation of these information is hard to obtain, making it challenging to predict accurate 3D human pose from multi-view images. To deal with this issue, we propose a fully self-supervised framework, named cascaded multi-view aggregating network (CMANet), to construct a canonical parameter space to holistically integrate and exploit multi-view information. In our framework, the multi-view information is grouped into two categories: 1) intra-view information , 2) inter-view information. Accordingly, CMANet consists of two components: intra-view module (IRV) and inter-view module (IEV). IRV is used for extracting initial camera pose and 3D human pose of each view; IEV is to fuse complementary pose information and cross-view 3D geometry for a final 3D human pose. To facilitate the aggregation of the intra- and inter-view, we define a canonical parameter space, depicted by per-view camera pose and human pose and shape parameters ($θ$ and $β$) of SMPL model, and propose a two-stage learning procedure. At first stage, IRV learns to estimate camera pose and view-dependent 3D human pose supervised by confident output of an off-the-shelf 2D keypoint detector. At second stage, IRV is frozen and IEV further refines the camera pose and optimizes the 3D human pose by implicitly encoding the cross-view complement and 3D geometry constraint, achieved by jointly fitting predicted multi-view 2D keypoints. The proposed framework, modules, and learning strategy are demonstrated to be effective by comprehensive experiments and CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.
△ Less
Submitted 29 March, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Human Mesh Recovery from Arbitrary Multi-view Images
Authors:
Xiaoben Li,
Mancheng Meng,
Ziyan Wu,
Terrence Chen,
Fan Yang,
Dinggang Shen
Abstract:
Human mesh recovery from arbitrary multi-view images involves two characteristics: the arbitrary camera poses and arbitrary number of camera views. Because of the variability, designing a unified framework to tackle this task is challenging. The challenges can be summarized as the dilemma of being able to simultaneously estimate arbitrary camera poses and recover human mesh from arbitrary multi-vi…
▽ More
Human mesh recovery from arbitrary multi-view images involves two characteristics: the arbitrary camera poses and arbitrary number of camera views. Because of the variability, designing a unified framework to tackle this task is challenging. The challenges can be summarized as the dilemma of being able to simultaneously estimate arbitrary camera poses and recover human mesh from arbitrary multi-view images while maintaining flexibility. To solve this dilemma, we propose a divide and conquer framework for Unified Human Mesh Recovery (U-HMR) from arbitrary multi-view images. In particular, U-HMR consists of a decoupled structure and two main components: camera and body decoupling (CBD), camera pose estimation (CPE), and arbitrary view fusion (AVF). As camera poses and human body mesh are independent of each other, CBD splits the estimation of them into two sub-tasks for two individual sub-networks (ie, CPE and AVF) to handle respectively, thus the two sub-tasks are disentangled. In CPE, since each camera pose is unrelated to the others, we adopt a shared MLP to process all views in a parallel way. In AVF, in order to fuse multi-view information and make the fusion operation independent of the number of views, we introduce a transformer decoder with a SMPL parameters query token to extract cross-view features for mesh recovery. To demonstrate the efficacy and flexibility of the proposed framework and effect of each component, we conduct extensive experiments on three public datasets: Human3.6M, MPI-INF-3DHP, and TotalCapture.
△ Less
Submitted 17 June, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning
Authors:
Chong Ma,
Hanqi Jiang,
Wenting Chen,
Yiwei Li,
Zihao Wu,
Xiaowei Yu,
Zhengliang Liu,
Lei Guo,
Dajiang Zhu,
Tuo Zhang,
Dinggang Shen,
Tianming Liu,
Xiang Li
Abstract:
In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaz…
▽ More
In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.
△ Less
Submitted 13 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Angular Momentum Memory Effect
Authors:
Xinliang An,
Taoran He,
Dawei Shen
Abstract:
Utilizing recent mathematical advances in proving stability of Minkowski spacetime with minimal decay rates and nonlinear stability of Kerr black holes with small angular momentum, we investigate the detailed asymptotic behaviors of gravitational waves generated in these spacetimes. Here we report and propose a new angular momentum memory effect along future null infinity. This accompanies Christo…
▽ More
Utilizing recent mathematical advances in proving stability of Minkowski spacetime with minimal decay rates and nonlinear stability of Kerr black holes with small angular momentum, we investigate the detailed asymptotic behaviors of gravitational waves generated in these spacetimes. Here we report and propose a new angular momentum memory effect along future null infinity. This accompanies Christodoulou's nonlinear displacement memory effect and the spin memory effect. The connections and differences to these effects are also addressed.
△ Less
Submitted 10 April, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Electronic Structure of Superconducting Infinite-Layer Lanthanum Nickelates
Authors:
Wenjie Sun,
Zhicheng Jiang,
Chengliang Xia,
Bo Hao,
Yueying Li,
Shengjun Yan,
Maosen Wang,
Hongquan Liu,
Jianyang Ding,
Jiayu Liu,
Zhengtai Liu,
Jishan Liu,
Hanghui Chen,
Dawei Shen,
Yuefeng Nie
Abstract:
Revealing the momentum-resolved electronic structure of infinite-layer nickelates is essential for understanding this new class of unconventional superconductors, but has been hindered by the formidable challenges in improving the sample quality. In this work, we report for the first time the angle-resolved photoemission spectroscopy of superconducting La$_{0.8}$Sr$_{0.2}$NiO$_{2}$ films prepared…
▽ More
Revealing the momentum-resolved electronic structure of infinite-layer nickelates is essential for understanding this new class of unconventional superconductors, but has been hindered by the formidable challenges in improving the sample quality. In this work, we report for the first time the angle-resolved photoemission spectroscopy of superconducting La$_{0.8}$Sr$_{0.2}$NiO$_{2}$ films prepared by molecular beam epitaxy and ${\mathrm{\textit{in situ}}}$ atomic-hydrogen reduction. The measured Fermi topology closely matches theoretical calculations, showing a large Ni-$d_{x^2-y^2}$ derived Fermi sheet that evolves from hole-like to electron-like along $k_{z}$, and a three-dimensional (3D) electron pocket centered at Brillouin zone corner. The Ni-$d_{x^2-y^2}$ derived bands show a mass enhancement ($m^*/m_{\rm{DFT}}$) of 2-3,while the 3D electron band shows negligible band renormalization. Moreover, the Ni-$d_{x^2-y^2}$ derived states also display a band dispersion anomaly at higher binding energy, reminiscent of the waterfall feature and kinks observed in cuprates.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
DGR: A General Graph Desmoothing Framework for Recommendation via Global and Local Perspectives
Authors:
Leilei Ding,
Dazhong Shen,
Chao Wang,
Tianfu Wang,
Le Zhang,
Yanyong Zhang
Abstract:
Graph Convolutional Networks (GCNs) have become pivotal in recommendation systems for learning user and item embeddings by leveraging the user-item interaction graph's node information and topology. However, these models often face the famous over-smoothing issue, leading to indistinct user and item embeddings and reduced personalization. Traditional desmoothing methods in GCN-based systems are mo…
▽ More
Graph Convolutional Networks (GCNs) have become pivotal in recommendation systems for learning user and item embeddings by leveraging the user-item interaction graph's node information and topology. However, these models often face the famous over-smoothing issue, leading to indistinct user and item embeddings and reduced personalization. Traditional desmoothing methods in GCN-based systems are model-specific, lacking a universal solution. This paper introduces a novel, model-agnostic approach named \textbf{D}esmoothing Framework for \textbf{G}CN-based \textbf{R}ecommendation Systems (\textbf{DGR}). It effectively addresses over-smoothing on general GCN-based recommendation models by considering both global and local perspectives. Specifically, we first introduce vector perturbations during each message passing layer to penalize the tendency of node embeddings approximating overly to be similar with the guidance of the global topological structure. Meanwhile, we further develop a tailored-design loss term for the readout embeddings to preserve the local collaborative relations between users and their neighboring items. In particular, items that exhibit a high correlation with neighboring items are also incorporated to enhance the local topological information. To validate our approach, we conduct extensive experiments on 5 benchmark datasets based on 5 well-known GCN-based recommendation models, demonstrating the effectiveness and generalization of our proposed framework.
△ Less
Submitted 22 April, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion
Authors:
Lianghu Guo,
Tianli Tao,
Xinyi Cai,
Zihao Zhu,
Jiawei Huang,
Lixuan Zhu,
Zhuoyang Gu,
Haifeng Tang,
Rui Zhou,
Siyan Han,
Yan Liang,
Qing Yang,
Dinggang Shen,
Han Zhang
Abstract:
Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin…
▽ More
Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, making longitudinal infant brain atlas construction and developmental trajectory delineation quite challenging. Thanks to the development of an AI-based generative model, neuroimage completion has become a powerful technique to retain as much available data as possible. However, current image completion methods usually suffer from inconsistency within each individual subject in the time dimension, compromising the overall quality. To solve this problem, our paper proposed a two-stage cascaded diffusion model, Cas-DiffCom, for dense and longitudinal 3D infant brain MRI completion and super-resolution. We applied our proposed method to the Baby Connectome Project (BCP) dataset. The experiment results validate that Cas-DiffCom achieves both individual consistency and high fidelity in longitudinal infant brain image completion. We further applied the generated infant brain images to two downstream tasks, brain tissue segmentation and developmental trajectory delineation, to declare its task-oriented potential in the neuroscience field.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Positive temperature-dependent thermal conductivity induced by wavelike phonons in complex Ag-based argyrodites
Authors:
Niuchang Ouyang,
Dongyi Shen,
Chen Wang,
Ruihuan Cheng,
Qi Wang,
Yue Chen
Abstract:
The phonon transport mechanisms and the anomalous temperature-dependent lattice thermal conductivities (kL) in Ag-based argyrodites have not been fully understood. Herein, we systematically study the phonon thermal transport of five Ag-based crystalline argyrodites Ag7PS6, Ag7AsS6, Ag8SnS6, Ag8GeS6 and Ag9GaS6 utilizing perturbation theory and the unified theory thermal transport model. Our result…
▽ More
The phonon transport mechanisms and the anomalous temperature-dependent lattice thermal conductivities (kL) in Ag-based argyrodites have not been fully understood. Herein, we systematically study the phonon thermal transport of five Ag-based crystalline argyrodites Ag7PS6, Ag7AsS6, Ag8SnS6, Ag8GeS6 and Ag9GaS6 utilizing perturbation theory and the unified theory thermal transport model. Our results show that, as the complexity of the unit cell increases, the proportion of the population terms falls while the coherence contributions become more significant, leading to the relatively weak temperature-dependent kL of Ag7PS6 and Ag7AsS6, while the more complex crystalline argyrodites, Ag8SnS6, Ag8GeS6 and Ag9GaS6, exhibiting a glass-like behavior in their temperature dependence of kL. We attribute the positive temperature-dependent and ultralow kL of Ag8SnS6, Ag8GeS6 and Ag9GaS6 to the dominance of wavelike phonons and the strong phonon broadening. Furthermore, using laser flash measurements and the homogeneous non-equilibrium molecular dynamics simulations based on accurate machine learning neuroevolution potentials, we provide further evidence for the glass-like temperature-dependent kL of Ag8SnS6 and Ag8GeS6.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Transferring Ultrahigh-Field Representations for Intensity-Guided Brain Segmentation of Low-Field Magnetic Resonance Imaging
Authors:
Kwanseok Oh,
Jieun Lee,
Da-Woon Heo,
Dinggang Shen,
Heung-Il Suk
Abstract:
Ultrahigh-field (UHF) magnetic resonance imaging (MRI), i.e., 7T MRI, provides superior anatomical details of internal brain structures owing to its enhanced signal-to-noise ratio and susceptibility-induced contrast. However, the widespread use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI. This study proposes a deep-learning framework that systematic…
▽ More
Ultrahigh-field (UHF) magnetic resonance imaging (MRI), i.e., 7T MRI, provides superior anatomical details of internal brain structures owing to its enhanced signal-to-noise ratio and susceptibility-induced contrast. However, the widespread use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI. This study proposes a deep-learning framework that systematically fuses the input LF magnetic resonance feature representations with the inferred 7T-like feature representations for brain image segmentation tasks in a 7T-absent environment. Specifically, our adaptive fusion module aggregates 7T-like features derived from the LF image by a pre-trained network and then refines them to be effectively assimilable UHF guidance into LF image features. Using intensity-guided features obtained from such aggregation and assimilation, segmentation models can recognize subtle structural representations that are usually difficult to recognize when relying only on LF features. Beyond such advantages, this strategy can seamlessly be utilized by modulating the contrast of LF features in alignment with UHF guidance, even when employing arbitrary segmentation models. Exhaustive experiments demonstrated that the proposed method significantly outperformed all baseline models on both brain tissue and whole-brain segmentation tasks; further, it exhibited remarkable adaptability and scalability by successfully integrating diverse segmentation models and tasks. These improvements were not only quantifiable but also visible in the superlative visual quality of segmentation masks.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Auto from cross: CMB lensing power spectrum without noise bias
Authors:
Delon Shen,
Emmanuel Schaan,
Simone Ferraro
Abstract:
Upcoming surveys will measure the cosmic microwave background (CMB) weak lensing power spectrum in exquisite detail, allowing for strong constraints on the sum of neutrino masses among other cosmological parameters. Standard CMB lensing power spectrum estimators aim to extract the connected non-Gaussian trispectrum of CMB temperature maps. However, they are generically dominated by a large Gaussia…
▽ More
Upcoming surveys will measure the cosmic microwave background (CMB) weak lensing power spectrum in exquisite detail, allowing for strong constraints on the sum of neutrino masses among other cosmological parameters. Standard CMB lensing power spectrum estimators aim to extract the connected non-Gaussian trispectrum of CMB temperature maps. However, they are generically dominated by a large Gaussian noise bias which thus needs to be subtracted at high accuracy. This is currently done with realistic map simulations of the CMB and noise, whose finite accuracy currently limits our ability to recover the CMB lensing on small-scale. In this paper, we propose a novel estimator which instead avoids this large Gaussian bias. This estimator relies only on the data and avoids the need for bias subtraction with simulations. Thus our bias avoidance method is (1) insensitive to misestimates in simulated CMB and noise models and (2) avoids the large computational cost of standard simulation-based methods like "realization-dependent $N^{(0)}$" (${\rm RDN}^{(0)}$). We show that our estimator is as robust as standard methods in the presence realistic inhomogeneous noise (e.g. from scan strategy) and masking. Moreover, our method can be combined with split-based methods, making it completely insensitive to mode coupling from inhomogeneous atmospheric and detector noise. We derive the corresponding expressions for our estimator when estimating lensing from CMB temperature and polarization. Although in this paper we specifically consider CMB weak lensing power spectrum estimation, we illuminate the relation between our new estimator, ${\rm RDN}^{(0)}$ subtraction, and general optimal trispectrum estimation. Through this discussion we conclude that our estimator is applicable to analogous problems in other fields which rely on estimating connected trispectra/four-point functions like large-scale structure.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Unveiling the charge density wave mechanism in vanadium-based Bi-layered kagome metals
Authors:
Yi-Chen Yang,
Soohyun Cho,
Tong-Rui Li,
Xiang-Qi Liu,
Zheng-Tai Liu,
Zhi-Cheng Jiang,
Jian-Yang Ding,
Wei Xia,
Zi-Cheng Tao,
Jia-Yu Liu,
Wen-Chuan **g,
Yu Huang,
Yu-Ming Shi,
Soonsang Huh,
Takeshi Kondo,
Zhe Sun,
Ji-Shan Liu,
Mao Ye,
Yi-Lin Wang,
Yan-Feng Guo,
Da-Wei Shen
Abstract:
The charge density wave (CDW), as a hallmark of vanadium-based kagome superconductor AV3Sb5 (A = K, Rb, Cs), has attracted intensive attention. However, the fundamental controversy regarding the underlying mechanism of CDW therein persists. Recently, the vanadium-based bi-layered kagome metal ScV6Sn6, reported to exhibit a long-range charge order below 94 K, has emerged as a promising candidate to…
▽ More
The charge density wave (CDW), as a hallmark of vanadium-based kagome superconductor AV3Sb5 (A = K, Rb, Cs), has attracted intensive attention. However, the fundamental controversy regarding the underlying mechanism of CDW therein persists. Recently, the vanadium-based bi-layered kagome metal ScV6Sn6, reported to exhibit a long-range charge order below 94 K, has emerged as a promising candidate to further clarify this core issue. Here, employing micro-focusing angle-resolved photoemission spectroscopy (μ-ARPES) and first-principles calculations, we systematically studied the unique CDW order in vanadium-based bi-layered kagome metals by comparing ScV6Sn6 with its isostructural counterpart YV6Sn6, which lacks a CDW ground state. Combining ARPES data and the corresponding joint density of states (DOS), we suggest that the VHS nesting mechanism might be invalid in these materials. Besides, in ScV6Sn6, we identified multiple hybridization energy gaps resulting from CDW-induced band folding, along with an anomalous band dispersion, implying a potential electron-phonon coupling driven mechanism underlying the formation of the CDW order. Our finding not only comprehensively maps the electronic structure of V-based bi-layer kagome metals but also provide constructive experimental evidence for the unique origin of CDW in this system.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation
Authors:
Zihan Li,
Yuan Zheng,
Dandan Shan,
Shuzhou Yang,
Qingde Li,
Beizhan Wang,
Yuanting Zhang,
Qingqi Hong,
Dinggang Shen
Abstract:
Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally can only capture small-range feature dependency for the convolutional layer with the local receptive field, which makes it difficult to learn global shape information from the limited information provided by scribble annot…
▽ More
Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally can only capture small-range feature dependency for the convolutional layer with the local receptive field, which makes it difficult to learn global shape information from the limited information provided by scribble annotations. To address this issue, this paper proposes a new CNN-Transformer hybrid solution for scribble-supervised medical image segmentation called ScribFormer. The proposed ScribFormer model has a triple-branch structure, i.e., the hybrid of a CNN branch, a Transformer branch, and an attention-guided class activation map (ACAM) branch. Specifically, the CNN branch collaborates with the Transformer branch to fuse the local features learned from CNN with the global representations obtained from Transformer, which can effectively overcome limitations of existing scribble-supervised segmentation methods. Furthermore, the ACAM branch assists in unifying the shallow convolution features and the deep convolution features to improve model's performance further. Extensive experiments on two public datasets and one private dataset show that our ScribFormer has superior performance over the state-of-the-art scribble-supervised segmentation methods, and achieves even better results than the fully-supervised segmentation methods. The code is released at https://github.com/HUANGLIZI/ScribFormer.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction
Authors:
Jiaqi Cui,
Yan Wang,
Lu Wen,
Pinxian Zeng,
Xi Wu,
Jiliu Zhou,
Dinggang Shen
Abstract:
To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, le…
▽ More
To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, leading to compromised reconstruction. In this paper, we propose a 3D point-based context clusters GAN, namely PCC-GAN, to reconstruct high-quality SPET images from LPET. Specifically, inspired by the geometric representation power of points, we resort to a point-based representation to enhance the explicit expression of the image structure, thus facilitating the reconstruction with finer details. Moreover, a context clustering strategy is applied to explore the contextual relationships among points, which mitigates the ambiguities of small structures in the reconstructed images. Experiments on both clinical and phantom datasets demonstrate that our PCC-GAN outperforms the state-of-the-art reconstruction methods qualitatively and quantitatively. Code is available at https://github.com/gluucose/PCCGAN.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Observation of possible excitonic charge density waves and metal-insulator transitions in atomically thin semimetals
Authors:
Qiang Gao,
Yang-hao Chan,
Pengfei Jiao,
Haiyang Chen,
Shuaishuai Yin,
Kanjanaporn Tangprapha,
Yichen Yang,
Xiaolong Li,
Zhengtai Liu,
Dawei Shen,
Shengwei Jiang,
Peng Chen
Abstract:
Charge density wave (CDW) is a collective quantum phenomenon with a charge modulation in solids1-2. Condensation of electron and hole pairs with finite momentum will lead to such an ordered state3-7. However, lattice symmetry breaking manifested as the softening of phonon modes can occur simultaneously, which makes it difficult to disentangle the origin of the transition8-14. Here, we report a con…
▽ More
Charge density wave (CDW) is a collective quantum phenomenon with a charge modulation in solids1-2. Condensation of electron and hole pairs with finite momentum will lead to such an ordered state3-7. However, lattice symmetry breaking manifested as the softening of phonon modes can occur simultaneously, which makes it difficult to disentangle the origin of the transition8-14. Here, we report a condensed phase in low dimensional HfTe2, whereas angle-resolved photoemission spectroscopy (ARPES) measurements show a metal-insulator transition by lowering the temperature in single triatomic layer (TL) HfTe2. A full gap opening, renormalization of the bands, and emergence of replica bands at the M point are observed in the low temperatures, indicating formation of a CDW in the ground state.Raman spectroscopy shows no sign of lattice distortion within the detection limit. The results are corroborated by first-principles calculations, demonstrating the electronic origin of the CDW. By adding more layers, the phase transition is suppressed and completely destroyed at 3 TL because of the increased screening around the Fermi surface. Interestingly, a small amount of electron do** in 1 TL film during the growth significantly raises the transition temperature (TC), which is attributed to a reduced screening effect and a more balanced electron and hole carrier density. Our results indicate a CDW formation mechanism consistent with the excitonic insulator phase in low dimensional HfTe2 and open up opportunity for realization of novel quantum states based on exciton condensation.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Electronic and magnetic excitations in La$_3$Ni$_2$O$_7$
Authors:
Xiaoyang Chen,
Jaewon Choi,
Zhicheng Jiang,
Jiong Mei,
Kun Jiang,
Jie Li,
Stefano Agrestini,
Mirian Garcia-Fernandez,
Xing Huang,
Hualei Sun,
Dawei Shen,
Meng Wang,
Jiang** Hu,
Yi Lu,
Ke-** Zhou,
Donglai Feng
Abstract:
The striking discovery of high-temperature superconductivity (HTSC) of 80 K in a bilayer nickelate La$_3$Ni$_2$O$_7$ under a moderately high pressure of about 14 GPa ignited a new wave of studying HTSC in nickelates. The properties of the parental phase at ambient pressure may contain key information on basic interactions therein and bosons that may mediate pairing giving birth to superconductivit…
▽ More
The striking discovery of high-temperature superconductivity (HTSC) of 80 K in a bilayer nickelate La$_3$Ni$_2$O$_7$ under a moderately high pressure of about 14 GPa ignited a new wave of studying HTSC in nickelates. The properties of the parental phase at ambient pressure may contain key information on basic interactions therein and bosons that may mediate pairing giving birth to superconductivity. Moreover, the bilayer structure of La$_3$Ni$_2$O$_7$ may suggest a distinct minimal model in comparison to cuprate superconductors. Here using X-ray absorption spectroscopy and resonant inelastic X-ray scattering, we studied La$_3$Ni$_2$O$_7$ at ambient pressure, and found that Ni 3$d_{x^2-y^2}$, Ni 3$d_{z^2}$, and ligand oxygen 2$p$ orbitals dominate the low-energy physics with a small charge-transfer energy. Remarkably, well-defined optical-like magnetic excitations were found to soften into a quasi-static spin-density-wave ordering, evidencing the strong electronic correlations and rich magnetic properties. Based on a Heisenberg spin model, we found that the inter-layer effective magnetic superexchange interaction is much larger than the intra-layer ones, and proposed two viable magnetic structures. Our results set the foundation for further exploration of La$_3$Ni$_2$O$_7$ superconductor.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
ReliCD: A Reliable Cognitive Diagnosis Framework with Confidence Awareness
Authors:
Yunfei Zhang,
Chuan Qin,
Dazhong Shen,
Hai** Ma,
Le Zhang,
Xingyi Zhang,
Hengshu Zhu
Abstract:
During the past few decades, cognitive diagnostics modeling has attracted increasing attention in computational education communities, which is capable of quantifying the learning status and knowledge mastery levels of students. Indeed, the recent advances in neural networks have greatly enhanced the performance of traditional cognitive diagnosis models through learning the deep representations of…
▽ More
During the past few decades, cognitive diagnostics modeling has attracted increasing attention in computational education communities, which is capable of quantifying the learning status and knowledge mastery levels of students. Indeed, the recent advances in neural networks have greatly enhanced the performance of traditional cognitive diagnosis models through learning the deep representations of students and exercises. Nevertheless, existing approaches often suffer from the issue of overconfidence in predicting students' mastery levels, which is primarily caused by the unavoidable noise and sparsity in realistic student-exercise interaction data, severely hindering the educational application of diagnostic feedback. To address this, in this paper, we propose a novel Reliable Cognitive Diagnosis(ReliCD) framework, which can quantify the confidence of the diagnosis feedback and is flexible for different cognitive diagnostic functions. Specifically, we first propose a Bayesian method to explicitly estimate the state uncertainty of different knowledge concepts for students, which enables the confidence quantification of diagnostic feedback. In particular, to account for potential differences, we suggest modeling individual prior distributions for the latent variables of different ability concepts using a pre-trained model. Additionally, we introduce a logical hypothesis for ranking confidence levels. Along this line, we design a novel calibration loss to optimize the confidence parameters by modeling the process of student performance prediction. Finally, extensive experiments on four real-world datasets clearly demonstrate the effectiveness of our ReliCD framework.
△ Less
Submitted 29 December, 2023;
originally announced January 2024.
-
Predicting Infant Brain Connectivity with Federated Multi-Trajectory GNNs using Scarce Data
Authors:
Michalis Pistos,
Gang Li,
Weili Lin,
Dinggang Shen,
Islem Rekik
Abstract:
The understanding of the convoluted evolution of infant brain networks during the first postnatal year is pivotal for identifying the dynamics of early brain connectivity development. Existing deep learning solutions suffer from three major limitations. First, they cannot generalize to multi-trajectory prediction tasks, where each graph trajectory corresponds to a particular imaging modality or co…
▽ More
The understanding of the convoluted evolution of infant brain networks during the first postnatal year is pivotal for identifying the dynamics of early brain connectivity development. Existing deep learning solutions suffer from three major limitations. First, they cannot generalize to multi-trajectory prediction tasks, where each graph trajectory corresponds to a particular imaging modality or connectivity type (e.g., T1-w MRI). Second, existing models require extensive training datasets to achieve satisfactory performance which are often challenging to obtain. Third, they do not efficiently utilize incomplete time series data. To address these limitations, we introduce FedGmTE-Net++, a federated graph-based multi-trajectory evolution network. Using the power of federation, we aggregate local learnings among diverse hospitals with limited datasets. As a result, we enhance the performance of each hospital's local generative model, while preserving data privacy. The three key innovations of FedGmTE-Net++ are: (i) presenting the first federated learning framework specifically designed for brain multi-trajectory evolution prediction in a data-scarce environment, (ii) incorporating an auxiliary regularizer in the local objective function to exploit all the longitudinal brain connectivity within the evolution trajectory and maximize data utilization, (iii) introducing a two-step imputation process, comprising a preliminary KNN-based precompletion followed by an imputation refinement step that employs regressors to improve similarity scores and refine imputations. Our comprehensive experimental results showed the outperformance of FedGmTE-Net++ in brain multi-trajectory prediction from a single baseline graph in comparison with benchmark methods.
△ Less
Submitted 8 January, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
CLIP in Medical Imaging: A Comprehensive Survey
Authors:
Zihao Zhao,
Yuxiao Liu,
Han Wu,
Yonghao Li,
Sheng Wang,
Lin Teng,
Disheng Liu,
Zhiming Cui,
Qian Wang,
Dinggang Shen
Abstract:
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks, attributable to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving both as a pre-training paradigm for alig…
▽ More
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks, attributable to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving both as a pre-training paradigm for aligning medical vision and language, and as a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP paradigm within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this study, We (1) start with a brief introduction to the fundamentals of CLIP methodology. (2) Then, we investigate the adaptation of CLIP pre-training in the medical domain, focusing on how to optimize CLIP given characteristics of medical images and reports. (3) Furthermore, we explore the practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks. (4) Finally, we discuss existing limitations of CLIP in the context of medical imaging and propose forward-looking directions to address the demands of medical imaging domain. We expect that this comprehensive survey will provide researchers in the field of medical image analysis with a holistic understanding of the CLIP paradigm and its potential implications. The project page can be found on https://github.com/zhaozh10/Awesome-CLIP-in-Medical-Imaging.
△ Less
Submitted 21 May, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Giant domain wall anomalous Hall effect in an antiferromagnet
Authors:
Wei Xia,
Bo Bai,
Xuejiao Chen,
Yichen Yang,
Yang Zhang,
Jian Yuan,
Qiang Li,
Kunya Yang,
Xiangqi Liu,
Yang Shi,
Haiyang Ma,
Huali Yang,
Mingquan He,
Lei Li,
Chuanying Xi,
Li Pi,
Xiaodong Lv,
Xia Wang,
Xuerong Liu,
Shiyan Li,
Xiaodong Zhou,
Jianpeng Liu,
Yulin Chen,
Jian Shen,
Dawei Shen
, et al. (3 additional authors not shown)
Abstract:
The Hall effect plays a crucial role in establishment of band theory of solids and discovery of emergent new phases of interacting electrons such as the topological phases of matter. Generally, the dissipationless Hall effect requires time-reversal symmetry breaking (TRSB), where TRSB induced by external magnetic field results in ordinary Hall effect, while TRSB caused by spontaneous magnetization…
▽ More
The Hall effect plays a crucial role in establishment of band theory of solids and discovery of emergent new phases of interacting electrons such as the topological phases of matter. Generally, the dissipationless Hall effect requires time-reversal symmetry breaking (TRSB), where TRSB induced by external magnetic field results in ordinary Hall effect, while TRSB caused by spontaneous magnetization gives rise to anomalous Hall effect (AHE) which scales with the net magnetization. The AHE is therefore not expected in antiferromagnets with vanishing small magnetization. However, large AHE was recently observed in certain antiferromagnets with noncolinear spin structure and nonvanishing Berry curvature, thus opening a new area for exploration of large AHE in antiferromagnets. Here, we report another origin of AHE in a layered antiferromagnet, namely the domain wall (DW) skew scattering with Weyl points near the Fermi level, in experiments for the first time. Interestingly, the DWs form a unique periodic stripe structure with controllable periodicity by external magnetic field, which decreases nearly monotonically from 975 nm at 0 T to 232 nm at 4 T. Electrons incident on DW with topological bound states experience strong asymmetric scattering, leading to giant extrinsic AHE, with the DW Hall conductivity (DWHC) at 2 K and 1.2 T even reaching a record value of about 1.51*104 S cm-1 among bulk systems, which is two orders of magnitude larger than the intrinsic anomalous Hall conductivity. The observation of giant DWHC and controllable stripe DW structure in an antiferromagnet not only sets a new paradigm for exploration of large extrinsic anomalous Hall effect, but also provides potential applications in spintronic devices.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis
Authors:
Zihao Zhao,
Sheng Wang,
Qian Wang,
Dinggang Shen
Abstract:
Obtaining large-scale radiology reports can be difficult for medical images due to various reasons, limiting the effectiveness of contrastive pre-training in the medical image domain and underscoring the need for alternative methods. In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without disturbing radiologist's rou…
▽ More
Obtaining large-scale radiology reports can be difficult for medical images due to various reasons, limiting the effectiveness of contrastive pre-training in the medical image domain and underscoring the need for alternative methods. In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without disturbing radiologist's routine diagnosis process. By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning. When a radiologist has similar gazes for two medical images, it may indicate semantic similarity for diagnosis, and these images should be treated as positive pairs when pre-training a computer-assisted diagnosis (CAD) network through contrastive learning. Accordingly, we introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks. McGIP uses radiologist's gaze to guide contrastive pre-training. We evaluate our method using two representative types of medical images and two common types of gaze data. The experimental results demonstrate the practicality of McGIP, indicating its high potential for various clinical scenarios and applications.
△ Less
Submitted 12 December, 2023; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Holistic Evaluation of GPT-4V for Biomedical Imaging
Authors:
Zhengliang Liu,
Hanqi Jiang,
Tianyang Zhong,
Zihao Wu,
Chong Ma,
Yiwei Li,
Xiaowei Yu,
Yutong Zhang,
Yi Pan,
Peng Shu,
Yanjun Lyu,
Lu Zhang,
Junjie Yao,
Peixin Dong,
Chao Cao,
Zhenxiang Xiao,
Jiaqi Wang,
Huan Zhao,
Shaochen Xu,
Yaonai Wei,
**gyuan Chen,
Haixing Dai,
Peilong Wang,
Hao He,
Zewei Wang
, et al. (25 additional authors not shown)
Abstract:
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor…
▽ More
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.
△ Less
Submitted 10 November, 2023;
originally announced December 2023.
-
A Mixed Integer Quadratic Program for Valuing the Impact of Price and Forecast Uncertainty for Wind Generators
Authors:
Daniel Shen,
Marija Ilic
Abstract:
Owners of wind power plants are exposed to financial risk in wholesale electricity markets due to the uncertain nature of wind forecasts and price volatility. In the event of a wind shortfall, the plant may have to repurchase power at a higher price in the real-time market. However, reducing the power offered in the day-ahead market may also be interpreted by regulators as physical withholding. We…
▽ More
Owners of wind power plants are exposed to financial risk in wholesale electricity markets due to the uncertain nature of wind forecasts and price volatility. In the event of a wind shortfall, the plant may have to repurchase power at a higher price in the real-time market. However, reducing the power offered in the day-ahead market may also be interpreted by regulators as physical withholding. We formulate and solve a mixed-integer quadratic program (MIQP) that prices the uncertain portion of a wind generator's forecast to hedge against uncertainties and which addresses concerns around withholding. We exploit the structure of the MIQP inputs to introduce additional constraints to improve computation time. Additionally, we provide a qualitative approach for generators and regulators to interpret the results of the MIQP. Finally, we simulate a real-world application for a wind farm in New York using past wind forecasts and NYISO prices.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Laser frequency stabilization and photoacoustic detection based on the tapered fiber coupled crystalline resonator
Authors:
Yaohui Xu,
Xiaolan Liu,
Wujun Li,
Haotian Wang,
Jun Guo,
Jie Ma,
Jianing Zhang,
Deyuan Shen
Abstract:
We demonstrate laser frequency stabilization using a high-Q MgF2 crystalline whispering gallery mode resonator coupled with a tapered fiber. We discovered that the tapered fiber, acting as a microcantilever, exhibits mechanical resonance characteristics that is capable of transmitting acoustic perturbations to the frequency locking loop. Both experimental and theoretical investigations into the in…
▽ More
We demonstrate laser frequency stabilization using a high-Q MgF2 crystalline whispering gallery mode resonator coupled with a tapered fiber. We discovered that the tapered fiber, acting as a microcantilever, exhibits mechanical resonance characteristics that is capable of transmitting acoustic perturbations to the frequency locking loop. Both experimental and theoretical investigations into the influence of external acoustic waves on the coupling system were conducted. After acoustic isolation, the locked laser exhibits a minimum frequency noise of 0.4Hz2/Hz at 7kHz and an integral linewidth of 68Hz (0.1s integration time). Benefiting from the ultralow frequency noise of the stabilized laser, it achieves a minimum noise equivalent acoustic signal level of 4.76*10-4 Pa/Hz1/2. Our results not only facilitate the realization of ultralow noise lasers but also serves as a novel and sensitive photoacoustic detector.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
MARformer: An Efficient Metal Artifact Reduction Transformer for Dental CBCT Images
Authors:
Yuxuan Shi,
Jun Xu,
Dinggang Shen
Abstract:
Cone Beam Computed Tomography (CBCT) plays a key role in dental diagnosis and surgery. However, the metal teeth implants could bring annoying metal artifacts during the CBCT imaging process, interfering diagnosis and downstream processing such as tooth segmentation. In this paper, we develop an efficient Transformer to perform metal artifacts reduction (MAR) from dental CBCT images. The proposed M…
▽ More
Cone Beam Computed Tomography (CBCT) plays a key role in dental diagnosis and surgery. However, the metal teeth implants could bring annoying metal artifacts during the CBCT imaging process, interfering diagnosis and downstream processing such as tooth segmentation. In this paper, we develop an efficient Transformer to perform metal artifacts reduction (MAR) from dental CBCT images. The proposed MAR Transformer (MARformer) reduces computation complexity in the multihead self-attention by a new Dimension-Reduced Self-Attention (DRSA) module, based on that the CBCT images have globally similar structure. A Patch-wise Perceptive Feed Forward Network (P2FFN) is also proposed to perceive local image information for fine-grained restoration. Experimental results on CBCT images with synthetic and real-world metal artifacts show that our MARformer is efficient and outperforms previous MAR methods and two restoration Transformers.
△ Less
Submitted 18 April, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis
Authors:
Yitao Zhu,
Zhenrong Shen,
Zihao Zhao,
Sheng Wang,
Xin Wang,
Xiangyu Zhao,
Dinggang Shen,
Qian Wang
Abstract:
The common practice in develo** computer-aided diagnosis (CAD) models based on transformer architectures usually involves fine-tuning from ImageNet pre-trained weights. However, with recent advances in large-scale pre-training and the practice of scaling laws, Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. Additionally, in real-world scenari…
▽ More
The common practice in develo** computer-aided diagnosis (CAD) models based on transformer architectures usually involves fine-tuning from ImageNet pre-trained weights. However, with recent advances in large-scale pre-training and the practice of scaling laws, Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. Additionally, in real-world scenarios, the deployments of multiple CAD models can be troublesome due to problems such as limited storage space and time-consuming model switching. To address these challenges, we propose a new method MeLo (Medical image Low-rank adaptation), which enables the development of a single CAD model for multiple clinical tasks in a lightweight manner. It adopts low-rank adaptation instead of resource-demanding fine-tuning. By fixing the weight of ViT models and only adding small low-rank plug-ins, we achieve competitive results on various diagnosis tasks across different imaging modalities using only a few trainable parameters. Specifically, our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets using about 0.17% trainable parameters. Moreover, MeLo adds only about 0.5MB of storage space and allows for extremely fast model switching in deployment and inference. Our source code and pre-trained weights are available on our website (https://absterzhu.github.io/melo.github.io/).
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
An Efficient Probabilistic Solution to Map** Errors in LiDAR-Camera Fusion for Autonomous Vehicles
Authors:
Dan Shen,
Zhengming Zhang,
Renran Tian,
Yaobin Chen,
Rini Sherony
Abstract:
LiDAR-camera fusion is one of the core processes for the perception system of current automated driving systems. The typical sensor fusion process includes a list of coordinate transformation operations following system calibration. Although a significant amount of research has been done to improve the fusion accuracy, there are still inherent data map** errors in practice related to system sync…
▽ More
LiDAR-camera fusion is one of the core processes for the perception system of current automated driving systems. The typical sensor fusion process includes a list of coordinate transformation operations following system calibration. Although a significant amount of research has been done to improve the fusion accuracy, there are still inherent data map** errors in practice related to system synchronization offsets, vehicle vibrations, the small size of the target, and fast relative moving speeds. Moreover, more and more complicated algorithms to improve fusion accuracy can overwhelm the onboard computational resources, limiting the actual implementation. This study proposes a novel and low-cost probabilistic LiDAR-Camera fusion method to alleviate these inherent map** errors in scene reconstruction. By calculating shape similarity using KL-divergence and applying RANSAC-regression-based trajectory smoother, the effects of LiDAR-camera map** errors are minimized in object localization and distance estimation. Designed experiments are conducted to prove the robustness and effectiveness of the proposed strategy.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Active Collision Avoidance System for E-Scooters in Pedestrian Environment
Authors:
Xuke Yan,
Dan Shen
Abstract:
In the dense fabric of urban areas, electric scooters have rapidly become a preferred mode of transportation. As they cater to modern mobility demands, they present significant safety challenges, especially when interacting with pedestrians. In general, e-scooters are suggested to be ridden in bike lanes/sidewalks or share the road with cars at the maximum speed of about 15-20 mph, which is more f…
▽ More
In the dense fabric of urban areas, electric scooters have rapidly become a preferred mode of transportation. As they cater to modern mobility demands, they present significant safety challenges, especially when interacting with pedestrians. In general, e-scooters are suggested to be ridden in bike lanes/sidewalks or share the road with cars at the maximum speed of about 15-20 mph, which is more flexible and much faster than pedestrians and bicyclists. Accurate prediction of pedestrian movement, coupled with assistant motion control of scooters, is essential in minimizing collision risks and seamlessly integrating scooters in areas dense with pedestrians. Addressing these safety concerns, our research introduces a novel e-Scooter collision avoidance system (eCAS) with a method for predicting pedestrian trajectories, employing an advanced LSTM network integrated with a state refinement module. This proactive model is designed to ensure unobstructed movement in areas with substantial pedestrian traffic without collisions. Results are validated on two public datasets, ETH and UCY, providing encouraging outcomes. Our model demonstrated proficiency in anticipating pedestrian paths and augmented scooter path planning, allowing for heightened adaptability in densely populated locales. This study shows the potential of melding pedestrian trajectory prediction with scooter motion planning. With the ubiquity of electric scooters in urban environments, such advancements have become crucial to safeguard all participants in urban transit.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.