Search | arXiv e-print repository

arXiv:2311.12028 [pdf, other]

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Authors: Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, Nicu Sebe

Abstract: Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D human pose estimation fr… ▽ More Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D human pose estimation from videos. Our HoT begins with pruning pose tokens of redundant frames and ends with recovering full-length tokens, resulting in a few pose tokens in the intermediate transformer blocks and thus improving the model efficiency. To effectively achieve this, we propose a token pruning cluster (TPC) that dynamically selects a few representative tokens with high semantic diversity while eliminating the redundancy of video frames. In addition, we develop a token recovering attention (TRA) to restore the detailed spatio-temporal information based on the selected tokens, thereby expanding the network output to the original full-length temporal resolution for fast inference. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that our method can achieve both high efficiency and estimation accuracy compared to the original VPT models. For instance, applying to MotionBERT and MixSTE on Human3.6M, our HoT can save nearly 50% FLOPs without sacrificing accuracy and nearly 40% FLOPs with only 0.2% accuracy drop, respectively. Code and models are available at https://github.com/NationalGAILab/HoT. △ Less

Submitted 27 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted by CVPR 2024, Open Sourced

arXiv:2311.10928 [pdf, other]

CAMRA: Copilot for AMR Annotation

Authors: Jon Z. Cai, Shafiuddin Rehan Ahmed, Julia Bonn, Kristin Wright-Bettner, Martha Palmer, James H. Martin

Abstract: In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompa… ▽ More In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu △ Less

Submitted 20 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 System Demonstration

arXiv:2311.00412 [pdf, other]

Feature-oriented Deep Learning Framework for Pulmonary Cone-beam CT (CBCT) Enhancement with Multi-task Customized Perceptual Loss

Authors: Jiarui Zhu, Werxing Chen, Hongfei Sun, Shaohua Zhi, **g Qin, **g Cai, Ge Ren

Abstract: Cone-beam computed tomography (CBCT) is routinely collected during image-guided radiation therapy (IGRT) to provide updated patient anatomy information for cancer treatments. However, CBCT images often suffer from streaking artifacts and noise caused by under-rate sampling projections and low-dose exposure, resulting in low clarity and information loss. While recent deep learning-based CBCT enhanc… ▽ More Cone-beam computed tomography (CBCT) is routinely collected during image-guided radiation therapy (IGRT) to provide updated patient anatomy information for cancer treatments. However, CBCT images often suffer from streaking artifacts and noise caused by under-rate sampling projections and low-dose exposure, resulting in low clarity and information loss. While recent deep learning-based CBCT enhancement methods have shown promising results in suppressing artifacts, they have limited performance on preserving anatomical details since conventional pixel-to-pixel loss functions are incapable of describing detailed anatomy. To address this issue, we propose a novel feature-oriented deep learning framework that translates low-quality CBCT images into high-quality CT-like imaging via a multi-task customized feature-to-feature perceptual loss function. The framework comprises two main components: a multi-task learning feature-selection network(MTFS-Net) for customizing the perceptual loss function; and a CBCT-to-CT translation network guided by feature-to-feature perceptual loss, which uses advanced generative models such as U-Net, GAN and CycleGAN. Our experiments showed that the proposed framework can generate synthesized CT (sCT) images for the lung that achieved a high similarity to CT images, with an average SSIM index of 0.9869 and an average PSNR index of 39.9621. The sCT images also achieved visually pleasing performance with effective artifacts suppression, noise reduction, and distinctive anatomical details preservation. Our experiment results indicate that the proposed framework outperforms the state-of-the-art models for pulmonary CBCT enhancement. This framework holds great promise for generating high-quality anatomical imaging from CBCT that is suitable for various clinical applications. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 32 pages,7 figures,journal

arXiv:2311.00266 [pdf, other]

Constructing the Fulde-Ferrell-Larkin-Ovchinnikov state in antiferromagnetic insulator CrOCl

Authors: Yifan Ding, Jiadian He, Shihao Zhang, Huakun Zuo, **fan Gu, Jiliang Cai, Xiaohui Zeng, Pu Yan, Kecheng Cao, Kenji Watanabe, Takashi Taniguchi, Peng Dong, Yiwen Zhang, Yueshen Wu, Xiang Zhou, **ghui Wang, Yulin Chen, Yu Ye, Jianpeng Liu, Jun Li

Abstract: Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, w… ▽ More Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, we report the observation of superconductivity in a few-layer antiferromagnetic insulator CrOCl by utilizing superconducting proximity effect with NbSe2 flakes. The superconductivity demonstrates a considerably weak gap of about 0.12 meV and the in-plane upper critical field reveals as behavior of the FFLO state at low temperature. Our first-principles calculations indicate that the proximitized superconductivity may exist in the CrOCl layer with Cr vacancies or line-defects. Moreover, the FFLO state could be induced by the inherent larger spin splitting in the CrOCl layer. Our findings not only demonstrate the fascinating interaction between superconductivity and magnetism, but also provide a possible path to construct FFLO state by intrinsic time reversal symmetry breaking and superconducting proximity effect. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.17082 [pdf, ps, other]

Does or did the supernova remnant Cassiopeia A operate as a PeVatron?

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE;… ▽ More For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE; $E_γ\geq 100$~TeV) $γ$-rays. In this context, the historical SNR Cassiopeia A (Cas A) is considered one of the most promising target for UHE observations. This paper presents the observation of Cas A and its vicinity by the LHAASO KM2A detector. The exceptional sensitivity of LHAASO KM2A in the UHE band, combined with the young age of Cas A, enabled us to derive stringent model-independent limits on the energy budget of UHE protons and nuclei accelerated by Cas A at any epoch after the explosion. The results challenge the prevailing paradigm that Cas A-type SNRs are major suppliers of PeV CRs in the Milky Way. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 11 pages, 3 figures, Accepted by the APJL

arXiv:2310.16315 [pdf]

Probing Silicon Carbide with Phase-Modulated Femtosecond Laser Pulses: Insights into Multiphoton Photocurrent

Authors: Ahsan Ali, Chuanliang Wang, **yang Cai, Khadga Jung Karki

Abstract: Wide bandgap semiconductors are widely used in photonic technologies due to their advantageous features, such as large optical bandgap, low losses, and fast operational speeds. Silicon carbide is a prototypical wide bandgap semiconductor with high optical nonlinearities, large electron transport, and a high breakdown threshold. Integration of silicon carbide in nonlinear photonics requires a syste… ▽ More Wide bandgap semiconductors are widely used in photonic technologies due to their advantageous features, such as large optical bandgap, low losses, and fast operational speeds. Silicon carbide is a prototypical wide bandgap semiconductor with high optical nonlinearities, large electron transport, and a high breakdown threshold. Integration of silicon carbide in nonlinear photonics requires a systematic analysis of the multiphoton contribution to the device functionality. Here, multiphoton photocurrent in a silicon carbide photodetector is investigated using phase-modulated femtosecond pulses. Multiphoton absorption is quantified using a 1030 nm phase-modulated pulsed laser. Our measurements show that although the bandgap is less than the energy of three photons, only four-photon absorption has a significant contribution to the photocurrent. We interpret the four-photon absorption as a direct transition from the valance to the conduction band at the Γ point. More importantly, silicon carbide withstands higher excitation intensities compared to other wide bandgap semiconductors making it an ideal system for high-power nonlinear applications. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 14 pages, 4 figures

arXiv:2310.15435 [pdf, other]

PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers' Workflows

Authors: Savvas Petridis, Michael Terry, Carrie J. Cai

Abstract: Prototy** AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototy**, designers are still prototy** AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create se… ▽ More Prototy** AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototy**, designers are still prototy** AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of prompts. In a study with 14 designers, we compare PromptInfuser to designers' current AI-prototy** workflow. PromptInfuser was perceived to be significantly more useful for communicating product ideas, more capable of producing prototypes that realistically represent the envisioned artifact, more efficient for prototy**, and more helpful for anticipating UI issues and technical constraints. PromptInfuser encouraged iteration over prompt and UI together, which helped designers identify UI and prompt incompatibilities and reflect upon their total solution. Together, these findings inform future systems for prototy** AI applications. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.15428 [pdf, other]

ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

Authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry

Abstract: Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively… ▽ More Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by hel** them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.11641 [pdf]

Cloud-Magnetic Resonance Imaging System: In the Era of 6G and Artificial Intelligence

Authors: Yirong Zhou, Yanhuang Wu, Yuhan Su, **g Li, Jianyun Cai, Yongfu You, Di Guo, Xiaobo Qu

Abstract: Magnetic Resonance Imaging (MRI) plays an important role in medical diagnosis, generating petabytes of image data annually in large hospitals. This voluminous data stream requires a significant amount of network bandwidth and extensive storage infrastructure. Additionally, local data processing demands substantial manpower and hardware investments. Data isolation across different healthcare instit… ▽ More Magnetic Resonance Imaging (MRI) plays an important role in medical diagnosis, generating petabytes of image data annually in large hospitals. This voluminous data stream requires a significant amount of network bandwidth and extensive storage infrastructure. Additionally, local data processing demands substantial manpower and hardware investments. Data isolation across different healthcare institutions hinders cross-institutional collaboration in clinics and research. In this work, we anticipate an innovative MRI system and its four generations that integrate emerging distributed cloud computing, 6G bandwidth, edge computing, federated learning, and blockchain technology. This system is called Cloud-MRI, aiming at solving the problems of MRI data storage security, transmission speed, AI algorithm maintenance, hardware upgrading, and collaborative work. The workflow commences with the transformation of k-space raw data into the standardized Imaging Society for Magnetic Resonance in Medicine Raw Data (ISMRMRD) format. Then, the data are uploaded to the cloud or edge nodes for fast image reconstruction, neural network training, and automatic analysis. Then, the outcomes are seamlessly transmitted to clinics or research institutes for diagnosis and other services. The Cloud-MRI system will save the raw imaging data, reduce the risk of data loss, facilitate inter-institutional medical collaboration, and finally improve diagnostic accuracy and work efficiency. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 4pages, 5figures, letters

arXiv:2310.08845 [pdf, other]

doi 10.1126/sciadv.adj2778

Very high energy gamma-ray emission beyond 10 TeV from GRB 221009A

Authors: Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the t… ▽ More The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the trigger. The intrinsic energy spectrum of gamma-rays can be described by a power-law after correcting for extragalactic background light (EBL) absorption. Such a hard spectrum challenges the synchrotron self-Compton (SSC) scenario of relativistic electrons for the afterglow emission above several TeV. Observations of gamma-rays up to 13 TeV from a source with a measured redshift of z=0.151 hints more transparency in intergalactic space than previously expected. Alternatively, one may invoke new physics such as Lorentz Invariance Violation (LIV) or an axion origin of very high energy (VHE) signals. △ Less

Submitted 22 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 49pages, 11figures

Journal ref: Science Advances, 9, eadj2778 (2023) 15 November 2023

arXiv:2310.08105 [pdf]

doi 10.1038/s41467-023-42617-4

Gate-controlled suppression of light-driven proton transport through graphene electrodes

Authors: S. Huang, E. Griffin, J. Cai, B. Xin, J. Tong, Y. Fu, V. Kravets, F. M. Peeters, M. Lozada-Hidalgo

Abstract: Recent experiments demonstrated that proton transport through graphene electrodes can be accelerated by over an order of magnitude with low intensity illumination. Here we show that this photo-effect can be suppressed for a tuneable fraction of the infrared spectrum by applying a voltage bias. Using photocurrent measurements and Raman spectroscopy, we show that such fraction can be selected by tun… ▽ More Recent experiments demonstrated that proton transport through graphene electrodes can be accelerated by over an order of magnitude with low intensity illumination. Here we show that this photo-effect can be suppressed for a tuneable fraction of the infrared spectrum by applying a voltage bias. Using photocurrent measurements and Raman spectroscopy, we show that such fraction can be selected by tuning the Fermi energy of electrons in graphene with a bias, a phenomenon controlled by Pauli blocking of photo-excited electrons. These findings demonstrate a dependence between graphene's electronic and proton transport properties and provide fundamental insights into molecularly thin electrode-electrolyte interfaces and their interaction with light. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Report number: 6932

Journal ref: Nature Communications (2023)

arXiv:2310.08041 [pdf, other]

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

Authors: **g Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Abstract: Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform t… ▽ More Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks. △ Less

Submitted 6 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: ICLR 2024 camera ready; Code is available at https://github.com/ziplab/QLLM and https://github.com/ModelTC/QLLM

arXiv:2310.06261 [pdf, other]

Self-Discriminative Modeling for Anomalous Graph Detection

Authors: **yu Cai, Yunhe Zhang, Jicong Fan

Abstract: This paper studies the problem of detecting anomalous graphs using a machine learning model trained on only normal graphs, which has many applications in molecule, biology, and social network data analysis. We present a self-discriminative modeling framework for anomalous graph detection. The key idea, mathematically and numerically illustrated, is to learn a discriminator (classifier) from the gi… ▽ More This paper studies the problem of detecting anomalous graphs using a machine learning model trained on only normal graphs, which has many applications in molecule, biology, and social network data analysis. We present a self-discriminative modeling framework for anomalous graph detection. The key idea, mathematically and numerically illustrated, is to learn a discriminator (classifier) from the given normal graphs together with pseudo-anomalous graphs generated by a model jointly trained, where we never use any true anomalous graphs and we hope that the generated pseudo-anomalous graphs interpolate between normal ones and (real) anomalous ones. Under the framework, we provide three algorithms with different computational efficiencies and stabilities for anomalous graph detection. The three algorithms are compared with several state-of-the-art graph-level anomaly detection baselines on nine popular graph datasets (four with small size and five with moderate size) and show significant improvement in terms of AUC. The success of our algorithms stems from the integration of the discriminative classifier and the well-posed pseudo-anomalous graphs, which provide new insights for anomaly detection. Moreover, we investigate our algorithms for large-scale imbalanced graph datasets. Surprisingly, our algorithms, though fully unsupervised, are able to significantly outperform supervised learning algorithms of anomalous graph detection. The corresponding reason is also analyzed. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: This work was submitted to NeurIPS 2023 but was unfortunately rejected

arXiv:2310.04721 [pdf, other]

Memory-Constrained Semantic Segmentation for Ultra-High Resolution UAV Imagery

Authors: Qi Li, Jiaxin Cai, Yuanlong Yu, Jason Gu, Jia Pan, Wenxi Liu

Abstract: Amidst the swift advancements in photography and sensor technologies, high-definition cameras have become commonplace in the deployment of Unmanned Aerial Vehicles (UAVs) for diverse operational purposes. Within the domain of UAV imagery analysis, the segmentation of ultra-high resolution images emerges as a substantial and intricate challenge, especially when grappling with the constraints impose… ▽ More Amidst the swift advancements in photography and sensor technologies, high-definition cameras have become commonplace in the deployment of Unmanned Aerial Vehicles (UAVs) for diverse operational purposes. Within the domain of UAV imagery analysis, the segmentation of ultra-high resolution images emerges as a substantial and intricate challenge, especially when grappling with the constraints imposed by GPU memory-restricted computational devices. This paper delves into the intricate problem of achieving efficient and effective segmentation of ultra-high resolution UAV imagery, while operating under stringent GPU memory limitation. The strategy of existing approaches is to downscale the images to achieve computationally efficient segmentation. However, this strategy tends to overlook smaller, thinner, and curvilinear regions. To address this problem, we propose a GPU memory-efficient and effective framework for local inference without accessing the context beyond local patches. In particular, we introduce a novel spatial-guided high-resolution query module, which predicts pixel-wise segmentation results with high quality only by querying nearest latent embeddings with the guidance of high-resolution information. Additionally, we present an efficient memory-based interaction scheme to correct potential semantic bias of the underlying high-resolution information by associating cross-image contextual semantics. For evaluation of our approach, we perform comprehensive experiments over public benchmarks and achieve superior performance under both conditions of small and large GPU memory usage limitations. We will release the model and codes in the future. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.01176 [pdf, other]

Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

Authors: Thanh Nguyen-Duc, Trung Le, Roland Bammer, He Zhao, Jianfei Cai, Dinh Phung

Abstract: Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data. Existing semi-supervised segmentation methods are usually based on the smoothness assumption. This assumption implies that the model output distributions of two similar data samples are encouraged to be invariant. In other words, the smoothness… ▽ More Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data. Existing semi-supervised segmentation methods are usually based on the smoothness assumption. This assumption implies that the model output distributions of two similar data samples are encouraged to be invariant. In other words, the smoothness assumption states that similar samples (e.g., adding small perturbations to an image) should have similar outputs. In this paper, we introduce a novel cross-adversarial local distribution (Cross-ALD) regularization to further enhance the smoothness assumption for semi-supervised medical image segmentation task. We conducted comprehensive experiments that the Cross-ALD archives state-of-the-art performance against many recent methods on the public LA and ACDC datasets. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: MICCAI 2023

arXiv:2310.00762 [pdf, ps, other]

A note on the stabilizer formalism via noncommutative graphs

Authors: Roy Araiza, Jihong Cai, Yushan Chen, Abraham Holtermann, Chieh Hsu, Tushar Mohan, Peixue Wu, Zeyuan Yu

Abstract: In this short note we formulate a stabilizer formalism in the language of noncommutative graphs. The classes of noncommutative graphs we consider are obtained via unitary representations of compact groups, and suitably chosen operators on finite-dimensional Hilbert spaces. Furthermore, in this framework, we generalize previous results in this area for determining when such noncommutative graphs ha… ▽ More In this short note we formulate a stabilizer formalism in the language of noncommutative graphs. The classes of noncommutative graphs we consider are obtained via unitary representations of compact groups, and suitably chosen operators on finite-dimensional Hilbert spaces. Furthermore, in this framework, we generalize previous results in this area for determining when such noncommutative graphs have anticliques. △ Less

Submitted 28 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Final version. To appear in "Quantum Information Processing''

arXiv:2309.10771 [pdf, other]

Redefining Qualitative Analysis in the AI Era: Utilizing ChatGPT for Efficient Thematic Analysis

Authors: He Zhang, Chuhao Wu, **gyi Xie, Yao Lyu, Jie Cai, John M. Carroll

Abstract: AI tools, particularly large-scale language model (LLM) based applications such as ChatGPT, have the potential to simplify qualitative research. Through semi-structured interviews with seventeen participants, we identified challenges and concerns in integrating ChatGPT into the qualitative analysis process. Collaborating with thirteen qualitative researchers, we developed a framework for designing… ▽ More AI tools, particularly large-scale language model (LLM) based applications such as ChatGPT, have the potential to simplify qualitative research. Through semi-structured interviews with seventeen participants, we identified challenges and concerns in integrating ChatGPT into the qualitative analysis process. Collaborating with thirteen qualitative researchers, we developed a framework for designing prompts to enhance the effectiveness of ChatGPT in thematic analysis. Our findings indicate that improving transparency, providing guidance on prompts, and strengthening users' understanding of LLMs' capabilities significantly enhance the users' ability to interact with ChatGPT. We also discovered and revealed the reasons behind researchers' shift in attitude towards ChatGPT from negative to positive. This research not only highlights the importance of well-designed prompts in LLM applications but also offers reflections for qualitative researchers on the perception of AI's role. Finally, we emphasize the potential ethical risks and the impact of constructing AI ethical expectations by researchers, particularly those who are novices, on future research and AI development. △ Less

Submitted 27 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.09466 [pdf, other]

Progressive Text-to-Image Diffusion with Soft Latent Direction

Authors: YuTeng Ye, Jiale Cai, Hang Zhou, Guanwen Li, Youjia Zhang, Zikai Song, Chenxing Gao, Junqing Yu, Wei Yang

Abstract: In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive synthesis and editing operation that systematically incorporates entities into the target image, ensuring their adherence to spatial and relational cons… ▽ More In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive synthesis and editing operation that systematically incorporates entities into the target image, ensuring their adherence to spatial and relational constraints at each sequential step. Our key insight stems from the observation that while a pre-trained text-to-image diffusion model adeptly handles one or two entities, it often falters when dealing with a greater number. To address this limitation, we propose harnessing the capabilities of a Large Language Model (LLM) to decompose intricate and protracted text descriptions into coherent directives adhering to stringent formats. To facilitate the execution of directives involving distinct semantic operations-namely insertion, editing, and erasing-we formulate the Stimulus, Response, and Fusion (SRF) framework. Within this framework, latent regions are gently stimulated in alignment with each operation, followed by the fusion of the responsive latent components to achieve cohesive entity manipulation. Our proposed framework yields notable advancements in object synthesis, particularly when confronted with intricate and lengthy textual inputs. Consequently, it establishes a new benchmark for text-to-image generation tasks, further elevating the field's performance standards. △ Less

Submitted 18 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 14 pages, 15 figures

arXiv:2309.08634 [pdf, other]

Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing

Authors: Junhui Cai, Ran Chen, Martin J. Wainwright, Linda Zhao

Abstract: Key challenges in running a retail business include how to select products to present to consumers (the assortment problem), and how to price products (the pricing problem) to maximize revenue or profit. Instead of considering these problems in isolation, we propose a joint approach to assortment-pricing based on contextual bandits. Our model is doubly high-dimensional, in that both context vector… ▽ More Key challenges in running a retail business include how to select products to present to consumers (the assortment problem), and how to price products (the pricing problem) to maximize revenue or profit. Instead of considering these problems in isolation, we propose a joint approach to assortment-pricing based on contextual bandits. Our model is doubly high-dimensional, in that both context vectors and actions are allowed to take values in high-dimensional spaces. In order to circumvent the curse of dimensionality, we propose a simple yet flexible model that captures the interactions between covariates and actions via a (near) low-rank representation matrix. The resulting class of models is reasonably expressive while remaining interpretable through latent factors, and includes various structured linear bandit and pricing models as particular cases. We propose a computationally tractable procedure that combines an exploration/exploitation protocol with an efficient low-rank matrix estimator, and we prove bounds on its regret. Simulation results show that this method has lower regret than state-of-the-art methods applied to various standard bandit and pricing models. Real-world case studies on the assortment-pricing problem, from an industry-leading instant noodles company to an emerging beauty start-up, underscore the gains achievable using our method. In each case, we show at least three-fold gains in revenue or profit by our bandit method, as well as the interpretability of the latent factor models that are learned. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.06425 [pdf, other]

Random matrix statistics and safety rest areas on interstates in the United States

Authors: Jia Cai, John Peca-Medlin, Yunke Wan

Abstract: We analyze physical spacings between locations of safety rest areas on interstates in the United States. We show normalized safety rest area spacings on major interstates exhibit Wigner surmise statistics, which align with the eigenvalue spacings for the Gaussian Unitary Ensemble from random matrix theory as well as the one-dimensional gas interactions via the Coulomb potential. We identify econom… ▽ More We analyze physical spacings between locations of safety rest areas on interstates in the United States. We show normalized safety rest area spacings on major interstates exhibit Wigner surmise statistics, which align with the eigenvalue spacings for the Gaussian Unitary Ensemble from random matrix theory as well as the one-dimensional gas interactions via the Coulomb potential. We identify economic and geographic regional traits at the state level that exhibit Poissonian statistics, which become more pronounced with increased geographical obstacles in interstate travel. Other regional filters (e.g., historical or political) produced results that did not diverge substantially from the overall Wigner surmise model. △ Less

Submitted 8 March, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

MSC Class: 60B20; 62P35

arXiv:2309.02670 [pdf, other]

Progressive Attention Guidance for Whole Slide Vulvovaginal Candidiasis Screening

Authors: Jiangdong Cai, Honglin Xiong, Maosong Cao, Luyan Liu, Lichi Zhang, Qian Wang

Abstract: Vulvovaginal candidiasis (VVC) is the most prevalent human candidal infection, estimated to afflict approximately 75% of all women at least once in their lifetime. It will lead to several symptoms including pruritus, vaginal soreness, and so on. Automatic whole slide image (WSI) classification is highly demanded, for the huge burden of disease control and prevention. However, the WSI-based compute… ▽ More Vulvovaginal candidiasis (VVC) is the most prevalent human candidal infection, estimated to afflict approximately 75% of all women at least once in their lifetime. It will lead to several symptoms including pruritus, vaginal soreness, and so on. Automatic whole slide image (WSI) classification is highly demanded, for the huge burden of disease control and prevention. However, the WSI-based computer-aided VCC screening method is still vacant due to the scarce labeled data and unique properties of candida. Candida in WSI is challenging to be captured by conventional classification models due to its distinctive elongated shape, the small proportion of their spatial distribution, and the style gap from WSIs. To make the model focus on the candida easier, we propose an attention-guided method, which can obtain a robust diagnosis classification model. Specifically, we first use a pre-trained detection model as prior instruction to initialize the classification model. Then we design a Skip Self-Attention module to refine the attention onto the fined-grained features of candida. Finally, we use a contrastive learning method to alleviate the overfitting caused by the style gap of WSIs and suppress the attention to false positive regions. Our experimental results demonstrate that our framework achieves state-of-the-art performance. Code and example data are available at https://github.com/cjdbehumble/MICCAI2023-VVC-Screening. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted in the main conference MICCAI 2023

Journal ref: 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023)

arXiv:2309.02046 [pdf, other]

A Fast and Provable Algorithm for Sparse Phase Retrieval

Authors: Jian-Feng Cai, Yu Long, Ruixue Wen, Jiaxi Ying

Abstract: We study the sparse phase retrieval problem, which seeks to recover a sparse signal from a limited set of magnitude-only measurements. In contrast to prevalent sparse phase retrieval algorithms that primarily use first-order methods, we propose an innovative second-order algorithm that employs a Newton-type method with hard thresholding. This algorithm overcomes the linear convergence limitations… ▽ More We study the sparse phase retrieval problem, which seeks to recover a sparse signal from a limited set of magnitude-only measurements. In contrast to prevalent sparse phase retrieval algorithms that primarily use first-order methods, we propose an innovative second-order algorithm that employs a Newton-type method with hard thresholding. This algorithm overcomes the linear convergence limitations of first-order methods while preserving their hallmark per-iteration computational efficiency. We provide theoretical guarantees that our algorithm converges to the $s$-sparse ground truth signal $\mathbf{x}^{\natural} \in \mathbb{R}^n$ (up to a global sign) at a quadratic convergence rate after at most $O(\log (\Vert\mathbf{x}^{\natural} \Vert /x_{\min}^{\natural}))$ iterations, using $Ω(s^2\log n)$ Gaussian random samples. Numerical experiments show that our algorithm achieves a significantly faster convergence rate than state-of-the-art methods. △ Less

Submitted 19 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2309.01660 [pdf]

Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain

Authors: Mohsen Jamali, Ziv M. Williams, **g Cai

Abstract: With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dors… ▽ More With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.01328 [pdf, ps, other]

Restoration Guarantee of Image Inpainting via Low Rank Patch Matrix Completion

Authors: Jian-Feng Cai, Jae Kyu Choi, **gyang Li, Guojian Yin

Abstract: In recent years, patch-based image restoration approaches have demonstrated superior performance compared to conventional variational methods. This paper delves into the mathematical foundations underlying patch-based image restoration methods, with a specific focus on establishing restoration guarantees for patch-based image inpainting, leveraging the assumption of self-similarity among patches.… ▽ More In recent years, patch-based image restoration approaches have demonstrated superior performance compared to conventional variational methods. This paper delves into the mathematical foundations underlying patch-based image restoration methods, with a specific focus on establishing restoration guarantees for patch-based image inpainting, leveraging the assumption of self-similarity among patches. To accomplish this, we present a reformulation of the image inpainting problem as structured low-rank matrix completion, accomplished by grou** image patches with potential overlaps. By making certain incoherence assumptions, we establish a restoration guarantee, given that the number of samples exceeds the order of $rlog^2(N)$, where $N\times N$ denotes the size of the image and $r > 0$ represents the sum of ranks for each group of image patches. Through our rigorous mathematical analysis, we provide valuable insights into the theoretical foundations of patch-based image restoration methods, shedding light on their efficacy and offering guidelines for practical implementation. △ Less

Submitted 19 November, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.16387 [pdf, ps, other]

Stability and instability for compressible Navier-Stokes equations with Yukawa potential

Authors: Juanzi Cai, Zhiang Wu, Guochun Wu

Abstract: In this paper, we first consider global well-posedness and long time behavior of compressible Navier-Stokes equations with Yukawa-type potential in $L^p$-framework under the stability condition $P'(\barρ)+γ\barρ>0$. Here $\barρ>0$ is the background density, P is the pressure and $γ\in\mathbb{R}$ is Yukawa coefficient. This is a continuity work of Chikami \cite{chikami1} concerning on local existen… ▽ More In this paper, we first consider global well-posedness and long time behavior of compressible Navier-Stokes equations with Yukawa-type potential in $L^p$-framework under the stability condition $P'(\barρ)+γ\barρ>0$. Here $\barρ>0$ is the background density, P is the pressure and $γ\in\mathbb{R}$ is Yukawa coefficient. This is a continuity work of Chikami \cite{chikami1} concerning on local existence and blow-up criterion. On the other hand, we study the instability of the linear and nonlinear problem of the system when $P'(\barρ)+γ\barρ<0$ in the Hadamard sense. △ Less

Submitted 30 August, 2023; originally announced August 2023.

MSC Class: 35Q30; 74H40; 76N10

arXiv:2308.13313 [pdf, other]

Extremely powerful and frequency-tunable terahertz pulses from a table-top laser-plasma wiggler

Authors: Jie Cai, Yinren Shou, Yixing Geng, Liqi Han, Xinlu Xu, Shuangchung Wen, Baifei Shen, **qing Yu, Xueqing Yan

Abstract: The production of broadband, terawatt terahertz (THz) pulses has been demonstrated by irradiating relativistic lasers on solid targets. However, the generation of extremely powerful, narrow-band, and frequency-tunable THz pulses remains a challenge. Here, we present a novel approach for such THz pulses, in which a plasma wiggler is elaborated by a table-top laser and a near-critical density plasma… ▽ More The production of broadband, terawatt terahertz (THz) pulses has been demonstrated by irradiating relativistic lasers on solid targets. However, the generation of extremely powerful, narrow-band, and frequency-tunable THz pulses remains a challenge. Here, we present a novel approach for such THz pulses, in which a plasma wiggler is elaborated by a table-top laser and a near-critical density plasma. In such a wiggler, the laser-accelerated electrons emit THz radiations with a period closely related to the plasma thickness. Theoretical model and numerical simulations predict a THz pulse with a laser-THz energy conversion over 2.0$\%$, an ultra-strong field exceeding 80 GV/m, a divergence angle approximately 20$^\circ$, and a center-frequency tunable from 4.4 to 1.5 THz, can be generated from a laser of 430 mJ. Furthermore, we demonstrate that this method can work across a wide range of laser and plasma parameters, offering potential for future applications with extremely powerful THz pulse. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.13144 [pdf]

Giant orbit-to-charge conversion induced via the inverse orbital Hall effect

Authors: Renyou Xu, Hui Zhang, Yuhao Jiang, Houyi Cheng, Yunfei Xie, Yuxuan Yao, Danrong Xiong, Zhaozhao Zhu, Xiaobai Ning, Runze Chen, Yan Huang, Shijie Xu, Jianwang Cai, Yong Xu, Tao Liu, Weisheng Zhao

Abstract: We investigate the orbit-to-charge conversion in YIG/Pt/nonmagnetic material (NM) trilayer heterostructures. With the additional Ru layer on the top of YIG/Pt stacks, the charge current signal increases nearly an order of magnitude in both longitudinal spin Seebeck effect (SSE) and spin pum** (SP) measurements. Through thickness dependence studies of the Ru metal layer and theoretical model, we… ▽ More We investigate the orbit-to-charge conversion in YIG/Pt/nonmagnetic material (NM) trilayer heterostructures. With the additional Ru layer on the top of YIG/Pt stacks, the charge current signal increases nearly an order of magnitude in both longitudinal spin Seebeck effect (SSE) and spin pum** (SP) measurements. Through thickness dependence studies of the Ru metal layer and theoretical model, we quantitatively clarify different contributions of the increased SSE signal that mainly comes from the inverse orbital Hall effect (IOHE) of Ru, and partially comes from the orbital sink effect in the Ru layer. A similar enhancement of SSE(SP) signals is also observed when Ru is replaced by other materials (Ta, W, and Cu), implying the universality of the IOHE in transition metals. Our findings not only suggest a more efficient generation of the charge current via the orbital angular moment channel but also provides crucial insights into the interplay among charge, spin, and orbit. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.11144 [pdf, other]

Exploring Unsupervised Cell Recognition with Prior Self-activation Maps

Authors: **yi Chen, Chenglu Zhu, Zhongyi Shui, Jiatong Cai, Sunyi Zheng, Shichuan Zhang, Lin Yang

Abstract: The success of supervised deep learning models on cell recognition tasks relies on detailed annotations. Many previous works have managed to reduce the dependency on labels. However, considering the large number of cells contained in a patch, costly and inefficient labeling is still inevitable. To this end, we explored label-free methods for cell recognition. Prior self-activation maps (PSM) are p… ▽ More The success of supervised deep learning models on cell recognition tasks relies on detailed annotations. Many previous works have managed to reduce the dependency on labels. However, considering the large number of cells contained in a patch, costly and inefficient labeling is still inevitable. To this end, we explored label-free methods for cell recognition. Prior self-activation maps (PSM) are proposed to generate pseudo masks as training targets. To be specific, an activation network is trained with self-supervised learning. The gradient information in the shallow layers of the network is aggregated to generate prior self-activation maps. Afterward, a semantic clustering module is then introduced as a pipeline to transform PSMs to pixel-level semantic pseudo masks for downstream tasks. We evaluated our method on two histological datasets: MoNuSeg (cell segmentation) and BCData (multi-class cell detection). Compared with other fully-supervised and weakly-supervised methods, our method can achieve competitive performance without any manual annotations. Our simple but effective framework can also achieve multi-class cell detection which can not be done by existing unsupervised methods. The results show the potential of PSMs that might inspire other research to deal with the hunger for labels in medical area. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: MICCAI 2023. arXiv admin note: substantial text overlap with arXiv:2210.07862

arXiv:2308.10726 [pdf, other]

The molecular clouds in a section of the third Galactic quadrant: observational properties and chemical abundance ratio between CO and its isotopologues

Authors: Chen Wang, Haoran Feng, Ji Yang, Xuepeng Chen, Yang Su, Qing-Zeng Yan, Fujun Du, Yuehui Ma, Jiajun Cai

Abstract: We compare the observational properties between $^{12}$CO, $^{13}$CO, and C$^{18}$O and summarize the observational parameters based on 7069 clouds sample from the Milky Way Imaging Scroll Painting (MWISP) CO survey in a section of the third Galactic quadrant. We find that the $^{13}$CO angular area ($A_{\rm ^{13}CO}$) generally increases with that of $^{12}$CO ($A_{\rm ^{12}CO}$), and the ratio o… ▽ More We compare the observational properties between $^{12}$CO, $^{13}$CO, and C$^{18}$O and summarize the observational parameters based on 7069 clouds sample from the Milky Way Imaging Scroll Painting (MWISP) CO survey in a section of the third Galactic quadrant. We find that the $^{13}$CO angular area ($A_{\rm ^{13}CO}$) generally increases with that of $^{12}$CO ($A_{\rm ^{12}CO}$), and the ratio of $A_{\rm ^{13}CO}$ to $A_{\rm ^{12}CO}$ is 0.38 by linear fitting. We find that the $^{12}$CO and $^{13}$CO flux are tightly correlated as $F_{\rm ^{13}CO}~=~0.17~ F_{\rm ^{12}CO}$ with both fluxes calculated within the $^{13}$CO-bright region. This indicates that the abundance $X_{\rm ^{13}CO}$ is a constant to be 6.5$^{+0.1}_{-0.5}$ $\times 10^{-7}$ for all samples under assumption of local thermodynamic equilibrium (LTE). Additionally, we observed that the X-factor is approximately constant in large sample molecular clouds. Similarly, we find $F_{\rm C^{18}O}~=~0.11~F_{\rm ^{13}CO}$ with both fluxes calculated within C$^{18}$O-bright region, which indicates that the abundance ratios ${X_{\rm ^{13}CO}/X_{\rm C^{18}O}}$ stays the same value 9.7$^{+0.6}_{-0.8}$ across the molecular clouds under LTE assumption. The linear relationships of $F_{\rm ^{12}CO}$ vs. $F_{\rm ^{13}CO}$ and $F_{\rm ^{13}CO}$ vs. $F_{\rm C^{18}O}$ hold not only for the $^{13}$CO-bright region or C$^{18}$O-bright region, but also for the entire molecular cloud scale with lower flux ratio. The abundance ratio ${X_{\rm ^{13}CO}/X_{\rm C^{18}O}}$ inside clouds shows a strong correlation with column density and temperature. This indicates that the ${X_{\rm ^{13}CO}/X_{\rm C^{18}O}}$ is dominated by a combination of chemical fractionation, selectively dissociation, and self-shielding effect inside clouds. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 11 pages, 16 figures, 1 table, accepted by AJ

arXiv:2308.10529 [pdf, other]

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding

Authors: Tianyu Yu, Chengyue Jiang, Chao Lou, Shen Huang, Xiaobin Wang, Wei Liu, Jiong Cai, Yangning Li, Yinghui Li, Kewei Tu, Hai-Tao Zheng, Ningyu Zhang, Pengjun Xie, Fei Huang, Yong Jiang

Abstract: Large language models (LLMs) have shown impressive ability for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extr… ▽ More Large language models (LLMs) have shown impressive ability for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extraction and entity ty**. To this end, we present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding. We express all NLU tasks with two atomic tasks, which define fixed instructions to restrict the input and output format but still ``open'' for arbitrarily varied label sets. The model is first instruction-tuned with extremely fine-grained labeled data synthesized by ChatGPT and then further fine-tuned by 233 different atomic tasks from 152 datasets across various domains. The experimental results show that SeqGPT has decent classification and extraction ability, and is capable of performing language understanding tasks on unseen domains. We also conduct empirical studies on the scaling of data and model size as well as on the transfer across tasks. Our model is accessible at https://github.com/Alibaba-NLP/SeqGPT. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: Initial version of SeqGPT

arXiv:2308.07868 [pdf, other]

ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces

Authors: Qianyi Wu, Kaisiyuan Wang, Kejie Li, Jianmin Zheng, Jianfei Cai

Abstract: In recent years, neural implicit surface reconstruction has emerged as a popular paradigm for multi-view 3D reconstruction. Unlike traditional multi-view stereo approaches, the neural implicit surface-based methods leverage neural networks to represent 3D scenes as signed distance functions (SDFs). However, they tend to disregard the reconstruction of individual objects within the scene, which lim… ▽ More In recent years, neural implicit surface reconstruction has emerged as a popular paradigm for multi-view 3D reconstruction. Unlike traditional multi-view stereo approaches, the neural implicit surface-based methods leverage neural networks to represent 3D scenes as signed distance functions (SDFs). However, they tend to disregard the reconstruction of individual objects within the scene, which limits their performance and practical applications. To address this issue, previous work ObjectSDF introduced a nice framework of object-composition neural implicit surfaces, which utilizes 2D instance masks to supervise individual object SDFs. In this paper, we propose a new framework called ObjectSDF++ to overcome the limitations of ObjectSDF. First, in contrast to ObjectSDF whose performance is primarily restricted by its converted semantic field, the core component of our model is an occlusion-aware object opacity rendering formulation that directly volume-renders object opacity to be supervised with instance masks. Second, we design a novel regularization term for object distinction, which can effectively mitigate the issue that ObjectSDF may result in unexpected reconstruction in invisible regions due to the lack of constraint to prevent collisions. Our extensive experiments demonstrate that our novel framework not only produces superior object reconstruction results but also significantly improves the quality of scene reconstruction. Code and more resources can be found in \url{https://qianyiwu.github.io/objectsdf++} △ Less

Submitted 17 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: ICCV 2023. Project Page: https://qianyiwu.github.io/objectsdf++ Code: https://github.com/QianyiWu/objectsdf_plus

arXiv:2308.07852 [pdf, ps, other]

Towards the demonstration of photon-photon collision with compact lasers

Authors: L. Q. Han, J. Cai, Y. R. Shou, X. D. Liu, J. Q. Yu, X. Q. Yan

Abstract: We report a proposal to observe the two-photon Breit-Wheeler process in plasma driven by compact lasers. A high charge electron bunch can be generated from laser plasma wakefield acceleration when a tightly focused laser pulse transports in a sub-critical density plasma. The electron bunch scatters with the laser pulse coming from the opposite direction and results the emitting of high brilliance… ▽ More We report a proposal to observe the two-photon Breit-Wheeler process in plasma driven by compact lasers. A high charge electron bunch can be generated from laser plasma wakefield acceleration when a tightly focused laser pulse transports in a sub-critical density plasma. The electron bunch scatters with the laser pulse coming from the opposite direction and results the emitting of high brilliance X-ray pulses. In a three-dimensional particle-in-cell simulation with a laser pulse of $\sim$10 J, one could produce a X-ray pulse with photon number higher than $3\times10^{11}$ and brilliance above $1.6\times 10^{23}$ photons/s/mm$^2$/mrad$^2$/0.1$\%$BW at 1 MeV. The X-ray pulses collide in the plasma and create more than $1.1\times 10^5$ electron-positron pairs per shot. It is also found that the positrons can be accelerated transversely by a transverse electric field generated in the plasma, which enables the safe detection in the direction away from the laser pulses. This proposal which has solved key challenges in laser driven photon-photon collision could demonstrate the two-photon Breit-Wheeler process on a much more compact device in a single shot. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.02657 [pdf]

doi 10.1038/s41586-023-06536-0

Observation of Fractionally Quantized Anomalous Hall Effect

Authors: Heonjoon Park, Jiaqi Cai, Eric Anderson, Yinong Zhang, Jiayi Zhu, Xiaoyu Liu, Chong Wang, William Holtzmann, Chaowei Hu, Zhaoyu Liu, Takashi Taniguchi, Kenji Watanabe, Jiun-haw Chu, Ting Cao, Liang Fu, Wang Yao, Cui-Zu Chang, David Cobden, Di Xiao, Xiaodong Xu

Abstract: The integer quantum anomalous Hall (QAH) effect is a lattice analog of the quantum Hall effect at zero magnetic field. This striking transport phenomenon occurs in electronic systems with topologically nontrivial bands and spontaneous time-reversal symmetry breaking. Discovery of its putative fractional counterpart in the presence of strong electron correlations, i.e., the fractional quantum anoma… ▽ More The integer quantum anomalous Hall (QAH) effect is a lattice analog of the quantum Hall effect at zero magnetic field. This striking transport phenomenon occurs in electronic systems with topologically nontrivial bands and spontaneous time-reversal symmetry breaking. Discovery of its putative fractional counterpart in the presence of strong electron correlations, i.e., the fractional quantum anomalous Hall (FQAH) effect, would open a new chapter in condensed matter physics. Here, we report the direct observation of both integer and fractional QAH effects in electrical measurements on twisted bilayer MoTe$_2$. At zero magnetic field, near filling factor $ν= -1$ (one hole per moiré unit cell) we see an extended integer QAH plateau in the Hall resistance $R_\text{xy}$ that is quantized to $h/e^2 \pm 0.1 \%$ while the longitudinal resistance $R_\text{xx}$ vanishes. Remarkably, at $ν=-2/3$ and $-3/5$ we see plateau features in $R_\text{xy}$ at $3h/2e^2 \pm 1\%$ and $5h/3e^2 \pm 3\%$, respectively, while $R_\text{xx}$ remains small. All these features shift linearly in an applied magnetic field with slopes matching the corresponding Chern numbers $-1$, $-2/3$, and $-3/5$, precisely as expected for integer and fractional QAH states. In addition, at zero magnetic field, $R_\text{xy}$ is approximately $2h/e^2$ near half filling ($ν= -1/2$) and varies linearly as $ν$ is tuned. This behavior resembles that of the composite Fermi liquid in the half-filled lowest Landau level of a two-dimensional electron gas at high magnetic field. Direct observation of the FQAH and associated effects paves the way for researching charge fractionalization and anyonic statistics at zero magnetic field. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 15 pages, 4 figures for main text. 8 extended data figures

Journal ref: Nature (2023)

arXiv:2307.16078 [pdf, other]

Restricted Holant Dichotomy on Domains 3 and 4

Authors: Yin Liu, Austen Z. Fan, **-Yi Cai

Abstract: $\operatorname{Holant}^*(f)$ denotes a class of counting problems specified by a constraint function $f$. We prove complexity dichotomy theorems for $\operatorname{Holant}^*(f)$ in two settings: (1) $f$ is any arity-3 real-valued function on input of domain size 3. (2) $f$ is any arity-3 $\{0,1\}$-valued function on input of domain size 4. $\operatorname{Holant}^*(f)$ denotes a class of counting problems specified by a constraint function $f$. We prove complexity dichotomy theorems for $\operatorname{Holant}^*(f)$ in two settings: (1) $f$ is any arity-3 real-valued function on input of domain size 3. (2) $f$ is any arity-3 $\{0,1\}$-valued function on input of domain size 4. △ Less

Submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.12115 [pdf, other]

A Revolution of Personalized Healthcare: Enabling Human Digital Twin with Mobile AIGC

Authors: Jiayuan Chen, Changyan Yi, Hongyang Du, Dusit Niyato, Jiawen Kang, Jun Cai, Xuemin, Shen

Abstract: Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empower… ▽ More Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empowered by the mobile AIGC is expected to revolutionize the personalized healthcare by generating rare disease data, modeling high-fidelity digital twin, building versatile testbeds, and providing 24/7 customized medical services. To promote the development of this new breed of paradigm, in this article, we propose a system architecture of mobile AIGC-driven HDT and highlight the corresponding design requirements and challenges. Moreover, we illustrate two use cases, i.e., mobile AIGC-driven HDT in customized surgery planning and personalized medication. In addition, we conduct an experimental study to prove the effectiveness of the proposed mobile AIGC-driven HDT solution, which shows a particular application in a virtual physical therapy teaching platform. Finally, we conclude this article by briefly discussing several open issues and future directions. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2307.10036 [pdf, other]

Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition

Authors: Jia-Xin Zhuang, Jiabin Cai, Jianguo Zhang, Wei-shi Zheng, Ruixuan Wang

Abstract: Automated medical image classification is the key component in intelligent diagnosis systems. However, most medical image datasets contain plenty of samples of common diseases and just a handful of rare ones, leading to major class imbalances. Currently, it is an open problem in intelligent diagnosis to effectively learn from imbalanced training data. In this paper, we propose a simple yet effecti… ▽ More Automated medical image classification is the key component in intelligent diagnosis systems. However, most medical image datasets contain plenty of samples of common diseases and just a handful of rare ones, leading to major class imbalances. Currently, it is an open problem in intelligent diagnosis to effectively learn from imbalanced training data. In this paper, we propose a simple yet effective framework, named \textbf{C}lass \textbf{A}ttention to \textbf{RE}gions of the lesion (CARE), to handle data imbalance issues by embedding attention into the training process of \textbf{C}onvolutional \textbf{N}eural \textbf{N}etworks (CNNs). The proposed attention module helps CNNs attend to lesion regions of rare diseases, therefore hel** CNNs to learn their characteristics more effectively. In addition, this attention module works only during the training phase and does not change the architecture of the original network, so it can be directly combined with any existing CNN architecture. The CARE framework needs bounding boxes to represent the lesion regions of rare diseases. To alleviate the need for manual annotation, we further developed variants of CARE by leveraging the traditional saliency methods or a pretrained segmentation model for bounding box generation. Results show that the CARE variants with automated bounding box generation are comparable to the original CARE framework with \textit{manual} bounding box annotations. A series of experiments on an imbalanced skin image dataset and a pneumonia dataset indicates that our method can effectively help the network focus on the lesion regions of rare diseases and remarkably improves the classification performance of rare diseases. △ Less

Submitted 20 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: Accepted by Neurocomputing on July 2023. 37 pages

arXiv:2307.08238 [pdf, other]

Unified Open-Vocabulary Dense Visual Prediction

Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai

Abstract: In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of existing approaches are task-specific and individually tackle each task. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Comp… ▽ More In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of existing approaches are task-specific and individually tackle each task. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Compared with separate models, a unified network is more desirable for diverse industrial applications. Moreover, OV dense prediction training data is relatively less. Separate networks can only leverage task-relevant training data, while a unified approach can integrate diverse training data to boost individual tasks. We address two major challenges in unified OV prediction. Firstly, unlike unified methods for fixed-set predictions, OV networks are usually trained with multi-modal data. Therefore, we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism to better leverage multi-modal data. Secondly, because UOVN uses data from different tasks for training, there are significant domain and task gaps. We present a UOVN training mechanism to reduce such gaps. Experiments on four datasets demonstrate the effectiveness of our UOVN. △ Less

Submitted 18 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.04513 [pdf, other]

CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation

Authors: Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, Winston Chong, Jianfei Cai

Abstract: New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying large-scale deep learning models. Since single-time-point samples with all-lesion labels are relatively easy to collect, exploiting them to train deep m… ▽ More New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying large-scale deep learning models. Since single-time-point samples with all-lesion labels are relatively easy to collect, exploiting them to train deep models is highly desirable to improve new lesion segmentation. Therefore, we proposed a coaction segmentation (CoactSeg) framework to exploit the heterogeneous data (i.e., new-lesion annotated two-time-point data and all-lesion annotated single-time-point data) for new MS lesion segmentation. The CoactSeg model is designed as a unified model, with the same three inputs (the baseline, follow-up, and their longitudinal brain differences) and the same three outputs (the corresponding all-lesion and new-lesion predictions), no matter which type of heterogeneous data is being used. Moreover, a simple and effective relation regularization is proposed to ensure the longitudinal relations among the three outputs to improve the model learning. Extensive experiments demonstrate that utilizing the heterogeneous data and the proposed longitudinal relation constraint can significantly improve the performance for both new-lesion and all-lesion segmentation tasks. Meanwhile, we also introduce an in-house MS-23v1 dataset, including 38 Oceania single-time-point samples with all-lesion labels. Codes and the dataset are released at https://github.com/ycwu1997/CoactSeg. △ Less

Submitted 14 September, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: Accepted by MICCAI 2023 (Early Acceptance)

arXiv:2307.03339 [pdf, other]

Open-Vocabulary Object Detection via Scene Graph Discovery

Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai

Abstract: In recent years, open-vocabulary (OV) object detection has attracted increasing research attention. Unlike traditional detection, which only recognizes fixed-category objects, OV detection aims to detect objects in an open category set. Previous works often leverage vision-language (VL) training data (e.g., referring grounding data) to recognize OV objects. However, they only use pairs of nouns an… ▽ More In recent years, open-vocabulary (OV) object detection has attracted increasing research attention. Unlike traditional detection, which only recognizes fixed-category objects, OV detection aims to detect objects in an open category set. Previous works often leverage vision-language (VL) training data (e.g., referring grounding data) to recognize OV objects. However, they only use pairs of nouns and individual objects in VL data, while these data usually contain much more information, such as scene graphs, which are also crucial for OV detection. In this paper, we propose a novel Scene-Graph-Based Discovery Network (SGDN) that exploits scene graph cues for OV detection. Firstly, a scene-graph-based decoder (SGDecoder) including sparse scene-graph-guided attention (SSGA) is presented. It captures scene graphs and leverages them to discover OV objects. Secondly, we propose scene-graph-based prediction (SGPred), where we build a scene-graph-based offset regression (SGOR) mechanism to enable mutual enhancement between scene graph extraction and object localization. Thirdly, we design a cross-modal learning mechanism in SGPred. It takes scene graphs as bridges to improve the consistency between cross-modal embeddings for OV object classification. Experiments on COCO and LVIS demonstrate the effectiveness of our approach. Moreover, we show the ability of our model for OV scene graph detection, while previous OV scene graph generation methods cannot tackle this task. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.02781 [pdf, other]

Dynamic Factor Analysis with Dependent Gaussian Processes for High-Dimensional Gene Expression Trajectories

Authors: Jiachen Cai, Robert J. B. Goudie, Colin Starr, Brian D. M. Tom

Abstract: The increasing availability of high-dimensional, longitudinal measures of genetic expression can facilitate analysis of the biological mechanisms of disease and prediction of future trajectories, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Ba… ▽ More The increasing availability of high-dimensional, longitudinal measures of genetic expression can facilitate analysis of the biological mechanisms of disease and prediction of future trajectories, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterising such correlation among different pathways through Dependent Gaussian Processes (DGP) and map** the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian Sparse Factor Analysis. Compared to previous approaches that model each pathway expression trajectory independently, our model demonstrates better performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated in the simulation study and real data analysis. To fit the model, we propose a Monte Carlo Expectation Maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA (Konzen and others, 2021), which returns the maximum likelihood estimates of DGP parameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. An R package has been developed that implements the proposed approach. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.01675 [pdf, other]

doi 10.1103/PhysRevA.109.032626

Two-photon-transition superadiabatic passage in an nitrogen-vacancy center in diamond

Authors: Musang Gong, Min Yu, Yaoming Chu, Wei Chen, Qingyun Cao, Ning Wang, Jianming Cai, Ralf Betzholz, Luigi Giannelli

Abstract: Reaching a given target quantum state with high fidelity and fast operation speed close to the quantum limit represents an important goal in quantum information science. Here, we experimentally demonstrate superadiabatic quantum driving to achieve population transfer in a three-level solid-state spin system. Starting from traditional stimulated Raman adiabatic passage (STIRAP), our approach implem… ▽ More Reaching a given target quantum state with high fidelity and fast operation speed close to the quantum limit represents an important goal in quantum information science. Here, we experimentally demonstrate superadiabatic quantum driving to achieve population transfer in a three-level solid-state spin system. Starting from traditional stimulated Raman adiabatic passage (STIRAP), our approach implements superadiabatic corrections to the STIRAP Hamiltonians with several paradigmatic pulse shapes. It requires no need of intense microwave pulses or long transfer times and shows enhanced robustness over pulse imperfections. These results might provide a useful tool for quantum information processing and coherent manipulations of quantum systems. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 8 pages, 7 figures

Journal ref: Physical Review A 109, 032626 (2024)

arXiv:2307.00370 [pdf, other]

Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model

Authors: Jiong Cai, Yong Jiang, Yue Zhang, Chengyue Jiang, Ke Yu, Jianhui Ji, Rong Xiao, Haihong Tang, Tao Wang, Zhongqiang Huang, Pengjun Xie, Fei Huang, Kewei Tu

Abstract: Discovering the intended items of user queries from a massive repository of items is one of the main goals of an e-commerce search system. Relevance prediction is essential to the search system since it helps improve performance. When online serving a relevance model, the model is required to perform fast and accurate inference. Currently, the widely used models such as Bi-encoder and Cross-encode… ▽ More Discovering the intended items of user queries from a massive repository of items is one of the main goals of an e-commerce search system. Relevance prediction is essential to the search system since it helps improve performance. When online serving a relevance model, the model is required to perform fast and accurate inference. Currently, the widely used models such as Bi-encoder and Cross-encoder have their limitations in accuracy or inference speed respectively. In this work, we propose a novel model called the Entity-Based Relevance Model (EBRM). We identify the entities contained in an item and decompose the QI (query-item) relevance problem into multiple QE (query-entity) relevance problems; we then aggregate their results to form the QI prediction using a soft logic formulation. The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy as well as cache QE predictions for fast online inference. Utilizing soft logic makes the prediction procedure interpretable and intervenable. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance. The proposed method is evaluated on labeled data from e-commerce websites. Empirical results show that it achieves promising improvements with computation efficiency. △ Less

Submitted 19 July, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

arXiv:2307.00275 [pdf]

Terahertz spin currents in nanoscale spatial resolution

Authors: Jiahua Cai, Mingcong Dai, Sai Chen, Peng Chen, Jiaqi Wang, Hongting Xiong, Zejun Ren, Shaojie Liu, Zhongkai Liu, Caihua Wan, Xiaojun Wu

Abstract: The ability to generate, detect, and control coherent terahertz (THz) spin currents with femtosecond temporal and nanoscale spatial resolution has significant ramifications. The diffraction limit of concentrated THz radiation, which has a wavelength range of 5 μm-1.5 mm, has impeded the accumulation of nanodomain data of magnetic structures and spintronic dynamics despite its potential benefits. C… ▽ More The ability to generate, detect, and control coherent terahertz (THz) spin currents with femtosecond temporal and nanoscale spatial resolution has significant ramifications. The diffraction limit of concentrated THz radiation, which has a wavelength range of 5 μm-1.5 mm, has impeded the accumulation of nanodomain data of magnetic structures and spintronic dynamics despite its potential benefits. Contemporary spintronic optoelectronic apparatuses with dimensions 100 nm presented a challenge for researchers due to this restriction. In this study, we demonstrate the use of spintronic THz emission nanoscopy (STEN), which allows for the efficient injection and precise coherent detection of ultrafast THz spin currents at the nanoscale. Furthermore, STEN is an effective method that does not require invasion for characterising and etching nanoscale spintronic heterostructures. The cohesive integration of nanophotonics, nanospintronics, and THz-nano technology into a single platform is poised to accelerate the development of high-frequency spintronic optoelectronic nanodevices and their revolutionary technical applications. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2307.00154 [pdf, other]

Stitched ViTs are Flexible Vision Backbones

Authors: Zizheng Pan, **g Liu, Haoyu He, Jianfei Cai, Bohan Zhuang

Abstract: Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual sizes requires separate trainings and is restricted by fixed performance-efficiency trade-offs. In this paper, we are inspired by stitchable neural network… ▽ More Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual sizes requires separate trainings and is restricted by fixed performance-efficiency trade-offs. In this paper, we are inspired by stitchable neural networks (SN-Net), which is a new framework that cheaply produces a single model that covers rich subnetworks by stitching pretrained model families, supporting diverse performance-efficiency trade-offs at runtime. Building upon this foundation, we introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation. Specifically, we first propose a two-way stitching scheme to enlarge the stitching space. We then design a resource-constrained sampling strategy that takes into account the underlying FLOPs distributions in the space for better sampling. Finally, we observe that learning stitching layers as a low-rank update plays an essential role on downstream tasks to stabilize training and ensure a good Pareto frontier. With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility. Code is available at https://github.com/ziplab/SN-Netv2. △ Less

Submitted 27 November, 2023; v1 submitted 30 June, 2023; originally announced July 2023.

Comments: Tech report

arXiv:2306.16834 [pdf, ps, other]

Intelligence of Astronomical Optical Telescope: Present Status and Future Perspectives

Authors: Kang Huang, Tianzhu Hu, **gyi Cai, Xiushan Pang, Yonghui Hou, Yong Zhang, Huaiqing Wang, Xiangqun Cui

Abstract: Artificial intelligence technology has been widely used in astronomy, and new artificial intelligence technologies and application scenarios are constantly emerging. There have been a large number of papers reviewing the application of artificial intelligence technology in astronomy. However, relevant articles seldom mention telescope intelligence separately, and it is difficult to understand the… ▽ More Artificial intelligence technology has been widely used in astronomy, and new artificial intelligence technologies and application scenarios are constantly emerging. There have been a large number of papers reviewing the application of artificial intelligence technology in astronomy. However, relevant articles seldom mention telescope intelligence separately, and it is difficult to understand the current development status and research hotspots of telescope intelligence from these papers. This paper combines the development history of artificial intelligence technology and the difficulties of critical technologies of telescopes, comprehensively introduces the development and research hotspots of telescope intelligence, then conducts statistical analysis on various research directions of telescope intelligence and defines the research directions' merits. All kinds of research directions are evaluated, and the research trend of each telescope's intelligence is pointed out. Finally, according to the advantages of artificial intelligence technology and the development trend of telescopes, future research hotspots of telescope intelligence are given. △ Less

Submitted 16 January, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 41 pages, 10 figure, for questions or comments, please email [email protected]

ACM Class: J.7

arXiv:2306.13866 [pdf, other]

MIRACLE: Multi-task Learning based Interpretable Regulation of Autoimmune Diseases through Common Latent Epigenetics

Authors: Pengcheng Xu, **pu Cai, Yulin Gao, Ziqi Rong

Abstract: DNA methylation is a crucial regulator of gene transcription and has been linked to various diseases, including autoimmune diseases and cancers. However, diagnostics based on DNA methylation face challenges due to large feature sets and small sample sizes, resulting in overfitting and suboptimal performance. To address these issues, we propose MIRACLE, a novel interpretable neural network that lev… ▽ More DNA methylation is a crucial regulator of gene transcription and has been linked to various diseases, including autoimmune diseases and cancers. However, diagnostics based on DNA methylation face challenges due to large feature sets and small sample sizes, resulting in overfitting and suboptimal performance. To address these issues, we propose MIRACLE, a novel interpretable neural network that leverages autoencoder-based multi-task learning to integrate multiple datasets and jointly identify common patterns in DNA methylation. MIRACLE's architecture reflects the relationships between methylation sites, genes, and pathways, ensuring biological interpretability and meaningfulness. The network comprises an encoder and a decoder, with a bottleneck layer representing pathway information as the basic unit of heredity. Customized defined MaskedLinear Layer is constrained by site-gene-pathway graph adjacency matrix information, which provides explainability and expresses the site-gene-pathway hierarchical structure explicitly. And from the embedding, there are different multi-task classifiers to predict diseases. Tested on six datasets, including rheumatoid arthritis, systemic lupus erythematosus, multiple sclerosis, inflammatory bowel disease, psoriasis, and type 1 diabetes, MIRACLE demonstrates robust performance in identifying common functions of DNA methylation across different phenotypes, with higher accuracy in prediction dieseases than baseline methods. By incorporating biological prior knowledge, MIRACLE offers a meaningful and interpretable framework for DNA methylation data analysis in the context of autoimmune diseases. △ Less

Submitted 3 August, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2306.11396 [pdf, other]

An adaptive multiresolution flux reconstruction method with local time step** and artificial viscosity for compressible flows simulations

Authors: Yixuan Lian, **sheng Cai, Shucheng Pan

Abstract: In this paper, we introduce a novel approach that combines multiresolution (MR) techniques with the flux reconstruction (FR) method to accurately and effciently simulate compressible flows. We achieve further enhancements in effciency through the incorporation of local time step**, and we add artificial viscosity to capture shocks. With the developed MR-FR algorithm, the layer difference of two… ▽ More In this paper, we introduce a novel approach that combines multiresolution (MR) techniques with the flux reconstruction (FR) method to accurately and effciently simulate compressible flows. We achieve further enhancements in effciency through the incorporation of local time step**, and we add artificial viscosity to capture shocks. With the developed MR-FR algorithm, the layer difference of two adjacent elements can exceed 1, and simulation errors can be adjusted by manipulating a single scalar. To ensure conservation, information communication between nodes at different layers is accomplished using L2 projection. Additionally, we propose an innovative indicator based on MR analysis to detect discontinuities, enabling us to take full advantage of the details generated by MR. By indicating smoothness and adding artificial viscosity only to the finest meshes, computational costs can be reduced and errors resulting from artificial diffusion can be locally limited. Numerical tests demonstrate that the adoption of MR preserve the convergence order of the FR method, and the newly proposed indicator performs well in detecting discontinuities. Overall, the MR-FR algorithm can accurately simulate compressible flows with strong shocks and physical dissipation using significantly fewer grids, making it a promising approach for further applications. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.11286 [pdf, other]

Globally Optimal Solutions to a Class of Fractional Optimization Problems Based on Proximal Gradient Algorithm

Authors: Yizun Lin, Jian-Feng Cai, Zhao-Rong Lai, Cheng Li

Abstract: In this paper, we investigate a category of constrained fractional optimization problems that emerge in various practical applications. The objective function for this category is characterized by the ratio of a numerator and denominator, both being convex, semi-algebraic, Lipschitz continuous, and differentiable with Lipschitz continuous gradients over the constraint sets. The constrained sets as… ▽ More In this paper, we investigate a category of constrained fractional optimization problems that emerge in various practical applications. The objective function for this category is characterized by the ratio of a numerator and denominator, both being convex, semi-algebraic, Lipschitz continuous, and differentiable with Lipschitz continuous gradients over the constraint sets. The constrained sets associated with these problems are closed, convex, and semi-algebraic. We propose an efficient algorithm that is inspired by the proximal gradient method, and we provide a thorough convergence analysis. Our algorithm offers several benefits compared to existing methods. It requires only a single proximal gradient operation per iteration, thus avoiding the complicated inner-loop concave maximization usually required. Additionally, our method converges to a critical point without the typical need for a nonnegative numerator, and this critical point becomes a globally optimal solution with an appropriate condition. Our approach is adaptable to unbounded constraint sets as well. Therefore, our approach is viable for many more practical models. Numerical experiments show that our method not only reliably reaches ground-truth solutions in some model problems but also outperforms several existing methods in maximizing the Sharpe ratio with real-world financial data. △ Less

Submitted 15 May, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 29 pages, 2 figures

MSC Class: 90C26; 90C32; 65K05

arXiv:2306.10072 [pdf, ps, other]

doi 10.1007/s11432-023-3961-3

Shor's Algorithm Does Not Factor Large Integers in the Presence of Noise

Authors: **-Yi Cai

Abstract: We consider Shor's quantum factoring algorithm in the setting of noisy quantum gates. Under a generic model of random noise for (controlled) rotation gates, we prove that the algorithm does not factor integers of the form $pq$ when the noise exceeds a vanishingly small level in terms of $n$ -- the number of bits of the integer to be factored, where $p$ and $q$ are from a well-defined set of primes… ▽ More We consider Shor's quantum factoring algorithm in the setting of noisy quantum gates. Under a generic model of random noise for (controlled) rotation gates, we prove that the algorithm does not factor integers of the form $pq$ when the noise exceeds a vanishingly small level in terms of $n$ -- the number of bits of the integer to be factored, where $p$ and $q$ are from a well-defined set of primes of positive density. We further prove that with probability $1 - o(1)$ over random prime pairs $(p,q)$, Shor's factoring algorithm does not factor numbers of the form $pq$, with the same level of random noise present. △ Less

Submitted 15 June, 2023; originally announced June 2023.

ACM Class: F.2.0; E.3

Journal ref: SCIENCE CHINA Information Sciences 2024

arXiv:2306.09760 [pdf]

A filtered embedded weighted compact nonlinear scheme for hyperbolic conservation law

Authors: Xuan Liu, Yaobing Min, **sheng Cai, Yankai Ma, Zhenguo Yan

Abstract: In situations where a wide range of flow scales are involved, the nonlinear scheme used should be capable of both shock capturing and low-dissipation.Most of the existing WCNS schemes are too dissipative because the weights deviate from ideal weights in the smooth regions caused by small-scale fluctuations. Moreover, due to the defect of the weighting strategy, the two smooth stencils located on t… ▽ More In situations where a wide range of flow scales are involved, the nonlinear scheme used should be capable of both shock capturing and low-dissipation.Most of the existing WCNS schemes are too dissipative because the weights deviate from ideal weights in the smooth regions caused by small-scale fluctuations. Moreover, due to the defect of the weighting strategy, the two smooth stencils located on the same side of a discontinuity cannot achieve fourth-order when the discontinuity only crosses S0 or S2. In this paper, we proposed the filtered embedded WCNS scheme which is applicable for complex flow simulations involving both shock and small-scale features. In order to overcome the above deficiency of existing WCNS scheme, a new map** function is proposed to filter the weights deviation out which can map the weights to ideal weights in smooth region. Meanwhile, the embedded process also implemented by this function which is utilized to improve the resolution of shock capturing in certain discontinuity distributions. The approximate-dispersion-relation analysis indicates that the scheme with the map** function we proposed has lower dispersion error and numerical dissipation as compared to the WCNS-JS and WCNS-Z schemes. The improved performance is demonstrated by the simulation of linear advection problem and nonlinear hyperbolic conservation laws. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Showing 101–150 of 797 results for author: Cai, J