Search | arXiv e-print repository

LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

Authors: Chenjie Cao, Yunuo Cai, Qiaole Dong, Yikai Wang, Yanwei Fu

Abstract: This paper introduces LeftRefill, an innovative approach to efficiently harness large Text-to-Image (T2I) diffusion models for reference-guided image synthesis. As the name implies, LeftRefill horizontally stitches reference and target views together as a whole input. The reference image occupies the left side, while the target canvas is positioned on the right. Then, LeftRefill paints the right-s… ▽ More This paper introduces LeftRefill, an innovative approach to efficiently harness large Text-to-Image (T2I) diffusion models for reference-guided image synthesis. As the name implies, LeftRefill horizontally stitches reference and target views together as a whole input. The reference image occupies the left side, while the target canvas is positioned on the right. Then, LeftRefill paints the right-side target canvas based on the left-side reference and specific task instructions. Such a task formulation shares some similarities with contextual inpainting, akin to the actions of a human painter. This novel formulation efficiently learns both structural and textured correspondence between reference and target without other image encoders or adapters. We inject task and view information through cross-attention modules in T2I models, and further exhibit multi-view reference ability via the re-arranged self-attention modules. These enable LeftRefill to perform consistent generation as a generalized model without requiring test-time fine-tuning or model modifications. Thus, LeftRefill can be seen as a simple yet unified framework to address reference-guided synthesis. As an exemplar, we leverage LeftRefill to address two different challenges: reference-guided inpainting and novel view synthesis, based on the pre-trained StableDiffusion. Codes and models are released at https://github.com/ewrfcas/LeftRefill. △ Less

Submitted 2 March, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Accepted by CVPR2024. Codes and models are released at https://github.com/ewrfcas/LeftRefill, Project page: https://ewrfcas.github.io/LeftRefill

arXiv:2305.11467 [pdf, other]

doi 10.1109/LRA.2024.3354627

Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition

Authors: Junqiao Zhao, Fenglin Zhang, Yingfeng Cai, Gengxuan Tian, Wenjie Mu, Chen Ye, Tiantian Feng

Abstract: Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually… ▽ More Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually based on the assumption of constant velocity, which is difficult to hold in practice, and is computationally expensive and subject to sequence length. Although the latter overcomes these problems, existing sequence descriptors are constructed by aggregating features of multiple frames only, without interaction on temporal information, and thus cannot obtain descriptors with spatio-temporal discrimination.In this paper, we propose a sequence descriptor that effectively incorporates spatio-temporal information. Specifically, spatial attention within the same frame is utilized to learn spatial feature patterns, while attention in corresponding local regions of different frames is utilized to learn the persistence or change of features over time. We use a sliding window to control the temporal range of attention and use relative positional encoding to construct sequential relationships between different features. This allows our descriptors to capture the intrinsic dynamics in a sequence of frames.Comprehensive experiments on challenging benchmark datasets show that the proposed approach outperforms recent state-of-the-art methods.The code is available at https://github.com/tiev-tongji/Spatio-Temporal-SeqVPR. △ Less

Submitted 27 January, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: 8 pages, 6 figures, published to RA-L

arXiv:2305.10933 [pdf, other]

doi 10.1103/PhysRevLett.133.021001

Primordial black holes from null energy condition violation during inflation

Authors: Yong Cai, Mian Zhu, Yun-Song Piao

Abstract: Primordial black holes (PBHs) and the violation of the null energy condition (NEC) have significant implications for our understanding of the very early universe. We present a novel approach to generate PBHs via the NEC violation in a single-field inflationary scenario. In our scenario, the universe transitions from a first slow-roll inflation stage with a Hubble parameter $H = H_{\text{inf}1}$ to… ▽ More Primordial black holes (PBHs) and the violation of the null energy condition (NEC) have significant implications for our understanding of the very early universe. We present a novel approach to generate PBHs via the NEC violation in a single-field inflationary scenario. In our scenario, the universe transitions from a first slow-roll inflation stage with a Hubble parameter $H = H_{\text{inf}1}$ to a second slow-roll inflation stage with $H = H_{\text{inf}2}\gg H_{\text{inf}1}$, passing through an intermediate stage of NEC violation. The NEC violation naturally enhances the primordial scalar power spectrum at a certain wavelength, leading to the production of PBHs with masses and abundances of observational interest. We also investigate the phenomenological signatures of scalar-induced gravitational waves (SIGWs) resulting from the enhanced density perturbations. Our work highlights the potential of utilizing a combination of PBHs, SIGWs, and primordial gravitational waves as a valuable probe for studying NEC violation during inflation, opening up new avenues for exploring the early universe. △ Less

Submitted 9 July, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: 9 pages, 5 figures + supplemental material, published in PRL

Journal ref: Physical Review Letters 133, 021001 (2024)

arXiv:2305.10773 [pdf, ps, other]

Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data

Authors: Yangshuo He, Guanding Yu, Yunlong Cai

Abstract: Recently, the ever-increasing demand for bandwidth in multi-modal communication systems requires a paradigm shift. Powered by deep learning, semantic communications are applied to multi-modal scenarios to boost communication efficiency and save communication resources. However, the existing end-to-end neural network (NN) based framework without the channel encoder/decoder is incompatible with mode… ▽ More Recently, the ever-increasing demand for bandwidth in multi-modal communication systems requires a paradigm shift. Powered by deep learning, semantic communications are applied to multi-modal scenarios to boost communication efficiency and save communication resources. However, the existing end-to-end neural network (NN) based framework without the channel encoder/decoder is incompatible with modern digital communication systems. Moreover, most end-to-end designs are task-specific and require re-design and re-training for new tasks, which limits their applications. In this paper, we propose a distributed multi-modal semantic communication framework incorporating the conventional channel encoder/decoder. We adopt NN-based semantic encoder and decoder to extract correlated semantic information contained in different modalities, including speech, text, and image. Based on the proposed framework, we further establish a general rate-adaptive coding mechanism for various types of multi-modal semantic tasks. In particular, we utilize unequal error protection based on semantic importance, which is derived by evaluating the distortion bound of each modality. We further formulate and solve an optimization problem that aims at minimizing inference delay while maintaining inference accuracy for semantic tasks. Numerical results show that the proposed mechanism fares better than both conventional communication and existing semantic communication systems in terms of task performance, inference delay, and deployment complexity. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.10299 [pdf, other]

Binarized Spectral Compressive Imaging

Authors: Yuanhao Cai, Yuxin Zheng, **g Lin, Xin Yuan, Yulun Zhang, Haoqian Wang

Abstract: Existing deep learning models for hyperspectral image (HSI) reconstruction achieve good performance but require powerful hardwares with enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited mobile devices. In this paper, we propose a novel method, Binarized Spectral-Redistribution Network (BiSRNet), for efficient and practical HSI resto… ▽ More Existing deep learning models for hyperspectral image (HSI) reconstruction achieve good performance but require powerful hardwares with enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited mobile devices. In this paper, we propose a novel method, Binarized Spectral-Redistribution Network (BiSRNet), for efficient and practical HSI restoration from compressed measurement in snapshot compressive imaging (SCI) systems. Firstly, we redesign a compact and easy-to-deploy base model to be binarized. Then we present the basic unit, Binarized Spectral-Redistribution Convolution (BiSR-Conv). BiSR-Conv can adaptively redistribute the HSI representations before binarizing activation and uses a scalable hyperbolic tangent function to closer approximate the Sign function in backpropagation. Based on our BiSR-Conv, we customize four binarized convolutional modules to address the dimension mismatch and propagate full-precision information throughout the whole network. Finally, our BiSRNet is derived by using the proposed techniques to binarize the base model. Comprehensive quantitative and qualitative experiments manifest that our proposed BiSRNet outperforms state-of-the-art binarization methods and achieves comparable performance with full-precision algorithms. Code and models are publicly available at https://github.com/caiyuanhao1998/BiSCI and https://github.com/caiyuanhao1998/MST △ Less

Submitted 18 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023; The first work to study binarized spectral compressive imaging reconstruction problem

arXiv:2305.08303 [pdf, other]

Deep-Unfolding for Next-Generation Transceivers

Authors: Qiyu Hu, Yunlong Cai, Guangyi Zhang, Guanding Yu, Geoffrey Ye Li

Abstract: The stringent performance requirements of future wireless networks, such as ultra-high data rates, extremely high reliability and low latency, are spurring worldwide studies on defining the next-generation multiple-input multiple-output (MIMO) transceivers. For the design of advanced transceivers in wireless communications, optimization approaches often leading to iterative algorithms have achieve… ▽ More The stringent performance requirements of future wireless networks, such as ultra-high data rates, extremely high reliability and low latency, are spurring worldwide studies on defining the next-generation multiple-input multiple-output (MIMO) transceivers. For the design of advanced transceivers in wireless communications, optimization approaches often leading to iterative algorithms have achieved great success for MIMO transceivers. However, these algorithms generally require a large number of iterations to converge, which entails considerable computational complexity and often requires fine-tuning of various parameters. With the development of deep learning, approximating the iterative algorithms with deep neural networks (DNNs) can significantly reduce the computational time. However, DNNs typically lead to black-box solvers, which requires amounts of data and extensive training time. To further overcome these challenges, deep-unfolding has emerged which incorporates the benefits of both deep learning and iterative algorithms, by unfolding the iterative algorithm into a layer-wise structure analogous to DNNs. In this article, we first go through the framework of deep-unfolding for transceiver design with matrix parameters and its recent advancements. Then, some endeavors in applying deep-unfolding approaches in next-generation advanced transceiver design are presented. Moreover, some open issues for future research are highlighted. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: 16 pages, 6 figures

arXiv:2305.08293 [pdf, other]

Identity-Preserving Talking Face Generation with Landmark and Appearance Priors

Authors: Weizhi Zhong, Chaowei Fang, Yinqi Cai, Pengxu Wei, Gangming Zhao, Liang Lin, Guanbin Li

Abstract: Generating talking face videos from audio attracts lots of research interest. A few person-specific methods can generate vivid videos but require the target speaker's videos for training or fine-tuning. Existing person-generic methods have difficulty in generating realistic and lip-synced videos while preserving identity information. To tackle this problem, we propose a two-stage framework consist… ▽ More Generating talking face videos from audio attracts lots of research interest. A few person-specific methods can generate vivid videos but require the target speaker's videos for training or fine-tuning. Existing person-generic methods have difficulty in generating realistic and lip-synced videos while preserving identity information. To tackle this problem, we propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures. First, we devise a novel Transformer-based landmark generator to infer lip and jaw landmarks from the audio. Prior landmark characteristics of the speaker's face are employed to make the generated landmarks coincide with the facial outline of the speaker. Then, a video rendering model is built to translate the generated landmarks into face images. During this stage, prior appearance information is extracted from the lower-half occluded target face and static reference images, which helps generate realistic and identity-preserving visual content. For effectively exploring the prior information of static reference images, we align static reference images with the target face's pose and expression based on motion fields. Moreover, auditory features are reused to guarantee that the generated face images are well synchronized with the audio. Extensive experiments demonstrate that our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: CVPR2023, Code: https://github.com/Weizhi-Zhong/IP_LAP

arXiv:2305.05687 [pdf, other]

doi 10.3847/1538-4357/accc89

Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies

Authors: James Paul Mason, Alexandra Werth, Colin G. West, Allison A. Youngblood, Donald L. Woodraska, Courtney Peck, Kevin Lacjak, Florian G. Frick, Moutamen Gabir, Reema A. Alsinan, Thomas Jacobsen, Mohammad Alrubaie, Kayla M. Chizmar, Benjamin P. Lau, Lizbeth Montoya Dominguez, David Price, Dylan R. Butler, Connor J. Biron, Nikita Feoktistov, Kai Dewey, N. E. Loomis, Michal Bodzianowski, Connor Kuybus, Henry Dietrick, Aubrey M. Wolfe , et al. (977 additional authors not shown)

Abstract: Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th… ▽ More Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms that could explain it: nanoflares or Alfvén waves. To date, neither can be directly observed. Nanoflares are, by definition, extremely small, but their aggregate energy release could represent a substantial heating mechanism, presuming they are sufficiently abundant. One way to test this presumption is via the flare frequency distribution, which describes how often flares of various energies occur. If the slope of the power law fitting the flare frequency distribution is above a critical threshold, $α=2$ as established in prior literature, then there should be a sufficient abundance of nanoflares to explain coronal heating. We performed $>$600 case studies of solar flares, made possible by an unprecedented number of data analysts via three semesters of an undergraduate physics laboratory course. This allowed us to include two crucial, but nontrivial, analysis methods: pre-flare baseline subtraction and computation of the flare energy, which requires determining flare start and stop times. We aggregated the results of these analyses into a statistical study to determine that $α= 1.63 \pm 0.03$. This is below the critical threshold, suggesting that Alfvén waves are an important driver of coronal heating. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 1,002 authors, 14 pages, 4 figures, 3 tables, published by The Astrophysical Journal on 2023-05-09, volume 948, page 71

arXiv:2305.04766 [pdf]

OSTA: One-shot Task-adaptive Channel Selection for Semantic Segmentation of Multichannel Images

Authors: Yuanzhi Cai, Jagannath Aryal, Yuan Fang, Hong Huang, Lei Fan

Abstract: Semantic segmentation of multichannel images is a fundamental task for many applications. Selecting an appropriate channel combination from the original multichannel image can improve the accuracy of semantic segmentation and reduce the cost of data storage, processing and future acquisition. Existing channel selection methods typically use a reasonable selection procedure to determine a desirable… ▽ More Semantic segmentation of multichannel images is a fundamental task for many applications. Selecting an appropriate channel combination from the original multichannel image can improve the accuracy of semantic segmentation and reduce the cost of data storage, processing and future acquisition. Existing channel selection methods typically use a reasonable selection procedure to determine a desirable channel combination, and then train a semantic segmentation network using that combination. In this study, the concept of pruning from a supernet is used for the first time to integrate the selection of channel combination and the training of a semantic segmentation network. Based on this concept, a One-Shot Task-Adaptive (OSTA) channel selection method is proposed for the semantic segmentation of multichannel images. OSTA has three stages, namely the supernet training stage, the pruning stage and the fine-tuning stage. The outcomes of six groups of experiments (L7Irish3C, L7Irish2C, L8Biome3C, L8Biome2C, RIT-18 and Semantic3D) demonstrated the effectiveness and efficiency of OSTA. OSTA achieved the highest segmentation accuracies in all tests (62.49% (mIoU), 75.40% (mIoU), 68.38% (mIoU), 87.63% (mIoU), 66.53% (mA) and 70.86% (mIoU), respectively). It even exceeded the highest accuracies of exhaustive tests (61.54% (mIoU), 74.91% (mIoU), 67.94% (mIoU), 87.32% (mIoU), 65.32% (mA) and 70.27% (mIoU), respectively), where all possible channel combinations were tested. All of this can be accomplished within a predictable and relatively efficient timeframe, ranging from 101.71% to 298.1% times the time required to train the segmentation network alone. In addition, there were interesting findings that were deemed valuable for several fields. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.04088 [pdf, other]

Atomically-precise Vacancy-assembled Quantum Antidots

Authors: Hanyan Fang, Harshitra Mahalingam, Xinzhe Li, Xu Han, Zhizhan Qiu, Yixuan Han, Keian Noori, Dikshant Dulal, Hongfei Chen, Pin Lyu, Tianhao Yang, **g Li, Chenliang Su, Wei Chen, Yongqing Cai, Antonio Castro H. Neto, Kostya S. Novoselov, Aleksandr Rodin, Jiong Lu

Abstract: Patterning antidots ("voids") into well-defined antidot lattices creates an intriguing class of artificial structures for the periodic modulation of 2D electron systems, leading to anomalous transport properties and exotic quantum phenomena as well as enabling the precise bandgap engineering of 2D materials to address technological bottleneck issues. However, realizing such atomic-scale quantum an… ▽ More Patterning antidots ("voids") into well-defined antidot lattices creates an intriguing class of artificial structures for the periodic modulation of 2D electron systems, leading to anomalous transport properties and exotic quantum phenomena as well as enabling the precise bandgap engineering of 2D materials to address technological bottleneck issues. However, realizing such atomic-scale quantum antidots (QADs) is infeasible by current nanolithographic techniques. Here, we report an atomically-precise bottom-up fabrication of a series of atomic-scale QADs with elegantly engineered quantum states through a controllable assembly of a chalcogenide single vacancy (SV) in 2D PtTe2, a type-II Dirac semimetal. Te SVs as atomic-scale "antidots" undergo thermal migration and assembly into highly-ordered SV lattices spaced by a single Te atom, reaching the ultimate downscaling limit of antidot lattices. Increasing the number of SVs in QADs strengthens the cumulative repulsive potential and consequently enhances collective interference of multiple-pocket scattered quasiparticles inside QADs, creating multi-level quantum hole states with tunable gap from telecom to far-infrared regime. Moreover, precisely engineered quantum hole states of QADs are symmetry-protected and thus survive upon atom-by-atom oxygen substitutional do**. Therefore, SV-assembled QADs exhibit unprecedented robustness and property tunability, which not only holds the key to their future applications but also embody a wide variety of material technologies. △ Less

Submitted 6 May, 2023; originally announced May 2023.

arXiv:2305.03977 [pdf, other]

An Adversarial Non-Autoregressive Model for Text Generation with Incomplete Information

Authors: Da Ren, Yi Cai, Qing Li

Abstract: Non-autoregressive models have been widely studied in the Complete Information Scenario (CIS), in which the input has complete information of corresponding output. However, their explorations in the Incomplete Information Scenario (IIS) are extremely limited. Our analyses reveal that the IIS's incomplete input information will augment the inherent limitations of existing non-autoregressive models… ▽ More Non-autoregressive models have been widely studied in the Complete Information Scenario (CIS), in which the input has complete information of corresponding output. However, their explorations in the Incomplete Information Scenario (IIS) are extremely limited. Our analyses reveal that the IIS's incomplete input information will augment the inherent limitations of existing non-autoregressive models trained under Maximum Likelihood Estimation. In this paper, we propose for the IIS an Adversarial Non-autoregressive Transformer (ANT) which has two features: 1) Position-Aware Self-Modulation to provide more reasonable hidden representations, and 2) Dependency Feed Forward Network to strengthen its capacity in dependency modeling. We compare ANT with other mainstream models in the IIS and demonstrate that ANT can achieve comparable performance with much fewer decoding iterations. Furthermore, we show its great potential in various applications like latent interpolation and semi-supervised learning. △ Less

Submitted 1 December, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2305.03921 [pdf]

doi 10.1007/s11467-023-1296-0

Efficient high harmonic generation in nonlinear photonic moiré superlattice

Authors: Tingyin Ning, Yingying Ren, Yanyan Huo, Yangjian Cai

Abstract: Photonic moiré superlattice as an emerging platform of flatbands can tightly confine the light inside the cavity and has important applications not only in linear optics but also in nonlinear optics. In this paper, we numerically investigate the third- and fifth-order harmonic generation (THG and FHG) in photonic moiré superlattices fabricated by the nonlinear material silicon. The high conversion… ▽ More Photonic moiré superlattice as an emerging platform of flatbands can tightly confine the light inside the cavity and has important applications not only in linear optics but also in nonlinear optics. In this paper, we numerically investigate the third- and fifth-order harmonic generation (THG and FHG) in photonic moiré superlattices fabricated by the nonlinear material silicon. The high conversion efficiency of THG and FHG is obtained at a relatively low intensity of fundamental light, e.g., the maximum conversion efficiency of THG and FHG arrives even up to be $10^{-2}$ and $10^{-9}$ at the fundamental intensity of 30 kW/m2, respectively, in the moiré superlattice of near flat band formed by the twist angle 6.01o. The results indicate the photonic moiré superlattice of a high-quality factor and flatbands is a promising platform for efficient nonlinear processes and advanced photonic devices. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 20 pages, 6 figures

arXiv:2305.03274 [pdf, other]

FAST: Feature Arrangement for Semantic Transmission

Authors: Kequan Zhou, Guangyi Zhang, Yunlong Cai, Qiyu Hu, Guanding Yu

Abstract: Although existing semantic communication systems have achieved great success, they have not considered that the channel is time-varying wherein deep fading occurs occasionally. Moreover, the importance of each semantic feature differs from each other. Consequently, the important features may be affected by channel fading and corrupted, resulting in performance degradation. Therefore, higher perfor… ▽ More Although existing semantic communication systems have achieved great success, they have not considered that the channel is time-varying wherein deep fading occurs occasionally. Moreover, the importance of each semantic feature differs from each other. Consequently, the important features may be affected by channel fading and corrupted, resulting in performance degradation. Therefore, higher performance can be achieved by avoiding the transmission of important features when the channel state is poor. In this paper, we propose a scheme of Feature Arrangement for Semantic Transmission (FAST). In particular, we aim to schedule the transmission order of features and transmit important features when the channel state is good. To this end, we first propose a novel metric termed feature priority, which takes into consideration both feature importance and feature robustness. Then, we perform channel prediction at the transmitter side to obtain the future channel state information (CSI). Furthermore, the feature arrangement module is developed based on the proposed feature priority and the predicted CSI by transmitting the prior features under better CSI. Simulation results show that the proposed scheme significantly improves the performance of image transmission compared to existing semantic communication systems without feature arrangement. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.00602 [pdf, other]

Feedback-driven anisotropy in the circumgalactic medium for quenching galaxies in the SIMBA simulations

Authors: Tianyi Yang, Romeel Davé, Weiguang Cui, Yan-Chuan Cai, John A. Peacock, Daniele Sorini

Abstract: We use the SIMBA galaxy formation simulation suite to explore anisotropies in the properties of circumgalactic gas that result from accretion and feedback processes. We particularly focus on the impact of bipolar active galactic nuclei (AGN) jet feedback as implemented in SIMBA, which quenches galaxies and has a dramatic effect on large-scale gas properties. We show that jet feedback at low redshi… ▽ More We use the SIMBA galaxy formation simulation suite to explore anisotropies in the properties of circumgalactic gas that result from accretion and feedback processes. We particularly focus on the impact of bipolar active galactic nuclei (AGN) jet feedback as implemented in SIMBA, which quenches galaxies and has a dramatic effect on large-scale gas properties. We show that jet feedback at low redshifts is most common in the stellar mass range $(1-5)\times 10^{10}M_\odot$, so we focus on galaxies with active jets in this mass range. In comparison to runs without jet feedback, jets cause lower densities and higher temperatures along the galaxy minor axis (SIMBA jet direction) at radii >=$0.5r_{200c}-4r_{200c}$ and beyond. This effect is less apparent at higher or lower stellar masses, and is strongest within green valley galaxies. The metallicity also shows strong anisotropy out to large scales, driven by star formation feedback. We find substantially stronger anisotropy at <=$0.5r_{200c}$, but this also exists in runs with no explicit feedback, suggesting that it is due to anisotropic accretion. Finally, we explore anisotropy in the bulk radial motion of the gas, finding that both star formation and AGN wind feedback contribute to pushing the gas outwards along the minor axis at <=1 Mpc, but AGN jet feedback further causes bulk outflow along the minor axis out to several Mpc, which drives quenching via gas starvation. These results provide observational signatures for the operation of AGN feedback in galaxy quenching. △ Less

Submitted 18 October, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: 21 pages, 14 figures, 2 tables, accepted by MNRAS. Comments are welcomed

arXiv:2304.12656 [pdf, ps, other]

doi 10.1145/3588915

Towards Generating Hop-constrained s-t Simple Path Graphs

Authors: Yuzheng Cai, Siyuan Liu, Weiguo Zheng, Xuemin Lin

Abstract: Graphs have been widely used in real-world applications, in which investigating relations between vertices is an important task. In this paper, we study the problem of generating the k-hop-constrained s-t simple path graph, i.e., the subgraph consisting of all simple paths from vertex s to vertex t of length no larger than k. To our best knowledge, we are the first to formalize this problem and pr… ▽ More Graphs have been widely used in real-world applications, in which investigating relations between vertices is an important task. In this paper, we study the problem of generating the k-hop-constrained s-t simple path graph, i.e., the subgraph consisting of all simple paths from vertex s to vertex t of length no larger than k. To our best knowledge, we are the first to formalize this problem and prove its NP-hardness on directed graphs. To tackle this challenging problem, we propose an efficient algorithm named EVE, which exploits the paradigm of edge-wise examination rather than exhaustively enumerating all paths. Powered by essential vertices appearing in all simple paths between vertex pairs, EVE distinguishes the edges that are definitely (or not) contained in the desired simple path graph, producing a tight upper-bound graph in the time cost $\mathcal{O}(k^2|E|)$. Each remaining undetermined edge is further verified to deliver the exact answer. Extensive experiments are conducted on 15 real networks. The results show that EVE significantly outperforms all baselines by several orders of magnitude. Moreover, by taking EVE as a built-in block, state-of-the-art for hop-constrained simple path enumeration can be accelerated by up to an order of magnitude. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted by SIGMOD 2023

Journal ref: Proc. ACM Manag. Data 1, 1, Article 61 (May 2023), 26 pages

arXiv:2304.12149 [pdf, other]

Exploring shared memory architectures for end-to-end gigapixel deep learning

Authors: Lucas W. Remedios, Leon Y. Cai, Samuel W. Remedios, Karthik Ramadass, Aravind Krishnan, Ruining Deng, Can Cui, Shunxing Bao, Lori A. Coburn, Yuankai Huo, Bennett A. Landman

Abstract: Deep learning has made great strides in medical imaging, enabled by hardware advances in GPUs. One major constraint for the development of new models has been the saturation of GPU memory resources during training. This is especially true in computational pathology, where images regularly contain more than 1 billion pixels. These pathological images are traditionally divided into small patches to… ▽ More Deep learning has made great strides in medical imaging, enabled by hardware advances in GPUs. One major constraint for the development of new models has been the saturation of GPU memory resources during training. This is especially true in computational pathology, where images regularly contain more than 1 billion pixels. These pathological images are traditionally divided into small patches to enable deep learning due to hardware limitations. In this work, we explore whether the shared GPU/CPU memory architecture on the M1 Ultra systems-on-a-chip (SoCs) recently released by Apple, Inc. may provide a solution. These affordable systems (less than \$5000) provide access to 128 GB of unified memory (Mac Studio with M1 Ultra SoC). As a proof of concept for gigapixel deep learning, we identified tissue from background on gigapixel areas from whole slide images (WSIs). The model was a modified U-Net (4492 parameters) leveraging large kernels and high stride. The M1 Ultra SoC was able to train the model directly on gigapixel images (16000$\times$64000 pixels, 1.024 billion pixels) with a batch size of 1 using over 100 GB of unified memory for the process at an average speed of 1 minute and 21 seconds per batch with Tensorflow 2/Keras. As expected, the model converged with a high Dice score of 0.989 $\pm$ 0.005. Training up until this point took 111 hours and 24 minutes over 4940 steps. Other high RAM GPUs like the NVIDIA A100 (largest commercially accessible at 80 GB, $\sim$\$15000) are not yet widely available (in preview for select regions on Amazon Web Services at \$40.96/hour as a group of 8). This study is a promising step towards WSI-wise end-to-end deep learning with prevalent network architectures. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.10864 [pdf, other]

FreMIM: Fourier Transform Meets Masked Image Modeling for Medical Image Segmentation

Authors: Wenxuan Wang, **g Wang, Chen Chen, Jianbo Jiao, Yuanxiu Cai, Shanshan Song, Jiangyun Li

Abstract: The research community has witnessed the powerful potential of self-supervised Masked Image Modeling (MIM), which enables the models capable of learning visual representation from unlabeled data. In this paper, to incorporate both the crucial global structural information and local details for dense prediction tasks, we alter the perspective to the frequency domain and present a new MIM-based fram… ▽ More The research community has witnessed the powerful potential of self-supervised Masked Image Modeling (MIM), which enables the models capable of learning visual representation from unlabeled data. In this paper, to incorporate both the crucial global structural information and local details for dense prediction tasks, we alter the perspective to the frequency domain and present a new MIM-based framework named FreMIM for self-supervised pre-training to better accomplish medical image segmentation tasks. Based on the observations that the detailed structural information mainly lies in the high-frequency components and the high-level semantics are abundant in the low-frequency counterparts, we further incorporate multi-stage supervision to guide the representation learning during the pre-training phase. Extensive experiments on three benchmark datasets show the superior advantage of our FreMIM over previous state-of-the-art MIM methods. Compared with various baselines trained from scratch, our FreMIM could consistently bring considerable improvements to model performance. The code will be publicly available at https://github.com/Rubics-Xuan/FreMIM. △ Less

Submitted 30 November, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: Accepted by WACV 2024

arXiv:2304.10713 [pdf, other]

doi 10.1103/PhysRevApplied.20.L021004

Axial correlation revivals and number factorization with structured random waves

Authors: Xin Liu, Chunhao Liang, Yangjian Cai, Sergey A. Ponomarenko

Abstract: We advance a general theory of field correlation revivals of structured random wave packets, composed of superpositions of propagation-invariant modes, at pairs of planes transverse to the packet propagation direction. We derive an elegant analytical relation between the normalized intensity autocorrelation function of thus structured paraxial light fields at a pair of points on an optical axis of… ▽ More We advance a general theory of field correlation revivals of structured random wave packets, composed of superpositions of propagation-invariant modes, at pairs of planes transverse to the packet propagation direction. We derive an elegant analytical relation between the normalized intensity autocorrelation function of thus structured paraxial light fields at a pair of points on an optical axis of the system and a Gauss sum, thereby establishing a fundamental link between statistical optics and number theory. We propose and experimentally implement a simple, robust analog random wave computer that can efficiently decompose numbers into prime factors. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.09087 [pdf, other]

MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed

Authors: Xiaowen Shi, Ze Wang, Yuanying Cai, Xiaoxu Wu, Fan Yang, Guogang Liao, Yongkang Wang, Xingxing Wang, Dong Wang

Abstract: Nowadays, the mainstream approach in position allocation system is to utilize a reinforcement learning model to allocate appropriate locations for items in various channels and then mix them into the feed. There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data. Strategy data is collected from the current online m… ▽ More Nowadays, the mainstream approach in position allocation system is to utilize a reinforcement learning model to allocate appropriate locations for items in various channels and then mix them into the feed. There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data. Strategy data is collected from the current online model, it suffers from an imbalanced distribution of state-action pairs, resulting in severe overestimation problems during training. On the other hand, random data offers a more uniform distribution of state-action pairs, but is challenging to obtain in industrial scenarios as it could negatively impact platform revenue and user experience due to random exploration. As the two types of data have different distributions, designing an effective strategy to leverage both types of data to enhance the efficacy of the RL model training has become a highly challenging problem. In this study, we propose a framework named Multi-Distribution Data Learning (MDDL) to address the challenge of effectively utilizing both strategy and random data for training RL models on mixed multi-distribution data. Specifically, MDDL incorporates a novel imitation learning signal to mitigate overestimation problems in strategy data and maximizes the RL signal for random data to facilitate effective learning. In our experiments, we evaluated the proposed MDDL framework in a real-world position allocation system and demonstrated its superior performance compared to the previous baseline. MDDL has been fully deployed on the Meituan food delivery platform and currently serves over 300 million users. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 4 pages, 2 figures, accepted by SIGIR 2023

arXiv:2304.08103 [pdf, other]

Low-code LLM: Graphical User Interface over Large Language Models

Authors: Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

Abstract: Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphica… ▽ More Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphical user interface, users can incorporate their ideas into the process without writing trivial prompts. The proposed Low-code LLM framework consists of a Planning LLM that designs a structured planning workflow for complex tasks, which can be correspondingly edited and confirmed by users through low-code visual programming operations, and an Executing LLM that generates responses following the user-confirmed workflow. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability. We demonstrate its benefits using four typical applications. By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks. The code, prompts, and experimental details are available at https://github.com/moymix/TaskMatrix/tree/main/LowCodeLLM. A system demonstration video can be found at https://www.youtube.com/watch?v=jb2C1vaeO3E. △ Less

Submitted 1 April, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: Accepted as a Demo Track paper at NAACL 2024

arXiv:2304.07120 [pdf, ps, other]

Resource Allocation and Passive Beamforming for IRS-assisted URLLC Systems

Authors: Yangyi Zhang, Xinrong Guan, Qingqing Wu, Zhi Ji, Yueming Cai

Abstract: In this correspondence, we investigate an intelligent reflective surface (IRS) assisted downlink ultra-reliable and low-latency communication (URLLC) system, where an access point (AP) sends short packets to multiple devices with the help of an IRS. Specifically, a performance comparison between the frequency division multiple access (FDMA) and time division multiple access (TDMA) is conducted for… ▽ More In this correspondence, we investigate an intelligent reflective surface (IRS) assisted downlink ultra-reliable and low-latency communication (URLLC) system, where an access point (AP) sends short packets to multiple devices with the help of an IRS. Specifically, a performance comparison between the frequency division multiple access (FDMA) and time division multiple access (TDMA) is conducted for the considered system, from the perspective of average age of information (AoI). Aiming to minimize the maximum average AoI among all devices by jointly optimizing the resource allocation and passive beamforming. However, the formulated problem is difficult to solve due to the non-convex objective function and coupled variables. Thus, we propose an alternating optimization based algorithm by dividing the original problem into two sub-problems which can be efficiently solved. Simulation results show that TDMA can achieve lower AoI by exploiting the time-selective passive beamforming of IRS for maximizing the signal to noise ratio (SNR) of each device consecutively. Moreover, it also shows that as the length of information bits becomes sufficiently large as compared to the available bandwidth, the proposed FDMA transmission scheme becomes more favorable instead, due to the more effective utilization of bandwidth. △ Less

Submitted 16 April, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Comparison between IRS-assisted FDMA versus IRS-assisted TDMA for URLLC

arXiv:2304.06455 [pdf, other]

Incoherent mode division multiplexing for high-security information encryption

Authors: Xin Liu, Sergey A. Ponomarenko, Fei Wang, Yangjian Cai, Chunhao Liang

Abstract: In the age of information explosion, the conventional optical communication protocols are rapidly reaching the limits of their capacity, as almost all available degrees of freedom (e.g., wavelength, polarization) for division multiplexing have been explored to date. Recent advances in coherent mode division multiplexing have greatly facilitated high-speed optical communications and secure, high-ca… ▽ More In the age of information explosion, the conventional optical communication protocols are rapidly reaching the limits of their capacity, as almost all available degrees of freedom (e.g., wavelength, polarization) for division multiplexing have been explored to date. Recent advances in coherent mode division multiplexing have greatly facilitated high-speed optical communications and secure, high-capacity information storage and transfer. However, coherent mode division multiplexing is quite vulnerable to even minute environmental disturbances which can cause significant information loss. Here, we propose and experimentally demonstrate a paradigm shift to incoherent mode division multiplexing for high-security optical information encryption by harnessing the degree of coherence of structured random light beams. In contrast to the conventional techniques, our approach does not require mode orthogonality to circumnavigate unwanted mode crosstalk. In addition, our protocol has, in principle, no upper bound on its capacity. Thanks to the extreme robustness of structured random light to external perturbations, we are able to achieve highly accurate information encryption and decryption in the adverse environment. The proposed protocol opens new horizons in an array of fields, such as optical communications and cryptography, and it can be relevant for information processing with acoustical, matter as well as other types of waves. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 23 pages, 6 figures

arXiv:2304.06436 [pdf, other]

Hidden magnetism uncovered in charge ordered bilayer kagome material ScV_6Sn_6

Authors: Z. Guguchia, D. J. Gawryluk, Soohyeon Shin, Z. Hao, C. Mielke III, D. Das, I. Plokhikh, L. Liborio, K. Shenton, Y. Hu, V. Sazgari, M. Medarde, H. Deng, Y. Cai, C. Chen, Y. Jiang, A. Amato, M. Shi, M. Z. Hasan, J. -X. Yin, R. Khasanov, E. Pomjakushina, H. Luetkens

Abstract: Charge ordered kagome lattices have been demonstrated to be intriguing platforms for studying the intertwining of topology, correlation, and magnetism. The recently discovered charge ordered kagome material ScV_6Sn_6 does not feature a magnetic groundstate or excitations, thus it is often regarded as a conventional paramagnet. Here, using advanced muon-spin rotation spectroscopy, we uncover an une… ▽ More Charge ordered kagome lattices have been demonstrated to be intriguing platforms for studying the intertwining of topology, correlation, and magnetism. The recently discovered charge ordered kagome material ScV_6Sn_6 does not feature a magnetic groundstate or excitations, thus it is often regarded as a conventional paramagnet. Here, using advanced muon-spin rotation spectroscopy, we uncover an unexpected hidden magnetism of the charge order. We observe a striking enhancement of the internal field width sensed by the muon ensemble, which takes place within the charge ordered state. More remarkably, the muon spin relaxation rate below the charge ordering temperature is substantially enhanced by applying an external magnetic field. Taken together with the hidden magnetism found in AV_3Sb_5 (A = K, Rb, Cs) and FeGe kagome systems, our results suggest ubiqitous time-reversal symmetry-breaking in charge ordered kagome lattices. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 9 pages, 4 figures

arXiv:2304.05472 [pdf, other]

Light Sampling Field and BRDF Representation for Physically-based Neural Rendering

Authors: **g Yang, Hanyuan Xiao, Wenbin Teng, Yunxuan Cai, Yajie Zhao

Abstract: Physically-based rendering (PBR) is key for immersive rendering effects used widely in the industry to showcase detailed realistic scenes from computer graphics assets. A well-known caveat is that producing the same is computationally heavy and relies on complex capture devices. Inspired by the success in quality and efficiency of recent volumetric neural rendering, we want to develop a physically… ▽ More Physically-based rendering (PBR) is key for immersive rendering effects used widely in the industry to showcase detailed realistic scenes from computer graphics assets. A well-known caveat is that producing the same is computationally heavy and relies on complex capture devices. Inspired by the success in quality and efficiency of recent volumetric neural rendering, we want to develop a physically-based neural shader to eliminate device dependency and significantly boost performance. However, no existing lighting and material models in the current neural rendering approaches can accurately represent the comprehensive lighting models and BRDFs properties required by the PBR process. Thus, this paper proposes a novel lighting representation that models direct and indirect light locally through a light sampling strategy in a learned light sampling field. We also propose BRDF models to separately represent surface/subsurface scattering details to enable complex objects such as translucent material (i.e., skin, jade). We then implement our proposed representations with an end-to-end physically-based neural face skin shader, which takes a standard face asset (i.e., geometry, albedo map, and normal map) and an HDRI for illumination as inputs and generates a photo-realistic rendering as output. Extensive experiments showcase the quality and efficiency of our PBR face skin shader, indicating the effectiveness of our proposed lighting and material representations. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: ICLR 2023 Poster

arXiv:2304.04364 [pdf, other]

ITportrait: Image-Text Coupled 3D Portrait Domain Adaptation

Authors: Xiangwen Deng, Yufeng Wang, Yuanhao Cai, **gxiang Sun, Yebin Liu, Haoqian Wang

Abstract: Domain adaptation of 3D portraits has gained more and more attention. However, the transfer mechanism of existing methods is mainly based on vision or language, which ignores the potential of vision-language combined guidance. In this paper, we propose an Image-Text multi-modal framework, namely Image and Text portrait (ITportrait), for 3D portrait domain adaptation. ITportrait relies on a two-sta… ▽ More Domain adaptation of 3D portraits has gained more and more attention. However, the transfer mechanism of existing methods is mainly based on vision or language, which ignores the potential of vision-language combined guidance. In this paper, we propose an Image-Text multi-modal framework, namely Image and Text portrait (ITportrait), for 3D portrait domain adaptation. ITportrait relies on a two-stage alternating training strategy. In the first stage, we employ a 3D Artistic Paired Transfer (APT) method for image-guided style transfer. APT constructs paired photo-realistic portraits to obtain accurate artistic poses, which helps ITportrait to achieve high-quality 3D style transfer. In the second stage, we propose a 3D Image-Text Embedding (ITE) approach in the CLIP space. ITE uses a threshold function to self-adaptively control the optimization direction of images or texts in the CLIP space. Comprehensive experiments prove that our ITportrait achieves state-of-the-art (SOTA) results and benefits downstream tasks. All source codes and pre-trained models will be released to the public. △ Less

Submitted 10 December, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

arXiv:2304.03409 [pdf, other]

doi 10.1073/pnas.2208276120

Two superconducting states with broken time-reversal symmetry in FeSe1-xSx

Authors: K. Matsuura, M. Roppongi, M. Qiu, Q. Sheng, Y. Cai, K. Yamakawa, Z. Guguchia, R. P. Day, K. M. Kojima, A. Damascelli, Y. Sugimura, M. Saito, T. Takenaka, K. Ishihara, Y. Mizukami, K. Hashimoto, Y. Gu, S. Guo, L. Fu, Z. Zhang, F. Ning, G. Zhao, G. Dai, C. **, J. W. Beare , et al. (3 additional authors not shown)

Abstract: Iron-chalcogenide superconductors FeSe$_{1-x}$S$_x$ possess unique electronic properties such as non-magnetic nematic order and its quantum critical point. The nature of superconductivity with such nematicity is important for understanding the mechanism of unconventional superconductivity. A recent theory suggested the possible emergence of a fundamentally new class of superconductivity with the s… ▽ More Iron-chalcogenide superconductors FeSe$_{1-x}$S$_x$ possess unique electronic properties such as non-magnetic nematic order and its quantum critical point. The nature of superconductivity with such nematicity is important for understanding the mechanism of unconventional superconductivity. A recent theory suggested the possible emergence of a fundamentally new class of superconductivity with the so-called Bogoliubov Fermi surfaces (BFSs) in this system. However, such an {\em ultranodal} pair state requires broken time-reversal symmetry (TRS) in the superconducting state, which has not been observed experimentally. Here we report muon spin relaxation ($μ$SR) measurements in FeSe$_{1-x}$S$_x$ superconductors for $0\le x \le 0.22$ covering both orthorhombic (nematic) and tetragonal phases. We find that the zero-field muon relaxation rate is enhanced below the superconducting transition temperature $T_{\rm c}$ for all compositions, indicating that the superconducting state breaks TRS both in the nematic and tetragonal phases. Moreover, the transverse-field $μ$SR measurements reveal that the superfluid density shows an unexpected and substantial reduction in the tetragonal phase ($x>0.17$). This implies that a significant fraction of electrons remain unpaired in the zero-temperature limit, which cannot be explained by the known unconventional superconducting states with point or line nodes. The time-reversal symmetry breaking and the suppressed superfluid density in the tetragonal phase, together with the reported enhanced zero-energy excitations, are consistent with the ultranodal pair state with BFSs. The present results reveal two different superconducting states with broken TRS separated by the nematic critical point in FeSe$_{1-x}$S$_x$, which calls for the theory of microscopic origins that account for the relation between the nematicity and superconductivity. △ Less

Submitted 12 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 8 pages, 4 figures, typos corrected. Accepted for publication in PNAS

Journal ref: Proc. Natl. Acad. Sci. USA 120, e2208276120 (2023)

arXiv:2304.02836 [pdf, other]

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Authors: Thomas Z. Li, John M. Still, Kaiwen Xu, Ho Hin Lee, Leon Y. Cai, Aravind R. Krishnan, Riqiang Gao, Mirza S. Khan, Sanja Antic, Michael Kammer, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman, Thomas A. Lasko

Abstract: The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learni… ▽ More The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures. △ Less

Submitted 29 June, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: Accepted to MICCAI 2023

arXiv:2304.02389 [pdf, other]

DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images

Authors: Bo Qian, Hao Chen, Xiangning Wang, Haoxuan Che, Gitaek Kwon, Jaeyoung Kim, Sung** Choi, Seoyoung Shin, Felix Krause, Markus Unterdechler, Junlin Hou, Rui Feng, Yihao Li, Mostafa El Habib Daho, Qiang Wu, ** Zhang, Xiaokang Yang, Yiyu Cai, Wei** Jia, Huating Li, Bin Sheng

Abstract: Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s… ▽ More Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and scientific benchmarking for diabetic retinopathy analysis using UW-OCTA images, we organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams from geographically diverse institutes submitting different solutions in these three tasks, respectively. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge. The obtained results from top algorithms indicate the importance of data augmentation, model architecture and ensemble of networks in improving the performance of deep learning models. These findings have the potential to enable new developments in diabetic retinopathy analysis. The challenge remains open for post-challenge registrations and submissions for benchmarking future methodology developments. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.18050 [pdf, other]

doi 10.1093/mnras/stad994

Minute-Cadence Observations of the LAMOST Fields with the TMTS: II. Catalogues of Short-Period Variable Stars from the First Two-Year Surveys

Authors: Jie Lin, Xiaofeng Wang, Jun Mo, Gaobo Xi, Alexei V. Filippenko, Shengyu Yan, Thomas G. Brink, Yi Yang, Chengyuan Wu, Péter Németh, Gaici Li, Fangzhou Guo, **cheng Guo, Yongzhi Cai, Heran Xiong, WeiKang Zheng, Qichun Liu, Jicheng Zhang, Xiaojun Jiang, Liyang Chen, Qiqi Xia, Haowei Peng, Zhihao Chen, Wenxiong Li, Weili Lin , et al. (3 additional authors not shown)

Abstract: Over the past few years, wide-field time-domain surveys like ZTF and OGLE have led to discoveries of various types of interesting short-period stellar variables, such as ultracompact eclipsing binary white dwarfs, rapidly rotating magnetised white dwarfs (WDs), transitional cataclysmic variables between hydrogen-rich and helium accretion, and blue large-amplitude pulsators (BLAPs), which greatly e… ▽ More Over the past few years, wide-field time-domain surveys like ZTF and OGLE have led to discoveries of various types of interesting short-period stellar variables, such as ultracompact eclipsing binary white dwarfs, rapidly rotating magnetised white dwarfs (WDs), transitional cataclysmic variables between hydrogen-rich and helium accretion, and blue large-amplitude pulsators (BLAPs), which greatly enrich our understandings of stellar physics under some extreme conditions. In this paper, we report the first-two-year discoveries of short-period variables (i.e., P<2 hr) by the Tsinghua University-Ma Huateng Telescopes for Survey (TMTS). TMTS is a multi-tube telescope system with a field of view up to 18 deg^2, which started to monitor the LAMOST sky areas since 2020 and generated uninterrupted minute-cadence light curves for about ten million sources within 2 years. Adopting the Lomb-Scargle periodogram with period-dependent thresholds for the maximum powers, we identify over 1 100 sources that exhibit a variation period shorter than 2 hr. Compiling the light curves with the Gaia magnitudes and colours, LAMOST spectral parameters, VSX classifications, and archived observations from other prevailing time-domain survey missions, we identified 1 076 as delta Scuti stars, which allows us study their populations and physical properties in the short-period regime. The other 31 sources include BLAPs, subdwarf B variables (sdBVs), pulsating WDs, ultracompact/short-period eclipsing/ellipsoidal binaries, cataclysmic variables below the period gap, etc., which are highly interesting and worthy of follow-up investigations. △ Less

Submitted 3 April, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: 22 pages, 15 figures, 5 tables, accepted by MNRAS

arXiv:2303.16925 [pdf, other]

doi 10.1051/0004-6361/202346526

The broad-lined Type-Ic supernova SN 2022xxf with extraordinary two-humped light curves

Authors: H. Kuncarayakti, J. Sollerman, L. Izzo, K. Maeda, S. Yang, S. Schulze, C. R. Angus, M. Aubert, K. Auchettl, M. Della Valle, L. Dessart, K. Hinds, E. Kankare, M. Kawabata, P. Lundqvist, T. Nakaoka, D. Perley, S. I. Raimundo, N. L. Strotjohann, K. Taguchi, Y. -Z. Cai, P. Charalampopoulos, Q. Fang, M. Fraser, C. P. Gutierrez , et al. (38 additional authors not shown)

Abstract: We report on our study of supernova (SN) 2022xxf based on observations obtained during the first four months of its evolution. The light curves (LCs) display two humps of similar maximum brightness separated by 75 days, unprecedented for a broad-lined (BL) Type Ic supernova (SN IcBL). SN 2022xxf is the most nearby SN IcBL to date (in NGC 3705, $z = 0.0037$, at a distance of about 20 Mpc). Optical… ▽ More We report on our study of supernova (SN) 2022xxf based on observations obtained during the first four months of its evolution. The light curves (LCs) display two humps of similar maximum brightness separated by 75 days, unprecedented for a broad-lined (BL) Type Ic supernova (SN IcBL). SN 2022xxf is the most nearby SN IcBL to date (in NGC 3705, $z = 0.0037$, at a distance of about 20 Mpc). Optical and near-infrared photometry and spectroscopy are used to identify the energy source powering the LC. Nearly 50 epochs of high signal-to-noise-ratio spectroscopy were obtained within 130 days, comprising an unparalleled dataset for a SN IcBL, and one of the best-sampled SN datasets to date. The global spectral appearance and evolution of SN 2022xxf points to typical SN Ic/IcBL, with broad features (up to $\sim14000$ km s$^{-1}$) and a gradual transition from the photospheric to the nebular phase. However, narrow emission lines (corresponding to $\sim1000-2500$ km s$^{-1}$) are present in the spectra from the time of the second rise, suggesting slower-moving circumstellar material (CSM). These lines are subtle, in comparison to the typical strong narrow lines of CSM-interacting SNe, for example, Type IIn, Ibn, and Icn, but some are readily noticeable at late times such as in Mg I $λ$5170 and [O I] $λ$5577. Unusually, the near-infrared spectra show narrow line peaks in a number of features formed by ions of O and Mg. We infer the presence of CSM that is free of H and He. We propose that the radiative energy from the ejecta-CSM interaction is a plausible explanation for the second LC hump. This interaction scenario is supported by the color evolution, which progresses to the blue as the light curve evolves along the second hump, and the slow second rise and subsequent rapid LC drop. (Abstract abridged) △ Less

Submitted 14 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: Accepted version

Journal ref: A&A 678, A209 (2023)

arXiv:2303.16709 [pdf]

Genuine full characterization of partially coherence beam

Authors: Xingyuan Lu, Zhuoyi Wang, Qiwen Zhan, Yangjian Cai, Chengliang Zhao

Abstract: For partially coherent light fields with random fluctuations, the intensity distributions and statistics have been proven to be more propagation robust compared with coherent light. However, its full potential in practical applications has not been realized due to the lack of four-dimensional optical field measurement. Here, a general modal decomposition method of partially coherent light field is… ▽ More For partially coherent light fields with random fluctuations, the intensity distributions and statistics have been proven to be more propagation robust compared with coherent light. However, its full potential in practical applications has not been realized due to the lack of four-dimensional optical field measurement. Here, a general modal decomposition method of partially coherent light field is proposed and demonstrated. The decomposed random modes can be used to, but not limited to, reconstruct average intensity, cross spectral density and orthogonal decomposition properties of the partially coherent light fields. Due to its versatility and flexibility, this method provides a powerful tool to further reveal light field invariant or retrieve embedded information after propagation through complex media. The Gaussian-shell-model beam and partially coherent Gaussian array are used as examples to demonstrate the reconstruction and even prediction of second-order statistical characteristics. This method is expected to pave the way for applications of partially coherent light in optical imaging, optical encryption and anti-turblence optical communication. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2303.16376 [pdf, other]

A Unified Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo

Abstract: Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture… ▽ More Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relies on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences. △ Less

Submitted 29 January, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.14423 [pdf, other]

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

Authors: Yuliang Cai, Jesse Thomason, Mohammad Rostami

Abstract: The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks which relaxes the need to fine-tune all network weights from scratch. However, existing CL algorithms primari… ▽ More The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks which relaxes the need to fine-tune all network weights from scratch. However, existing CL algorithms primarily consider learning unimodal vision-only or language-only tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks based on increasing the number of the learnable parameters dynamically and using knowledge distillation. The new additional parameters are used to specialize the network for each task. Our approach enables sharing information between the tasks while addressing the challenge of catastrophic forgetting. Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead. Our model reaches state-of-the-art performance on challenging vision-and-language tasks. △ Less

Submitted 25 March, 2023; originally announced March 2023.

arXiv:2303.09391 [pdf, other]

Rapidly growing primordial black holes as seeds of the massive high-redshift JWST Galaxies

Authors: Guan-Wen Yuan, Lei Lei, Yuan-Zhu Wang, Bo Wang, Yi-Ying Wang, Chao Chen, Zhao-Qiang Shen, Yi-Fu Cai, Yi-Zhong Fan

Abstract: A group of massive galaxies at redshifts of $z\gtrsim 7$ have been recently detected by the James Webb Space Telescope (JWST), which were unexpected to form so early within the framework of standard Big Bang cosmology. In this work, we propose that this puzzle can be explained by the presence of some primordial black holes (PBHs) with a mass of $\sim 1000 M_\odot$. These PBHs, clothed in dark matt… ▽ More A group of massive galaxies at redshifts of $z\gtrsim 7$ have been recently detected by the James Webb Space Telescope (JWST), which were unexpected to form so early within the framework of standard Big Bang cosmology. In this work, we propose that this puzzle can be explained by the presence of some primordial black holes (PBHs) with a mass of $\sim 1000 M_\odot$. These PBHs, clothed in dark matter halo and undergoing super-Eddington accretion, serve as seeds for the early galaxy formation with masses of $\sim 10^{8}-10^{10}~M_\odot$ at high redshift, thus accounting for the JWST observations. Using a hierarchical Bayesian inference framework to constrain the PBH mass distribution models, we find that the Lognormal model with $M_{\rm c}\sim 750M_\odot$ is preferred over other hypotheses. These rapidly growing BHs are expected to emit strong radiation and may appear as high-redshift compact objects, similar to those recently discovered by JWST. Although we focuse on PBHs in this work, the bound on the initial mass of the seed black holes remains robust even if they were formed through astrophysical channels. △ Less

Submitted 18 June, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted by Science China Physics, Mechanics & Astronomy

arXiv:2303.07977 [pdf]

Direct generation of time-energy-entangled W triphotons in atomic vapor

Authors: Kangkang Li, Jianming Wen, Yin Cai, Saeid Vashahri Ghamsari, Changbiao Li, Feng Li, Zhaoyang Zhang, Yanpeng Zhang, Min Xiao

Abstract: Sources of entangled multiphotons are not only essential for fundamental tests of quantum foundations, but are also the cornerstone of a variety of optical quantum technologies today. Over past three decades, tremendous efforts have been devoted to creating multiphoton entanglement by multiplexing existing biphoton sources with linear optics and postselections. Different from all previous protocol… ▽ More Sources of entangled multiphotons are not only essential for fundamental tests of quantum foundations, but are also the cornerstone of a variety of optical quantum technologies today. Over past three decades, tremendous efforts have been devoted to creating multiphoton entanglement by multiplexing existing biphoton sources with linear optics and postselections. Different from all previous protocols, here we report, for the first time, the observation of continuous-mode time-energy-entangled W-class triphotons with an unprecedented generation rate directly through the process of spontaneous six-wave mixing (SSWM) in a four-level triple-Lambda atomic vapor cell. Facilitated by electromagnetically induced transparency and coherence control, our SSWM scheme enables versatile narrowband triphoton generation with many intriguing properties including long temporal coherence and controllable waveforms, ideal for implementing long-distance quantum communications, networking, and information processing by interfacing photons and atoms. Most importantly, our work paves a way for the development of a reliable and efficient genuine triphoton source, thus making the research on multiphoton entanglement within easy reach. △ Less

Submitted 30 April, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: updated version

arXiv:2303.07709 [pdf, other]

3D Face Arbitrary Style Transfer

Authors: Xiangwen Deng, Yingshuang Zou, Yuanhao Cai, Chendong Zhao, Yang Liu, Zhifang Liu, Yuxiao Liu, Jiawei Zhou, Haoqian Wang

Abstract: Style transfer of 3D faces has gained more and more attention. However, previous methods mainly use images of artistic faces for style transfer while ignoring arbitrary style images such as abstract paintings. To solve this problem, we propose a novel method, namely Face-guided Dual Style Transfer (FDST). To begin with, FDST employs a 3D decoupling module to separate facial geometry and texture. T… ▽ More Style transfer of 3D faces has gained more and more attention. However, previous methods mainly use images of artistic faces for style transfer while ignoring arbitrary style images such as abstract paintings. To solve this problem, we propose a novel method, namely Face-guided Dual Style Transfer (FDST). To begin with, FDST employs a 3D decoupling module to separate facial geometry and texture. Then we propose a style fusion strategy for facial geometry. Subsequently, we design an optimization-based DDSG mechanism for textures that can guide the style transfer by two style images. Besides the normal style image input, DDSG can utilize the original face input as another style input as the face prior. By this means, high-quality face arbitrary style transfer results can be obtained. Furthermore, FDST can be applied in many downstream tasks, including region-controllable style transfer, high-fidelity face texture reconstruction, large-pose face reconstruction, and artistic face reconstruction. Comprehensive quantitative and qualitative results show that our method can achieve comparable performance. All source codes and pre-trained weights will be released to the public. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2303.07276 [pdf, other]

Optimization of Cryptocurrency Miners' Participation in Ancillary Service Markets

Authors: Ali Menati, Yuting Cai, Rayan El Helou, Chao Tian, Le Xie

Abstract: Proof-of-work computation used in cryptocurrencies has witnessed significant growth in the U.S. and many other regions around the world. One of the most significant bottlenecks for the scalable deployment of such computation is its energy demand. On the other hand, the electric energy system is increasing the need for flexibility for energy balancing and ancillary services due to the intermittent… ▽ More Proof-of-work computation used in cryptocurrencies has witnessed significant growth in the U.S. and many other regions around the world. One of the most significant bottlenecks for the scalable deployment of such computation is its energy demand. On the other hand, the electric energy system is increasing the need for flexibility for energy balancing and ancillary services due to the intermittent nature of many new energy resources such as wind and solar. In this work, we model the operation of a cryptomining facility with heterogeneous mining devices participating in ancillary services. We propose a general formulation for the cryptominers to maximize their profit by strategically participating in ancillary services and controlling the loss of mining revenue, which requires taking into account the disparity in the efficiency of the mining machines. The optimization formulation is considered for both offline and online scenarios, and optimal algorithms are proposed to solve these problems. As a special case of our problem, we investigate cryptominers' participation in frequency regulation, where the miners benefit from their fast-responding devices and contribute to grid stability. In the second special setting, a risk-aware algorithm is proposed to jointly minimize the cost and the risk of participating in ancillary services with homogeneous mining devices. Simulation results based on real-world Electric Reliability Council of Texas (ERCOT) traces show more than 20\% gain in profit, highlighting the advantage of our proposed algorithms. △ Less

Submitted 13 March, 2023; originally announced March 2023.

arXiv:2303.06705 [pdf, other]

Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement

Authors: Yuanhao Cai, Hao Bian, **g Lin, Haoqian Wang, Radu Timofte, Yulun Zhang

Abstract: When enhancing low-light images, many deep learning algorithms are based on the Retinex theory. However, the Retinex model does not consider the corruptions hidden in the dark or introduced by the light-up process. Besides, these methods usually require a tedious multi-stage training pipeline and rely on convolutional neural networks, showing limitations in capturing long-range dependencies. In th… ▽ More When enhancing low-light images, many deep learning algorithms are based on the Retinex theory. However, the Retinex model does not consider the corruptions hidden in the dark or introduced by the light-up process. Besides, these methods usually require a tedious multi-stage training pipeline and rely on convolutional neural networks, showing limitations in capturing long-range dependencies. In this paper, we formulate a simple yet principled One-stage Retinex-based Framework (ORF). ORF first estimates the illumination information to light up the low-light image and then restores the corruption to produce the enhanced image. We design an Illumination-Guided Transformer (IGT) that utilizes illumination representations to direct the modeling of non-local interactions of regions with different lighting conditions. By plugging IGT into ORF, we obtain our algorithm, Retinexformer. Comprehensive quantitative and qualitative experiments demonstrate that our Retinexformer significantly outperforms state-of-the-art methods on thirteen benchmarks. The user study and application on low-light object detection also reveal the latent practical values of our method. Code, models, and results are available at https://github.com/caiyuanhao1998/Retinexformer △ Less

Submitted 26 October, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: ICCV 2023; The first Transformer-based method for low-light image enhancement

arXiv:2303.05785 [pdf, other]

Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation

Authors: Ho Hin Lee, Quan Liu, Shunxing Bao, Qi Yang, Xin Yu, Leon Y. Cai, Thomas Li, Yuankai Huo, Xenofon Koutsoukos, Bennett A. Landman

Abstract: With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize tha… ▽ More With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize that convolution with LK sizes is limited to maintain an optimal convergence for locality learning. While Structural Re-parameterization (SR) enhances the local convergence with small kernels in parallel, optimal small kernel branches may hinder the computational efficiency for training. In this work, we propose RepUX-Net, a pure CNN architecture with a simple large kernel block design, which competes favorably with current network state-of-the-art (SOTA) (e.g., 3D UX-Net, SwinUNETR) using 6 challenging public datasets. We derive an equivalency between kernel re-parameterization and the branch-wise variation in kernel convergence. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting and model the spatial frequency as a Bayesian prior to re-parameterize convolutional weights during training. Specifically, a reciprocal function is leveraged to estimate a frequency-weighted value, which rescales the corresponding kernel element for stochastic gradient descent. From the experimental results, RepUX-Net consistently outperforms 3D SOTA benchmarks with internal validation (FLARE: 0.929 to 0.944), external validation (MSD: 0.901 to 0.932, KiTS: 0.815 to 0.847, LiTS: 0.933 to 0.949, TCIA: 0.736 to 0.779) and transfer learning (AMOS: 0.880 to 0.911) scenarios in Dice Score. △ Less

Submitted 5 June, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted to MICCAI 2023 (top 13.6%), both codes and pretrained models are available at: https://github.com/MASILab/RepUX-Net

arXiv:2303.05172 [pdf, other]

doi 10.1016/j.nima.2023.168680

The JUNO experiment Top Tracker

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Muhammad Akram, Abid Aleem, Tsagkarakis Alexandros, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, Burin Asavapibhop, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato , et al. (592 additional authors not shown)

Abstract: The main task of the Top Tracker detector of the neutrino reactor experiment Jiangmen Underground Neutrino Observatory (JUNO) is to reconstruct and extrapolate atmospheric muon tracks down to the central detector. This muon tracker will help to evaluate the contribution of the cosmogenic background to the signal. The Top Tracker is located above JUNO's water Cherenkov Detector and Central Detector… ▽ More The main task of the Top Tracker detector of the neutrino reactor experiment Jiangmen Underground Neutrino Observatory (JUNO) is to reconstruct and extrapolate atmospheric muon tracks down to the central detector. This muon tracker will help to evaluate the contribution of the cosmogenic background to the signal. The Top Tracker is located above JUNO's water Cherenkov Detector and Central Detector, covering about 60% of the surface above them. The JUNO Top Tracker is constituted by the decommissioned OPERA experiment Target Tracker modules. The technology used consists in walls of two planes of plastic scintillator strips, one per transverse direction. Wavelength shifting fibres collect the light signal emitted by the scintillator strips and guide it to both ends where it is read by multianode photomultiplier tubes. Compared to the OPERA Target Tracker, the JUNO Top Tracker uses new electronics able to cope with the high rate produced by the high rock radioactivity compared to the one in Gran Sasso underground laboratory. This paper will present the new electronics and mechanical structure developed for the Top Tracker of JUNO along with its expected performance based on the current detector simulation. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: 20 pages

Journal ref: Nucl.Instrum.Meth.A 1057 (2023) 168680

arXiv:2303.04978 [pdf, ps, other]

The push-forwards and pull-backs of $δ$-forms and applications to non-archimedean Arakelov geometry

Authors: Yulin Cai

Abstract: We study two kinds of push-forwards of $δ$-forms and define the pull-backs of $δ$-forms. As a generalization of Gubler-Künnemann, we prove the projection formula and the tropical Poincaré-Lelong formula. As an application, we follow the idea of Gubler-Künnemann and generalize the notion of $δ$-forms on algebraic varieties, this allows us to define the first Chern forms for any piecewise smooth met… ▽ More We study two kinds of push-forwards of $δ$-forms and define the pull-backs of $δ$-forms. As a generalization of Gubler-Künnemann, we prove the projection formula and the tropical Poincaré-Lelong formula. As an application, we follow the idea of Gubler-Künnemann and generalize the notion of $δ$-forms on algebraic varieties, this allows us to define the first Chern forms for any piecewise smooth metrics. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.03910 [pdf, other]

JUNO sensitivity to $^7$Be, $pep$, and CNO solar neutrinos

Authors: Angel Abusleme, Thomas Adam, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Muhammad Akram, Abid Aleem, Tsagkarakis Alexandros, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, Burin Asavapibhop, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta , et al. (592 additional authors not shown)

Abstract: The Jiangmen Underground Neutrino Observatory (JUNO), the first multi-kton liquid scintillator detector, which is under construction in China, will have a unique potential to perform a real-time measurement of solar neutrinos well below the few MeV threshold typical for Water Cherenkov detectors. JUNO's large target mass and excellent energy resolution are prerequisites for reaching unprecedented… ▽ More The Jiangmen Underground Neutrino Observatory (JUNO), the first multi-kton liquid scintillator detector, which is under construction in China, will have a unique potential to perform a real-time measurement of solar neutrinos well below the few MeV threshold typical for Water Cherenkov detectors. JUNO's large target mass and excellent energy resolution are prerequisites for reaching unprecedented levels of precision. In this paper, we provide estimation of the JUNO sensitivity to 7Be, pep, and CNO solar neutrinos that can be obtained via a spectral analysis above the 0.45 MeV threshold. This study is performed assuming different scenarios of the liquid scintillator radiopurity, ranging from the most opti mistic one corresponding to the radiopurity levels obtained by the Borexino experiment, up to the minimum requirements needed to perform the neutrino mass ordering determination with reactor antineutrinos - the main goal of JUNO. Our study shows that in most scenarios, JUNO will be able to improve the current best measurements on 7Be, pep, and CNO solar neutrino fluxes. We also perform a study on the JUNO capability to detect periodical time variations in the solar neutrino flux, such as the day-night modulation induced by neutrino flavor regeneration in Earth, and the modulations induced by temperature changes driven by helioseismic waves. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.02738 [pdf, other]

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

Authors: Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

Abstract: We revisit the problem of learning in two-player zero-sum Markov games, focusing on develo** an algorithm that is uncoupled, convergent, and rational, with non-asymptotic convergence rates. We start from the case of stateless matrix game with bandit feedback as a warm-up, showing an $O(t^{-\frac{1}{8}})$ last-iterate convergence rate. To the best of our knowledge, this is the first result that o… ▽ More We revisit the problem of learning in two-player zero-sum Markov games, focusing on develo** an algorithm that is uncoupled, convergent, and rational, with non-asymptotic convergence rates. We start from the case of stateless matrix game with bandit feedback as a warm-up, showing an $O(t^{-\frac{1}{8}})$ last-iterate convergence rate. To the best of our knowledge, this is the first result that obtains finite last-iterate convergence rate given access to only bandit feedback. We extend our result to the case of irreducible Markov games, providing a last-iterate convergence rate of $O(t^{-\frac{1}{9+\varepsilon}})$ for any $\varepsilon>0$. Finally, we study Markov games without any assumptions on the dynamics, and show a path convergence rate, which is a new notion of convergence we defined, of $O(t^{-\frac{1}{10}})$. Our algorithm removes the coordination and prior knowledge requirement of [Wei et al., 2021], which pursued the same goals as us for irreducible Markov games. Our algorithm is related to [Chen et al., 2021, Cen et al., 2021] and also builds on the entropy regularization technique. However, we remove their requirement of communications on the entropy values, making our algorithm entirely uncoupled. △ Less

Submitted 8 November, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

Comments: To appear at NeurIPS 2023

arXiv:2303.01668 [pdf, other]

RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

Authors: Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang

Abstract: Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory. RePreM is simple but effective compared to existing representation… ▽ More Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory. RePreM is simple but effective compared to existing representation pre-training methods in RL. It avoids algorithmic sophistication (such as data augmentation or estimating multiple models) with sequence modeling and generates a representation that captures long-term dynamics well. Empirically, we demonstrate the effectiveness of RePreM in various tasks, including dynamic prediction, transfer learning, and sample-efficient RL with both value-based and actor-critic methods. Moreover, we show that RePreM scales well with dataset size, dataset quality, and the scale of the encoder, which indicates its potential towards big RL models. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted by AAAI-23

arXiv:2303.01503 [pdf, other]

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

Authors: Rongyao Fang, Peng Gao, Aojun Zhou, Yingjie Cai, Si Liu, Jifeng Dai, Hongsheng Li

Abstract: One-to-one matching is a crucial design in DETR-like object detection frameworks. It enables the DETR to perform end-to-end detection. However, it also faces challenges of lacking positive sample supervision and slow convergence speed. Several recent works proposed the one-to-many matching mechanism to accelerate training and boost detection performance. We revisit these methods and model them in… ▽ More One-to-one matching is a crucial design in DETR-like object detection frameworks. It enables the DETR to perform end-to-end detection. However, it also faces challenges of lacking positive sample supervision and slow convergence speed. Several recent works proposed the one-to-many matching mechanism to accelerate training and boost detection performance. We revisit these methods and model them in a unified format of augmenting the object queries. In this paper, we propose two methods that realize one-to-many matching from a different perspective of augmenting images or image features. The first method is One-to-many Matching via Data Augmentation (denoted as DataAug-DETR). It spatially transforms the images and includes multiple augmented versions of each image in the same training batch. Such a simple augmentation strategy already achieves one-to-many matching and surprisingly improves DETR's performance. The second method is One-to-many matching via Feature Augmentation (denoted as FeatAug-DETR). Unlike DataAug-DETR, it augments the image features instead of the original images and includes multiple augmented features in the same batch to realize one-to-many matching. FeatAug-DETR significantly accelerates DETR training and boosts detection performance while kee** the inference speed unchanged. We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and H-Deformable-DETR. Without extra training data, FeatAug-DETR shortens the training convergence periods of Deformable-DETR to 24 epochs and achieves 58.3 AP on COCO val2017 set with Swin-L as the backbone. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 12 pages, 6 figures

arXiv:2303.00288

The Race of mRNA therapy: Evidence from Patent Landscape

Authors: Jianxiong Ren, Xiaoming Zhang, Xingyong Si, Xiangjun Kong, **yu Cong, **** Wang, Xiang Li, Qianru Zhang, Peifen Yao, Mengyao Li, Yuanqi Cai, Zhaocai Sun, Kunmeng Liu, Benzheng Wei

Abstract: mRNA therapy is gaining worldwide attention as an emerging therapeutic approach. The widespread use of mRNA vaccines during the COVID-19 outbreak has demonstrated the potential of mRNA therapy. As mRNA-based drugs have expanded and their indications have broadened, more patents for mRNA innovations have emerged. The global patent landscape for mRNA therapy has not yet been analyzed, indicating a r… ▽ More mRNA therapy is gaining worldwide attention as an emerging therapeutic approach. The widespread use of mRNA vaccines during the COVID-19 outbreak has demonstrated the potential of mRNA therapy. As mRNA-based drugs have expanded and their indications have broadened, more patents for mRNA innovations have emerged. The global patent landscape for mRNA therapy has not yet been analyzed, indicating a research gap in need of filling, from new technology to productization. This study uses social network analysis with the patent quality assessment to investigate the temporal trends, citation relationship, and significant litigation for 16,101 mRNA therapy patents and summarizes the hot topics and potential future directions for this industry. The information obtained in this study not only may be utilized as a tool of knowledge for researchers in a comprehensive and integrated way but can also provide inspiration for efficient production methods for mRNA drugs. This study shows that infectious diseases and cancer are currently the primary applications for mRNA drugs. Emerging patent activity and lawsuits in this field are demonstrating that delivery technology remains one of the key challenges in the field and that drug-targeting research in combination with vector technology will be one of the major directions for the industry going forward. With significant funding, new organizations have developed novel delivery technologies in an attempt to break into the patent thicket established by companies such as Arbutus. The global mRNA therapeutic landscape is undergoing a multifaceted development pattern, and the monopoly of giant companies is being challenged. △ Less

Submitted 15 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: I have received requests from co-authors and funding agencies to withdraw the manuscript

arXiv:2302.14819 [pdf, other]

ROG-Map: An Efficient Robocentric Occupancy Grid Map for Large-scene and High-resolution LiDAR-based Motion Planning

Authors: Yunfan Ren, Yixi Cai, Fangcheng Zhu, Siqi Liang, Fu Zhang

Abstract: Recent advances in LiDAR technology have opened up new possibilities for robotic navigation. Given the widespread use of occupancy grid maps (OGMs) in robotic motion planning, this paper aims to address the challenges of integrating LiDAR with OGMs. To this end, we propose ROG-Map, a uniform grid-based OGM that maintains a local map moving along with the robot to enable efficient map operation and… ▽ More Recent advances in LiDAR technology have opened up new possibilities for robotic navigation. Given the widespread use of occupancy grid maps (OGMs) in robotic motion planning, this paper aims to address the challenges of integrating LiDAR with OGMs. To this end, we propose ROG-Map, a uniform grid-based OGM that maintains a local map moving along with the robot to enable efficient map operation and reduce memory costs for large-scene autonomous flight. Moreover, we present a novel incremental obstacle inflation method that significantly reduces the computational cost of inflation. The proposed method outperforms state-of-the-art (SOTA) methods on various public datasets. To demonstrate the effectiveness and efficiency of ROG-Map, we integrate it into a complete quadrotor system and perform autonomous flights against both small obstacles and large-scale scenes. During real-world flight tests with a 0.05 m resolution local map and 30mx30mx12m local map size, ROG-Map takes only 29.8% of frame time on average to update the map at a frame rate of 50 Hz (\ie, 5.96 ms in 20 ms), including 0.33% (i.e., 0.66 ms) to perform obstacle inflation, demonstrating outstanding real-world performance. We release ROG-Map as an open-source ROS package to promote the development of LiDAR-based motion planning. △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.13477 [pdf, other]

Adaptive CSI Feedback for Deep Learning-Enabled Image Transmission

Authors: Guangyi Zhang, Qiyu Hu, Yunlong Cai, Guanding Yu

Abstract: Recently, deep learning-enabled joint-source channel coding (JSCC) has received increasing attention due to its great success in image transmission. However, most existing JSCC studies only focus on single-input single-output (SISO) channels. In this paper, we first propose a JSCC system for wireless image transmission over multiple-input multiple-output (MIMO) channels. As the complexity of an im… ▽ More Recently, deep learning-enabled joint-source channel coding (JSCC) has received increasing attention due to its great success in image transmission. However, most existing JSCC studies only focus on single-input single-output (SISO) channels. In this paper, we first propose a JSCC system for wireless image transmission over multiple-input multiple-output (MIMO) channels. As the complexity of an image determines its reconstruction difficulty, the JSCC achieves quite different reconstruction performances on different images. Moreover, we observe that the images with higher reconstruction qualities are generally more robust to the noise, and can be allocated with less communication resources than the images with lower reconstruction qualities. Based on this observation, we propose an adaptive channel state information (CSI) feedback scheme for precoding, which improves the effectiveness by adjusting the feedback overhead. In particular, we develop a performance evaluator to predict the reconstruction quality of each image, so that the proposed scheme can adaptively decrease the CSI feedback overhead for the transmitted images with high predicted reconstruction qualities in the JSCC system. We perform experiments to demonstrate that the proposed scheme can significantly improve the image transmission performance with much-reduced feedback overhead. △ Less

Submitted 26 February, 2023; originally announced February 2023.

arXiv:2302.12046 [pdf, ps, other]

Observation of Q-switched and continuous wave regimes with mode-hop** in Er-doped fiber lasers incorporating a dynamic population grating

Authors: Zengrun Wen, Xiulin Fan, Kaile Wang, Weiming Wang, Song Gao, Wen**g Hao, Yuanmei Gao, Yangjian Cai, Liren Zheng

Abstract: Dynamic population gratings (DPGs) in rare-earth doped fibers are prevalent devices in fiber lasers for the production of single-longitudinal-mode emission, Q-switched pulses, and wavelength self-swee** regimes. This study presents a transition from Q-switched state to continuous wave (CW) state, accompanying irregular mode-hop**, in an erbium-doped fiber laser with a heavily-doped DPG centere… ▽ More Dynamic population gratings (DPGs) in rare-earth doped fibers are prevalent devices in fiber lasers for the production of single-longitudinal-mode emission, Q-switched pulses, and wavelength self-swee** regimes. This study presents a transition from Q-switched state to continuous wave (CW) state, accompanying irregular mode-hop**, in an erbium-doped fiber laser with a heavily-doped DPG centered at 1549.95 nm. Our results demonstrate that the transition between these two states can be achieved by adjusting the pump power. The repetition frequency of the Q-switched pulse increases monotonically with the increasing pump power, while the pulse duration initially narrows and then expands because the reduced peak intensity weakens the nonlinear effect. Additionally, modulation peaks are evident on both the Q-switched pulse train and the CW background, which are induced by the irregular mode-hop** caused by the DPG. Furthermore, we observe that the central wavelength fluctuates within a range of 0.05 nm. These results provide valuable insight into the DPG effect in heavily-doped fibers. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.09221 [pdf, other]

Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the Edge

Authors: **gzong Li, Yik Hong Cai, Libin Liu, Yu Mao, Chun Jason Xue, Hong Xu

Abstract: 3D object detection plays a pivotal role in many applications, most notably autonomous driving and robotics. These applications are commonly deployed on edge devices to promptly interact with the environment, and often require near real-time response. With limited computation power, it is challenging to execute 3D detection on the edge using highly complex neural networks. Common approaches such a… ▽ More 3D object detection plays a pivotal role in many applications, most notably autonomous driving and robotics. These applications are commonly deployed on edge devices to promptly interact with the environment, and often require near real-time response. With limited computation power, it is challenging to execute 3D detection on the edge using highly complex neural networks. Common approaches such as offloading to the cloud induce significant latency overheads due to the large amount of point cloud data during transmission. To resolve the tension between wimpy edge devices and compute-intensive inference workloads, we explore the possibility of empowering fast 2D detection to extrapolate 3D bounding boxes. To this end, we present Moby, a novel system that demonstrates the feasibility and potential of our approach. We design a transformation pipeline for Moby that generates 3D bounding boxes efficiently and accurately based on 2D detection results without running 3D detectors. Further, we devise a frame offloading scheduler that decides when to launch the 3D detector judiciously in the cloud to avoid the errors from accumulating. Extensive evaluations on NVIDIA Jetson TX2 with real-world autonomous driving datasets demonstrate that Moby offers up to 91.9% latency improvement with modest accuracy loss over state of the art. △ Less

Submitted 4 September, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: Accepted to ACM International Conference on Multimedia (MM) 2023

Showing 301–350 of 1,452 results for author: Cai, Y