Search | arXiv e-print repository

Feature Map for Quantum Data in Classification

Authors: Hyeokjea Kwon, Hojun Lee, Joonwoo Bae

Abstract: The kernel trick in supervised learning signifies transformations of an inner product by a feature map, which then restructures training data in a larger Hilbert space according to an endowed inner product. A quantum feature map corresponds to an instance with a Hilbert space of quantum states by fueling quantum resources to machine learning algorithms. In this work, we point out that the quantum… ▽ More The kernel trick in supervised learning signifies transformations of an inner product by a feature map, which then restructures training data in a larger Hilbert space according to an endowed inner product. A quantum feature map corresponds to an instance with a Hilbert space of quantum states by fueling quantum resources to machine learning algorithms. In this work, we point out that the quantum state space is specific such that a measurement postulate characterizes an inner product and that manipulation of quantum states prepared from classical data cannot enhance the distinguishability of data points. We present a feature map for quantum data as a probabilistic manipulation of quantum states to improve supervised learning algorithms. △ Less

Submitted 3 June, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: QCNC 2024 accepted paper (Main conference)

arXiv:2303.12793 [pdf, other]

CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

Authors: Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Wenqiang Zhang

Abstract: This work focuses on sign language retrieval-a recently proposed task for sign language understanding. Sign language retrieval consists of two sub-tasks: text-to-sign-video (T2V) retrieval and sign-video-to-text (V2T) retrieval. Different from traditional video-text retrieval, sign language videos, not only contain visual signals but also carry abundant semantic meanings by themselves due to the f… ▽ More This work focuses on sign language retrieval-a recently proposed task for sign language understanding. Sign language retrieval consists of two sub-tasks: text-to-sign-video (T2V) retrieval and sign-video-to-text (V2T) retrieval. Different from traditional video-text retrieval, sign language videos, not only contain visual signals but also carry abundant semantic meanings by themselves due to the fact that sign languages are also natural languages. Considering this character, we formulate sign language retrieval as a cross-lingual retrieval problem as well as a video-text retrieval task. Concretely, we take into account the linguistic properties of both sign languages and natural languages, and simultaneously identify the fine-grained cross-lingual (i.e., sign-to-word) map**s while contrasting the texts and the sign videos in a joint embedding space. This process is termed as cross-lingual contrastive learning. Another challenge is raised by the data scarcity issue-sign language datasets are orders of magnitude smaller in scale than that of speech recognition. We alleviate this issue by adopting a domain-agnostic sign encoder pre-trained on large-scale sign videos into the target domain via pseudo-labeling. Our framework, termed as domain-aware sign language retrieval via Cross-lingual Contrastive learning or CiCo for short, outperforms the pioneering method by large margins on various datasets, e.g., +22.4 T2V and +28.0 V2T R@1 improvements on How2Sign dataset, and +13.7 T2V and +17.1 V2T R@1 improvements on PHOENIX-2014T dataset. Code and models are available at: https://github.com/FangyunWei/SLRT. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023. Code and models are available at: https://github.com/FangyunWei/SLRT

arXiv:2303.12304 [pdf, other]

SiamTHN: Siamese Target Highlight Network for Visual Tracking

Authors: Jiahao Bao, Kaiqiang Chen, Xian Sun, Liang** Zhao, Wenhui Diao, Menglong Yan

Abstract: Siamese network based trackers develop rapidly in the field of visual object tracking in recent years. The majority of siamese network based trackers now in use treat each channel in the feature maps generated by the backbone network equally, making the similarity response map sensitive to background influence and hence challenging to focus on the target region. Additionally, there are no structur… ▽ More Siamese network based trackers develop rapidly in the field of visual object tracking in recent years. The majority of siamese network based trackers now in use treat each channel in the feature maps generated by the backbone network equally, making the similarity response map sensitive to background influence and hence challenging to focus on the target region. Additionally, there are no structural links between the classification and regression branches in these trackers, and the two branches are optimized separately during training. Therefore, there is a misalignment between the classification and regression branches, which results in less accurate tracking results. In this paper, a Target Highlight Module is proposed to help the generated similarity response maps to be more focused on the target region. To reduce the misalignment and produce more precise tracking results, we propose a corrective loss to train the model. The two branches of the model are jointly tuned with the use of corrective loss to produce more reliable prediction results. Experiments on 5 challenging benchmark datasets reveal that the method outperforms current models in terms of performance, and runs at 38 fps, proving its effectiveness and efficiency. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.09556 [pdf, other]

Efficient Diffusion Training via Min-SNR Weighting Strategy

Authors: Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, Baining Guo

Abstract: Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence. In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. To address this issue, we treat the diffusion training as a multi-task learning problem, and introduce a simple yet effectiv… ▽ More Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence. In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. To address this issue, we treat the diffusion training as a multi-task learning problem, and introduce a simple yet effective approach referred to as Min-SNR-$γ$. This method adapts loss weights of timesteps based on clamped signal-to-noise ratios, which effectively balances the conflicts among timesteps. Our results demonstrate a significant improvement in converging speed, 3.4$\times$ faster than previous weighting strategies. It is also more effective, achieving a new record FID score of 2.06 on the ImageNet $256\times256$ benchmark using smaller architectures than that employed in previous state-of-the-art. The code is available at https://github.com/TiankaiHang/Min-SNR-Diffusion-Training. △ Less

Submitted 11 March, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.09295 [pdf, other]

DIRE for Diffusion-Generated Image Detection

Authors: Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li

Abstract: Diffusion models have shown remarkable success in visual synthesis, but have also raised concerns about potential abuse for malicious purposes. In this paper, we seek to build a detector for telling apart real images from diffusion-generated images. We find that existing detectors struggle to detect images generated by diffusion models, even if we include generated images from a specific diffusion… ▽ More Diffusion models have shown remarkable success in visual synthesis, but have also raised concerns about potential abuse for malicious purposes. In this paper, we seek to build a detector for telling apart real images from diffusion-generated images. We find that existing detectors struggle to detect images generated by diffusion models, even if we include generated images from a specific diffusion model in their training data. To address this issue, we propose a novel image representation called DIffusion Reconstruction Error (DIRE), which measures the error between an input image and its reconstruction counterpart by a pre-trained diffusion model. We observe that diffusion-generated images can be approximately reconstructed by a diffusion model while real images cannot. It provides a hint that DIRE can serve as a bridge to distinguish generated and real images. DIRE provides an effective way to detect images generated by most diffusion models, and it is general for detecting generated images from unseen diffusion models and robust to various perturbations. Furthermore, we establish a comprehensive diffusion-generated benchmark including images generated by eight diffusion models to evaluate the performance of diffusion-generated image detectors. Extensive experiments on our collected benchmark demonstrate that DIRE exhibits superiority over previous generated-image detectors. The code and dataset are available at https://github.com/ZhendongWang6/DIRE. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: A general diffusion-generated image detector

arXiv:2303.06522 [pdf, other]

Token Sparsification for Faster Medical Image Segmentation

Authors: Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna

Abstract: Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding -> token completion -> dense decoding (SCD) pipeline. We first empirically show that naively appl… ▽ More Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding -> token completion -> dense decoding (SCD) pipeline. We first empirically show that naively applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput and inference up to 60.6% higher throughput) while maintaining segmentation quality. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: Accepted to IPMI'23. Code will be available here: https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg

arXiv:2303.02328 [pdf, ps, other]

Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization

Authors: Sangrok Lee, Jongseong Bae, Ha Young Kim

Abstract: Domain generalization (DG) is a principal task to evaluate the robustness of computer vision models. Many previous studies have used normalization for DG. In normalization, statistics and normalized features are regarded as style and content, respectively. However, it has a content variation problem when removing style because the boundary between content and style is unclear. This study addresses… ▽ More Domain generalization (DG) is a principal task to evaluate the robustness of computer vision models. Many previous studies have used normalization for DG. In normalization, statistics and normalized features are regarded as style and content, respectively. However, it has a content variation problem when removing style because the boundary between content and style is unclear. This study addresses this problem from the frequency domain perspective, where amplitude and phase are considered as style and content, respectively. First, we verify the quantitative phase variation of normalization through the mathematical derivation of the Fourier transform formula. Then, based on this, we propose a novel normalization method, PCNorm, which eliminates style only as the preserving content through spectral decomposition. Furthermore, we propose advanced PCNorm variants, CCNorm and SCNorm, which adjust the degrees of variations in content and style, respectively. Thus, they can learn domain-agnostic representations for DG. With the normalization methods, we propose ResNet-variant models, DAC-P and DAC-SC, which are robust to the domain gap. The proposed models outperform other recent DG methods. The DAC-SC achieves an average state-of-the-art performance of 65.6% on five datasets: PACS, VLCS, Office-Home, DomainNet, and TerraIncognita. △ Less

Submitted 15 March, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: 10 pages,6 figures, Conference on Computer Vision and Pattern Recognition 2023

arXiv:2303.00251 [pdf, other]

doi 10.1109/TAC.2023.3298281

Distributed Data-driven Predictive Control via Dissipative Behavior Synthesis

Authors: Yitao Yan, Jie Bao, Biao Huang

Abstract: This paper presents a distributed data-driven predictive control (DDPC) approach using the behavioral framework. It aims to design a network of controllers for an interconnected system with linear time-invariant (LTI) subsystems such that a given global (network-wide) cost function is minimized while desired control performance (e.g., network stability and disturbance rejection) is achieved using… ▽ More This paper presents a distributed data-driven predictive control (DDPC) approach using the behavioral framework. It aims to design a network of controllers for an interconnected system with linear time-invariant (LTI) subsystems such that a given global (network-wide) cost function is minimized while desired control performance (e.g., network stability and disturbance rejection) is achieved using dissipativity in the quadratic difference form (QdF). By viewing dissipativity as a behavior and integrating it into the control design as a virtual dynamical system, the proposed approach carries out the entire design process in a unified framework with a set-theoretic viewpoint. This leads to an effective data-driven distributed control design, where the global design goal can be achieved by distributed optimization based on the local QdF conditions. The approach is illustrated by an example throughout the paper. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Journal ref: IEEE Transactions on Automatic Control, 2023

arXiv:2302.14391 [pdf, other]

doi 10.1063/5.0147570

Log-law recovery through reinforcement-learning wall model for large-eddy simulation

Authors: Aurélien Vadrot, Xiang I. A. Yang, H. Jane Bae, Mahdi Abkar

Abstract: This paper focuses on the use of reinforcement learning (RL) as a machine-learning (ML) modeling tool for near-wall turbulence. RL has demonstrated its effectiveness in solving high-dimensional problems, especially in domains such as games. Despite its potential, RL is still not widely used for turbulence modeling and is primarily used for flow control and optimization purposes. A new RL wall mode… ▽ More This paper focuses on the use of reinforcement learning (RL) as a machine-learning (ML) modeling tool for near-wall turbulence. RL has demonstrated its effectiveness in solving high-dimensional problems, especially in domains such as games. Despite its potential, RL is still not widely used for turbulence modeling and is primarily used for flow control and optimization purposes. A new RL wall model (WM) called VYBA23 is developed in this work, which uses agents dispersed in the flow near the wall. The model is trained on a single Reynolds number ($Re_τ= 10^4$) and does not rely on high-fidelity data, as the back-propagation process is based on a reward rather than output error. The states of the RLWM, which are the representation of the environment by the agents, are normalized to remove dependence on the Reynolds number. The model is tested and compared to another RLWM (BK22) and to an equilibrium wall model, in a half-channel flow at eleven different Reynolds numbers ($Re_τ\in [180;10^{10}]$). The effects of varying agents' parameters such as actions range, time-step, and spacing are also studied. The results are promising, showing little effect on the average flow field but some effect on wall-shear stress fluctuations and velocity fluctuations. This work offers positive prospects for develo** RLWMs that can recover physical laws, and for extending this type of ML models to more complex flows in the future. △ Less

Submitted 2 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2211.03614

arXiv:2302.14331 [pdf]

Lifetime-configurable soft robots via photodegradable silicone elastomer composites

Authors: Min-Ha Oh, Young-Hwan Kim, Seung-Min Lee, Gyeong-Seok Hwang, Kyung-Sub Kim, Jae-Young Bae, Ju-Young Kim, Ju-Yong Lee, Yu-Chan Kim, Sang Yup Kim, Seung-Kyun Kang

Abstract: Develo** soft robots that can control their own life-cycle and degrade on-demand while maintaining hyper-elasticity is a significant research challenge. On-demand degradable soft robots, which conserve their original functionality during operation and rapidly degrade under specific external stimulation, present the opportunity to self-direct the disappearance of temporary robots. This study prop… ▽ More Develo** soft robots that can control their own life-cycle and degrade on-demand while maintaining hyper-elasticity is a significant research challenge. On-demand degradable soft robots, which conserve their original functionality during operation and rapidly degrade under specific external stimulation, present the opportunity to self-direct the disappearance of temporary robots. This study proposes soft robots and materials that exhibit excellent mechanical stretchability and can degrade under ultraviolet (UV) light by mixing a fluoride-generating diphenyliodonium hexafluorophosphate (DPI-HFP) with a silicone resin. Spectroscopic analysis revealed the mechanism of Si-O-Si backbone cleavage using fluoride ion (F-), which was generated from UV exposed DPI-HFP. Furthermore, photo-differential scanning calorimetry (DSC) based thermal analysis indicated increased decomposition kinetics at increased temperatures. Additionally, we demonstrated a robotics application of this composite by fabricating a gaiting robot. The integration of soft electronics, including strain sensors, temperature sensors, and photodetectors, expanded the robotic functionalities. This study provides a simple yet novel strategy for designing lifecycle mimicking soft robotics that can be applied to reduce soft robotics waste, explore hazardous areas where retrieval of robots is impossible, and ensure hardware security with on-demand destructive material platforms. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 58 pages, 6 figures, 2 Supplementary Text, 15 Supplementary figures, 1 movie

arXiv:2302.12824 [pdf, other]

doi 10.1051/0004-6361/202245577

Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Diverse outcomes of binary-disk interactions

Authors: Yapeng Zhang, Christian Ginski, Jane Huang, Alice Zurlo, Hervé Beust, Jaehan Bae, Myriam Benisty, Antonio Garufi, Michiel R. Hogerheijde, Rob G. van Holstein, Matthew Kenworthy, Maud Langlois, Carlo F. Manara, Paola Pinilla, Christian Rab, Álvaro Ribas, Giovanni P. Rosotti, Jonathan Williams

Abstract: Circumstellar disks do not evolve in isolation, as about half of solar-type stars were born in binary or multiple systems. Resolving disks in binary systems provides the opportunity to examine the influence of stellar companions on the outcomes of planet formation. We aim to investigate and compare disks in stellar multiple systems with near-infrared scattered-light imaging as part of the Disk Evo… ▽ More Circumstellar disks do not evolve in isolation, as about half of solar-type stars were born in binary or multiple systems. Resolving disks in binary systems provides the opportunity to examine the influence of stellar companions on the outcomes of planet formation. We aim to investigate and compare disks in stellar multiple systems with near-infrared scattered-light imaging as part of the Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS) program. We used polarimetric differential imaging with SPHERE/IRDIS at the VLT to search for scattered light from the circumstellar disks in three multiple systems, CHX 22, S CrA, and HP Cha. We performed astrometric and orbit analyses for the stellar companions using archival HST, VLT/NACO, and SPHERE data. Combined with the age and orbital constraints, the observed disk structures provide insights into the evolutionary history and the impact of the stellar companions. The small grains in CHX 22 form a tail-like structure surrounding the close binary, which likely results from a close encounter and capture of a cloudlet. S CrA shows intricate structures (tentative ringed and spiral features) in the circumprimary disk as a possible consequence of perturbations by companions. The circumsecondary disk is truncated and connected to the primary disk via a streamer, suggesting tidal interactions. In HP Cha, the primary disk is less disturbed and features a tenuous streamer, through which the material flows towards the companions. The comparison of the three systems spans a wide range of binary separation (50 - 500 au) and illustrates the decreasing influence on disk structures with the distance of companions. This agrees with the statistical analysis of exoplanet population in binaries, that planet formation is likely obstructed around close binary systems, while it is not suppressed in wide binaries. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 19 pages, 6 figures, accpeted for publication in A&A

Journal ref: A&A 672, A145 (2023)

arXiv:2302.09192 [pdf, other]

doi 10.1038/s41567-024-02400-8

Observation of Josephson Harmonics in Tunnel Junctions

Authors: Dennis Willsch, Dennis Rieger, Patrick Winkel, Madita Willsch, Christian Dickel, Jonas Krause, Yoichi Ando, Raphaël Lescanne, Zaki Leghtas, Nicholas T. Bronn, Pratiti Deb, Olivia Lanes, Zlatko K. Minev, Benedikt Dennig, Simon Geisert, Simon Günzler, Sören Ihssen, Patrick Paluch, Thomas Reisinger, Roudy Hanna, ** Hee Bae, Peter Schüffelgen, Detlev Grützmacher, Luiza Buimaga-Iarinca, Cristian Morari , et al. (5 additional authors not shown)

Abstract: Superconducting quantum processors have a long road ahead to reach fault-tolerant quantum computing. One of the most daunting challenges is taming the numerous microscopic degrees of freedom ubiquitous in solid-state devices. State-of-the-art technologies, including the world's largest quantum processors, employ aluminum oxide (AlO$_x$) tunnel Josephson junctions (JJs) as sources of nonlinearity,… ▽ More Superconducting quantum processors have a long road ahead to reach fault-tolerant quantum computing. One of the most daunting challenges is taming the numerous microscopic degrees of freedom ubiquitous in solid-state devices. State-of-the-art technologies, including the world's largest quantum processors, employ aluminum oxide (AlO$_x$) tunnel Josephson junctions (JJs) as sources of nonlinearity, assuming an idealized pure $\sin\varphi$ current-phase relation (C$\varphi$R). However, this celebrated $\sin\varphi$ C$\varphi$R is only expected to occur in the limit of vanishingly low-transparency channels in the AlO$_x$ barrier. Here we show that the standard C$\varphi$R fails to accurately describe the energy spectra of transmon artificial atoms across various samples and laboratories. Instead, a mesoscopic model of tunneling through an inhomogeneous AlO$_x$ barrier predicts %-level contributions from higher Josephson harmonics. By including these in the transmon Hamiltonian, we obtain orders of magnitude better agreement between the computed and measured energy spectra. The reality of Josephson harmonics transforms qubit design and prompts a reevaluation of models for quantum gates and readout, parametric amplification and mixing, Floquet qubits, protected Josephson qubits, etc. As an example, we show that engineered Josephson harmonics can reduce the charge dispersion and the associated errors in transmon qubits by an order of magnitude, while preserving anharmonicity. △ Less

Submitted 22 August, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Journal ref: Nat. Phys. 20, 815 (2024)

arXiv:2302.06987 [pdf, other]

Existence of entire solutions to the Lagrangian mean curvature equations in supercritical phase

Authors: Zixiao Liu, Cong Wang, Jiguang Bao

Abstract: In this paper, we establish the existence and uniqueness theorem of entire solutions to the Lagrangian mean curvature equations with prescribed asymptotic behavior at infinity. The phase functions are assumed to be supercritical and converge to a constant in a certain rate at infinity. The basic idea is to establish uniform estimates for the approximating problems defined on bounded domains and th… ▽ More In this paper, we establish the existence and uniqueness theorem of entire solutions to the Lagrangian mean curvature equations with prescribed asymptotic behavior at infinity. The phase functions are assumed to be supercritical and converge to a constant in a certain rate at infinity. The basic idea is to establish uniform estimates for the approximating problems defined on bounded domains and the main ingredient is to construct appropriate subsolutions and supersolutions as barrier functions. We also prove a nonexistence result to show the convergence rate of the phase functions is optimal. △ Less

Submitted 14 February, 2023; originally announced February 2023.

MSC Class: 35A01; 35J60; 35B40

arXiv:2302.04407 [pdf, other]

Bayesian Non-parametric Hidden Markov Model for Agile Radar Pulse Sequences Streaming Analysis

Authors: Jiadi Bao, Yunjie Li, Mengtao Zhu, Shafei Wang

Abstract: Multi-function radars (MFRs) are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online proce… ▽ More Multi-function radars (MFRs) are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online processing framework for parameter estimation and change point detection of MFR work modes. At first, this paper designed a fully-conjugate Bayesian non-parametric hidden Markov model with a designed prior distribution (agile BNP-HMM) to represent the MFR pulse agility characteristics. The proposed model allows fully-variational Bayesian inference. Then, the proposed framework is constructed by two main parts. The first part is the agile BNP-HMM model for automatically inferring the number of HMM hidden states and emission distribution of the corresponding hidden states. An estimation error lower bound on performance is derived and the proposed algorithm is shown to be close to the bound. The second part utilizes the streaming Bayesian updating to facilitate computation, and designed an online work mode change detection framework based upon a weighted sequential probability ratio test. We demonstrate that the proposed framework is consistently highly effective and robust to baseline methods on diverse simulated data-sets. △ Less

Submitted 22 August, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: 15 pages, 10 figures, submitted to IEEE transactions on signal processing

arXiv:2302.04308 [pdf, other]

Enhancing Modality-Agnostic Representations via Meta-Learning for Brain Tumor Segmentation

Authors: Aishik Konwer, Xiaoling Hu, Joseph Bae, Xuan Xu, Chao Chen, Prateek Prasanna

Abstract: In medical vision, different imaging modalities provide complementary information. However, in practice, not all modalities may be available during inference or even training. Previous approaches, e.g., knowledge distillation or image synthesis, often assume the availability of full modalities for all patients during training; this is unrealistic and impractical due to the variability in data coll… ▽ More In medical vision, different imaging modalities provide complementary information. However, in practice, not all modalities may be available during inference or even training. Previous approaches, e.g., knowledge distillation or image synthesis, often assume the availability of full modalities for all patients during training; this is unrealistic and impractical due to the variability in data collection across sites. We propose a novel approach to learn enhanced modality-agnostic representations by employing a meta-learning strategy in training, even when only limited full modality samples are available. Meta-learning enhances partial modality representations to full modality representations by meta-training on partial modality data and meta-testing on limited full modality samples. Additionally, we co-supervise this feature enrichment by introducing an auxiliary adversarial learning branch. More specifically, a missing modality detector is used as a discriminator to mimic the full modality setting. Our segmentation framework significantly outperforms state-of-the-art brain tumor segmentation techniques in missing modality scenarios. △ Less

Submitted 22 August, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: Accepted in ICCV 2023

arXiv:2302.03519 [pdf, other]

Efficient Parametric Approximations of Neural Network Function Space Distance

Authors: Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse

Abstract: It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Func… ▽ More It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset. △ Less

Submitted 28 May, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 18 pages, 5 figures, ICML 2023

arXiv:2302.00319 [pdf, other]

Development of deep biological ages aware of morbidity and mortality based on unsupervised and semi-supervised deep learning approaches

Authors: Seong-Eun Moon, Ji Won Yoon, Shinyoung Joo, Yoohyung Kim, Jae Hyun Bae, Seokho Yoon, Haanju Yoo, Young Min Cho

Abstract: Background: While deep learning technology, which has the capability of obtaining latent representations based on large-scale data, can be a potential solution for the discovery of a novel aging biomarker, existing deep learning methods for biological age estimation usually depend on chronological ages and lack of consideration of mortality and morbidity that are the most significant outcomes of a… ▽ More Background: While deep learning technology, which has the capability of obtaining latent representations based on large-scale data, can be a potential solution for the discovery of a novel aging biomarker, existing deep learning methods for biological age estimation usually depend on chronological ages and lack of consideration of mortality and morbidity that are the most significant outcomes of aging. Methods: This paper proposes a novel deep learning model to learn latent representations of biological aging in regard to subjects' morbidity and mortality. The model utilizes health check-up data in addition to morbidity and mortality information to learn the complex relationships between aging and measured clinical attributes. Findings: The proposed model is evaluated on a large dataset of general populations compared with KDM and other learning-based models. Results demonstrate that biological ages obtained by the proposed model have superior discriminability of subjects' morbidity and mortality. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2301.11014 [pdf, other]

Privacy-Preserving Joint Edge Association and Power Optimization for the Internet of Vehicles via Federated Multi-Agent Reinforcement Learning

Authors: Yan Lin, **ming Bao, Yi** Zhang, Jun Li, Feng Shu, Lajos Hanzo

Abstract: Proactive edge association is capable of improving wireless connectivity at the cost of increased handover (HO) frequency and energy consumption, while relying on a large amount of private information sharing required for decision making. In order to improve the connectivity-cost trade-off without privacy leakage, we investigate the privacy-preserving joint edge association and power allocation (J… ▽ More Proactive edge association is capable of improving wireless connectivity at the cost of increased handover (HO) frequency and energy consumption, while relying on a large amount of private information sharing required for decision making. In order to improve the connectivity-cost trade-off without privacy leakage, we investigate the privacy-preserving joint edge association and power allocation (JEAPA) problem in the face of the environmental uncertainty and the infeasibility of individual learning. Upon modelling the problem by a decentralized partially observable Markov Decision Process (Dec-POMDP), it is solved by federated multi-agent reinforcement learning (FMARL) through only sharing encrypted training data for federatively learning the policy sought. Our simulation results show that the proposed solution strikes a compelling trade-off, while preserving a higher privacy level than the state-of-the-art solutions. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 6 pages, 4 figures, IEEE Trans. on Veh. Technol

arXiv:2301.10796 [pdf, other]

doi 10.1016/j.nima.2023.168024

Reconstruction of Fast Neutron Direction in Segmented Organic Detectors using Deep Learning

Authors: Jun Woo Bae, Tingshiuan C. Wu, Igor Jovanovic

Abstract: A method for reconstructing the direction of a fast neutron source using a segmented organic scintillator-based detector and deep learning model is proposed and analyzed. The model is based on recurrent neural network, which can be trained by a sequence of data obtained from an event recorded in the detector and suitably pre-processed. The performance of deep learning-based model is compared with… ▽ More A method for reconstructing the direction of a fast neutron source using a segmented organic scintillator-based detector and deep learning model is proposed and analyzed. The model is based on recurrent neural network, which can be trained by a sequence of data obtained from an event recorded in the detector and suitably pre-processed. The performance of deep learning-based model is compared with the conventional double-scatter detection algorithm in reconstructing the direction of a fast neutron source. With the deep learning model, the uncertainty in source direction of 0.301 rad is achieved with 100 neutron detection events in a segmented cubic organic scintillator detector with a side length of 46 mm. To reconstruct the source direction with the same angular resolution as the double-scatter algorithm, the deep learning method requires 75% fewer events. Application of this method could augment the operation of segmented detectors operated in the neutron scatter camera configuration for applications such as special nuclear material detection. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 15 pages. 9 figures. Preprint submitted to Elsevier August 2022

Journal ref: Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment (2023) 1049 168024

arXiv:2301.10772 [pdf]

Gene-SGAN: a method for discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering

Authors: Zhijian Yang, Junhao Wen, Ahmed Abdulkadir, Yuhan Cui, Guray Erus, Elizabeth Mamourian, Randa Melhem, Dhivya Srinivasan, Sindhuja T. Govindarajan, Jiong Chen, Mohamad Habes, Colin L. Masters, Paul Maruff, Jurgen Fripp, Luigi Ferrucci, Marilyn S. Albert, Sterling C. Johnson, John C. Morris, Pamela LaMontagne, Daniel S. Marcus, Tammie L. S. Benzinger, David A. Wolk, Li Shen, **gxuan Bao, Susan M. Resnick , et al. (3 additional authors not shown)

Abstract: Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limite… ▽ More Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limited if the derived subtypes are not associated with genetic drivers or susceptibility factors. Herein, we describe Gene-SGAN - a multi-view, weakly-supervised deep clustering method - which dissects disease heterogeneity by jointly considering phenotypic and genetic data, thereby conferring genetic correlations to the disease subtypes and associated endophenotypic signatures. We first validate the generalizability, interpretability, and robustness of Gene-SGAN in semi-synthetic experiments. We then demonstrate its application to real multi-site datasets from 28,858 individuals, deriving subtypes of Alzheimer's disease and brain endophenotypes associated with hypertension, from MRI and SNP data. Derived brain phenotypes displayed significant differences in neuroanatomical patterns, genetic determinants, biological and clinical biomarkers, indicating potentially distinct underlying neuropathologic processes, genetic drivers, and susceptibility factors. Overall, Gene-SGAN is broadly applicable to disease subty** and endophenotype discovery, and is herein tested on disease-related, genetically-driven neuroimaging phenotypes. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.09091 [pdf, other]

BallGAN: 3D-aware Image Synthesis with a Spherical Background

Authors: Minjung Shin, Yunji Seo, Jeongmin Bae, Young Sun Choi, Hyunsu Kim, Hyeran Byun, Youngjung Uh

Abstract: 3D-aware GANs aim to synthesize realistic 3D scenes such that they can be rendered in arbitrary perspectives to produce images. Although previous methods produce realistic images, they suffer from unstable training or degenerate solutions where the 3D geometry is unnatural. We hypothesize that the 3D geometry is underdetermined due to the insufficient constraint, i.e., being classified as real ima… ▽ More 3D-aware GANs aim to synthesize realistic 3D scenes such that they can be rendered in arbitrary perspectives to produce images. Although previous methods produce realistic images, they suffer from unstable training or degenerate solutions where the 3D geometry is unnatural. We hypothesize that the 3D geometry is underdetermined due to the insufficient constraint, i.e., being classified as real image to the discriminator is not enough. To solve this problem, we propose to approximate the background as a spherical surface and represent a scene as a union of the foreground placed in the sphere and the thin spherical background. It reduces the degree of freedom in the background field. Accordingly, we modify the volume rendering equation and incorporate dedicated constraints to design a novel 3D-aware GAN framework named BallGAN. BallGAN has multiple advantages as follows. 1) It produces more reasonable 3D geometry; the images of a scene across different viewpoints have better photometric consistency and fidelity than the state-of-the-art methods. 2) The training becomes much more stable. 3) The foreground can be separately rendered on top of different arbitrary backgrounds. △ Less

Submitted 24 August, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: ICCV 2023, Project Page: https://minjung-s.github.io/ballgan

arXiv:2301.06734 [pdf, other]

doi 10.1103/PhysRevFluids.8.064612

Towards real-time reconstruction of velocity fluctuations in turbulent channel flow

Authors: Rahul Arun, H. Jane Bae, Beverley J. McKeon

Abstract: We develop a framework for efficient streaming reconstructions of turbulent velocity fluctuations from limited sensor measurements with the goal of enabling real-time applications. The reconstruction process is simplified by computing linear estimators using flow statistics from an initial training period and evaluating their performance during a subsequent testing period with data obtained from d… ▽ More We develop a framework for efficient streaming reconstructions of turbulent velocity fluctuations from limited sensor measurements with the goal of enabling real-time applications. The reconstruction process is simplified by computing linear estimators using flow statistics from an initial training period and evaluating their performance during a subsequent testing period with data obtained from direct numerical simulation. We address cases where (i) no, (ii) limited, and (iii) full-field training data are available using estimators based on (i) resolvent modes, (ii) resolvent-based estimation, and (iii) spectral proper orthogonal decomposition modes. During training, we introduce blockwise inversion to accurately and efficiently compute the resolvent operator in an interpretable manner. During testing, we enable efficient streaming reconstructions by using a temporal sliding discrete Fourier transform to recursively update Fourier coefficients using incoming measurements. We use this framework to reconstruct with minimal time delay the turbulent velocity fluctuations in a minimal channel at ${\rm Re}_τ\approx 186$ from sparse planar measurements. We evaluate reconstruction accuracy in the context of the extent of data required and thereby identify potential use cases for each estimator. The reconstructions capture large portions of the dynamics from relatively few measurement planes when the linear estimators are computed with sufficient fidelity. We also evaluate the efficiency of our reconstructions and show that the present framework has the potential to help enable real-time reconstructions of turbulent velocity fluctuations in an analogous experimental setting. △ Less

Submitted 27 June, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 36 pages, 22 figures, accepted by Physical Review Fluids

Journal ref: Physical Review Fluids 8, 064612 (2023)

arXiv:2301.04104 [pdf, other]

Mastering Diverse Domains through World Models

Authors: Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

Abstract: Develo** a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algorithms can be readily applied to tasks similar to what they have been developed for, configuring them for new application domains requires significant human expertise and experimentation. We present Dr… ▽ More Develo** a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algorithms can be readily applied to tasks similar to what they have been developed for, configuring them for new application domains requires significant human expertise and experimentation. We present DreamerV3, a general algorithm that outperforms specialized methods across over 150 diverse tasks, with a single configuration. Dreamer learns a model of the environment and improves its behavior by imagining future scenarios. Robustness techniques based on normalization, balancing, and transformations enable stable learning across domains. Applied out of the box, Dreamer is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula. This achievement has been posed as a significant challenge in artificial intelligence that requires exploring farsighted strategies from pixels and sparse rewards in an open world. Our work allows solving challenging control problems without extensive experimentation, making reinforcement learning broadly applicable. △ Less

Submitted 17 April, 2024; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: Website: https://danijar.com/dreamerv3

arXiv:2301.03629 [pdf, other]

doi 10.1051/0004-6361/202345868

On the correlation between dark matter, intracluster light and globular cluster distribution in SMACS0723

Authors: J. M. Diego, M. Pascale, B. Frye, A. Zitrin, T. Broadhurst, G. Mahler, G. B. Caminha, M. Jauzac, Myung Gyoon Lee, Jang Ho Bae, In Sung Jang, Mireia Montes

Abstract: We present a free-form model of SMACS0723, the first cluster observed with JWST. This model makes no strong assumptions about the distribution of mass (mostly dark matter) in the cluster and we use it to study the possible correlation between dark matter with the intracluster light and distribution of globular clusters. To explore the uncertainty in mass modelling, we derive three lens models base… ▽ More We present a free-form model of SMACS0723, the first cluster observed with JWST. This model makes no strong assumptions about the distribution of mass (mostly dark matter) in the cluster and we use it to study the possible correlation between dark matter with the intracluster light and distribution of globular clusters. To explore the uncertainty in mass modelling, we derive three lens models based on spectroscopically confirmed systems and new candidate systems with redshifts predicted by the lens model derived from the spectroscopic systems. We find that beyond the radius of influence of the BCG, the total mass does not trace the ICL, implying the need for a dark component (dark matter). Two loop-like structures observed in the intracluster light do not have an obvious correspondence with the total mass (mostly dark matter) distribution. The radial profiles of the ICL and the distribution of globular clusters are similar to each other, but steeper than the profile of the lens model. More specifically, we find that the total mass is shallower by 1 dex in log scale than both ICL and globular cluster profiles. This is in excellent agreement with N-body simulations of cold dark matter. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: 12 pages, 8 figures

Journal ref: A&A 679, A159 (2023)

arXiv:2301.03169 [pdf, other]

A Study on the Generality of Neural Network Structures for Monocular Depth Estimation

Authors: **woo Bae, Kyumin Hwang, Sunghoon Im

Abstract: Monocular depth estimation has been widely studied, and significant improvements in performance have been recently reported. However, most previous works are evaluated on a few benchmark datasets, such as KITTI datasets, and none of the works provide an in-depth analysis of the generalization performance of monocular depth estimation. In this paper, we deeply investigate the various backbone netwo… ▽ More Monocular depth estimation has been widely studied, and significant improvements in performance have been recently reported. However, most previous works are evaluated on a few benchmark datasets, such as KITTI datasets, and none of the works provide an in-depth analysis of the generalization performance of monocular depth estimation. In this paper, we deeply investigate the various backbone networks (e.g.CNN and Transformer models) toward the generalization of monocular depth estimation. First, we evaluate state-of-the-art models on both in-distribution and out-of-distribution datasets, which have never been seen during network training. Then, we investigate the internal properties of the representations from the intermediate layers of CNN-/Transformer-based models using synthetic texture-shifted datasets. Through extensive experiments, we observe that the Transformers exhibit a strong shape-bias rather than CNNs, which have a strong texture-bias. We also discover that texture-biased models exhibit worse generalization performance for monocular depth estimation than shape-biased models. We demonstrate that similar aspects are observed in real-world driving datasets captured under diverse environments. Lastly, we conduct a dense ablation study with various backbone networks which are utilized in modern strategies. The experiments demonstrate that the intrinsic locality of the CNNs and the self-attention of the Transformers induce texture-bias and shape-bias, respectively. △ Less

Submitted 10 December, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

Comments: Accepted in TPAMI

arXiv:2301.02674 [pdf, other]

doi 10.3847/1538-4357/aca89c

Molecular Map** of DR Tau's Protoplanetary Disk, Envelope, Outflow, and Large-Scale Spiral Arm

Authors: Jane Huang, Edwin A. Bergin, Jaehan Bae, Myriam Benisty, Sean M. Andrews

Abstract: DR Tau has been noted for its unusually high variability in comparison with other T Tauri stars. Although it is one of the most extensively studied pre-main sequence stars, observations with millimeter interferometry have so far been relatively limited. We present NOEMA images of $^{12}$CO, $^{13}$CO, C$^{18}$O, SO, DCO$^+$, and H$_2$CO toward DR Tau at a resolution of $\sim0.5''$ ($\sim100$ au).… ▽ More DR Tau has been noted for its unusually high variability in comparison with other T Tauri stars. Although it is one of the most extensively studied pre-main sequence stars, observations with millimeter interferometry have so far been relatively limited. We present NOEMA images of $^{12}$CO, $^{13}$CO, C$^{18}$O, SO, DCO$^+$, and H$_2$CO toward DR Tau at a resolution of $\sim0.5''$ ($\sim100$ au). In addition to the protoplanetary disk, CO emission reveals an envelope, a faint asymmetric outflow, and a spiral arm with a clump. The $\sim1200$ au extent of the CO arm far exceeds that of the spiral arms previously detected in scattered light, which underlines the necessity of sensitive molecular imaging for contextualizing the disk environment. The kinematics and compact emission distribution of C$^{18}$O, SO, DCO$^+$, and H$_2$CO indicate that they originate primarily from within the Keplerian circumstellar disk. The SO emission, though, also exhibits an asymmetry that may be due to interaction with infalling material or unresolved substructure. The complex environment of DR Tau is reminiscent of those of outbursting FUor sources and some EXor sources, suggesting that DR Tau's extreme stellar activity could likewise be linked to disk instabilities promoted by large-scale infall. △ Less

Submitted 6 January, 2023; originally announced January 2023.

Comments: 29 pages, accepted by ApJ

arXiv:2301.01792 [pdf, other]

Saturation of fishbone instability by self-generated zonal flows in tokamak plasmas

Authors: G. Brochard, C. Liu, X. Wei, W. Heidbrink, Z. Lin, N. Gorelenkov, C. Chrystal, X. Du, J. Bao, A. R. Polevoi, M. Schneider, S. H. Kim, S. D. Pinches, P. Liu, J. H. Nicolau, H. Lütjens

Abstract: Gyrokinetic simulations of the fishbone instability in DIII-D tokamak plasmas find that self-generated zonal flows can dominate the nonlinear saturation by preventing coherent structures from persisting or drifting in the energetic particle phase space when the mode frequency down-chirps. Results from the simulation with zonal flows agree quantitatively, for the first time, with experimental measu… ▽ More Gyrokinetic simulations of the fishbone instability in DIII-D tokamak plasmas find that self-generated zonal flows can dominate the nonlinear saturation by preventing coherent structures from persisting or drifting in the energetic particle phase space when the mode frequency down-chirps. Results from the simulation with zonal flows agree quantitatively, for the first time, with experimental measurements of the fishbone saturation amplitude and energetic particle transport. Moreover, the fishbone-induced zonal flows are likely responsible for the formation of an internal transport barrier that was observed after fishbone bursts in this DIII-D experiment. Finally, gyrokinetic simulations of a related ITER baseline scenario show that the fishbone induces insignificant energetic particle redistribution and may enable high performance scenarios in ITER burning plasma experiments. △ Less

Submitted 22 January, 2024; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: Accepted in Physical Review Letters : https://journals.aps.org/prl/accepted/b0073Yf2Q4c1a78cb8611348c9fa5932b12922776

Journal ref: Physical Review Letters 2024

arXiv:2301.01684 [pdf, other]

doi 10.1051/0004-6361/202245381

A kinematically detected planet candidate in a transition disk

Authors: Jochen Stadler, Myriam Benisty, Andrés F. Izquierdo, Stefano Facchini, Richard Teague, Nicolas Kurtovic, Paola Pinilla, Jaehan Bae, Megan Ansdell, Ryan Loomis, Satoshi Mayama, Laura M. Pérez, Leonardo Testi

Abstract: Transition disks are protoplanetary disks with inner cavities possibly cleared by massive companions, which makes them prime targets to observe at high resolution to map their velocity structure. We present ALMA Band 6 dust and gas observations of the transition disk around RXJ1604.3-2130 A, known to feature nearly symmetric shadows in scattered light. We studied the $^{12}$CO line channel maps an… ▽ More Transition disks are protoplanetary disks with inner cavities possibly cleared by massive companions, which makes them prime targets to observe at high resolution to map their velocity structure. We present ALMA Band 6 dust and gas observations of the transition disk around RXJ1604.3-2130 A, known to feature nearly symmetric shadows in scattered light. We studied the $^{12}$CO line channel maps and moment maps of the line of sight velocity and peak intensity. We fitted a Keplerian model of the channel-by-channel emission to study line profile differences and produced deprojected radial profiles for all velocity components. The $^{12}$CO emission shows a cavity inwards of $\sim$56 au and within the dust continuum ring at 81 au. The azimuthal brightness variations in the $^{12}$CO line and dust continuum are broadly aligned with the shadows detected in scattered-light observations. We find a strong localized non-Keplerian feature toward the west within the continuum ring (at $R=41\pm 10$ au and $PA=280\pm 2 ^\circ$). A tightly wound spiral is also detected which extends over 300$^\circ$ in azimuth, possibly connected to the localized non-Keplerian feature. Finally, a bending of the iso-velocity contours within the gas cavity indicates a highly perturbed inner region, possibly related to the presence of a misaligned inner disk. While broadly aligned with the scattered-light shadows, the localized non-Keplerian feature cannot be solely due to changes in temperature. Instead, we interpret the kinematical feature as tracing a massive companion located at the edge of the dust continuum ring. We speculate that the spiral is caused by buoyancy resonances driven by planet-disk interactions. However, this potential planet at $\sim$41 au cannot explain the gas-depleted cavity, the low accretion rate, and the misaligned inner disk, which suggests the presence of another companion closer in. △ Less

Submitted 11 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: 13 pages, 14 figures. To be published in Astronomy & Astrophysics Letters

Journal ref: A&A 670, L1 (2023)

arXiv:2301.01486 [pdf, other]

doi 10.1051/0004-6361/202245192

Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Characterization of the young star T CrA and its circumstellar environment

Authors: E. Rigliaco, R. Gratton, S. Ceppi, C. Ginski, M. Hogerheijde, M. Benisty, T. Birnstiel, M. Dima, S. Facchini, A. Garufi, J. Bae, M. Langlois, G. Lodato, E. Mamajek, C. F. Manara, F. Ménard, Á. Ribas, A. Zurlo

Abstract: Birth environments of young stars have strong imprints on the star itself and their surroundings. We present a detailed analysis of the wealthy circumstellar environment around the young Herbig Ae/Be star TCrA. Our aim is to understand the nature of the stellar system and the extended circumstellar structures as seen in scattered light images. We conduct our analysis combining archival data, and n… ▽ More Birth environments of young stars have strong imprints on the star itself and their surroundings. We present a detailed analysis of the wealthy circumstellar environment around the young Herbig Ae/Be star TCrA. Our aim is to understand the nature of the stellar system and the extended circumstellar structures as seen in scattered light images. We conduct our analysis combining archival data, and new adaptive optics high-contrast and high-resolution images. The scattered light images reveal the presence of a complex environment composed of a bright forward scattering rim of the disk's surface that is seen at very high inclination, a dark lane of the disk midplane, bipolar outflows, and streamer features likely tracing infalling material from the surrounding birth cloud onto the disk. The analysis of the light curve suggests the star is a binary with a period of 29.6yrs. The comparison of the scattered light images with ALMA continuum and 12CO line emission shows the disk is in keplerian rotation, with the northern side of the outflowing material receding, while the southern side approaching the observer. The disk is itself seen edge-on. The direction of the outflows seen in scattered light is in agreement with the direction of the more distant molecular hydrogen emission-line objects (MHOs) associated to the star. Modeling of the SED using a radiative transfer scheme well agrees with the proposed configuration, as well as the hydrodynamical simulation performed using a Smoothed Particle Hydrodynamics code. We find evidence of streamers of accreting material around TCrA. These streamers connect the filament along which TCrA is forming with the outer parts of the disk, suggesting that the strong misalignment between the inner and outer disk is due to a change in the direction of the angular momentum of the material accreting on the disk during the late phase of star formation. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: 17 pages, 14 figures

Journal ref: A&A 671, A82 (2023)

arXiv:2301.00663 [pdf, other]

A Survey of Toric Quivers and BPS Algebras

Authors: Jiakang Bao

Abstract: In this note, we discuss some properties of the quiver BPS algebras. We consider how they would transform under different operations on the toric quivers, such as dualities and higgsing. We also give free field realizations of the algebras, in particular for the chiral quivers. In this note, we discuss some properties of the quiver BPS algebras. We consider how they would transform under different operations on the toric quivers, such as dualities and higgsing. We also give free field realizations of the algebras, in particular for the chiral quivers. △ Less

Submitted 10 April, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: 40 pages; v4: minor corrections

arXiv:2212.14820 [pdf, other]

doi 10.1038/s41598-023-37771-0

On the structure of mirrored operators obtained from optimal entanglement witnesses

Authors: Anindita Bera, Joonwoo Bae, Beatrix C. Hiesmayr, Dariusz Chruściński

Abstract: Entanglement witnesses (EWs) are a versatile tool in the verification of entangled states. The framework of mirrored EW doubles the power of a given EW by introducing its twin -- a mirrored EW -- whereby two EWs related by mirroring can bound the set of separable states more efficiently. In this work, we investigate the relation between the EWs and its mirrored ones, and present a conjecture which… ▽ More Entanglement witnesses (EWs) are a versatile tool in the verification of entangled states. The framework of mirrored EW doubles the power of a given EW by introducing its twin -- a mirrored EW -- whereby two EWs related by mirroring can bound the set of separable states more efficiently. In this work, we investigate the relation between the EWs and its mirrored ones, and present a conjecture which claims that the mirrored operator obtained from an optimal EW is either a positive operator or a decomposable EW, which implies that positive-partial-transpose entangled states, also known as the bound entangled states, cannot be detected. This conjecture is reached by studying numerous known examples of optimal EWs. However, the mirrored EWs obtained from the non-optimal ones can be non-decomposable as well. We also show that mirrored operators obtained from the extremal decomposable witnesses are positive semi-definite. Interestingly, the witnesses that violate the well known conjecture of Structural Physical Approximation, do satisfy our conjecture. The intricate relation between these two conjectures is discussed and it reveals a novel structure of the separability problem. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Comments: 13 pages, 3 figures, 1 table, comments are welcome

Report number: 10733

Journal ref: Scientific Reports, vol. 13, page no. 10733, 2023

arXiv:2212.13815 [pdf, other]

Fundamental theorem for quantum asset pricing

Authors: **ge Bao, Patrick Rebentrost

Abstract: Quantum computers have the potential to provide an advantage for financial pricing problems by the use of quantum estimation. In a broader context, it is reasonable to ask about situations where the market and the assets traded on the market themselves have quantum properties. In this work, we consider a financial setting where instead of by classical probabilities the market is described by a pur… ▽ More Quantum computers have the potential to provide an advantage for financial pricing problems by the use of quantum estimation. In a broader context, it is reasonable to ask about situations where the market and the assets traded on the market themselves have quantum properties. In this work, we consider a financial setting where instead of by classical probabilities the market is described by a pure quantum state or, more generally, a quantum density operator. This setting naturally leads to a new asset class, which we call quantum assets. Under the assumption that such assets have a price and can be traded, we develop an extended definition of arbitrage to quantify gains without the corresponding risk. Our main result is a quantum version of the first fundamental theorem of asset pricing. If and only if there is no arbitrage, there exists a risk-free density operator under which all assets are martingales. This density operator is used for the pricing of quantum derivatives. To prove the theorem, we study the density operator version of the Radon-Nikodym measure change. We provide examples to illustrate the theory. △ Less

Submitted 5 April, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 19 pages, 2 figures, presentation modified, typos corrected

arXiv:2212.08667 [pdf, other]

doi 10.3847/1538-4357/acb3c4

Map** Protoplanetary Disk Vertical Structure with CO Isotopologue Line Emission

Authors: Charles J. Law, Richard Teague, Karin I. Öberg, Evan A. Rich, Sean M. Andrews, Jaehan Bae, Myriam Benisty, Stefano Facchini, Kevin Flaherty, Andrea Isella, Sheng **, Jun Hashimoto, Jane Huang, Ryan A. Loomis, Feng Long, Carlos E. Muñoz-Romero, Teresa Paneque-Carreño, Laura M. Pérez, Chunhua Qi, Kamber R. Schwarz, Jochen Stadler, Takashi Tsukagoshi, David J. Wilner, Gerrit van der Plas

Abstract: High spatial resolution observations of CO isotopologue line emission in protoplanetary disks at mid-inclinations (${\approx}$30-75°) allow us to characterize the gas structure in detail, including radial and vertical substructures, emission surface heights and their dependencies on source characteristics, and disk temperature profiles. By combining observations of a suite of CO isotopologues, we… ▽ More High spatial resolution observations of CO isotopologue line emission in protoplanetary disks at mid-inclinations (${\approx}$30-75°) allow us to characterize the gas structure in detail, including radial and vertical substructures, emission surface heights and their dependencies on source characteristics, and disk temperature profiles. By combining observations of a suite of CO isotopologues, we can map the 2D (r, z) disk structure from the disk upper atmosphere, as traced by CO, to near the midplane, as probed by less abundant isotopologues. Here, we present high angular resolution (${\lesssim}$0."1 to ${\approx}$0."2; ${\approx}$15-30 au) observations of CO, $^{13}$CO, and C$^{18}$O in either or both J=2-1 and J=3-2 lines in the transition disks around DM Tau, Sz 91, LkCa 15, and HD 34282. We derived line emission surfaces in CO for all disks and in $^{13}$CO for the DM Tau and LkCa 15 disks. With these observations, we do not resolve the vertical structure of C$^{18}$O in any disk, which is instead consistent with C$^{18}$O emission originating from the midplane. Both the J=2-1 and J=3-2 lines show similar heights. Using the derived emission surfaces, we computed radial and vertical gas temperature distributions for each disk, including empirical temperature models for the DM Tau and LkCa 15 disks. After combining our sample with literature sources, we find that $^{13}$CO line emitting heights are also tentatively linked with source characteristics, e.g., stellar host mass, gas temperature, disk size, and show steeper trends than seen in CO emission surfaces. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: 31 pages, 15 figures, accepted for publication in ApJ. Image cubes available at https://zenodo.org/record/7430257

arXiv:2212.06138 [pdf, other]

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

Authors: Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Shuyang Gu, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Abstract: Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory. In this paper, we identify that fine-tuning performance is significantly impacted by hyper-parameter choices. We examine various key hyper-parameters and empirically evaluate their impact in fine-tuning CLIP for classification tasks through a… ▽ More Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory. In this paper, we identify that fine-tuning performance is significantly impacted by hyper-parameter choices. We examine various key hyper-parameters and empirically evaluate their impact in fine-tuning CLIP for classification tasks through a comprehensive study. We find that the fine-tuning performance of CLIP is substantially underestimated. Equipped with hyper-parameter refinement, we demonstrate CLIP itself is better or at least competitive in fine-tuning compared with large-scale supervised pre-training approaches or latest works that use CLIP as prediction targets in Masked Image Modeling. Specifically, CLIP ViT-Base/16 and CLIP ViT-Large/14 can achieve 85.7%,88.0% finetuning Top-1 accuracy on the ImageNet-1K dataset . These observations challenge the conventional conclusion that CLIP is not suitable for fine-tuning, and motivate us to rethink recently proposed improvements based on CLIP. We will release our code publicly at \url{https://github.com/LightDXY/FT-CLIP}. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: Technical Report, code will be available at https://github.com/LightDXY/FT-CLIP

arXiv:2212.06135 [pdf, other]

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

Authors: Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, **g**g Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

Abstract: This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Ro… ▽ More This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Rodin), which represents a neural radiance field as multiple 2D feature maps and rolls out these maps into a single 2D feature plane within which we perform 3D-aware diffusion. The Rodin model brings the much-needed computational efficiency while preserving the integrity of diffusion in 3D by using 3D-aware convolution that attends to projected features in the 2D feature plane according to their original relationship in 3D. We also use latent conditioning to orchestrate the feature generation for global coherence, leading to high-fidelity avatars and enabling their semantic editing based on text prompts. Finally, we use hierarchical synthesis to further enhance details. The 3D avatars generated by our model compare favorably with those produced by existing generative techniques. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards. We also demonstrate 3D avatar generation from image or text as well as text-guided editability. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: Project Webpage: https://3d-avatar-diffusion.microsoft.com/

arXiv:2212.04947 [pdf]

Monitoring Spent Nuclear Fuel in a Dry Cask Using Momentum Integrated Muon Scattering Tomography

Authors: Junghyun Bae, Stylianos Chatzidakis

Abstract: Nuclear materials accountability and nonproliferation are among the critical tasks to be addressed for the advancement of nuclear energy in the United States. Monitoring spent nuclear fuel is important to continue reliable stewardship of SNF storage. Cosmic ray muons have been acknowledged a promising radiographic tool for monitoring SNF due to their highly penetrative nature and high energy. Cosm… ▽ More Nuclear materials accountability and nonproliferation are among the critical tasks to be addressed for the advancement of nuclear energy in the United States. Monitoring spent nuclear fuel is important to continue reliable stewardship of SNF storage. Cosmic ray muons have been acknowledged a promising radiographic tool for monitoring SNF due to their highly penetrative nature and high energy. Cosmic ray muons are more suitable and have been used for imaging large and dense objects. Despite their potential in various applications, the wide application of cosmic ray muons is limited by the naturally low intensity at sea level. To efficiently utilize cosmic ray muons in engineering applications, trajectory and momentum must be measured. Although various studies demonstrate that there is significant potential for measuring momentum in muon applications, it is still difficult to measure both muon scattering angle and momentum in the field. To fill this critical gap, a muon spectrometer using multilayer pressurized gas Cherenkov radiators was proposed. However, existing muon tomographic algorithms were developed assuming monoenergetic muon scattering and are not optimized for a measured polyenergetic momentum spectrum. In this work, we develop and evaluate a momentum integrated muon scattering tomography algorithm. We evaluate the algorithm on its capability to identify a missing fuel assembly from a SNF dry cask. Our results demonstrate that image resolution using MMST is significantly improved when measuring muon momentum and it can reduce monitoring time by a factor of 10 when compared to that of a conventional muon imaging technique in terms of systematically finding a missing FA. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Transaction of American Nuclear Society

arXiv:2212.03905 [pdf, other]

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

Authors: Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Abstract: Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications. In practice, VAEs usually require multiple training rounds to choose the amount of information the latent variable should retain. This trade-off between the reconstruction error (distortion) and the KL divergence (rate) is typically parameterized by a hyperparameter… ▽ More Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications. In practice, VAEs usually require multiple training rounds to choose the amount of information the latent variable should retain. This trade-off between the reconstruction error (distortion) and the KL divergence (rate) is typically parameterized by a hyperparameter $β$. In this paper, we introduce Multi-Rate VAE (MR-VAE), a computationally efficient framework for learning optimal parameters corresponding to various $β$ in a single training run. The key idea is to explicitly formulate a response function that maps $β$ to the optimal parameters using hypernetworks. MR-VAEs construct a compact response hypernetwork where the pre-activations are conditionally gated based on $β$. We justify the proposed architecture by analyzing linear VAEs and showing that it can represent response functions exactly for linear VAEs. With the learned hypernetwork, MR-VAEs can construct the rate-distortion curve without additional training and can be deployed with significantly less hyperparameter tuning. Empirically, our approach is competitive and often exceeds the performance of multiple $β$-VAEs training with minimal computation and memory overheads. △ Less

Submitted 16 August, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: 22 pages, 9 figures

arXiv:2212.03863 [pdf, other]

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Authors: Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

Abstract: Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous wor… ▽ More Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code and models are available at https://github.com/yoctta/XPaste. △ Less

Submitted 31 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: ICML 2023, code is available at https://github.com/yoctta/XPaste

arXiv:2212.03210 [pdf, other]

doi 10.21468/SciPostPhys.15.4.139

Isolated flat bands in 2D lattices based on a novel path-exchange symmetry

Authors: Jun-Hyung Bae, Tigran Sedrakyan, Saurabh Maiti

Abstract: The increased ability to engineer two-dimensional (2D) systems, either using materials, photonic lattices, or cold atoms, has led to the search for 2D structures with interesting properties. One such property is the presence of flat bands. Typically, the presence of these requires long-ranged hop**s, fine-tuning of nearest neighbor hop**s, or breaking time-reversal symmetry by using a staggere… ▽ More The increased ability to engineer two-dimensional (2D) systems, either using materials, photonic lattices, or cold atoms, has led to the search for 2D structures with interesting properties. One such property is the presence of flat bands. Typically, the presence of these requires long-ranged hop**s, fine-tuning of nearest neighbor hop**s, or breaking time-reversal symmetry by using a staggered flux distribution in the unit cell. We provide a prescription based on carrying out projections from a parent system to generate different flat band systems. We identify the conditions for maintaining the flatness and identify a path-exchange symmetry in such systems that cause the flat band to be degenerate with the other dispersive ones. Breaking this symmetry leads to lifting the degeneracy while still preserving the flatness of the band. This technique does not require changing the topology nor breaking time-reversal symmetry as was suggested earlier in the literature. The prescription also eliminates the need for any fine-tuning. Moreover, it is shown that the subsequent projected systems inherit the precise fine-tuning conditions that were discussed in the literature for similar systems, in order to have and isolate a flat band. As examples, we demonstrate the use of our prescription to arrive at the flat band conditions for popular systems like the Kagome, the Lieb, and the Dice lattices. Finally, we are also able to show that a flat band exists in a recently proposed chiral spin-liquid state of the Kagome lattice only if it is associated with a gauge field that produces a flux modulation of the Chern-Simons type. △ Less

Submitted 27 January, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: 37 pages, 20 figures, References are updated and an additional section on flux-attachment to lattice is also included

Journal ref: SciPost Phys. 15, 139 (2023)

arXiv:2212.02741 [pdf, ps, other]

A sparsity-promoting resolvent analysis for the identification of spatiotemporally-localized amplification mechanisms

Authors: Barbara Lopez-Doriga, Eric Ballouz, H. Jane Bae, Scott T. M. Dawson

Abstract: This work introduces a variant of resolvent analysis that identifies forcing and response modes that are sparse in both space and time. This is achieved through the use of a sparse principal component analysis (PCA) algorithm, which formulates the associated optimization problem as a nonlinear eigenproblem that can be solved with an inverse power method. We apply this method to parallel shear flow… ▽ More This work introduces a variant of resolvent analysis that identifies forcing and response modes that are sparse in both space and time. This is achieved through the use of a sparse principal component analysis (PCA) algorithm, which formulates the associated optimization problem as a nonlinear eigenproblem that can be solved with an inverse power method. We apply this method to parallel shear flows, both in the case where we assume Fourier modes in time (as in standard resolvent analysis) and obtain spatial localization, and where we allow for temporally-sparse modes through the use of a linearized Navier-Stokes operator discretized in both space and time. Appropriate choice of desired mode sparsity allows for the identification of structures corresponding to high amplification that are localized in both space and time. We report on the similarities and differences between these structures and those from standard methods of analysis. After validating this space-time resolvent analysis on statistically-stationary channel flow, we next implement the methodology on a time-periodic Stokes boundary layer, demonstrating the applicability of the approach to non-statistically-stationary systems. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2212.02660 [pdf, ps, other]

Wavelet-based resolvent analysis for statistically-stationary and temporally-evolving flows

Authors: Eric Ballouz, Barbara Lopez-Doriga, Scott T. M. Dawson, H. Jane Bae

Abstract: This work introduces a formulation of resolvent analysis that uses wavelet transforms rather than Fourier transforms in time. This allows resolvent analysis to be extended to turbulent flows with non-stationary means in addition to statistically-stationary flows. The optimal resolvent modes for this formulation correspond to the potentially time-transient structures that are most amplified by the… ▽ More This work introduces a formulation of resolvent analysis that uses wavelet transforms rather than Fourier transforms in time. This allows resolvent analysis to be extended to turbulent flows with non-stationary means in addition to statistically-stationary flows. The optimal resolvent modes for this formulation correspond to the potentially time-transient structures that are most amplified by the linearized Navier-Stokes operator. We validate this methodology for turbulent channel flow and show that the wavelet-based and Fourier-based resolvent analyses are equivalent for statistically-stationary flows. We then apply the wavelet-based resolvent analysis to study the transient growth mechanism in the buffer layer of a turbulent channel flow by windowing the resolvent operator in time and frequency. The method is also applied to temporally-evolving parallel shear flows such as an oscillating boundary layer and three-dimensional channel flow, in which a lateral pressure gradient perturbs a fully-developed turbulent flow in a channel. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.16427 [pdf, ps, other]

Multi-agent reinforcement learning for wall modeling in LES of flow over periodic hills

Authors: Di Zhou, Michael P. Whitmore, Kevin P. Griffin, H. Jane Bae

Abstract: We develop a wall model for large-eddy simulation (LES) that takes into account various pressure-gradient effects using multi-agent reinforcement learning (MARL). The model is trained using low-Reynolds-number flow over periodic hills with agents distributed on the wall along the computational grid points. The model utilizes a wall eddy-viscosity formulation as the boundary condition, which is sho… ▽ More We develop a wall model for large-eddy simulation (LES) that takes into account various pressure-gradient effects using multi-agent reinforcement learning (MARL). The model is trained using low-Reynolds-number flow over periodic hills with agents distributed on the wall along the computational grid points. The model utilizes a wall eddy-viscosity formulation as the boundary condition, which is shown to provide better predictions of the mean velocity field, rather than the typical wall-shear stress formulation. Each agent receives states based on local instantaneous flow quantities at an off-wall location, computes a reward based on the estimated wall-shear stress, and provides an action to update the wall eddy viscosity at each time step. The trained wall model is validated in wall-modeled LES (WMLES) of flow over periodic hills at higher Reynolds numbers, and the results show the effectiveness of the model on flow with pressure gradients. The analysis of the trained model indicates that the model is capable of distinguishing between the various pressure gradient regimes present in the flow. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.15045 [pdf, other]

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs

Authors: Yixuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li

Abstract: In this work, we are dedicated to text-guided image generation and propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a map** network. In the training stage, we encode an image with CLIP and map the output fea… ▽ More In this work, we are dedicated to text-guided image generation and propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a map** network. In the training stage, we encode an image with CLIP and map the output feature to a latent code, which is further used to reconstruct the image. In this way, the map** network is optimized in a self-supervised learning way. In the inference stage, since CLIP can embed both image and text into a shared feature embedding space, we replace CLIP image encoder in the training architecture with CLIP text encoder, while kee** the following map** network as well as StyleGAN model. As a result, we can flexibly input a text description to generate an image. Moreover, by simply adding mapped text features of an attribute to a mapped CLIP image feature, we can effectively edit the attribute to the image. Extensive experiments demonstrate the superior performance of our proposed CLIP2GAN compared to previous methods. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2211.14813 [pdf, other]

SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

Authors: Huaishao Luo, Junwei Bao, Youzheng Wu, Xiaodong He, Tianrui Li

Abstract: Recently, the contrastive language-image pre-training, e.g., CLIP, has demonstrated promising results on various downstream tasks. The pre-trained model can capture enriched visual concepts for images by learning from a large scale of text-image data. However, transferring the learned visual knowledge to open-vocabulary semantic segmentation is still under-explored. In this paper, we propose a CLI… ▽ More Recently, the contrastive language-image pre-training, e.g., CLIP, has demonstrated promising results on various downstream tasks. The pre-trained model can capture enriched visual concepts for images by learning from a large scale of text-image data. However, transferring the learned visual knowledge to open-vocabulary semantic segmentation is still under-explored. In this paper, we propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation in an annotation-free manner. The SegCLIP achieves segmentation based on ViT and the main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs. The gathering operation can dynamically capture the semantic groups, which can be used to generate the final segmentation results. We further propose a reconstruction loss on masked patches and a superpixel-based KL loss with pseudo-labels to enhance the visual representation. Experimental results show that our model achieves comparable or superior segmentation accuracy on the PASCAL VOC 2012 (+0.3% mIoU), PASCAL Context (+2.3% mIoU), and COCO (+2.2% mIoU) compared with baselines. We release the code at https://github.com/ArrowLuo/SegCLIP. △ Less

Submitted 20 June, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

arXiv:2211.14511 [pdf, ps, other]

Investigating nonlinearity in wall turbulence: regenerative versus parametric mechanisms

Authors: B. F. Farrell, E. Kim, H. J. Bae, M. -A. Nikolaidis, P. J. Ioannou

Abstract: Both linear growth processes associated with non-normality of the mean flow and nonlinear interaction transferring energy among fluctuations contribute to maintaining turbulence. However, a detailed understanding of the mechanism by which they cooperate in sustaining the turbulent state is lacking. In this report, we examine the role of fluctuation-fluctuation nonlinearity by varying the magnitude… ▽ More Both linear growth processes associated with non-normality of the mean flow and nonlinear interaction transferring energy among fluctuations contribute to maintaining turbulence. However, a detailed understanding of the mechanism by which they cooperate in sustaining the turbulent state is lacking. In this report, we examine the role of fluctuation-fluctuation nonlinearity by varying the magnitude of the associated term in the dynamics of Couette flow turbulence to determine how this nonlinear component helps maintain and determine the structure of the turbulent state, and particularly whether this mechanism is parametric or regenerative. Having determined that the mechanism supporting the fluctuation field in Navier-Stokes turbulence is parametric, we then study the mechanism by which the fluctuation component of turbulence is maintained by parametric growth in a time-dependent mean flow by examining the parametric growth mechanism in the frequency domain using analysis of the time-dependent resolvent. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: 10 pages, 4 fgures

Report number: CTR 2022 Report

arXiv:2211.12634 [pdf, other]

PNI : Industrial Anomaly Detection using Position and Neighborhood Information

Authors: Jaehyeok Bae, Jae-Han Lee, Seyun Kim

Abstract: Because anomalous samples cannot be used for training, many anomaly detection and localization methods use pre-trained networks and non-parametric modeling to estimate encoded feature distribution. However, these methods neglect the impact of position and neighborhood information on the distribution of normal features. To overcome this, we propose a new algorithm, \textbf{PNI}, which estimates the… ▽ More Because anomalous samples cannot be used for training, many anomaly detection and localization methods use pre-trained networks and non-parametric modeling to estimate encoded feature distribution. However, these methods neglect the impact of position and neighborhood information on the distribution of normal features. To overcome this, we propose a new algorithm, \textbf{PNI}, which estimates the normal distribution using conditional probability given neighborhood features, modeled with a multi-layer perceptron network. Moreover, position information is utilized by creating a histogram of representative features at each position. Instead of simply resizing the anomaly map, the proposed method employs an additional refine network trained on synthetic anomaly images to better interpolate and account for the shape and edge of the input image. We conducted experiments on the MVTec AD benchmark dataset and achieved state-of-the-art performance, with \textbf{99.56\%} and \textbf{98.98\%} AUROC scores in anomaly detection and localization, respectively. △ Less

Submitted 30 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.12445 [pdf, other]

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

Authors: Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Abstract: We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. SinDiffusion significantly improves the quality and diversity of generated samples compared with existing GAN-based approaches. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with pr… ▽ More We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. SinDiffusion significantly improves the quality and diversity of generated samples compared with existing GAN-based approaches. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales which serves as the default setting in prior work. This avoids the accumulation of errors, which cause characteristic artifacts in generated results. Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics, therefore we redesign the network structure of the diffusion model. Coupling these two designs enables us to generate photorealistic and diverse images from a single image. Furthermore, SinDiffusion can be applied to various applications, i.e., text-guided image generation, and image outpainting, due to the inherent capability of diffusion models. Extensive experiments on a wide range of images demonstrate the superiority of our proposed method for modeling the patch distribution. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.11201 [pdf, other]

Self-Supervised 3D Traversability Estimation with Proxy Bank Guidance

Authors: Jihwan Bae, Junwon Seo, Taekyung Kim, Hae-gon Jeon, Kiho Kwak, Inwook Shim

Abstract: Traversability estimation for mobile robots in off-road environments requires more than conventional semantic segmentation used in constrained environments like on-road conditions. Recently, approaches to learning a traversability estimation from past driving experiences in a self-supervised manner are arising as they can significantly reduce human labeling costs and labeling errors. However, the… ▽ More Traversability estimation for mobile robots in off-road environments requires more than conventional semantic segmentation used in constrained environments like on-road conditions. Recently, approaches to learning a traversability estimation from past driving experiences in a self-supervised manner are arising as they can significantly reduce human labeling costs and labeling errors. However, the self-supervised data only provide supervision for the actually traversed regions, inducing epistemic uncertainty according to the scarcity of negative information. Negative data are rarely harvested as the system can be severely damaged while logging the data. To mitigate the uncertainty, we introduce a deep metric learning-based method to incorporate unlabeled data with a few positive and negative prototypes in order to leverage the uncertainty, which jointly learns using semantic segmentation and traversability regression. To firmly evaluate the proposed framework, we introduce a new evaluation metric that comprehensively evaluates the segmentation and regression. Additionally, we construct a driving dataset `Dtrail' in off-road environments with a mobile robot platform, which is composed of a wide variety of negative data. We examine our method on Dtrail as well as the publicly available SemanticKITTI dataset. △ Less

Submitted 20 December, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.10003 [pdf, other]

3d human motion generation from the text via gesture action classification and the autoregressive model

Authors: Gwantae Kim, Youngsuk Ryu, Junyeop Lee, David K. Han, Jeongmin Bae, Hanseok Ko

Abstract: In this paper, a deep learning-based model for 3D human motion generation from the text is proposed via gesture action classification and an autoregressive model. The model focuses on generating special gestures that express human thinking, such as waving and nodding. To achieve the goal, the proposed method predicts expression from the sentences using a text classification model based on a pretra… ▽ More In this paper, a deep learning-based model for 3D human motion generation from the text is proposed via gesture action classification and an autoregressive model. The model focuses on generating special gestures that express human thinking, such as waving and nodding. To achieve the goal, the proposed method predicts expression from the sentences using a text classification model based on a pretrained language model and generates gestures using the gate recurrent unit-based autoregressive model. Especially, we proposed the loss for the embedding space for restoring raw motions and generating intermediate motions well. Moreover, the novel data augmentation method and stop token are proposed to generate variable length motions. To evaluate the text classification model and 3D human motion generation model, a gesture action classification dataset and action-based gesture dataset are collected. With several experiments, the proposed method successfully generates perceptually natural and realistic 3D human motion from the text. Moreover, we verified the effectiveness of the proposed method using a public-available action recognition dataset to evaluate cross-dataset generalization performance. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures, ICIP 2022

arXiv:2211.07879 [pdf, ps, other]

doi 10.1017/jfm.2023.331

Machine learning building-block-flow wall model for large-eddy simulation

Authors: Adrián Lozano-Durán, H. Jane Bae

Abstract: A wall model for large-eddy simulation (LES) is proposed by devising the flow as a combination of building blocks. The core assumption of the model is that a finite set of simple canonical flows contains the essential physics to predict the wall-shear stress in more complex scenarios. The model is constructed to predict zero/favourable/adverse mean pressure gradient wall turbulence, separation, st… ▽ More A wall model for large-eddy simulation (LES) is proposed by devising the flow as a combination of building blocks. The core assumption of the model is that a finite set of simple canonical flows contains the essential physics to predict the wall-shear stress in more complex scenarios. The model is constructed to predict zero/favourable/adverse mean pressure gradient wall turbulence, separation, statistically unsteady turbulence with mean flow three-dimensionality, and laminar flow. The approach is implemented using two types of artificial neural networks: a classifier, which identifies the contribution of each building block in the flow, and a predictor, which estimates the wall-shear stress via combination of the building-block flows. The training data are directly obtained from wall-modelled LES (WMLES) optimised to reproduce the correct mean quantities. This approach guarantees the consistency of the training data with the numerical discretisation and the gridding strategy of the flow solver. The output of the model is accompanied by a confidence score in the prediction that aids the detection of regions where the model underperforms. The model is validated in canonical flows (e.g. laminar/turbulent boundary layers, turbulent channels, turbulent Poiseuille-Couette flow, turbulent pipe) and two realistic aircraft configurations: the NASA Common Research Model High-lift and NASA Juncture Flow experiment. It is shown that the building-block-flow wall model outperforms (or matches) the predictions by an equilibrium wall model. It is also concluded that further improvements in WMLES should incorporate advances in subgrid-scale modelling to minimise error propagation to the wall model. △ Less

Submitted 30 April, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

Showing 151–200 of 917 results for author: Bae, J