-
MagAO-X and HST high-contrast imaging of the AS209 disk at H$α$
Authors:
Gabriele Cugno,
Yifan Zhou,
Thanawuth Thanathibodee,
Per Calissendorff,
Michael R. Meyer,
Suzan Edwards,
Jaehan Bae,
Myriam Benisty,
Edwin Bergin,
Matthew De Furio,
Stefano Facchini,
Jared R. Males,
Laird M. Close,
Richard D. Teague,
Olivier Guyon,
Sebastiaan Y. Haffert,
Alexander D. Hedglen,
Maggie Kautz,
Andrés Izquierdo,
Joseph D. Long,
Jennifer Lumbres,
Avalon L. McLeod,
Logan A. Pearce,
Lauren Schatz,
Kyle Van Gorkom
Abstract:
The detection of emission lines associated with accretion processes is a direct method for studying how and where gas giant planets form, how young planets interact with their natal protoplanetary disk and how volatile delivery to their atmosphere takes place. H$α$ ($λ=0.656\,μ$m) is expected to be the strongest accretion line observable from the ground with adaptive optics systems, and is therefo…
▽ More
The detection of emission lines associated with accretion processes is a direct method for studying how and where gas giant planets form, how young planets interact with their natal protoplanetary disk and how volatile delivery to their atmosphere takes place. H$α$ ($λ=0.656\,μ$m) is expected to be the strongest accretion line observable from the ground with adaptive optics systems, and is therefore the target of specific high-contrast imaging campaigns. We present MagAO-X and HST data obtained to search for H$α$ emission from the previously detected protoplanet candidate orbiting AS209, identified through ALMA observations. No signal was detected at the location of the candidate, and we provide limits on its accretion. Our data would have detected an H$α$ emission with $F_\mathrm{Hα}>2.5\pm0.3 \times10^{-16}$ erg s$^{-1}$ cm$^{-2}$, a factor 6.5 lower than the HST flux measured for PDS70b (Zhou et al., 2021). The flux limit indicates that if the protoplanet is currently accreting it is likely that local extinction from circumstellar and circumplanetary material strongly attenuates its emission at optical wavelengths. In addition, the data reveal the first image of the jet north of the star as expected from previous detections of forbidden lines. Finally, this work demonstrates that current ground-based observations with extreme adaptive optics systems can be more sensitive than space-based observations, paving the way to the hunt for small planets in reflected light with extremely large telescopes.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
MAPS: Constraining Serendipitous Time Variability in Protoplanetary Disk Molecular Ion Emission
Authors:
Abygail R. Waggoner,
L. Ilsedore Cleeves,
Ryan A. Loomis,
Yuri Aikawa,
Jaehan Bae,
Jennifer B. Bergner,
Alice S. Booth,
Jenny K. Calahan,
Gianni Cataldi,
Charles J. Law,
Romane Le Gal,
Feng Long,
Karin I. Öberg,
Richard Teague,
David J. Wilner
Abstract:
Theoretical models and observations suggest that the abundances of molecular ions in protoplanetary disks should be highly sensitive to the variable ionization conditions set by the young central star. We present a search for temporal flux variability of HCO+ J=1-0, which was observed as a part of the Molecules with ALMA at Planet-forming Scales (MAPS) ALMA Large Program. We split out and imaged t…
▽ More
Theoretical models and observations suggest that the abundances of molecular ions in protoplanetary disks should be highly sensitive to the variable ionization conditions set by the young central star. We present a search for temporal flux variability of HCO+ J=1-0, which was observed as a part of the Molecules with ALMA at Planet-forming Scales (MAPS) ALMA Large Program. We split out and imaged the line and continuum data for each individual day the five sources were observed (HD 163296, AS 209, GM Aur, MWC 480, and IM Lup, with between 3 to 6 unique visits per source). Significant enhancement (>3σ) was not observed, but we find variations in the spectral profiles in all five disks. Variations in AS 209, GM Aur, and HD 163296 are tentatively attributed to variations in HCO+ flux, while variations in IM Lup and MWC 480 are most likely introduced by differences in the \textit{uv} coverage, which impact the amount of recovered flux during imaging. The tentative detections and low degree of variability are consistent with expectations of X-ray flare driven HCO+ variability, which requires relatively large flares to enhance the HCO+ rotational emission at significant (>20%) levels. These findings also demonstrate the need for dedicated monitoring campaigns with high signal to noise ratios to fully characterize X-ray flare driven chemistry.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
HopPG: Self-Iterative Program Generation for Multi-Hop Question Answering over Heterogeneous Knowledge
Authors:
Yingyao Wang,
Yongwei Zhou,
Chaoqun Duan,
Junwei Bao,
Tiejun Zhao
Abstract:
The semantic parsing-based method is an important research branch for knowledge-based question answering. It usually generates executable programs lean upon the question and then conduct them to reason answers over a knowledge base. Benefit from this inherent mechanism, it has advantages in the performance and the interpretability. However, traditional semantic parsing methods usually generate a c…
▽ More
The semantic parsing-based method is an important research branch for knowledge-based question answering. It usually generates executable programs lean upon the question and then conduct them to reason answers over a knowledge base. Benefit from this inherent mechanism, it has advantages in the performance and the interpretability. However, traditional semantic parsing methods usually generate a complete program before executing it, which struggles with multi-hop question answering over heterogeneous knowledge. On one hand, generating a complete multi-hop program relies on multiple heterogeneous supporting facts, and it is difficult for generators to understand these facts simultaneously. On the other hand, this way ignores the semantic information of the intermediate answers at each hop, which is beneficial for subsequent generation. To alleviate these challenges, we propose a self-iterative framework for multi-hop program generation (HopPG) over heterogeneous knowledge, which leverages the previous execution results to retrieve supporting facts and generate subsequent programs hop by hop. We evaluate our model on MMQA-T^2, and the experimental results show that HopPG outperforms existing semantic-parsing-based baselines, especially on the multi-hop questions.
△ Less
Submitted 10 September, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
A magnetically driven disc wind in the inner disc of PDS 70
Authors:
Justyn Campbell-White,
Carlo F. Manara,
Myriam Benisty,
Antonella Natta,
Rik A. B. Claes,
Antonio Frasca,
Jaehan Bae,
Stefano Facchini,
Andrea Isella,
Laura Pérez,
Paola Pinilla,
Aurora Sicilia-Aguilar,
Richard Teague
Abstract:
PDS 70 is so far the only young disc where multiple planets have been detected by direct imaging. The disc has a large cavity when seen at sub-mm and NIR wavelengths, which hosts two massive planets. This makes PDS 70 the ideal target to study the physical conditions in a strongly depleted inner disc shaped by two giant planets, and in particular to test whether disc winds can play a significant r…
▽ More
PDS 70 is so far the only young disc where multiple planets have been detected by direct imaging. The disc has a large cavity when seen at sub-mm and NIR wavelengths, which hosts two massive planets. This makes PDS 70 the ideal target to study the physical conditions in a strongly depleted inner disc shaped by two giant planets, and in particular to test whether disc winds can play a significant role in its evolution. Using X-Shooter and HARPS spectra, we detected for the first time the wind-tracing [O I] 6300AA line, and confirm the low-moderate value of mass-accretion rate in the literature. The [O I] line luminosity is high with respect to the accretion luminosity when compared to a large sample of discs with cavities in nearby star-forming regions. The FWHM and blue-shifted peak of the [O I] line suggest an emission in a region very close to the star, favouring a magnetically driven wind as the origin. We also detect wind emission and high variability in the He I 10830AA line, which is unusual for low-accretors. We discuss that, although the cavity of PDS 70 was clearly carved out by the giant planets, the substantial inner disc wind could also have had a significant contribution to clearing the inner-disc.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Disordered structure for long-range charge density wave order in annealed crystals of magnetic kagome FeGe
Authors:
Chenfei Shi,
Yi Liu,
Bishal Baran Maity,
Qi Wang,
Surya Rohith Kotla,
Sitaram Ramakrishnan,
Claudio Eisele,
Harshit Agarwal,
Leila Noohinejad,
Qian Tao,
Baojuan Kang,
Zhefeng Lou,
Xiaohui Yang,
Yanpeng Qi,
Xiao Lin,
Zhu-An Xu,
A. Thamizhavel,
Guang-Han Cao,
Sander van Smaalen,
Shixun Cao,
**-Ke Bao
Abstract:
Recently, charge density wave (CDW) has been observed well below the order of antiferromagnetism (AFM) in kagome FeGe in which magnetism and CDW are intertwined to form an emergent quantum ground state. The mechanism of CDW precipitating from an A-type AFM of Fe kagome sublattice is intensively debated. The structural distortion originating from the CDW has yet to be accurately determined in FeGe.…
▽ More
Recently, charge density wave (CDW) has been observed well below the order of antiferromagnetism (AFM) in kagome FeGe in which magnetism and CDW are intertwined to form an emergent quantum ground state. The mechanism of CDW precipitating from an A-type AFM of Fe kagome sublattice is intensively debated. The structural distortion originating from the CDW has yet to be accurately determined in FeGe. Here we resolved the structure model of the CDW in annealed FeGe crystals through single crystal x-ray diffraction via a synchrotron radiation source. The annealed crystals exhibit strong CDW transition signals exemplified by sharp magnetic susceptibility drop and specific heat jump, as well as intense superlattice reflections from 2 $\times$ 2 $\times$ 2 CDW order. Occupational disorder of Ge atoms resulting from short-range CDW correlations above $T_\mathrm{CDW}$ has also been identified from the structure refinements. The dimerization of Ge atoms along c axis has been demonstrated to be the dominant distortion for CDW. The Fe kagome and Ge honeycomb sublattices only undergo subtle distortions. Occupational disorder of Ge atoms is also proved to exist in the CDW phase due to the random selection of partial Ge sites to be dimerized to realize the structural distortion. Our work paves the way to understanding the unconventional nature of CDW in FeGe not only by solving the structural distortion below $T_\mathrm{CDW}$ and identifying fluctuations above it but also by rationalizing the synthesis of high-quality crystals for in-depth investigations in the future.
△ Less
Submitted 17 November, 2023; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Ensemble Kalman Filters with Resampling
Authors:
Omar Al Ghattas,
Jiajun Bao,
Daniel Sanz-Alonso
Abstract:
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state of the system is high dimensional, ensemble Kalman filters are often the method of choice. These algorithms rely on an ensemble of interacting particles to sequentially estimate the state as new observations become available. Despite the practical su…
▽ More
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state of the system is high dimensional, ensemble Kalman filters are often the method of choice. These algorithms rely on an ensemble of interacting particles to sequentially estimate the state as new observations become available. Despite the practical success of ensemble Kalman filters, theoretical understanding is hindered by the intricate dependence structure of the interacting particles. This paper investigates ensemble Kalman filters that incorporate an additional resampling step to break the dependency between particles. The new algorithm is amenable to a theoretical analysis that extends and improves upon those available for filters without resampling, while also performing well in numerical examples.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Constant-depth circuits for Uniformly Controlled Gates and Boolean functions with application to quantum memory circuits
Authors:
Jonathan Allcock,
**ge Bao,
João F. Doriguello,
Alessandro Luongo,
Miklos Santha
Abstract:
We explore the power of the unbounded Fan-Out gate and the Global Tunable gates generated by Ising-type Hamiltonians in constructing constant-depth quantum circuits, with particular attention to quantum memory devices. We propose two types of constant-depth constructions for implementing Uniformly Controlled Gates. These gates include the Fan-In gates defined by…
▽ More
We explore the power of the unbounded Fan-Out gate and the Global Tunable gates generated by Ising-type Hamiltonians in constructing constant-depth quantum circuits, with particular attention to quantum memory devices. We propose two types of constant-depth constructions for implementing Uniformly Controlled Gates. These gates include the Fan-In gates defined by $|x\rangle|b\rangle\mapsto |x\rangle|b\oplus f(x)\rangle$ for $x\in\{0,1\}^n$ and $b\in\{0,1\}$, where $f$ is a Boolean function. The first of our constructions is based on computing the one-hot encoding of the control register $|x\rangle$, while the second is based on Boolean analysis and exploits different representations of $f$ such as its Fourier expansion. Via these constructions, we obtain constant-depth circuits for the quantum counterparts of read-only and read-write memory devices -- Quantum Random Access Memory (QRAM) and Quantum Random Access Gate (QRAG) -- of memory size $n$. The implementation based on one-hot encoding requires either $O(n\log{n}\log\log{n})$ ancillae and $O(n\log{n})$ Fan-Out gates or $O(n\log{n})$ ancillae and $6$ Global Tunable gates. On the other hand, the implementation based on Boolean analysis requires only $2$ Global Tunable gates at the expense of $O(n^2)$ ancillae.
△ Less
Submitted 14 December, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Nonexistence of multi-dimensional solitary waves for the Euler-Poisson system
Authors:
Junsik Bae,
Daisuke Kawagoe
Abstract:
We study the nonexistence of multi-dimensional solitary waves for the Euler-Poisson system governing ion dynamics. It is well-known that the one-dimensional Euler-Poisson system has solitary waves that travel faster than the ion-sound speed. In contrast, we show that the two-dimensional and three-dimensional models do not admit nontrivial irrotational spatially localized traveling waves for any tr…
▽ More
We study the nonexistence of multi-dimensional solitary waves for the Euler-Poisson system governing ion dynamics. It is well-known that the one-dimensional Euler-Poisson system has solitary waves that travel faster than the ion-sound speed. In contrast, we show that the two-dimensional and three-dimensional models do not admit nontrivial irrotational spatially localized traveling waves for any traveling velocity and for general pressure laws. We derive some Pohozaev type identities associated with the energy and density integrals. This approach is extended to prove the nonexistence of irrotational multi-dimensional solitary waves for the two-species Euler-Poisson system for ions and electrons.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Studying Large Language Model Generalization with Influence Functions
Authors:
Roger Grosse,
Juhan Bae,
Cem Anil,
Nelson Elhage,
Alex Tamkin,
Amirhossein Tajdini,
Benoit Steiner,
Dustin Li,
Esin Durmus,
Ethan Perez,
Evan Hubinger,
Kamilė Lukošiūtė,
Karina Nguyen,
Nicholas Joseph,
Sam McCandlish,
Jared Kaplan,
Samuel R. Bowman
Abstract:
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?…
▽ More
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
FlexDTI: Flexible diffusion gradient encoding scheme-based highly efficient diffusion tensor imaging using deep learning
Authors:
Zejun Wu,
Jiechao Wang,
Zunquan Chen,
Qinqin Yang,
Zhen Xing,
Dairong Cao,
Jianfeng Bao,
Taishan Kang,
Jianzhong Lin,
Shuhui Cai,
Zhong Chen,
Congbo Cai
Abstract:
Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. App…
▽ More
Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. Approach: FlexDTI was developed to achieve high-quality DTI parametric map** with flexible number and directions of diffusion encoding gradients. The method used dynamic convolution kernels to embed diffusion gradient direction information into feature maps of the corresponding diffusion signal. Furthermore, it realized the generalization of a flexible number of diffusion gradient directions by setting the maximum number of input channels of the network. The network was trained and tested using datasets from the Human Connectome Project and local hospitals. Results from FlexDTI and other advanced tensor parameter estimation methods were compared. Main results: Compared to other methods, FlexDTI successfully achieves high-quality diffusion tensor-derived parameters even if the number and directions of diffusion encoding gradients change. It reduces normalized root mean squared error (NRMSE) by about 50% on fractional anisotropy (FA) and 15% on mean diffusivity (MD), compared with the state-of-the-art deep learning method with flexible diffusion encoding gradient scheme. Significance: FlexDTI can well learn diffusion gradient direction information to achieve generalized DTI reconstruction with flexible diffusion gradient scheme. Both flexibility and reconstruction quality can be taken into account in this network.
△ Less
Submitted 21 December, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Rapid and Scalable Bayesian AB Testing
Authors:
Srivas Chennu,
Andrew Maher,
Christian Pangerl,
Subash Prabanantham,
Jae Hyeon Bae,
Jamie Martin,
Bud Goswami
Abstract:
AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical p…
▽ More
AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stop**, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stop**, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Tentative co-orbital submillimeter emission within the Lagrangian region L5 of the protoplanet PDS 70 b
Authors:
Olga Balsalobre-Ruza,
Itziar de Gregorio-Monsalvo,
Jorge Lillo-Box,
Nuria Huélamo,
Álvaro Ribas,
Myriam Benisty,
Jaehan Bae,
Stefano Facchini,
Richard Teague
Abstract:
Context: High-spatial resolution Atacama Large Millimeter/submillimeter Array (ALMA) data have revealed a plethora of substructures in protoplanetary disks. Some of those features are thought to trace the formation of embedded planets. One example is the gas and dust that accumulated in the co-orbital Lagrangian regions $L_4$/$L_5$, which were tentatively detected in recent years and might be the…
▽ More
Context: High-spatial resolution Atacama Large Millimeter/submillimeter Array (ALMA) data have revealed a plethora of substructures in protoplanetary disks. Some of those features are thought to trace the formation of embedded planets. One example is the gas and dust that accumulated in the co-orbital Lagrangian regions $L_4$/$L_5$, which were tentatively detected in recent years and might be the pristine material for the formation of Trojan bodies. Aims: This work is part of the TROY project, whose ultimate goal is to find robust evidence of exotrojan bodies and study their implications in the exoplanet field. Here, we focus on the early stages of the formation of these bodies by inspecting the iconic system PDS 70, the only confirmed planetary system in formation. Methods: We reanalyzed archival high-angular resolution Band 7 ALMA observations from PDS 70 by doing an independent imaging process to look for emission in the Lagrangian regions of the two detected gas giant protoplanets, PDS 70 b and c. We then projected the orbital paths and visually inspected emission features at the regions around the $L_4$/$L_5$ locations as defined by $\pm$ 60$^{\circ}$ in azimuth from the planet position. Results: We found emission at a $\sim$4-$σ$ level ($\sim$6-$σ$ when correcting from a cleaning effect) at the position of the $L_{5}$ region of PDS 70 b. This emission corresponds to a dust mass in a range of 0.03- 2 M$_{Moon}$, which potentially accumulated in this gravitational well. Conclusions: The tentative detection of the co-orbital dust trap that we report requires additional observations to be confirmed. We predict that we could detect the co-orbital motion of PDS 70 b and the dust presumably associated with $L_5$ by observing again with the same sensitivity and angular resolution as early as February 2026.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
AltFreezing for More General Video Face Forgery Detection
Authors:
Zhendong Wang,
Jianmin Bao,
Wengang Zhou,
Weilun Wang,
Houqiang Li
Abstract:
Existing face forgery detection models try to discriminate fake images by detecting only spatial artifacts (e.g., generative artifacts, blending) or mainly temporal artifacts (e.g., flickering, discontinuity). They may experience significant performance degradation when facing out-domain artifacts. In this paper, we propose to capture both spatial and temporal artifacts in one model for face forge…
▽ More
Existing face forgery detection models try to discriminate fake images by detecting only spatial artifacts (e.g., generative artifacts, blending) or mainly temporal artifacts (e.g., flickering, discontinuity). They may experience significant performance degradation when facing out-domain artifacts. In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection. A simple idea is to leverage a spatiotemporal model (3D ConvNet). However, we find that it may easily rely on one type of artifact and ignore the other. To address this issue, we present a novel training strategy called AltFreezing for more general face forgery detection. The AltFreezing aims to encourage the model to detect both spatial and temporal artifacts. It divides the weights of a spatiotemporal network into two groups: spatial-related and temporal-related. Then the two groups of weights are alternately frozen during the training process so that the model can learn spatial and temporal features to distinguish real or fake videos. Furthermore, we introduce various video-level data augmentation methods to improve the generalization capability of the forgery detection model. Extensive experiments show that our framework outperforms existing methods in terms of generalization to unseen manipulations and datasets. Code is available at https: //github.com/ZhendongWang6/AltFreezing.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Idealizing Tauc Plot for Accurate Bandgap Determination of Semiconductor with UV-Vis: A Case Study for Cubic Boron Arsenide
Authors:
Hong Zhong,
Fengjiao Pan,
Shuai Yue,
Chengzhen Qin,
Viktor Hadjiev,
Fei Tian,
Xinfeng Liu,
Feng Lin,
Zhiming Wang,
Zhifeng Ren,
Jiming Bao
Abstract:
The Tauc plot method is widely used to determine the bandgap of semiconductors via UV-visible optical spectroscopy due to its simplicity and perceived accuracy. However, the actual Tauc plot often exhibits significant baseline absorption below the expected bandgap, leading to discrepancies in the calculated bandgap depending on whether the linear fit is extrapolated to zero or non-zero baseline. I…
▽ More
The Tauc plot method is widely used to determine the bandgap of semiconductors via UV-visible optical spectroscopy due to its simplicity and perceived accuracy. However, the actual Tauc plot often exhibits significant baseline absorption below the expected bandgap, leading to discrepancies in the calculated bandgap depending on whether the linear fit is extrapolated to zero or non-zero baseline. In this study, we show that both extrapolation methods can produce significant errors by simulating Tauc plots with varying levels of baseline absorption. To address this issue, we propose a new method that involves idealizing the absorption spectrum by removing its baseline before constructing the Tauc plot. Experimental verification of this method using a gallium phosphide (GaP) wafer with intentionally introduced baseline absorptions shows promising results. Furthermore, we apply this new method to cubic boron arsenide (c-BAs) and resolve discrepancies in c-BAs bandgap values reported by different groups, obtaining a converging bandgap of 1.835 eV based on both previous and new transmission spectra. The method is applicable to both indirect and direct bandgap semiconductors, regardless of whether the absorption spectrum is measured via transmission or diffuse reflectance, will become essential to obtain accurate values of their bandgaps.
△ Less
Submitted 12 June, 2023;
originally announced July 2023.
-
Blockwise Key Distillation in Satellite-based Quantum Key Distribution
Authors:
Minu J. Bae,
Nitish K. Panigrahy,
Prajit Dhara,
Walter O. Krawec,
Alexander Russell,
Don Towsley,
Bing Wang
Abstract:
Free-space satellite communication has significantly lower photon loss than terrestrial communication via optical fibers. Satellite-based quantum key distribution (QKD) leverages this advantage and provides a promising direction in achieving long-distance inter-continental QKD. Satellite channels, however, can be highly dynamic due to various environmental factors and time-of-the-day effects, lead…
▽ More
Free-space satellite communication has significantly lower photon loss than terrestrial communication via optical fibers. Satellite-based quantum key distribution (QKD) leverages this advantage and provides a promising direction in achieving long-distance inter-continental QKD. Satellite channels, however, can be highly dynamic due to various environmental factors and time-of-the-day effects, leading to heterogeneous noises over time. In this paper, we compare two key distillation techniques for satellite-based QKD. One is the traditional {\em non-blockwise} strategy that treats all the signals as a whole; the other is a {\em blockwise} strategy that divides the signals into individual blocks that have similar noise characteristics and processes them independently. Through extensive simulation in a wide range of settings, we show trends in optimal parameter choices and when one strategy provides better key generation rates than the other. Our results show that the blockwise strategy can lead to up to $5\%$ key rate improvement (leading to on average $1.9\times10^{7}$ more key bits per day) when considering two types of blocks, i.e., for nighttime and daytime, respectively. The blockwise strategy only requires changes in the classical post-processing stage of QKD and can be easily deployed in existing satellite systems.
△ Less
Submitted 9 July, 2023;
originally announced July 2023.
-
Constraining the gas distribution in the PDS 70 disk as a method to assess the effect of planet-disk interactions
Authors:
B. Portilla-Revelo,
I. Kamp,
S. Facchini,
E. F. van Dishoeck,
C. Law,
Ch. Rab,
J. Bae,
M. Benisty,
K. Öberg,
R. Teague
Abstract:
Embedded planets are potentially the cause of substructures like gaps and cavities observed in several protoplanetary disks. Thus, the substructures observed in the continuum and in line emission encode information about the presence of planets in the system and how they interact with the natal disk. The pre-transitional disk around the star PDS 70 is the first case of two young planets imaged wit…
▽ More
Embedded planets are potentially the cause of substructures like gaps and cavities observed in several protoplanetary disks. Thus, the substructures observed in the continuum and in line emission encode information about the presence of planets in the system and how they interact with the natal disk. The pre-transitional disk around the star PDS 70 is the first case of two young planets imaged within a dust depleted gap that was likely carved by themselves. We aim to determine the spatial distribution of the gas and dust components in the PDS 70 disk. The axisymmetric substructures observed in the resulting profiles are interpreted in the context of planet-disk interactions. We develop a thermo-chemical forward model for an axisymmetric disk to explain a subset of the Atacama Large Millimeter/Submillimeter Array (ALMA) band 6 observations of three CO isotopologues plus the continuum towards PDS 70. Combining the inferred gas and dust distributions, the model results in a variable gas-to-dust ratio profile throughout the disk that spans two orders of magnitude within the first $130$ au and shows a step gradient towards the outer disk, which is consistent with the presence of a pressure maxima driven by planet-disk interactions. We find a gas density drop factor of ${\sim} 19$ at the location of the planet PDS 70 c with respect to the peak gas density at $75$ au. Combining this value with literature results on the hydrodynamics of planet-disk interactions, we find this gas gap depth to be consistent with independent planet mass estimates from infrared observations. Our findings point towards gas stirring processes taking place in the common gap due to the gravitational perturbation of both planets.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
BMAD: Benchmarks for Medical Anomaly Detection
Authors:
**an Bao,
Hanshi Sun,
Hanqiu Deng,
Yinsheng He,
Zhaoxiang Zhang,
Xingyu Li
Abstract:
Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. However, there is a lack of a universal and fair benchmark for evaluating AD…
▽ More
Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. However, there is a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain. To bridge this gap, we introduce a comprehensive evaluation benchmark for assessing anomaly detection methods on medical images. This benchmark encompasses six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fourteen state-of-the-art AD algorithms. This standardized and well-curated medical benchmark with the well-structured codebase enables comprehensive comparisons among recently proposed anomaly detection methods. It will facilitate the community to conduct a fair comparison and advance the field of AD on medical imaging. More information on BMAD is available in our GitHub repository: https://github.com/DorisBao/BMAD
△ Less
Submitted 27 April, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Magnetically-activated accretion outbursts of pre-main sequence discs
Authors:
Jacob Cleaver,
Lee Hartmann,
Jaehan Bae
Abstract:
We investigate whether triggering of the magnetorotational instability (MRI) in protoplanetary discs can account for the wide diversity of observed accretion outbursts. We show that short-lived, relatively low accretion rate events probably result from triggering in the inner disc and can occur at low surface densities, comparable to or smaller than the minimum mass solar nebula, and thus are very…
▽ More
We investigate whether triggering of the magnetorotational instability (MRI) in protoplanetary discs can account for the wide diversity of observed accretion outbursts. We show that short-lived, relatively low accretion rate events probably result from triggering in the inner disc and can occur at low surface densities, comparable to or smaller than the minimum mass solar nebula, and thus are very unlikely to result from MRI triggering by gravitational instability. We develop time-dependent accretion disc models using an $α$-viscosity approach and calculate light curves to compare with observations. Our modeling indicates that the lag time between infrared and optical bursts seen in Gaia 17bpi can be explained with an outside-in propagation with an $α\sim 0.1$ in the MRI-active region, consistent with other estimates. While outbursts in inner discs can show time delays of a few years between infrared and optical light curves, our models indicate that large, FU Ori-like bursts can exhibit infrared precursors decades before optical bursts. Detecting such precursors could enable analysis of the central star before it is overwhelmed by the rapid accreting material, as well as constraining outburst physics. Our results emphasize the importance of near-infrared monitoring of young stellar objects in addition to optical surveys. In addition, our findings emphasize the need for more sophisticated, three-dimensional, non-ideal magnetohydrodynamic simulations to fully exploit observational results.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets
Authors:
Yu Lu,
Junwei Bao,
Zichen Ma,
Xiaoguang Han,
Youzheng Wu,
Shuguang Cui,
Xiaodong He
Abstract:
High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling or designing and extending recommender dialogue templates. However, they suffer from (i) the limited number of human annotators results in that datasets can har…
▽ More
High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling or designing and extending recommender dialogue templates. However, they suffer from (i) the limited number of human annotators results in that datasets can hardly capture rich and large-scale cases in the real world, (ii) the limited experience and knowledge of annotators account for the uninformative corpus and inappropriate recommendations. In this paper, we propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues through a data2text generation process, where unstructured recommendation conversations are generated from structured graphs based on user-item information from the real world. In doing so, we comprehensively exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets. Extensive experiments validate the benefit brought by the automatically synthesized data under low-resource scenarios and demonstrate the promising potential to facilitate the development of a more effective conversational recommendation system.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Benchmarking Neural Network Training Algorithms
Authors:
George E. Dahl,
Frank Schneider,
Zachary Nado,
Naman Agarwal,
Chandramouli Shama Sastry,
Philipp Hennig,
Sourabh Medapati,
Runa Eschenhagen,
Priya Kasimbeg,
Daniel Suo,
Juhan Bae,
Justin Gilmer,
Abel L. Peirson,
Bilal Khan,
Rohan Anil,
Mike Rabbat,
Shankar Krishnan,
Daniel Snider,
Ehsan Amid,
Kongtao Chen,
Chris J. Maddison,
Rakshith Vasudev,
Michal Badura,
Ankush Garg,
Peter Mattson
Abstract:
Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi…
▽ More
Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a community, we are currently unable to reliably identify training algorithm improvements, or even determine the state-of-the-art training algorithm. In this work, using concrete experiments, we argue that real progress in speeding up training requires new benchmarks that resolve three basic challenges faced by empirical comparisons of training algorithms: (1) how to decide when training is complete and precisely measure training time, (2) how to handle the sensitivity of measurements to exact workload details, and (3) how to fairly compare algorithms that require hyperparameter tuning. In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark. Our benchmark includes a set of workload variants that make it possible to detect benchmark submissions that are more robust to workload changes than current widely-used methods. Finally, we evaluate baseline submissions constructed using various optimizers that represent current practice, as well as other optimizers that have recently received attention in the literature. These baseline results collectively demonstrate the feasibility of our benchmark, show that non-trivial gaps between methods exist, and set a provisional state-of-the-art for future benchmark submissions to try and surpass.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Triple spiral arms of a triple protostar system imaged in molecular lines
Authors:
Jeong-Eun Lee,
Tomoaki Matsumoto,
Hyun-Jeong Kim,
Seokho Lee,
Daniel Harsono,
Jaehan Bae,
Neal J. Evans II,
Shu-ichiro Inutsuka,
Minho Choi,
Ken'ichi Tatematsu,
Jae-Joon Lee,
Dan Jaffe
Abstract:
Most stars form in multiple star systems. For a better understanding of their formation processes, it is important to resolve the individual protostellar components and the surrounding envelope and disk material at the earliest possible formation epoch because the formation history can be lost in a few orbital timescales. Here we present the ALMA observational results of a young multiple protostel…
▽ More
Most stars form in multiple star systems. For a better understanding of their formation processes, it is important to resolve the individual protostellar components and the surrounding envelope and disk material at the earliest possible formation epoch because the formation history can be lost in a few orbital timescales. Here we present the ALMA observational results of a young multiple protostellar system, IRAS 04239+2436, where three well-developed large spiral arms were detected in the shocked SO emission. Along the most conspicuous arm, the accretion streamer was also detected in the SO$_2$ emission. The observational results are complemented by numerical magneto-hydrodynamic simulations, where those large arms only appear in magnetically weakened clouds. The numerical simulations also suggest that the large triple spiral arms are the result of gravitational interactions between compact triple protostars and the turbulent infalling envelope.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
HQ-50K: A Large-scale, High-quality Dataset for Image Restoration
Authors:
Qinhong Yang,
Dongdong Chen,
Zhentao Tan,
Qiankun Liu,
Qi Chu,
Jianmin Bao,
Lu Yuan,
Gang Hua,
Nenghai Yu
Abstract:
This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50,000 high-quality images with rich texture details and semantic diversity. We analyze existing image restoration datasets from five different perspectives, including data scale, resolution, compression rates, texture details, and semantic coverage. However, we find that all of these datasets are defi…
▽ More
This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50,000 high-quality images with rich texture details and semantic diversity. We analyze existing image restoration datasets from five different perspectives, including data scale, resolution, compression rates, texture details, and semantic coverage. However, we find that all of these datasets are deficient in some aspects. In contrast, HQ-50K considers all of these five aspects during the data curation process and meets all requirements. We also present a new Degradation-Aware Mixture of Expert (DAMoE) model, which enables a single model to handle multiple corruption types and unknown levels. Our extensive experiments demonstrate that HQ-50K consistently improves the performance on various image restoration tasks, such as super-resolution, denoising, dejpeg, and deraining. Furthermore, our proposed DAMoE, trained on our \dataset, outperforms existing state-of-the-art unified models designed for multiple restoration tasks and levels. The dataset and code are available at \url{https://github.com/littleYaang/HQ-50K}.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Designing a Better Asymmetric VQGAN for StableDiffusion
Authors:
Zixin Zhu,
Xuelu Feng,
Dongdong Chen,
Jianmin Bao,
Le Wang,
Yinpeng Chen,
Lu Yuan,
Gang Hua
Abstract:
StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real ima…
▽ More
StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while kee** the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at \url{https://github.com/buxiangzhiren/Asymmetric_VQGAN}.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Using Cosmic Ray Muons to Assess Geological Characteristics in the Subsurface
Authors:
Harish R Gadey,
Robert Howard,
Stefano C Tognini,
Jennifer L Meszaros,
Rose A Montgomery,
Stylianos Chatzidakis,
JungHyun Bae,
Robert Clark
Abstract:
Cosmic rays are energetic nuclei and elementary particles that originate from stars and intergalactic events. The interaction of these particles with the upper atmosphere produces a range of secondary particles that reach the surface of the earth, of which muons are the most prominent. With enough energy, muons can travel up to a few kilometers beneath the surface of the earth before being stopped…
▽ More
Cosmic rays are energetic nuclei and elementary particles that originate from stars and intergalactic events. The interaction of these particles with the upper atmosphere produces a range of secondary particles that reach the surface of the earth, of which muons are the most prominent. With enough energy, muons can travel up to a few kilometers beneath the surface of the earth before being stopped completely. The terrestrial muon flux profile and associated zenith angle can be utilized to determine geological characteristics of a location without having to use conventional methods. This work intends to use a low-power plastic scintillator-based muon detection system for this non-destructive geological assay methodology. 4 custom designed plastic scintillation panels are used to realize two orthogonal detection planes. Simultaneous triggers between detectors from two planes indicate a coincidence event which is recorded using a data acquisition system from FNAL.
In order to quantify the systematic uncertainties associated with the detector, such as energy depositions and angular resolution of the detector design, a Monte Carlo simulation using Geant4 is being developed. Simulated and experimental data will drive the development and validation of a reconstruction algorithm that, upon completion, is expected to predict average overburden and rock density. Extended detector exposure to muons can be used as a means to understand changes in the surrounding environment like rock porosity. On the experimental front, the measured flux data will be used to benchmark independent and established models. Successful proof-of-concept demonstration of this technology can open doors for long term non-invasive geological monitoring. The detector design, and experimental methodology are detailed in this work.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Authors:
Shalev Lifshitz,
Keiran Paster,
Harris Chan,
Jimmy Ba,
Sheila McIlraith
Abstract:
Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces a methodology, inspired by unCLIP, for instruction-tuning generative models of behavior without relying on a large dataset of instruction-labeled trajectories. Using this methodology, we create an instruction-tuned Video Pretraining (VPT) model called STEVE…
▽ More
Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces a methodology, inspired by unCLIP, for instruction-tuning generative models of behavior without relying on a large dataset of instruction-labeled trajectories. Using this methodology, we create an instruction-tuned Video Pretraining (VPT) model called STEVE-1, which can follow short-horizon open-ended text and visual instructions in Minecraft. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, reducing the need for costly human text annotations, and all for only $60 of compute. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 sets a new bar for open-ended instruction-following in Minecraft with low-level controls (mouse and keyboard) and raw pixel inputs, far outperforming previous baselines and robustly completing 12 of 13 tasks in our early-game evaluation suite. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, training scripts, and evaluation tools are made available for further research.
△ Less
Submitted 3 February, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality Estimation
Authors:
**sheng Ba,
Manuel Rigger
Abstract:
Database Management Systems (DBMSs) process a given query by creating a query plan, which is subsequently executed, to compute the query's result. Deriving an efficient query plan is challenging, and both academia and industry have invested decades into researching query optimization. Despite this, DBMSs are prone to performance issues, where a DBMS produces an unexpectedly inefficient query plan…
▽ More
Database Management Systems (DBMSs) process a given query by creating a query plan, which is subsequently executed, to compute the query's result. Deriving an efficient query plan is challenging, and both academia and industry have invested decades into researching query optimization. Despite this, DBMSs are prone to performance issues, where a DBMS produces an unexpectedly inefficient query plan that might lead to the slow execution of a query. Finding such issues is a longstanding problem and inherently difficult, because no ground truth information on an expected execution time exists. In this work, we propose Cardinality Estimation Restriction Testing (CERT), a novel technique that finds performance issues through the lens of cardinality estimation. Given a query on a database, CERT derives a more restrictive query (e.g., by replacing a LEFT JOIN with an INNER JOIN), whose estimated number of rows should not exceed the estimated number of rows for the original query. CERT tests cardinality estimation specifically, because they were shown to be the most important part for query optimization; thus, we expect that finding and fixing such issues might result in the highest performance gains. In addition, we found that other kinds of query optimization issues can be exposed by unexpected estimated cardinalities, which can also be found by CERT. CERT is a black-box technique that does not require access to the source code; DBMSs expose query plans via the EXPLAIN statement. CERT eschews executing queries, which is costly and prone to performance fluctuations. We evaluated CERT on three widely used and mature DBMSs, MySQL, TiDB, and CockroachDB. CERT found 13 unique issues, of which 2 issues were fixed and 9 confirmed by the developers. We expect that this new angle on finding performance bugs will help DBMS developers in improving DMBSs' performance.
△ Less
Submitted 9 January, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Interior derivative estimates and Bernstein theorem for Hessian quotient equations
Authors:
Limei Dai,
Jiguang Bao,
Bo Wang
Abstract:
In this paper, we obtain the interior derivative estimates of solutions for elliptic and parabolic Hessian quotient equations. Then we establish the Bernstein theorem for parabolic Hessian quotient equations, that is, any parabolically convex solution $u=u(x,t)\in C^{4,2}(\mathbb{R}^n\times (-\infty,0])$ for $-u_t\frac{S_n(D^2u)}{S_l(D^2u)}=1$ in $\mathbb{R}^n\times (-\infty,0]$ must be the form o…
▽ More
In this paper, we obtain the interior derivative estimates of solutions for elliptic and parabolic Hessian quotient equations. Then we establish the Bernstein theorem for parabolic Hessian quotient equations, that is, any parabolically convex solution $u=u(x,t)\in C^{4,2}(\mathbb{R}^n\times (-\infty,0])$ for $-u_t\frac{S_n(D^2u)}{S_l(D^2u)}=1$ in $\mathbb{R}^n\times (-\infty,0]$ must be the form of $u=-mt+P(x)$ with $m>0$ being a constant and $P$ being a convex quadratic polynomial.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Authors:
Shihao Zhao,
Dongdong Chen,
Yen-Chun Chen,
Jianmin Bao,
Shaozhe Hao,
Lu Yuan,
Kwan-Yee K. Wong
Abstract:
Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challeng…
▽ More
Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model. Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models, eliminating the huge cost of training from scratch. Moreover, thanks to some dedicated adapter designs, Uni-ControlNet only necessitates a constant number (i.e., 2) of adapters, regardless of the number of local or global controls used. This not only reduces the fine-tuning costs and model size, making it more suitable for real-world deployment, but also facilitate composability of different conditions. Through both quantitative and qualitative comparisons, Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability. Code is available at \url{https://github.com/ShihaoZhaoZSH/Uni-ControlNet}.
△ Less
Submitted 29 October, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Training on Thin Air: Improve Image Classification with Generated Data
Authors:
Yongchao Zhou,
Hshmat Sahak,
Jimmy Ba
Abstract:
Acquiring high-quality data for training discriminative models is a crucial yet challenging aspect of building effective predictive systems. In this paper, we present Diffusion Inversion, a simple yet effective method that leverages the pre-trained generative model, Stable Diffusion, to generate diverse, high-quality training data for image classification. Our approach captures the original data d…
▽ More
Acquiring high-quality data for training discriminative models is a crucial yet challenging aspect of building effective predictive systems. In this paper, we present Diffusion Inversion, a simple yet effective method that leverages the pre-trained generative model, Stable Diffusion, to generate diverse, high-quality training data for image classification. Our approach captures the original data distribution and ensures data coverage by inverting images to the latent space of Stable Diffusion, and generates diverse novel training images by conditioning the generative model on noisy versions of these vectors. We identify three key components that allow our generated images to successfully supplant the original dataset, leading to a 2-3x enhancement in sample complexity and a 6.5x decrease in sampling time. Moreover, our approach consistently outperforms generic prompt-based steering methods and KNN retrieval baseline across a wide range of datasets. Additionally, we demonstrate the compatibility of our approach with widely-used data augmentation techniques, as well as the reliability of the generated data in supporting various neural architectures and enhancing few-shot learning.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Authors:
Yann Dubois,
Xuechen Li,
Rohan Taori,
Tianyi Zhang,
Ishaan Gulrajani,
Jimmy Ba,
Carlos Guestrin,
Percy Liang,
Tatsunori B. Hashimoto
Abstract:
Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their strong instruction-following abilities. Develo** these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following requires tackling three major challenges: the high cost of data collection, the lack of trustworthy eva…
▽ More
Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their strong instruction-following abilities. Develo** these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following requires tackling three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these challenges with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. First, we design LLM prompts to simulate human feedback that are 50x cheaper than crowdworkers and display high agreement with humans. Second, we propose an automatic evaluation and validate it against human instructions obtained on real-world interactions. Third, we contribute reference implementations for several methods (PPO, DPO, best-of-n, expert iteration, and more) that learn from pairwise feedback. Finally, as an end-to-end validation of AlpacaFarm, we train and evaluate eleven models on 10k pairs of real human feedback and show that rankings of models trained in AlpacaFarm match rankings of models trained on human data. As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003. We release all components of AlpacaFarm at https://github.com/tatsu-lab/alpaca_farm.
△ Less
Submitted 7 January, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding
Authors:
Augustin Toma,
Patrick R. Lawler,
Jimmy Ba,
Rahul G. Krishnan,
Barry B. Rubin,
Bo Wang
Abstract:
We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical Camel achieves state-of-the-art performance across medical benchmarks among openly available medical LLMs. Leveraging efficient single-GPU training, Clinical Camel surpasses GPT-3.5 in five-shot evaluations on all assessed benchmarks, including 64.3…
▽ More
We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical Camel achieves state-of-the-art performance across medical benchmarks among openly available medical LLMs. Leveraging efficient single-GPU training, Clinical Camel surpasses GPT-3.5 in five-shot evaluations on all assessed benchmarks, including 64.3% on the USMLE Sample Exam (compared to 58.5% for GPT-3.5), 77.9% on PubMedQA (compared to 60.2%), 60.7% on MedQA (compared to 53.6%), and 54.2% on MedMCQA (compared to 51.0%). In addition to these benchmarks, Clinical Camel demonstrates its broader capabilities, such as synthesizing plausible clinical notes. This work introduces dialogue-based knowledge encoding, a novel method to synthesize conversational data from dense medical texts. While benchmark results are encouraging, extensive and rigorous human evaluation across diverse clinical scenarios is imperative to ascertain safety before implementation. By openly sharing Clinical Camel, we hope to foster transparent and collaborative research, working towards the safe integration of LLMs within the healthcare domain. Significant challenges concerning reliability, bias, and the potential for outdated knowledge persist. Nonetheless, the transparency provided by an open approach reinforces the scientific rigor essential for future clinical applications.
△ Less
Submitted 17 August, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Entire solutions to the parabolic Monge--Ampère equation with unbounded nonlinear growth in time
Authors:
Ning An,
Jiguang Bao,
Zixiao Liu
Abstract:
The Liouville type theorem on the parabolic Monge--Ampère equation $-u_t\det D^2u=1$ states that any entire parabolically convex classical solution must be of form $-t+|x|^2/2$ up to a re-scaling and transformation, under additional assumption that partial derivative with respect to time variable $u_t$ is strictly negative and bounded. In this paper, we study the case when $u_t$ is unbounded, prov…
▽ More
The Liouville type theorem on the parabolic Monge--Ampère equation $-u_t\det D^2u=1$ states that any entire parabolically convex classical solution must be of form $-t+|x|^2/2$ up to a re-scaling and transformation, under additional assumption that partial derivative with respect to time variable $u_t$ is strictly negative and bounded. In this paper, we study the case when $u_t$ is unbounded, prove an existence result of entire parabolically convex smooth solution and investigate the asymptotic behavior near infinity.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Global simulations of kinetic-magnetohydrodynamic processes with energetic electrons in tokamak plasmas
Authors:
Jian Bao,
Wenlu Zhang,
Ding Li,
Zhihong Lin,
Zhiyong Qiu,
Wei Chen,
Xiang Zhu,
Junyi Cheng,
Chao Dong,
**tao Cao
Abstract:
The energetic electrons (EEs) generated through auxiliary heating have been found to destabilize various Alfven eigenmodes (AEs) in recent experiments, which in turn lead to the EE transport and degrade the plasma energy confinement. In this work, we propose a global fluid-kinetic hybrid model for studying corresponding kinetic-magnetohydrodynamic (MHD) processes by coupling the drift-kinetic EEs…
▽ More
The energetic electrons (EEs) generated through auxiliary heating have been found to destabilize various Alfven eigenmodes (AEs) in recent experiments, which in turn lead to the EE transport and degrade the plasma energy confinement. In this work, we propose a global fluid-kinetic hybrid model for studying corresponding kinetic-magnetohydrodynamic (MHD) processes by coupling the drift-kinetic EEs to the Landau-fluid model of bulk plasmas in a non-perturbative manner. The numerical capability of Landau-fluid bulk plasmas is obtained based on a well-benchmarked eigenvalue code MAS [Multiscale Analysis of plasma Stabilities, J. Bao et al. Nucl. Fusion accepted 2023], and the EE responses to the electromagnetic fluctuations are analytically derived, which not only contribute to the MHD interchange drive and parallel current but also lead to the newly kinetic particle compression with the precessional drift resonance in the leading order. The hybrid model is casted into a nonlinear eigenvalue matrix equation and solved iteratively using Newton's method. By calibrating the EE precession frequency against the particle equation of motion in general geometry and applying more realistic trapped particle distribution in the poloidal plane, MAS simulations of EE-driven beta-induced Alfven eigenmodes (e-BAE) show excellent agreements with gyrokinetic particle-in-cell simulations, and the non-perturbative effects of EEs on e-BAE mode structure, growth rate and dam** rate are demonstrated. With these efforts, the upgraded MAS greatly improves the computation efficiency for plasma problems related to deeply-trapped EEs, which is superior than initial-value simulations restricted by the stringent electron Courant condition regarding to the practical application of fast linear analysis.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization
Authors:
Anastasia Razdaibiedina,
Yuning Mao,
Rui Hou,
Madian Khabsa,
Mike Lewis,
Jimmy Ba,
Amjad Almahairi
Abstract:
Prompt tuning is one of the successful approaches for parameter-efficient tuning of pre-trained language models. Despite being arguably the most parameter-efficient (tuned soft prompts constitute <0.1% of total parameters), it typically performs worse than other efficient tuning methods and is quite sensitive to hyper-parameters. In this work, we introduce Residual Prompt Tuning - a simple and eff…
▽ More
Prompt tuning is one of the successful approaches for parameter-efficient tuning of pre-trained language models. Despite being arguably the most parameter-efficient (tuned soft prompts constitute <0.1% of total parameters), it typically performs worse than other efficient tuning methods and is quite sensitive to hyper-parameters. In this work, we introduce Residual Prompt Tuning - a simple and efficient method that significantly improves the performance and stability of prompt tuning. We propose to reparameterize soft prompt embeddings using a shallow network with a residual connection. Our experiments show that Residual Prompt Tuning significantly outperforms prompt tuning on SuperGLUE benchmark. Notably, our method reaches +7 points improvement over prompt tuning with T5-Base and allows to reduce the prompt length by 10x without hurting performance. In addition, we show that our approach is robust to the choice of learning rate and prompt initialization, and is effective in few-shot settings.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.
-
PMP: Learning to Physically Interact with Environments using Part-wise Motion Priors
Authors:
**seok Bae,
Jungdam Won,
Donggeun Lim,
Cheol-Hui Min,
Young Min Kim
Abstract:
We present a method to animate a character incorporating multiple part-wise motion priors (PMP). While previous works allow creating realistic articulated motions from reference data, the range of motion is largely limited by the available samples. Especially for the interaction-rich scenarios, it is impractical to attempt acquiring every possible interacting motion, as the combination of physical…
▽ More
We present a method to animate a character incorporating multiple part-wise motion priors (PMP). While previous works allow creating realistic articulated motions from reference data, the range of motion is largely limited by the available samples. Especially for the interaction-rich scenarios, it is impractical to attempt acquiring every possible interacting motion, as the combination of physical parameters increases exponentially. The proposed PMP allows us to assemble multiple part skills to animate a character, creating a diverse set of motions with different combinations of existing data. In our pipeline, we can train an agent with a wide range of part-wise priors. Therefore, each body part can obtain a kinematic insight of the style from the motion captures, or at the same time extract dynamics-related information from the additional part-specific simulation. For example, we can first train a general interaction skill, e.g. gras**, only for the dexterous part, and then combine the expert trajectories from the pre-trained agent with the kinematic priors of other limbs. Eventually, our whole-body agent learns a novel physical interaction skill even with the absence of the object trajectories in the reference motion sequence.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Wall Modeling of Turbulent Flows with Various Pressure Gradients Using Multi-Agent Reinforcement Learning
Authors:
Di Zhou,
H. Jane Bae
Abstract:
We propose a framework for develo** wall models for large-eddy simulation that is able to capture pressure-gradient effects using multi-agent reinforcement learning. Within this framework, the distributed reinforcement learning agents receive off-wall environmental states including pressure gradient and turbulence strain rate, ensuring adaptability to a wide range of flows characterized by press…
▽ More
We propose a framework for develo** wall models for large-eddy simulation that is able to capture pressure-gradient effects using multi-agent reinforcement learning. Within this framework, the distributed reinforcement learning agents receive off-wall environmental states including pressure gradient and turbulence strain rate, ensuring adaptability to a wide range of flows characterized by pressure-gradient effects and separations. Based on these states, the agents determine an action to adjust the wall eddy viscosity, and consequently the wall-shear stress. The model training is in-situ with wall-modeled large-eddy simulation grid resolutions and does not rely on the instantaneous velocity fields from high-fidelity simulations. Throughout the training, the agents compute rewards from the relative error in the estimated wall-shear stress, which allows the agents to refine an optimal control policy that minimizes prediction errors. Employing this framework, wall models are trained for two distinct subgrid-scale models using low-Reynolds-number flow over periodic hills. These models are validated through simulations of flows over periodic hills at higher Reynolds numbers and flow over the Boeing Gaussian bump. The developed wall models successfully capture the acceleration and deceleration of wall-bounded turbulent flows under pressure gradients and outperform the equilibrium wall model in predicting skin friction.
△ Less
Submitted 1 November, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
IoTFlowGenerator: Crafting Synthetic IoT Device Traffic Flows for Cyber Deception
Authors:
Joseph Bao,
Murat Kantarcioglu,
Yevgeniy Vorobeychik,
Charles Kamhoua
Abstract:
Over the years, honeypots emerged as an important security tool to understand attacker intent and deceive attackers to spend time and resources. Recently, honeypots are being deployed for Internet of things (IoT) devices to lure attackers, and learn their behavior. However, most of the existing IoT honeypots, even the high interaction ones, are easily detected by an attacker who can observe honeyp…
▽ More
Over the years, honeypots emerged as an important security tool to understand attacker intent and deceive attackers to spend time and resources. Recently, honeypots are being deployed for Internet of things (IoT) devices to lure attackers, and learn their behavior. However, most of the existing IoT honeypots, even the high interaction ones, are easily detected by an attacker who can observe honeypot traffic due to lack of real network traffic originating from the honeypot. This implies that, to build better honeypots and enhance cyber deception capabilities, IoT honeypots need to generate realistic network traffic flows. To achieve this goal, we propose a novel deep learning based approach for generating traffic flows that mimic real network traffic due to user and IoT device interactions. A key technical challenge that our approach overcomes is scarcity of device-specific IoT traffic data to effectively train a generator. We address this challenge by leveraging a core generative adversarial learning algorithm for sequences along with domain specific knowledge common to IoT devices. Through an extensive experimental evaluation with 18 IoT devices, we demonstrate that the proposed synthetic IoT traffic generation tool significantly outperforms state of the art sequence and packet generators in remaining indistinguishable from real traffic even to an adaptive attacker.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
A New Momentum-Integrated Muon Tomography Imaging Algorithm
Authors:
JungHyun Bae,
Rose Montgomery,
Stylianos Chatzidakis
Abstract:
For decades, the application of muon tomography to spent nuclear fuel (SNF) cask imaging has been theoretically evaluated and experimentally verified by many research groups around the world, including Los Alamos National Laboratory in the United States, Canadian Nuclear Laboratory in Canada, the National Institute for Nuclear Physics in Italy, and Toshiba in Japan. Although monitoring of SNF usin…
▽ More
For decades, the application of muon tomography to spent nuclear fuel (SNF) cask imaging has been theoretically evaluated and experimentally verified by many research groups around the world, including Los Alamos National Laboratory in the United States, Canadian Nuclear Laboratory in Canada, the National Institute for Nuclear Physics in Italy, and Toshiba in Japan. Although monitoring of SNF using cosmic ray muons has attracted significant attention as a promising nontraditional nondestructive radiographic technique, the wide application of muon tomography is often limited because of the natural low cosmic ray muon flux at sea level: 100 m-2min-1sr-1. Recent studies suggest measuring muon momentum in muon scattering tomography (MST) applications to address this challenge. Some techniques have been discussed; however, an imaging algorithm for momentum-coupled MST had not been developed. This paper presents a new imaging algorithm for MST which integrates muon scattering angle and momentum in a single M-value. To develop a relationship between muon momentum and scattering angle distribution, various material samples (Al, Fe, Pb, and U) were thoroughly investigated using a Monte Carlo particle transport code GEANT4 simulation. Reconstructed images of an SNF cask using the new algorithm are presented herein to demonstrate the benefit of measuring muon momentum in MST. In this analysis a missing fuel assembly (FA) was located in the dry storage cask.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
Authors:
Zhaoyan Liu,
Noel Vouitsis,
Satya Krishna Gorti,
Jimmy Ba,
Gabriel Loaiza-Ganem
Abstract:
We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CL…
▽ More
We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic map** which "translates" between the space of conditions and the latent space of the generative model, in such a way that the generated latent corresponds to a data sample satisfying the desired condition. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed -- all while retaining a much higher level of generality. Our code is available at https://github.com/layer6ai-labs/tr0n.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Non-Local and Quantum Advantages in Network Coding for Multiple Access Channels
Authors:
Jiyoung Yun,
Ashutosh Rai,
Joonwoo Bae
Abstract:
Devising efficient communication in a network consisting of multiple transmitters and receivers is a problem of immense importance in communication theory. Interestingly, resources in the quantum world have been shown to be very effective in enhancing the performance of communication networks. In this work, we study entanglement-assisted communication over classical network channels. When there is…
▽ More
Devising efficient communication in a network consisting of multiple transmitters and receivers is a problem of immense importance in communication theory. Interestingly, resources in the quantum world have been shown to be very effective in enhancing the performance of communication networks. In this work, we study entanglement-assisted communication over classical network channels. When there is asymmetry such that noise introduced by the channel depends on the input alphabets, non communicating senders may exploit shared entangled states to overcome the noise. We consider multiple access channels, an essential building block for many complex networks, and develop an extensive framework for n-senders and 1-receiver multiple access channels based on nonlocal games. We obtain generic results for computing correlation assisted sum-capacities of these channels. The considered channels introduce less noise on winning and more noise on losing the game, and the correlation assistance is classified as local (L), quantum (Q), or no-signaling (NS). Furthermore, we consider a broad class of multiple access channels such as depolarizing ones that admix a uniform noise with some probability and prove general results on their sum-capacities. Finally, we apply our analysis to three specific depolarizing multiple access channels based on Clauser-Horne-Shimony-Holt, magic square, and Mermin-GHZ nonlocal games. In all three cases we find significant enhancements in sum-capacities on using nonlocal correlations. We obtain either exact expressions for sum-capacities or suitable upper and lower bounds on them. The general framework developed in this work has much wider applicability and the specificity studied in details are some illustrative examples to compare with recent studies in this direction.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
MAS: A versatile Landau-fluid eigenvalue code for plasma stability analysis in general geometry
Authors:
Jian Bao,
Wenlu Zhang,
Ding Li,
Zhihong Lin,
Ge Dong,
Chang Liu,
Huasheng Xie,
Guo Meng,
Junyi Cheng,
Chao Dong,
**tao Cao
Abstract:
We have developed a new global eigenvalue code, Multiscale Analysis for plasma Stabilities (MAS), for studying plasma problems with wave toroidal mode number n and frequency omega in a broad range of interest in general tokamak geometry, based on a five-field Landau-fluid description of thermal plasmas. Beyond kee** the necessary plasma fluid response, we further retain the important kinetic eff…
▽ More
We have developed a new global eigenvalue code, Multiscale Analysis for plasma Stabilities (MAS), for studying plasma problems with wave toroidal mode number n and frequency omega in a broad range of interest in general tokamak geometry, based on a five-field Landau-fluid description of thermal plasmas. Beyond kee** the necessary plasma fluid response, we further retain the important kinetic effects including diamagnetic drift, ion finite Larmor radius, finite parallel electric field, ion and electron Landau resonances in a self-consistent and non-perturbative manner without sacrificing the attractive efficiency in computation. The physical capabilities of the code are evaluated and examined in the aspects of both theory and simulation. In theory, the comprehensive Landau-fluid model implemented in MAS can be reduced to the well-known ideal MHD model, electrostatic ion-fluid model, and drift-kinetic model in various limits, which clearly delineates the physics validity regime. In simulation, MAS has been well benchmarked with theory and other gyrokinetic and kinetic-MHD hybrid codes in a manner of adopting the unified physical and numerical framework, which covers the kinetic Alfven wave, ion sound wave, low-n kink, high-n ion temperature gradient mode and kinetic ballooning mode. Moreover, MAS is successfully applied to model the Alfven eigenmode (AE) activities in DIII-D discharge #159243, which faithfully captures the frequency swee** of RSAE, the tunneling dam** of TAE, as well as the polarization characteristics of KBAE and BAAE being consistent with former gyrokinetic theory and simulation. With respect to the key progress contributed to the community, MAS has the advantage of combining rich physics ingredients, realistic global geometry and high computation efficiency together for plasma stability analysis in linear regime.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Gap Opening in Protoplanetary Disks: Gas Dynamics from Global Non-ideal MHD Simulations with Consistent Thermochemistry
Authors:
Xiao Hu,
Zhi-Yun Li,
Lile Wang,
Zhaohuan Zhu,
Jaehan Bae
Abstract:
Recent high angular resolution ALMA observations have revealed numerous gaps in protoplanetary disks. A popular interpretation has been that planets open them. Most previous investigations of planet gap-opening have concentrated on viscous disks. Here, we carry out 2D (axisymmetric) global simulations of gap opening by a planet in a wind-launching non-ideal MHD disk with consistent thermochemistry…
▽ More
Recent high angular resolution ALMA observations have revealed numerous gaps in protoplanetary disks. A popular interpretation has been that planets open them. Most previous investigations of planet gap-opening have concentrated on viscous disks. Here, we carry out 2D (axisymmetric) global simulations of gap opening by a planet in a wind-launching non-ideal MHD disk with consistent thermochemistry. We find a strong concentration of poloidal magnetic flux in the planet-opened gap, where the gas dynamics are magnetically dominated. The magnetic field also drives a fast (nearly sonic) meridional gas circulation in the denser disk regions near the inner and outer edges of the gap, which may be observable through high-resolution molecular line observations. The gap is more ionized than its denser surrounding regions, with a better magnetic field-matter coupling. In particular, it has a much higher abundance of molecular ion HCO$^+$, consistent with ALMA observations of the well-studied AS 209 protoplanetary disk that has prominent gaps and fast meridional motions reaching the local sound speed. Finally, we provide fitting formulae for the ambipolar and Ohmic diffusivities as a function of the disk local density, which can be used for future 3D simulations of planet gap-opening in non-ideal MHD disks where thermochemistry is too computationally expensive to evolve self-consistently with the magneto-hydrodynamics.
△ Less
Submitted 31 May, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Boosted Prompt Ensembles for Large Language Models
Authors:
Silviu Pitis,
Michael R. Zhang,
Andrew Wang,
Jimmy Ba
Abstract:
Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training. To further improve performance, we propose a prompt ensembling method for large language models, which uses a small dataset to construct a set of few shot prompts that together comprise a ``boosted prompt ensemble''. The few shot examples for…
▽ More
Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training. To further improve performance, we propose a prompt ensembling method for large language models, which uses a small dataset to construct a set of few shot prompts that together comprise a ``boosted prompt ensemble''. The few shot examples for each prompt are chosen in a stepwise fashion to be ``hard'' examples on which the previous step's ensemble is uncertain. We show that this outperforms single-prompt output-space ensembles and bagged prompt-space ensembles on the GSM8k and AQuA datasets, among others. We propose both train-time and test-time versions of boosted prompting that use different levels of available annotation and conduct a detailed empirical study of our algorithm.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Rethinking Dense Retrieval's Few-Shot Ability
Authors:
Si Sun,
Yida Lu,
Shi Yu,
Xiangyang Li,
Zhonghua Li,
Zhao Cao,
Zhiyuan Liu,
Deiming Ye,
Jie Bao
Abstract:
Few-shot dense retrieval (DR) aims to effectively generalize to novel search scenarios by learning a few samples. Despite its importance, there is little study on specialized datasets and standardized evaluation protocols. As a result, current methods often resort to random sampling from supervised datasets to create "few-data" setups and employ inconsistent training strategies during evaluations,…
▽ More
Few-shot dense retrieval (DR) aims to effectively generalize to novel search scenarios by learning a few samples. Despite its importance, there is little study on specialized datasets and standardized evaluation protocols. As a result, current methods often resort to random sampling from supervised datasets to create "few-data" setups and employ inconsistent training strategies during evaluations, which poses a challenge in accurately comparing recent progress. In this paper, we propose a customized FewDR dataset and a unified evaluation benchmark. Specifically, FewDR employs class-wise sampling to establish a standardized "few-shot" setting with finely-defined classes, reducing variability in multiple sampling rounds. Moreover, the dataset is disjointed into base and novel classes, allowing DR models to be continuously trained on ample data from base classes and a few samples in novel classes. This benchmark eliminates the risk of novel class leakage, providing a reliable estimation of the DR model's few-shot ability. Our extensive empirical results reveal that current state-of-the-art DR models still face challenges in the standard few-shot scene. Our code and data will be open-sourced at https://github.com/OpenMatch/ANCE-Tele.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Molecules with ALMA at Planet-forming Scales (MAPS). Complex Kinematics in the AS 209 Disk Induced by a Forming Planet and Disk Winds
Authors:
Maria Galloway-Sprietsma,
Jaehan Bae,
Richard Teague,
Myriam Benisty,
Stefano Facchini,
Yuri Aikawa,
Felipe Alarcón,
Sean M. Andrews,
Edwin Bergin,
Gianni Cataldi,
L. Ilsedore Cleeves,
Ian Czekala,
Viviana V. Guzmán,
Jane Huang,
Charles J. Law,
Romane Le Gal,
Yao Liu,
Feng Long,
François Ménard,
Karin I. Öberg,
Catherine Walsh,
David J. Wilner
Abstract:
We study the kinematics of the AS 209 disk using the J=2-1 transitions of $^{12}$CO, $^{13}$CO, and C$^{18}$O. We derive the radial, azimuthal, and vertical velocity of the gas, taking into account the lowered emission surface near the annular gap at ~1.7 (200 au) within which a candidate circumplanetary disk-hosting planet has been reported previously. In $^{12}$CO and $^{13}$CO, we find a cohere…
▽ More
We study the kinematics of the AS 209 disk using the J=2-1 transitions of $^{12}$CO, $^{13}$CO, and C$^{18}$O. We derive the radial, azimuthal, and vertical velocity of the gas, taking into account the lowered emission surface near the annular gap at ~1.7 (200 au) within which a candidate circumplanetary disk-hosting planet has been reported previously. In $^{12}$CO and $^{13}$CO, we find a coherent upward flow arising from the gap. The upward gas flow is as fast as $150~{\rm m~s}^{-1}$ in the regions traced by $^{12}$CO emission, which corresponds to about 50% of the local sound speed or $6\%$ of the local Keplerian speed. Such an upward gas flow is difficult to reconcile with an embedded planet alone. Instead, we propose that magnetically driven winds via ambipolar diffusion are triggered by the low gas density within the planet-carved gap, dominating the kinematics of the gap region. We estimate the ambipolar Elsasser number, Am, using the HCO$^+$ column density as a proxy for ion density and find that Am is ~0.1 at the radial location of the upward flow. This value is broadly consistent with the value at which numerical simulations find ambipolar diffusion drives strong winds. We hypothesize the activation of magnetically-driven winds in a planet-carved gap can control the growth of the embedded planet. We provide a scaling relationship which describes the wind-regulated terminal mass: adopting parameters relevant to 100 au from a solar-mass star, we find the wind-regulated terminal mass is about one Jupiter mass, which may help explain the dearth of directly imaged super-Jovian-mass planets.
△ Less
Submitted 12 May, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
More on Affine Dynkin Quiver Yangians
Authors:
Jiakang Bao
Abstract:
We consider the quiver Yangians associated to general affine Dynkin diagrams. Although the quivers are generically not toric, the algebras have some similar structures. The odd reflections of the affine Dynkin diagrams should correspond to Seiberg duality of the quivers, and we investigate the relations of the dual quiver Yangians. We also mention the construction of the twisted quiver Yangians. I…
▽ More
We consider the quiver Yangians associated to general affine Dynkin diagrams. Although the quivers are generically not toric, the algebras have some similar structures. The odd reflections of the affine Dynkin diagrams should correspond to Seiberg duality of the quivers, and we investigate the relations of the dual quiver Yangians. We also mention the construction of the twisted quiver Yangians. It is conjectured that the truncations of the (twisted) quiver Yangians can give rise to certain $\mathcal{W}$-algebras. Incidentally, we give the screening currents of the $\mathcal{W}$-algebras in terms of the free field realization in the case of generalized conifolds. Moreover, we discuss the toroidal and elliptic algebras for any general quivers.
△ Less
Submitted 18 April, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Scientific Computing Algorithms to Learn Enhanced Scalable Surrogates for Mesh Physics
Authors:
Brian R. Bartoldson,
Ye** Hu,
Amar Saini,
Jose Cadena,
Yucheng Fu,
Jie Bao,
Zhijie Xu,
Brenda Ng,
Phan Nguyen
Abstract:
Data-driven modeling approaches can produce fast surrogates to study large-scale physics problems. Among them, graph neural networks (GNNs) that operate on mesh-based data are desirable because they possess inductive biases that promote physical faithfulness, but hardware limitations have precluded their application to large computational domains. We show that it is \textit{possible} to train a cl…
▽ More
Data-driven modeling approaches can produce fast surrogates to study large-scale physics problems. Among them, graph neural networks (GNNs) that operate on mesh-based data are desirable because they possess inductive biases that promote physical faithfulness, but hardware limitations have precluded their application to large computational domains. We show that it is \textit{possible} to train a class of GNN surrogates on 3D meshes. We scale MeshGraphNets (MGN), a subclass of GNNs for mesh-based physics modeling, via our domain decomposition approach to facilitate training that is mathematically equivalent to training on the whole domain under certain conditions. With this, we were able to train MGN on meshes with \textit{millions} of nodes to generate computational fluid dynamics (CFD) simulations. Furthermore, we show how to enhance MGN via higher-order numerical integration, which can reduce MGN's error and training time. We validated our methods on an accompanying dataset of 3D $\text{CO}_2$-capture CFD simulations on a 3.1M-node mesh. This work presents a practical path to scaling MGN for real-world applications.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Policy lessons from the Italian pandemic of Covid-19
Authors:
José M. Carcione,
**g Ba
Abstract:
We analyze the management of the Italian pandemic during the five identified waves. We considered the following problems: (i) The composition of the CTS ("Scientific Technical Committee"), which was composed entirely of doctors, mainly virologists, without mathematical epidemiologists, statisticians, physicists, etc. In fact, a pandemic has a behavior described by mathematical, stochastic and prob…
▽ More
We analyze the management of the Italian pandemic during the five identified waves. We considered the following problems: (i) The composition of the CTS ("Scientific Technical Committee"), which was composed entirely of doctors, mainly virologists, without mathematical epidemiologists, statisticians, physicists, etc. In fact, a pandemic has a behavior described by mathematical, stochastic and probabilistic criteria; (ii) Political interference in security measures and media propaganda; (iii) The initial stages of the vaccination campaign, ignoring the age factor, and (iv) The persistence of the pandemic due to the population unvaccinated (anti-vax or "no-vax"), which amounts to about six to seven million people, including 10% of anti-vax doctors.
△ Less
Submitted 14 May, 2023; v1 submitted 11 March, 2023;
originally announced March 2023.
-
ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions
Authors:
Mohammad Vahid Jamali,
Hamid Saber,
Homayoon Hatami,
Jung Hyun Bae
Abstract:
While decades of theoretical research have led to the invention of several classes of error-correction codes, the design of such codes is an extremely challenging task, mostly driven by human ingenuity. Recent studies demonstrate that such designs can be effectively automated and accelerated via tools from machine learning (ML), thus enabling ML-driven classes of error-correction codes with promis…
▽ More
While decades of theoretical research have led to the invention of several classes of error-correction codes, the design of such codes is an extremely challenging task, mostly driven by human ingenuity. Recent studies demonstrate that such designs can be effectively automated and accelerated via tools from machine learning (ML), thus enabling ML-driven classes of error-correction codes with promising performance gains compared to classical designs. A fundamental challenge, however, is that it is prohibitively complex, if not impossible, to design and train fully ML-driven encoder and decoder pairs for large code dimensions. In this paper, we propose Product Autoencoder (ProductAE) -- a computationally-efficient family of deep learning driven (encoder, decoder) pairs -- aimed at enabling the training of relatively large codes (both encoder and decoder) with a manageable training complexity. We build upon ideas from classical product codes and propose constructing large neural codes using smaller code components. ProductAE boils down the complex problem of training the encoder and decoder for a large code dimension $k$ and blocklength $n$ to less-complex sub-problems of training encoders and decoders for smaller dimensions and blocklengths. Our training results show successful training of ProductAEs of dimensions as large as $k = 300$ bits with meaningful performance gains compared to state-of-the-art classical and neural designs. Moreover, we demonstrate excellent robustness and adaptivity of ProductAEs to channel models different than the ones used for training.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Detecting Entanglement by State Preparation and a Fixed Measurement
Authors:
Jaemin Kim,
Anindita Bera,
Joonwoo Bae,
Dariusz Chruscinski
Abstract:
It is shown that a fixed measurement setting, e.g., a measurement in the computational basis, can detect all entangled states by preparing multipartite quantum states, called network states. We present network states for both cases to construct decomposable entanglement witnesses (EWs) equivalent to the partial transpose criteria and also non-decomposable EWs that detect undistillable entangled st…
▽ More
It is shown that a fixed measurement setting, e.g., a measurement in the computational basis, can detect all entangled states by preparing multipartite quantum states, called network states. We present network states for both cases to construct decomposable entanglement witnesses (EWs) equivalent to the partial transpose criteria and also non-decomposable EWs that detect undistillable entangled states beyond the partial transpose criteria. Entanglement detection by state preparation can be extended to multipartite states such as graph states, a resource for measurement-based quantum computing. Our results readily apply to a realistic scenario, for instance, an array of superconducting qubits. neutral atoms, or photons, in which the preparation of a multipartite state and a fixed measurement are experimentally feasible.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.