-
Nature of excitons and their ligand-mediated delocalization in nickel dihalide charge-transfer insulators
Authors:
Connor A. Occhialini,
Yi Tseng,
Hebatalla Elnaggar,
Qian Song,
Mark Blei,
Seth Ariel Tongay,
Valentina Bisogni,
Frank M. F. de Groot,
Jonathan Pelliciari,
Riccardo Comin
Abstract:
The fundamental optical excitations of correlated transition-metal compounds are typically identified with multielectronic transitions localized at the transition-metal site, such as $dd$ transitions. In this vein, intense interest has surrounded the appearance of sharp, below band-gap optical transitions, i.e. excitons, within the magnetic phase of correlated Ni$^{2+}$ van der Waals magnets. The…
▽ More
The fundamental optical excitations of correlated transition-metal compounds are typically identified with multielectronic transitions localized at the transition-metal site, such as $dd$ transitions. In this vein, intense interest has surrounded the appearance of sharp, below band-gap optical transitions, i.e. excitons, within the magnetic phase of correlated Ni$^{2+}$ van der Waals magnets. The interplay of magnetic and charge-transfer insulating ground states in Ni$^{2+}$ systems raises intriguing questions on the roles of long-range magnetic order and of metal-ligand charge transfer in the exciton nature, which inspired microscopic descriptions beyond typical $dd$ excitations. Here we study the impact of charge-transfer and magnetic order on the excitation spectrum of the nickel dihalides (NiX$_2$, X $=$ Cl, Br, and I) using Ni-$L_3$ resonant inelastic x-ray scattering (RIXS). In all compounds, we detect sharp excitations, analogous to the recently reported excitons, and assign them to spin-singlet multiplets of octahedrally-coordinated Ni$^{2+}$ stabilized by intra-atomic Hund's exchange. Additionally, we demonstrate that these excitons are dispersive using momentum resolved RIXS. Our data evidence a ligand-mediated multiplet dispersion, which is tuned by the charge-transfer gap and independent of the presence of long-range magnetic order. This reveals the mechanisms governing non-local interactions of on-site $dd$ excitations with the surrounding crystal/magnetic structure, in analogy to ground state superexchange. These measurements thus establish the roles of magnetic order, self-doped ligand holes, and intersite coupling mechanisms for the properties of $dd$ excitations in charge-transfer insulators.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Batch and match: black-box variational inference with a score-based divergence
Authors:
Diana Cai,
Chirag Modi,
Loucas Pillaud-Vivien,
Charles C. Margossian,
Robert M. Gower,
David M. Blei,
Lawrence K. Saul
Abstract:
Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Not…
▽ More
Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization.
△ Less
Submitted 12 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Hierarchical Causal Models
Authors:
Eli N. Weinstein,
David M. Blei
Abstract:
Scientists often want to learn about cause and effect from hierarchical data, collected from subunits nested inside units. Consider students in schools, cells in patients, or cities in states. In such settings, unit-level variables (e.g. each school's budget) may affect subunit-level variables (e.g. the test scores of each student in each school) and vice versa. To address causal questions with hi…
▽ More
Scientists often want to learn about cause and effect from hierarchical data, collected from subunits nested inside units. Consider students in schools, cells in patients, or cities in states. In such settings, unit-level variables (e.g. each school's budget) may affect subunit-level variables (e.g. the test scores of each student in each school) and vice versa. To address causal questions with hierarchical data, we propose hierarchical causal models, which extend structural causal models and causal graphical models by adding inner plates. We develop a general graphical identification technique for hierarchical causal models that extends do-calculus. We find many situations in which hierarchical data can enable causal identification even when it would be impossible with non-hierarchical data, that is, if we had only unit-level summaries of subunit-level variables (e.g. the school's average test score, rather than each student's score). We develop estimation techniques for hierarchical causal models, using methods including hierarchical Bayesian models. We illustrate our results in simulation and via a reanalysis of the classic "eight schools" study.
△ Less
Submitted 26 June, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Revisiting Topic-Guided Language Models
Authors:
Carolina Zheng,
Keyon Vafa,
David M. Blei
Abstract:
A recent line of work in natural language processing has aimed to combine language models and topic models. These topic-guided language models augment neural language models with topic models, unsupervised learning methods that can discover document-level patterns of word use. This paper compares the effectiveness of these methods in a standardized setting. We study four topic-guided language mode…
▽ More
A recent line of work in natural language processing has aimed to combine language models and topic models. These topic-guided language models augment neural language models with topic models, unsupervised learning methods that can discover document-level patterns of word use. This paper compares the effectiveness of these methods in a standardized setting. We study four topic-guided language models and two baselines, evaluating the held-out predictive performance of each model on four corpora. Surprisingly, we find that none of these methods outperform a standard LSTM language model baseline, and most fail to learn good topics. Further, we train a probe of the neural language model that shows that the baseline's hidden states already encode topic information. We make public all code used for this study.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
A Computational Approach to Style in American Poetry
Authors:
David M. Kaplan,
David M. Blei
Abstract:
We develop a quantitative method to assess the style of American poems and to visualize a collection of poems in relation to one another. Qualitative poetry criticism helped guide our development of metrics that analyze various orthographic, syntactic, and phonemic features. These features are used to discover comprehensive stylistic information from a poem's multi-layered latent structure, and to…
▽ More
We develop a quantitative method to assess the style of American poems and to visualize a collection of poems in relation to one another. Qualitative poetry criticism helped guide our development of metrics that analyze various orthographic, syntactic, and phonemic features. These features are used to discover comprehensive stylistic information from a poem's multi-layered latent structure, and to compute distances between poems in this space. Visualizations provide ready access to the analytical components. We demonstrate our method on several collections of poetry, showing that it better delineates poetry style than the traditional word-occurrence features that are used in typical text analysis algorithms. Our method has potential applications to academic research of texts, to research of the intuitive personal response to poetry, and to making recommendations to readers based on their favorite poems.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Valley-polarized Exitonic Mott Insulator in WS2/WSe2 Moiré Superlattice
Authors:
Zhen Lian,
Yuze Meng,
Lei Ma,
Indrajit Maity,
Li Yan,
Qiran Wu,
Xiong Huang,
Dongxue Chen,
Xiaotong Chen,
Xinyue Chen,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Johannes Lischner,
Yong-Tao Cui,
Su-Fei Shi
Abstract:
Strongly enhanced electron-electron interaction in semiconducting moiré superlattices formed by transition metal dichalcogenides (TMDCs) heterobilayers has led to a plethora of intriguing fermionic correlated states. Meanwhile, interlayer excitons in a type-II aligned TMDC heterobilayer moiré superlattice, with electrons and holes separated in different layers, inherit this enhanced interaction an…
▽ More
Strongly enhanced electron-electron interaction in semiconducting moiré superlattices formed by transition metal dichalcogenides (TMDCs) heterobilayers has led to a plethora of intriguing fermionic correlated states. Meanwhile, interlayer excitons in a type-II aligned TMDC heterobilayer moiré superlattice, with electrons and holes separated in different layers, inherit this enhanced interaction and strongly interact with each other, promising for realizing tunable correlated bosonic quasiparticles with valley degree of freedom. We employ photoluminescence spectroscopy to investigate the strong repulsion between interlayer excitons and correlated electrons in a WS2/WSe2 moiré superlattice and combine with theoretical calculations to reveal the spatial extent of interlayer excitons and the band hierarchy of correlated states. We further find that an excitonic Mott insulator state emerges when one interlayer exciton occupies one moiré cell, evidenced by emerging photoluminescence peaks under increased optical excitation power. Double occupancy of excitons in one unit cell requires overcoming the energy cost of exciton-exciton repulsion of about 30-40 meV, depending on the stacking configuration of the WS2/WSe2 heterobilayer. Further, the valley polarization of the excitonic Mott insulator state is enhanced by nearly one order of magnitude. Our study demonstrates the WS2/WSe2 moiré superlattice as a promising platform for engineering and exploring new correlated states of fermion, bosons, and a mixture of both.
△ Less
Submitted 24 August, 2023; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Measurements of Correlated Insulator Gaps in a Transition Metal Dichalcogenide Moiré Superlattice
Authors:
Xiong Huang,
Dongxue Chen,
Zhen Lian,
Qiran Wu,
Mina Rashetnia,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Su-Fei Shi,
Yong-Tao Cui
Abstract:
Moiré superlattices of transitional metal dichalcogenides exhibit strong electron-electron interaction that has led to experimental observations of Mott insulators and generalized Wigner crystals. In this letter, we report direct measurements of the thermodynamic gaps of these correlated insulating states in a dual-gate WS2/WSe2 moiré bilayer. We employ the microwave impedance microscopy to probe…
▽ More
Moiré superlattices of transitional metal dichalcogenides exhibit strong electron-electron interaction that has led to experimental observations of Mott insulators and generalized Wigner crystals. In this letter, we report direct measurements of the thermodynamic gaps of these correlated insulating states in a dual-gate WS2/WSe2 moiré bilayer. We employ the microwave impedance microscopy to probe the electronic features in both the graphene top gate and the moiré bilayer, from which we extract the do** dependence of the chemical potential of the moiré bilayer and the energy gaps for various correlated insulating states utilizing the Landau quantization of graphene. These gaps are relatively insensitive to the application of an external electric field to the WS2/WSe2 moiré bilayer.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Quadrupolar Excitons and Hybridized Interlayer Mott Insulator in a Trilayer Moiré Superlattice
Authors:
Zhen Lian,
Dongxue Chen,
Lei Ma,
Yuze Meng,
Ying Su,
Li Yan,
Xiong Huang,
Qiran Wu,
Xinyue Chen,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Chuanwei Zhang,
Yong-Tao Cui,
Su-Fei Shi
Abstract:
Transition metal dichalcogenide (TMDC) moiré superlattices, owing to the moiré flatbands and strong correlation, can host periodic electron crystals and fascinating correlated physics. The TMDC heterojunctions in the type-II alignment also enable long-lived interlayer excitons that are promising for correlated bosonic states, while the interaction is dictated by the asymmetry of the heterojunction…
▽ More
Transition metal dichalcogenide (TMDC) moiré superlattices, owing to the moiré flatbands and strong correlation, can host periodic electron crystals and fascinating correlated physics. The TMDC heterojunctions in the type-II alignment also enable long-lived interlayer excitons that are promising for correlated bosonic states, while the interaction is dictated by the asymmetry of the heterojunction. Here we demonstrate a new excitonic state, quadrupolar exciton, in a symmetric WSe2-WS2-WSe2 trilayer moiré superlattice. The quadrupolar excitons exhibit a quadratic dependence on the electric field, distinctively different from the linear Stark shift of the dipolar excitons in heterobilayers. This quadrupolar exciton stems from the hybridization of WSe2 valence moiré flatbands. The same mechanism also gives rise to an interlayer Mott insulator state, in which the two WSe2 layers share one hole laterally confined in one moiré unit cell. In contrast, the hole occupation probability in each layer can be continuously tuned via an out-of-plane electric field, reaching 100% in the top or bottom WSe2 under a large electric field, accompanying the transition from quadrupolar excitons to dipolar excitons. Our work demonstrates a trilayer moiré system as a new exciting playground for realizing novel correlated states and engineering quantum phase transitions.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Evaluating the Moral Beliefs Encoded in LLMs
Authors:
Nino Scherrer,
Claudia Shi,
Amir Feder,
David M. Blei
Abstract:
This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of…
▽ More
This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not kill"). We administer the survey to 28 open- and closed-source LLMs. We find that (a) in unambiguous scenarios, most models "choose" actions that align with commonsense. In ambiguous cases, most models express uncertainty. (b) Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording. (c) Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Amortized Variational Inference: When and Why?
Authors:
Charles C. Margossian,
David M. Blei
Abstract:
In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable's approximate posterior. Typically, A-VI is used as a step in the training of variationa…
▽ More
In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable's approximate posterior. Typically, A-VI is used as a step in the training of variational autoencoders, however it stands to reason that A-VI could also be used as a general alternative to F-VI. In this paper we study when and why A-VI can be used for approximate Bayesian inference. We derive conditions on a latent variable model which are necessary, sufficient, and verifiable under which A-VI can attain F-VI's optimal solution, thereby closing the amortization gap. We prove these conditions are uniquely verified by simple hierarchical models, a broad class that encompasses many models in machine learning. We then show, on a broader class of models, how to expand the domain of AVI's inference function to improve its solution, and we provide examples, e.g. hidden Markov models, where the amortization gap cannot be closed.
△ Less
Submitted 23 May, 2024; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Practical and Asymptotically Exact Conditional Sampling in Diffusion Models
Authors:
Luhuan Wu,
Brian L. Trippe,
Christian A. Naesseth,
David M. Blei,
John P. Cunningham
Abstract:
Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requir…
▽ More
Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Density Uncertainty Layers for Reliable Uncertainty Estimation
Authors:
Yookoon Park,
David M. Blei
Abstract:
Assessing the predictive uncertainty of deep neural networks is crucial for safety-related applications of deep learning. Although Bayesian deep learning offers a principled framework for estimating model uncertainty, the common approaches that approximate the parameter posterior often fail to deliver reliable estimates of predictive uncertainty. In this paper, we propose a novel criterion for rel…
▽ More
Assessing the predictive uncertainty of deep neural networks is crucial for safety-related applications of deep learning. Although Bayesian deep learning offers a principled framework for estimating model uncertainty, the common approaches that approximate the parameter posterior often fail to deliver reliable estimates of predictive uncertainty. In this paper, we propose a novel criterion for reliable predictive uncertainty: a model's predictive variance should be grounded in the empirical density of the input. That is, the model should produce higher uncertainty for inputs that are improbable in the training data and lower uncertainty for inputs that are more probable. To operationalize this criterion, we develop the density uncertainty layer, a stochastic neural network architecture that satisfies the density uncertain criterion by design. We study density uncertainty layers on the UCI and CIFAR-10/100 uncertainty benchmarks. Compared to existing approaches, density uncertainty layers provide more reliable uncertainty estimates and robust out-of-distribution detection performance.
△ Less
Submitted 4 March, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Nonparametric Identifiability of Causal Representations from Unknown Interventions
Authors:
Julius von Kügelgen,
Michel Besserve,
Liang Wendong,
Luigi Gresele,
Armin Kekić,
Elias Bareinboim,
David M. Blei,
Bernhard Schölkopf
Abstract:
We study causal representation learning, the task of inferring latent causal variables and their causal relations from high-dimensional mixtures of the variables. Prior work relies on weak supervision, in the form of counterfactual pre- and post-intervention views or temporal structure; places restrictive assumptions, such as linearity, on the mixing function or latent causal model; or requires pa…
▽ More
We study causal representation learning, the task of inferring latent causal variables and their causal relations from high-dimensional mixtures of the variables. Prior work relies on weak supervision, in the form of counterfactual pre- and post-intervention views or temporal structure; places restrictive assumptions, such as linearity, on the mixing function or latent causal model; or requires partial knowledge of the generative process, such as the causal graph or intervention targets. We instead consider the general setting in which both the causal model and the mixing function are nonparametric. The learning signal takes the form of multiple datasets, or environments, arising from unknown interventions in the underlying causal model. Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data. We study the fundamental setting of two causal variables and prove that the observational distribution and one perfect intervention per node suffice for identifiability, subject to a genericity condition. This condition rules out spurious solutions that involve fine-tuning of the intervened and observational distributions, mirroring similar conditions for nonlinear cause-effect inference. For an arbitrary number of variables, we show that at least one pair of distinct perfect interventional domains per node guarantees identifiability. Further, we demonstrate that the strengths of causal influences among the latent variables are preserved by all equivalent solutions, rendering the inferred representation appropriate for drawing causal conclusions from new data. Our study provides the first identifiability results for the general nonparametric setting with unknown interventions, and elucidates what is possible and impossible for causal representation learning without more direct supervision.
△ Less
Submitted 28 October, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
An Invariant Learning Characterization of Controlled Text Generation
Authors:
Carolina Zheng,
Claudia Shi,
Keyon Vafa,
Amir Feder,
David M. Blei
Abstract:
Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. Many approaches reduce this problem to training a predictor of the desired attribute. For example, researchers ho** to deploy a large language model to produce non-toxic content may use a toxicity classifier to filter generated text. In practice, the generated text to classify…
▽ More
Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. Many approaches reduce this problem to training a predictor of the desired attribute. For example, researchers ho** to deploy a large language model to produce non-toxic content may use a toxicity classifier to filter generated text. In practice, the generated text to classify, which is determined by user prompts, may come from a wide range of distributions. In this paper, we show that the performance of controlled generation may be poor if the distributions of text in response to user prompts differ from the distribution the predictor was trained on. To address this problem, we cast controlled generation under distribution shift as an invariant learning problem: the most effective predictor should be invariant across multiple text environments. We then discuss a natural solution that arises from this characterization and propose heuristics for selecting natural environments. We study this characterization and the proposed method empirically using both synthetic and real data. Experiments demonstrate both the challenge of distribution shift in controlled generation and the potential of invariance methods in this setting.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
On the Misspecification of Linear Assumptions in Synthetic Control
Authors:
Achille Nazaret,
Claudia Shi,
David M. Blei
Abstract:
The synthetic control (SC) method is a popular approach for estimating treatment effects from observational panel data. It rests on a crucial assumption that we can write the treated unit as a linear combination of the untreated units. This linearity assumption, however, can be unlikely to hold in practice and, when violated, the resulting SC estimates are incorrect. In this paper we examine two q…
▽ More
The synthetic control (SC) method is a popular approach for estimating treatment effects from observational panel data. It rests on a crucial assumption that we can write the treated unit as a linear combination of the untreated units. This linearity assumption, however, can be unlikely to hold in practice and, when violated, the resulting SC estimates are incorrect. In this paper we examine two questions: (1) How large can the misspecification error be? (2) How can we limit it? First, we provide theoretical bounds to quantify the misspecification error. The bounds are comforting: small misspecifications induce small errors. With these bounds in hand, we then develop new SC estimators that are specially designed to minimize misspecification error. The estimators are based on additional data about each unit, which is used to produce the SC weights. (For example, if the units are countries then the additional data might be demographic information about each.) We study our estimators on synthetic data; we find they produce more accurate causal estimates than standard synthetic controls. We then re-analyze the California tobacco-program data of the original SC paper, now including additional data from the US census about per-state demographics. Our estimators show that the observations in the pre-treatment period lie within the bounds of misspecification error, and that the observations post-treatment lie outside of those bounds. This is evidence that our SC methods have uncovered a true effect.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
Posterior Collapse and Latent Variable Non-identifiability
Authors:
Yixin Wang,
David M. Blei,
John P. Cunningham
Abstract:
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful re…
▽ More
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Causal Fairness Assessment of Treatment Allocation with Electronic Health Records
Authors:
Linying Zhang,
Lauren R. Richter,
Yixin Wang,
Anna Ostropolets,
Noemie Elhadad,
David M. Blei,
George Hripcsak
Abstract:
Healthcare continues to grapple with the persistent issue of treatment disparities, sparking concerns regarding the equitable allocation of treatments in clinical practice. While various fairness metrics have emerged to assess fairness in decision-making processes, a growing focus has been on causality-based fairness concepts due to their capacity to mitigate confounding effects and reason about b…
▽ More
Healthcare continues to grapple with the persistent issue of treatment disparities, sparking concerns regarding the equitable allocation of treatments in clinical practice. While various fairness metrics have emerged to assess fairness in decision-making processes, a growing focus has been on causality-based fairness concepts due to their capacity to mitigate confounding effects and reason about bias. However, the application of causal fairness notions in evaluating the fairness of clinical decision-making with electronic health record (EHR) data remains an understudied domain. This study aims to address the methodological gap in assessing causal fairness of treatment allocation with electronic health records data. We propose a causal fairness algorithm to assess fairness in clinical decision-making. Our algorithm accounts for the heterogeneity of patient populations and identifies potential unfairness in treatment allocation by conditioning on patients who have the same likelihood to benefit from the treatment. We apply this framework to a patient cohort with coronary artery disease derived from an EHR database to evaluate the fairness of treatment decisions. In addition, we investigate the impact of social determinants of health on the assessment of causal fairness of treatment allocation.
△ Less
Submitted 7 January, 2024; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Probabilistic Conformal Prediction Using Conditional Random Samples
Authors:
Zhendong Wang,
Ruijiang Gao,
Mingzhang Yin,
Mingyuan Zhou,
David M. Blei
Abstract:
This paper proposes probabilistic conformal prediction (PCP), a predictive inference algorithm that estimates a target variable by a discontinuous predictive set. Given inputs, PCP construct the predictive set based on random samples from an estimated generative model. It is efficient and compatible with either explicit or implicit conditional generative models. Theoretically, we show that PCP gua…
▽ More
This paper proposes probabilistic conformal prediction (PCP), a predictive inference algorithm that estimates a target variable by a discontinuous predictive set. Given inputs, PCP construct the predictive set based on random samples from an estimated generative model. It is efficient and compatible with either explicit or implicit conditional generative models. Theoretically, we show that PCP guarantees correct marginal coverage with finite samples. Empirically, we study PCP on a variety of simulated and real datasets. Compared to existing methods for conformal inference, PCP provides sharper predictive sets.
△ Less
Submitted 20 June, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Strong Effects of Interlayer Interaction on Valence-Band Splitting in Transition Metal Dichalcogenides
Authors:
Garrett Benson,
Viviane Zurdo Costa,
Neal Border,
Kentaro Yumigeta,
Mark Blei,
Sefaattin Tongay,
K. Watanabe,
T. Taniguchi,
Andrew Ichimura,
Santosh KC,
Taha Salavati-fard,
Bin Wang,
Akm Newaz
Abstract:
Understanding the origin of valence band maxima (VBM) splitting in transition metal dichalcogenides (TMDs) is important because it governs the unique spin and valley physics in monolayer and multilayer TMDs. In this work, we present our systematic study of VBM splitting ($Δ$) in atomically thin MoS$_2$ and WS$_2$ by employing photocurrent spectroscopy as we change the temperature and the layer num…
▽ More
Understanding the origin of valence band maxima (VBM) splitting in transition metal dichalcogenides (TMDs) is important because it governs the unique spin and valley physics in monolayer and multilayer TMDs. In this work, we present our systematic study of VBM splitting ($Δ$) in atomically thin MoS$_2$ and WS$_2$ by employing photocurrent spectroscopy as we change the temperature and the layer numbers. We found that VBM splitting in monolayer MoS$_2$ and WS$_2$ depends strongly on temperature, which contradicts the theory that spin-orbit coupling solely determines the VBM splitting in monolayer TMDs. We also found that the rate of change of VBM splitting with respect to temperature ($m=\frac{\partialΔ}{\partial T}$) is the highest for monolayer (-0.14 meV/K for MoS$_2$) and the rate decreases as the layer number increases ($m ~ 0$ meV/K for 5 layers MoS$_2$). We performed density functional theory (DFT) and the GW with Bethe-Salpeter Equation (GW-BSE) calculations to determine the electronic band structure and optical absorption for a bilayer MoS$_2$ with different interlayer separations. Our simulations agree with the experimental observations and demonstrate that the temperature dependence of VBM splitting in atomically thin monolayer and multilayer TMDs originates from the changes in the interlayer coupling strength between the neighboring layers. By studying two different types of TMDs and many different layer thicknesses, we also demonstrate that VBM splitting also depends on the layer numbers and type of transition metals. Our study will help understand the role spin-orbit coupling and interlayer interaction play in determining the VBM splitting in quantum materials and develop next-generation devices based on spin-orbit interactions.
△ Less
Submitted 9 May, 2022; v1 submitted 6 May, 2022;
originally announced May 2022.
-
CAREER: A Foundation Model for Labor Sequence Data
Authors:
Keyon Vafa,
Emil Palikot,
Tianyu Du,
Ayush Kanodia,
Susan Athey,
David M. Blei
Abstract:
Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although machine learning methods offer promise for such problems, these survey datasets are too small to take advantage of them. In recent years large datasets of online resumes have also become available, providing data about the career trajectories of mil…
▽ More
Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although machine learning methods offer promise for such problems, these survey datasets are too small to take advantage of them. In recent years large datasets of online resumes have also become available, providing data about the career trajectories of millions of individuals. However, standard econometric models cannot take advantage of their scale or incorporate them into the analysis of survey data. To this end we develop CAREER, a foundation model for job sequences. CAREER is first fit to large, passively-collected resume data and then fine-tuned to smaller, better-curated datasets for economic inferences. We fit CAREER to a dataset of 24 million job sequences from resumes, and adjust it on small longitudinal survey datasets. We find that CAREER forms accurate predictions of job sequences, outperforming econometric baselines on three widely-used economics datasets. We further find that CAREER can be used to form good predictions of other downstream variables. For example, incorporating CAREER into a wage model provides better predictions than the econometric models currently in use.
△ Less
Submitted 29 February, 2024; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Map** Interstellar Dust with Gaussian Processes
Authors:
Andrew C. Miller,
Lauren Anderson,
Boris Leistedt,
John P. Cunningham,
David W. Hogg,
David M. Blei
Abstract:
Interstellar dust corrupts nearly every stellar observation, and accounting for it is crucial to measuring physical properties of stars. We model the dust distribution as a spatially varying latent field with a Gaussian process (GP) and develop a likelihood model and inference method that scales to millions of astronomical observations. Modeling interstellar dust is complicated by two factors. The…
▽ More
Interstellar dust corrupts nearly every stellar observation, and accounting for it is crucial to measuring physical properties of stars. We model the dust distribution as a spatially varying latent field with a Gaussian process (GP) and develop a likelihood model and inference method that scales to millions of astronomical observations. Modeling interstellar dust is complicated by two factors. The first is integrated observations. The data come from a vantage point on Earth and each observation is an integral of the unobserved function along our line of sight, resulting in a complex likelihood and a more difficult inference problem than in classical GP inference. The second complication is scale; stellar catalogs have millions of observations. To address these challenges we develop ziggy, a scalable approach to GP inference with integrated observations based on stochastic variational inference. We study ziggy on synthetic data and the Ananke dataset, a high-fidelity mechanistic model of the Milky Way with millions of stars. ziggy reliably infers the spatial dust map with well-calibrated posterior uncertainties.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Transport Score Climbing: Variational Inference Using Forward KL and Adaptive Neural Transport
Authors:
Liyi Zhang,
David M. Blei,
Christian A. Naesseth
Abstract:
Variational inference often minimizes the "reverse" Kullbeck-Leibler (KL) KL(q||p) from the approximate distribution q to the posterior p. Recent work studies the "forward" KL KL(p||q), which unlike reverse KL does not lead to variational approximations that underestimate uncertainty. This paper introduces Transport Score Climbing (TSC), a method that optimizes KL(p||q) by using Hamiltonian Monte…
▽ More
Variational inference often minimizes the "reverse" Kullbeck-Leibler (KL) KL(q||p) from the approximate distribution q to the posterior p. Recent work studies the "forward" KL KL(p||q), which unlike reverse KL does not lead to variational approximations that underestimate uncertainty. This paper introduces Transport Score Climbing (TSC), a method that optimizes KL(p||q) by using Hamiltonian Monte Carlo (HMC) and a novel adaptive transport map. The transport map improves the trajectory of HMC by acting as a change of variable between the latent variable space and a warped space. TSC uses HMC samples to dynamically train the transport map while optimizing KL(p||q). TSC leverages synergies, where better transport maps lead to better HMC sampling, which then leads to better transport maps. We demonstrate TSC on synthetic and real data. We find that TSC achieves competitive performance when training variational autoencoders on large-scale data.
△ Less
Submitted 2 September, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
On the Assumptions of Synthetic Control Methods
Authors:
Claudia Shi,
Dhanya Sridhar,
Vishal Misra,
David M. Blei
Abstract:
Synthetic control (SC) methods have been widely applied to estimate the causal effect of large-scale interventions, e.g., the state-wide effect of a change in policy. The idea of synthetic controls is to approximate one unit's counterfactual outcomes using a weighted combination of some other units' observed outcomes. The motivating question of this paper is: how does the SC strategy lead to valid…
▽ More
Synthetic control (SC) methods have been widely applied to estimate the causal effect of large-scale interventions, e.g., the state-wide effect of a change in policy. The idea of synthetic controls is to approximate one unit's counterfactual outcomes using a weighted combination of some other units' observed outcomes. The motivating question of this paper is: how does the SC strategy lead to valid causal inferences? We address this question by re-formulating the causal inference problem targeted by SC with a more fine-grained model, where we change the unit of the analysis from "large units" (e.g., states) to "small units" (e.g., individuals in states). Under this re-formulation, we derive sufficient conditions for the non-parametric causal identification of the causal effect. We highlight two implications of the reformulation: (1) it clarifies where "linearity" comes from, and how it falls naturally out of the more fine-grained and flexible model, and (2) it suggests new ways of using available data with SC methods for valid causal inference, in particular, new ways of selecting observations from which to estimate the counterfactual.
△ Less
Submitted 14 December, 2021; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Unsupervised Representation Learning via Neural Activation Coding
Authors:
Yookoon Park,
Sangho Lee,
Gunhee Kim,
David M. Blei
Abstract:
We present neural activation coding (NAC) as a novel approach for learning deep representations from unlabeled data for downstream applications. We argue that the deep encoder should maximize its nonlinear expressivity on the data for downstream predictors to take full advantage of its representation power. To this end, NAC maximizes the mutual information between activation patterns of the encode…
▽ More
We present neural activation coding (NAC) as a novel approach for learning deep representations from unlabeled data for downstream applications. We argue that the deep encoder should maximize its nonlinear expressivity on the data for downstream predictors to take full advantage of its representation power. To this end, NAC maximizes the mutual information between activation patterns of the encoder and the data over a noisy communication channel. We show that learning for a noise-robust activation code increases the number of distinct linear regions of ReLU encoders, hence the maximum nonlinear expressivity. More interestingly, NAC learns both continuous and discrete representations of data, which we respectively evaluate on two downstream tasks: (i) linear classification on CIFAR-10 and ImageNet-1K and (ii) nearest neighbor retrieval on CIFAR-10 and FLICKR-25K. Empirical results show that NAC attains better or comparable performance on both tasks over recent baselines including SimCLR and DistillHash. In addition, NAC pretraining provides significant benefits to the training of deep generative models. Our code is available at https://github.com/yookoon/nac.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Conformal Sensitivity Analysis for Individual Treatment Effects
Authors:
Mingzhang Yin,
Claudia Shi,
Yixin Wang,
David M. Blei
Abstract:
Estimating an individual treatment effect (ITE) is essential to personalized decision making. However, existing methods for estimating the ITE often rely on unconfoundedness, an assumption that is fundamentally untestable with observed data. To assess the robustness of individual-level causal conclusion with unconfoundedness, this paper proposes a method for sensitivity analysis of the ITE, a way…
▽ More
Estimating an individual treatment effect (ITE) is essential to personalized decision making. However, existing methods for estimating the ITE often rely on unconfoundedness, an assumption that is fundamentally untestable with observed data. To assess the robustness of individual-level causal conclusion with unconfoundedness, this paper proposes a method for sensitivity analysis of the ITE, a way to estimate a range of the ITE under unobserved confounding. The method we develop quantifies unmeasured confounding through a marginal sensitivity model [Ros2002, Tan2006], and adapts the framework of conformal inference to estimate an ITE interval at a given confounding strength. In particular, we formulate this sensitivity analysis problem as a conformal inference problem under distribution shift, and we extend existing methods of covariate-shifted conformal inference to this more general setting. The result is a predictive interval that has guaranteed nominal coverage of the ITE, a method that provides coverage with distribution-free and nonasymptotic guarantees. We evaluate the method on synthetic data and illustrate its application in an observational study.
△ Less
Submitted 12 July, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
The Posterior Predictive Null
Authors:
Gemma E. Moran,
John P. Cunningham,
David M. Blei
Abstract:
Bayesian model criticism is an important part of the practice of Bayesian statistics. Traditionally, model criticism methods have been based on the predictive check, an adaptation of goodness-of-fit testing to Bayesian modeling and an effective method to understand how well a model captures the distribution of the data. In modern practice, however, researchers iteratively build and develop many mo…
▽ More
Bayesian model criticism is an important part of the practice of Bayesian statistics. Traditionally, model criticism methods have been based on the predictive check, an adaptation of goodness-of-fit testing to Bayesian modeling and an effective method to understand how well a model captures the distribution of the data. In modern practice, however, researchers iteratively build and develop many models, exploring a space of models to help solve the problem at hand. While classical predictive checks can help assess each one, they cannot help the researcher understand how the models relate to each other. This paper introduces the posterior predictive null check (PPN), a method for Bayesian model criticism that helps characterize the relationships between models. The idea behind the PPN is to check whether data from one model's predictive distribution can pass a predictive check designed for another model. This form of criticism complements the classical predictive check by providing a comparative tool. A collection of PPNs, which we call a PPN study, can help us understand which models are equivalent and which models provide different perspectives on the data. With mixture models, we demonstrate how a PPN study, along with traditional predictive checks, can help select the number of components by the principle of parsimony. With probabilistic factor models, we demonstrate how a PPN study can help understand relationships between different classes of models, such as linear models and models based on neural networks. Finally, we analyze data from the literature on predictive checks to show how a PPN study can improve the practice of Bayesian model criticism. Code to replicate the results in this paper is available at \url{https://github.com/gemoran/ppn-code}.
△ Less
Submitted 6 July, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Identifiable Deep Generative Models via Sparse Decoding
Authors:
Gemma E. Moran,
Dhanya Sridhar,
Yixin Wang,
David M. Blei
Abstract:
We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors (representations) which summarize the associations in the observed data features. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. As examples, in ratings data each m…
▽ More
We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors (representations) which summarize the associations in the observed data features. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable: with infinite data, the true model parameters can be learned. (In contrast, most deep generative models are not identifiable.) We empirically study the sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.
△ Less
Submitted 17 February, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Optimization-based Causal Estimation from Heterogenous Environments
Authors:
Mingzhang Yin,
Yixin Wang,
David M. Blei
Abstract:
This paper presents a new optimization approach to causal estimation. Given data that contains covariates and an outcome, which covariates are causes of the outcome, and what is the strength of the causality? In classical machine learning (ML), the goal of optimization is to maximize predictive accuracy. However, some covariates might exhibit a non-causal association with the outcome. Such spuriou…
▽ More
This paper presents a new optimization approach to causal estimation. Given data that contains covariates and an outcome, which covariates are causes of the outcome, and what is the strength of the causality? In classical machine learning (ML), the goal of optimization is to maximize predictive accuracy. However, some covariates might exhibit a non-causal association with the outcome. Such spurious associations provide predictive power for classical ML, but they prevent us from causally interpreting the result. This paper proposes CoCo, an optimization algorithm that bridges the gap between pure prediction and causal inference. CoCo leverages the recently-proposed idea of environments, datasets of covariates/response where the causal relationships remain invariant but where the distribution of the covariates changes from environment to environment. Given datasets from multiple environments-and ones that exhibit sufficient heterogeneity-CoCo maximizes an objective for which the only solution is the causal solution. We describe the theoretical foundations of this approach and demonstrate its effectiveness on simulated and real datasets. Compared to classical ML and existing methods, CoCo provides more accurate estimates of the causal model and more accurate predictions under interventions.
△ Less
Submitted 10 June, 2024; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Rationales for Sequential Predictions
Authors:
Keyon Vafa,
Yuntian Deng,
David M. Blei,
Alexander M. Rush
Abstract:
Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain. We consider model explanations though rationales, subsets of context that can explain individual model predictions. We find sequential rationales by solving a combinatorial optimization: the best rationale is the smallest subset of input tokens that would predict the same output as the f…
▽ More
Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain. We consider model explanations though rationales, subsets of context that can explain individual model predictions. We find sequential rationales by solving a combinatorial optimization: the best rationale is the smallest subset of input tokens that would predict the same output as the full sequence. Enumerating all subsets is intractable, so we propose an efficient greedy algorithm to approximate this objective. The algorithm, which is called greedy rationalization, applies to any model. For this approach to be effective, the model should form compatible conditional distributions when making predictions on incomplete subsets of the context. This condition can be enforced with a short fine-tuning step. We study greedy rationalization on language modeling and machine translation. Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales. On a new dataset of annotated sequential rationales, greedy rationales are most similar to human rationales.
△ Less
Submitted 17 November, 2021; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Imaging Generalized Wigner Crystal States in a WSe2/WS2 Moiré Superlattice
Authors:
Hongyuan Li,
Shaowei Li,
Emma C. Regan,
Danqing Wang,
Wenyu Zhao,
Salman Kahn,
Kentaro Yumigeta,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Alex Zettl,
Michael F. Crommie,
Feng Wang
Abstract:
The Wigner crystal state, first predicted by Eugene Wigner in 1934, has fascinated condensed matter physicists for nearly 90 years2-14. Studies of two-dimensional (2D) electron gases first revealed signatures of the Wigner crystal in electrical transport measurements at high magnetic fields2-4. More recently optical spectroscopy has provided evidence of generalized Wigner crystal states in transit…
▽ More
The Wigner crystal state, first predicted by Eugene Wigner in 1934, has fascinated condensed matter physicists for nearly 90 years2-14. Studies of two-dimensional (2D) electron gases first revealed signatures of the Wigner crystal in electrical transport measurements at high magnetic fields2-4. More recently optical spectroscopy has provided evidence of generalized Wigner crystal states in transition metal dichalcogenide (TMDC) moiré superlattices. Direct observation of the 2D Wigner crystal lattice in real space, however, has remained an outstanding challenge. Scanning tunneling microscopy (STM) in principle has sufficient spatial resolution to image a Wigner crystal, but conventional STM measurements can potentially alter fragile Wigner crystal states in the process of measurement. Here we demonstrate real-space imaging of 2D Wigner crystals in WSe2/WS2 moiré heterostructures using a novel non-invasive STM spectroscopy technique. We employ a graphene sensing layer in close proximity to the WSe2/WS2 moiré superlattice for Wigner crystal imaging, where local STM tunneling current into the graphene sensing layer is modulated by the underlying electron lattice of the Wigner crystal in the WSe2/WS2 heterostructure. Our measurement directly visualizes different lattice configurations associated with Wigner crystal states at fractional electron fillings of n = 1/3, 1/2, and 2/3, where n is the electron number per site. The n=1/3 and n=2/3 Wigner crystals are observed to exhibit a triangle and a honeycomb lattice, respectively, in order to minimize nearest-neighbor occupations. The n = 1/2 state, on the other hand, spontaneously breaks the original C3 symmetry and forms a stripe structure in real space. Our study lays a solid foundation toward the fundamental understanding of rich Wigner crystal states in WSe2/WS2 moiré heterostructures.
△ Less
Submitted 19 June, 2021;
originally announced June 2021.
-
Imaging local discharge cascades for correlated electrons in WS2/WSe2 moiré superlattices
Authors:
Hongyuan Li,
Shaowei Li,
Mit H. Naik,
**gxu Xie,
Xinyu Li,
Emma Regan,
Danqing Wang,
Wenyu Zhao,
Kentaro Yumigeta,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Alex Zettl,
Steven G. Louie,
Michael F. Crommie,
Feng Wang
Abstract:
Transition metal dichalcogenide (TMD) moiré heterostructures provide an ideal platform to explore the extended Hubbard model1 where long-range Coulomb interactions play a critical role in determining strongly correlated electron states. This has led to experimental observations of Mott insulator states at half filling2-4 as well as a variety of extended Wigner crystal states at different fractiona…
▽ More
Transition metal dichalcogenide (TMD) moiré heterostructures provide an ideal platform to explore the extended Hubbard model1 where long-range Coulomb interactions play a critical role in determining strongly correlated electron states. This has led to experimental observations of Mott insulator states at half filling2-4 as well as a variety of extended Wigner crystal states at different fractional fillings5-9. Microscopic understanding of these emerging quantum phases, however, is still lacking. Here we describe a novel scanning tunneling microscopy (STM) technique for local sensing and manipulation of correlated electrons in a gated WS2/WSe2 moiré superlattice that enables experimental extraction of fundamental extended Hubbard model parameters. We demonstrate that the charge state of local moiré sites can be imaged by their influence on STM tunneling current, analogous to the charge-sensing mechanism in a single-electron transistor. In addition to imaging, we are also able to manipulate the charge state of correlated electrons. Discharge cascades of correlated electrons in the moiré superlattice are locally induced by ram** the STM bias, thus enabling the nearest-neighbor Coulomb interaction (UNN) to be estimated. 2D map** of the moiré electron charge states also enables us to determine onsite energy fluctuations at different moiré sites. Our technique should be broadly applicable to many semiconductor moiré systems, offering a powerful new tool for microscopic characterization and control of strongly correlated states in moiré superlattices.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Synthesis, Engineering, and Theory of 2D van der Waals Magnets
Authors:
M. Blei,
J. L. Lado,
Q. Song,
D. Dey,
O. Erten,
V. Pardo,
R. Comin,
S. Tongay,
A. S. Botana
Abstract:
Spontaneous magnetic order is a routine instance in three-dimensional (3D) materials but for a long time, it remained elusive in the 2D world. Recently, the first examples of (stand-alone) 2D van der Waals (vdW) crystals with magnetic order, either antiferromagnetic or ferromagnetic, have been reported. In this review, we describe the state of the art of the nascent field of magnetic 2D materials…
▽ More
Spontaneous magnetic order is a routine instance in three-dimensional (3D) materials but for a long time, it remained elusive in the 2D world. Recently, the first examples of (stand-alone) 2D van der Waals (vdW) crystals with magnetic order, either antiferromagnetic or ferromagnetic, have been reported. In this review, we describe the state of the art of the nascent field of magnetic 2D materials focusing on synthesis, engineering, and theory aspects. We also discuss challenges and some of the many different promising directions for future work.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Manipulation of room-temperature Valley-Coherent Exciton-Polaritons in atomically thin crystals by real and artificial magnetic fields
Authors:
Christoph Rupprecht,
Evgeny Sedov,
Martin Klaas,
Heiko Knopf,
Mark Blei,
Nils Lundt,
Sefaattin Tongay,
Takashi Taniguchi,
Kenji Watanabe,
Ulrike Schulz,
Alexey Kavokin,
Falk Eilenberger,
Sven Höfling,
Christian Schneider
Abstract:
Strong spin-orbit coupling and inversion symmetry breaking in transition metal dichalcogenide monolayers yield the intriguing effects of valley-dependent optical selection rules. As such, it is possible to substantially polarize valley excitons with chiral light and furthermore create coherent superpositions of K and K- polarized states. Yet, at ambient conditions dephasing usually becomes too dom…
▽ More
Strong spin-orbit coupling and inversion symmetry breaking in transition metal dichalcogenide monolayers yield the intriguing effects of valley-dependent optical selection rules. As such, it is possible to substantially polarize valley excitons with chiral light and furthermore create coherent superpositions of K and K- polarized states. Yet, at ambient conditions dephasing usually becomes too dominant, and valley coherence typically is not observable. Here, we demonstrate that valley coherence is, however, clearly observable for a single monolayer of WSe2, if it is strongly coupled to the optical mode of a high quality factor microcavity. The azimuthal vector, representing the phase of the valley coherent superposition, can be directly manipulated by applying magnetic fields, and furthermore, it sensibly reacts to the polarization anisotropy of the cavity which represents an artificial magnetic field. Our results are in qualitative and quantitative agreement with our model based on pseudospin rate equations, accounting for both effects of real and pseudo-magnetic fields.
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
Nanoscale Conductivity Imaging of Correlated Electronic States in WSe2/WS2 Moiré Superlattices
Authors:
Zhaodong Chu,
Emma C Regan,
Xuejian Ma,
Danqing Wang,
Zifan Xu,
M. Iqbal Bakti Utama,
Kentaro Yumigeta,
Mark Blei,
Kenji Watanabe,
Takashi Taniguchi,
Sefaattin Tongay,
Feng Wang,
Keji Lai
Abstract:
We report the nanoscale conductivity imaging of correlated electronic states in angle-aligned WSe2/WS2 heterostructures using microwave impedance microscopy. The noncontact microwave probe allows us to observe the Mott insulating state with one hole per moiré unit cell that persists for temperatures up to 150 K, consistent with other characterization techniques. In addition, we identify for the fi…
▽ More
We report the nanoscale conductivity imaging of correlated electronic states in angle-aligned WSe2/WS2 heterostructures using microwave impedance microscopy. The noncontact microwave probe allows us to observe the Mott insulating state with one hole per moiré unit cell that persists for temperatures up to 150 K, consistent with other characterization techniques. In addition, we identify for the first time a Mott insulating state at one electron per moiré unit cell. Appreciable inhomogeneity of the correlated states is directly visualized in the hetero-bilayer region, indicative of local disorders in the moiré superlattice potential or electrostatic do**. Our work provides important insights on 2D moiré systems down to the microscopic level.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Imaging moiré flat bands in 3D reconstructed WSe2/WS2 superlattices
Authors:
Hongyuan Li,
Shaowei Li,
Mit H. Naik,
**gxu Xie,
Xinyu Li,
Jiayin Wang,
Emma Regan,
Danqing Wang,
Wenyu Zhao,
Sihan Zhao,
Salman Kahn,
Kentaro Yumigeta,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Alex Zettl,
Steven G. Louie,
Feng Wang,
Michael F. Crommie
Abstract:
Moiré superlattices in transition metal dichalcogenide (TMD) heterostructures can host novel correlated quantum phenomena due to the interplay of narrow moiré flat bands and strong, long-range Coulomb interactions1-5. However, microscopic knowledge of the atomically-reconstructed moiré superlattice and resulting flat bands is still lacking, which is critical for fundamental understanding and contr…
▽ More
Moiré superlattices in transition metal dichalcogenide (TMD) heterostructures can host novel correlated quantum phenomena due to the interplay of narrow moiré flat bands and strong, long-range Coulomb interactions1-5. However, microscopic knowledge of the atomically-reconstructed moiré superlattice and resulting flat bands is still lacking, which is critical for fundamental understanding and control of the correlated moiré phenomena. Here we quantitatively study the moiré flat bands in three-dimensional (3D) reconstructed WSe2/WS2 moiré superlattices by comparing scanning tunneling spectroscopy (STS) of high quality exfoliated TMD heterostructure devices with ab initio simulations of TMD moiré superlattices. A strong 3D buckling reconstruction accompanied by large in-plane strain redistribution is identified in our WSe2/WS2 moiré heterostructures. STS imaging demonstrates that this results in a remarkably narrow and highly localized K-point moiré flat band at the valence band edge of the heterostructure. A series of moiré flat bands are observed at different energies that exhibit varying degrees of localization. Our observations contradict previous simplified theoretical models but agree quantitatively with ab initio simulations that fully capture the 3D structural reconstruction. Here the strain redistribution and 3D buckling dominate the effective moiré potential and result in moiré flat bands at the Brillouin zone K points.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Exciton-exciton interaction beyond the hydrogenic picture in a MoSe$_2$ monolayer in the strong light-matter coupling regime
Authors:
Petr Stepanov,
Amit Vashisht,
Martin Klaas,
Nils Lundt,
Sefaattin Tongay,
Mark Blei,
Sven Höfling,
Thomas Volz,
Anna Minguzzi,
Julien Renard,
Christian Schneider,
Maxime Richard
Abstract:
In transition metal dichalcogenides layers of atomic scale thickness, the electron-hole Coulomb interaction potential is strongly influenced by the sharp discontinuity of the dielectric function across the layer plane. This feature results in peculiar non-hydrogenic excitonic states, in which exciton-mediated optical nonlinearities are predicted to be enhanced as compared to their hydrogenic count…
▽ More
In transition metal dichalcogenides layers of atomic scale thickness, the electron-hole Coulomb interaction potential is strongly influenced by the sharp discontinuity of the dielectric function across the layer plane. This feature results in peculiar non-hydrogenic excitonic states, in which exciton-mediated optical nonlinearities are predicted to be enhanced as compared to their hydrogenic counterpart. To demonstrate this enhancement, we performed optical transmission spectroscopy of a MoSe$_2$ monolayer placed in the strong coupling regime with the mode of an optical microcavity, and analyzed the results quantitatively with a nonlinear input-output theory. We find an enhancement of both the exciton-exciton interaction and of the excitonic fermionic saturation with respect to realistic values expected in the hydrogenic picture. Such results demonstrate that unconventional excitons in MoSe$_2$ are highly favourable for the implementation of large exciton-mediated optical nonlinearities, potentially working up to room temperature.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Text-Based Ideal Points
Authors:
Keyon Vafa,
Suresh Naidu,
David M. Blei
Abstract:
Ideal point models analyze lawmakers' votes to quantify their political positions, or ideal points. But votes are not the only way to express a political position. Lawmakers also give speeches, release press statements, and post tweets. In this paper, we introduce the text-based ideal point model (TBIP), an unsupervised probabilistic topic model that analyzes texts to quantify the political positi…
▽ More
Ideal point models analyze lawmakers' votes to quantify their political positions, or ideal points. But votes are not the only way to express a political position. Lawmakers also give speeches, release press statements, and post tweets. In this paper, we introduce the text-based ideal point model (TBIP), an unsupervised probabilistic topic model that analyzes texts to quantify the political positions of its authors. We demonstrate the TBIP with two types of politicized text data: U.S. Senate speeches and senator tweets. Though the model does not analyze their votes or political affiliations, the TBIP separates lawmakers by party, learns interpretable politicized topics, and infers ideal points close to the classical vote-based ideal points. One benefit of analyzing texts, as opposed to votes, is that the TBIP can estimate ideal points of anyone who authors political texts, including non-voting actors. To this end, we use it to study tweets from the 2020 Democratic presidential candidates. Using only the texts of their tweets, it identifies them along an interpretable progressive-to-moderate spectrum.
△ Less
Submitted 21 July, 2020; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Confinement of long-lived interlayer excitons in WS$_2$/WSe$_2$ heterostructures
Authors:
Alejandro R. -P. Montblanch,
Dhiren M. Kara,
Ioannis Paradisanos,
Carola M. Purser,
Matthew S. G. Feuer,
Evgeny M. Alexeev,
Lucio Stefan,
Ying Qin,
Mark Blei,
Gang Wang,
Alisson R. Cadore,
Pawel Latawiec,
Marko Lončar,
Sefaattin Tongay,
Andrea C. Ferrari,
Mete Atatüre
Abstract:
Interlayer excitons in layered materials constitute a novel platform to study many-body phenomena arising from long-range interactions between quantum particles. The ability to localise individual interlayer excitons in potential energy traps is a key step towards simulating Hubbard physics in artificial lattices. Here, we demonstrate spatial localisation of long-lived interlayer excitons in a str…
▽ More
Interlayer excitons in layered materials constitute a novel platform to study many-body phenomena arising from long-range interactions between quantum particles. The ability to localise individual interlayer excitons in potential energy traps is a key step towards simulating Hubbard physics in artificial lattices. Here, we demonstrate spatial localisation of long-lived interlayer excitons in a strongly confining trap array using a WS$_{2}$/WSe$_{2}$ heterostructure on a nanopatterned substrate. We detect long-lived interlayer excitons with lifetime approaching 0.2 ms and show that their confinement results in a reduced lifetime in the microsecond range and stronger emission rate with sustained optical selection rules. The combination of a permanent dipole moment, spatial confinement and long lifetime places interlayer excitons in a regime that satisfies one of the requirements for observing long-range dynamics in an optically resolvable trap lattice.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
Towards Clarifying the Theory of the Deconfounder
Authors:
Yixin Wang,
David M. Blei
Abstract:
Wang and Blei (2019) studies multiple causal inference and proposes the deconfounder algorithm. The paper discusses theoretical requirements and presents empirical studies. Several refinements have been suggested around the theory of the deconfounder. Among these, Imai and Jiang clarified the assumption of "no unobserved single-cause confounders." Using their assumption, this paper clarifies the t…
▽ More
Wang and Blei (2019) studies multiple causal inference and proposes the deconfounder algorithm. The paper discusses theoretical requirements and presents empirical studies. Several refinements have been suggested around the theory of the deconfounder. Among these, Imai and Jiang clarified the assumption of "no unobserved single-cause confounders." Using their assumption, this paper clarifies the theory. Furthermore, Ogburn et al. (2020) proposes counterexamples to the theory. But the proposed counterexamples do not satisfy the required assumptions.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Giant Valley-Zeeman Splitting from Spin-Singlet and Spin-Triplet Interlayer Excitons in WSe2/MoSe2 Heterostructure
Authors:
Tianmeng Wang,
Shengnan Miao,
Zhipeng Li,
Yuze Meng,
Zhengguang Lu,
Zhen Lian,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Dmitry Smirnov,
Su-Fei Shi
Abstract:
Transition metal dichalcogenides (TMDCs) heterostructure with a type II alignment hosts unique interlayer excitons with the possibility of spin-triplet and spin-singlet states. However, the associated spectroscopy signatures remain elusive, strongly hindering the understanding of the Moire potential modulation of the interlayer exciton. In this work, we unambiguously identify the spin-singlet and…
▽ More
Transition metal dichalcogenides (TMDCs) heterostructure with a type II alignment hosts unique interlayer excitons with the possibility of spin-triplet and spin-singlet states. However, the associated spectroscopy signatures remain elusive, strongly hindering the understanding of the Moire potential modulation of the interlayer exciton. In this work, we unambiguously identify the spin-singlet and spin-triplet interlayer excitons in the WSe2/MoSe2 hetero-bilayer with a 60-degree twist angle through the gate- and magnetic field-dependent photoluminescence spectroscopy. Both the singlet and triplet interlayer excitons show giant valley-Zeeman splitting between the K and K' valleys, a result of the large Lande g-factor of the singlet interlayer exciton and triplet interlayer exciton, which are experimentally determined to be ~ 10.7 and ~ 15.2, respectively, in good agreement with theoretical expectation. The PL from the singlet and triplet interlayer excitons show opposite helicities, determined by the atomic registry. Helicity-resolved photoluminescence excitation (PLE) spectroscopy study shows that both singlet and triplet interlayer excitons are highly valley-polarized at the resonant excitation, with the valley polarization of the singlet interlayer exciton approaches unity at ~ 20 K. The highly valley-polarized singlet and triplet interlayer excitons with giant valley-Zeeman splitting inspire future applications in spintronics and valleytronics.
△ Less
Submitted 26 December, 2019;
originally announced December 2019.
-
Momentum-Dark Intervalley Exciton in Monolayer Tungsten Diselenide Brightened via Chiral Phonon
Authors:
Zhipeng Li,
Tianmeng Wang,
Chenhao **,
Zhengguang Lu,
Zhen Lian,
Yuze Meng,
Mark Blei,
Mengnan Gao,
Takashi Taniguchi,
Kenji Watanabe,
Tianhui Ren,
Ting Cao,
Sefaattin Tongay,
Dmitry Smirnov,
Lifa Zhang,
Su-Fei Shi
Abstract:
Inversion symmetry breaking and three-fold rotation symmetry grant the valley degree of freedom to the robust exciton in monolayer transition metal dichalcogenides (TMDCs), which can be exploited for valleytronics applications. However, the short lifetime of the exciton significantly constrains the possible applications. In contrast, dark exciton could be long-lived but does not necessarily posses…
▽ More
Inversion symmetry breaking and three-fold rotation symmetry grant the valley degree of freedom to the robust exciton in monolayer transition metal dichalcogenides (TMDCs), which can be exploited for valleytronics applications. However, the short lifetime of the exciton significantly constrains the possible applications. In contrast, dark exciton could be long-lived but does not necessarily possess the valley degree of freedom. In this work, we report the identification of the momentum-dark, intervalley exciton in monolayer WSe2 through low-temperature magneto-photoluminescence (PL) spectra. Interestingly, the intervalley exciton is brightened through the emission of a chiral phonon at the corners of the Brillouin zone (K point), and the pseudoangular momentum (PAM) of the phonon is transferred to the emitted photon to preserve the valley information. The chiral phonon energy is determined to be ~ 23 meV, based on the experimentally extracted exchange interaction (~ 7 meV), in excellent agreement with the theoretical expectation of 24.6 meV. The long-lived intervalley exciton with valley degree of freedom adds an exciting quasiparticle for valleytronics, and the coupling between the chiral phonon and intervalley exciton furnishes a venue for valley spin manipulation.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
Poisson-Randomized Gamma Dynamical Systems
Authors:
Aaron Schein,
Scott W. Linderman,
Mingyuan Zhou,
David M. Blei,
Hanna Wallach
Abstract:
This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness. The PRGDS is based on a new motif in Bayesian latent variable modeling, an alternating chain of discrete Poisson and continuous gamma latent states that is analytically convenient and computationally tractabl…
▽ More
This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness. The PRGDS is based on a new motif in Bayesian latent variable modeling, an alternating chain of discrete Poisson and continuous gamma latent states that is analytically convenient and computationally tractable. This motif yields closed-form complete conditionals for all variables by way of the Bessel distribution and a novel discrete distribution that we call the shifted confluent hypergeometric distribution. We draw connections to closely related models and compare the PRGDS to these models in studies of real-world count data sets of text, international events, and neural spike trains. We find that a sparse variant of the PRGDS, which allows the continuous gamma latent states to take values of exactly zero, often obtains better predictive performance than other models and is uniquely capable of inferring latent structures that are highly localized in time.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Optical detection of Mott and generalized Wigner crystal states in WSe2/WS2 moiré superlattices
Authors:
Emma C. Regan,
Danqing Wang,
Chenhao **,
M. Iqbal Bakti Utama,
Beini Gao,
Xin Wei,
Sihan Zhao,
Wenyu Zhao,
Kentaro Yumigeta,
Mark Blei,
Johan Carlstroem,
Kenji Watanabe,
Takashi Taniguchi,
Sefaattin Tongay,
Michael Crommie,
Alex Zettl,
Feng Wang
Abstract:
Moiré superlattices are emerging as a new route for engineering strongly correlated electronic states in two-dimensional van der Waals heterostructures, as recently demonstrated in the correlated insulating and superconducting states in magic-angle twisted bilayer graphene and ABC trilayer graphene/boron nitride moiré superlattices. Transition metal dichalcogenide (TMDC) moiré heterostructures pro…
▽ More
Moiré superlattices are emerging as a new route for engineering strongly correlated electronic states in two-dimensional van der Waals heterostructures, as recently demonstrated in the correlated insulating and superconducting states in magic-angle twisted bilayer graphene and ABC trilayer graphene/boron nitride moiré superlattices. Transition metal dichalcogenide (TMDC) moiré heterostructures provide another exciting model system to explore correlated quantum phenomena, with the addition of strong light-matter interactions and large spin-orbital coupling. Here we report the optical detection of strongly correlated phases in semiconducting WSe2/WS2 moiré superlattices. Our sensitive optical detection technique reveals a Mott insulator state at one hole per superlattice site (ν = 1), and surprising insulating phases at fractional filling factors ν = 1/3 and 2/3, which we assign to generalized Wigner crystallization on an underlying lattice. Furthermore, the unique spin-valley optical selection rules of TMDC heterostructures allow us to optically create and investigate low-energy spin excited states in the Mott insulator. We reveal an especially slow spin relaxation lifetime of many microseconds in the Mott insulating state, orders-of-magnitude longer than that of charge excitations. Our studies highlight novel correlated physics that can emerge in moiré superlattices beyond graphene.
△ Less
Submitted 20 October, 2019;
originally announced October 2019.
-
The Blessings of Multiple Causes: A Reply to Ogburn et al. (2019)
Authors:
Yixin Wang,
David M. Blei
Abstract:
Ogburn et al. (2019, arXiv:1910.05438) discuss "The Blessings of Multiple Causes" (Wang and Blei, 2018, arXiv:1805.06826). Many of their remarks are interesting. But they also claim that the paper has "foundational errors" and that its "premise is...incorrect." These claims are not substantiated. There are no foundational errors; the premise is correct.
Ogburn et al. (2019, arXiv:1910.05438) discuss "The Blessings of Multiple Causes" (Wang and Blei, 2018, arXiv:1805.06826). Many of their remarks are interesting. But they also claim that the paper has "foundational errors" and that its "premise is...incorrect." These claims are not substantiated. There are no foundational errors; the premise is correct.
△ Less
Submitted 20 December, 2019; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Prescribed Generative Adversarial Networks
Authors:
Adji B. Dieng,
Francisco J. R. Ruiz,
David M. Blei,
Michalis K. Titsias
Abstract:
Generative adversarial networks (GANs) are a powerful approach to unsupervised learning. They have achieved state-of-the-art performance in the image domain. However, GANs are limited in two ways. They often learn distributions with low support---a phenomenon known as mode collapse---and they do not guarantee the existence of a probability density, which makes evaluating generalization using predi…
▽ More
Generative adversarial networks (GANs) are a powerful approach to unsupervised learning. They have achieved state-of-the-art performance in the image domain. However, GANs are limited in two ways. They often learn distributions with low support---a phenomenon known as mode collapse---and they do not guarantee the existence of a probability density, which makes evaluating generalization using predictive log-likelihood impossible. In this paper, we develop the prescribed GAN (PresGAN) to address these shortcomings. PresGANs add noise to the output of a density network and optimize an entropy-regularized adversarial loss. The added noise renders tractable approximations of the predictive log-likelihood and stabilizes the training procedure. The entropy regularizer encourages PresGANs to capture all the modes of the data distribution. Fitting PresGANs involves computing the intractable gradients of the entropy regularization term; PresGANs sidestep this intractability using unbiased stochastic estimates. We evaluate PresGANs on several datasets and found they mitigate mode collapse and generate samples with high perceptual quality. We further found that PresGANs reduce the gap in performance in terms of predictive log-likelihood between traditional GANs and variational autoencoders (VAEs).
△ Less
Submitted 9 October, 2019;
originally announced October 2019.
-
Direct Observation of Gate Tunable Dark Trions in Monolayer WSe2
Authors:
Zhipeng Li,
Tianmeng Wang,
Zhengguang Lu,
Mandeep Khatoniar,
Zhen Lian,
Yuze Meng,
Mark Blei,
Takashi Taniguchi,
Kenji Watanabe,
Stephen A. McGill,
Sefaattin Tongay,
Vinod M. Menon,
Dmitry Smirnov,
Su-Fei Shi
Abstract:
Spin-forbidden intravalley dark exciton in tungsten-based transition metal dichalcogenides (TMDCs), owing to its unique spin texture and long lifetime, has attracted intense research interest. Here, we show that we can control the dark exciton electrostatically by dressing it with one free electron or free hole, forming the dark trions. The existence of the dark trions is suggested by the unique m…
▽ More
Spin-forbidden intravalley dark exciton in tungsten-based transition metal dichalcogenides (TMDCs), owing to its unique spin texture and long lifetime, has attracted intense research interest. Here, we show that we can control the dark exciton electrostatically by dressing it with one free electron or free hole, forming the dark trions. The existence of the dark trions is suggested by the unique magneto-photoluminescence spectroscopy pattern of the boron nitride (BN) encapsulated monolayer WSe2 device at low temperature. The unambiguous evidence of the dark trions is further obtained by directly resolving the radiation pattern of the dark trions through back focal plane imaging. The dark trions possess binding energy of ~ 15 meV, and it inherits the long lifetime and large g-factor from the dark exciton. Interestingly, under the out-of-plane magnetic field, dressing the dark exciton with one free electron or hole results in distinctively different valley polarization of the emitted phonon, a result of the different intervalley scattering mechanism for the electron and hole. Finally, the lifetime of the positive dark trion can be further tuned from ~ 50 to ~ 215 ps by controlling the gate voltage. The gate tunable dark trions ushers in new opportunities for excitonic optoelectronics and valleytronics.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
Population Predictive Checks
Authors:
Gemma E. Moran,
David M. Blei,
Rajesh Ranganath
Abstract:
Bayesian modeling helps applied researchers articulate assumptions about their data and develop models tailored for specific applications. Thanks to good methods for approximate posterior inference, researchers can now easily build, use, and revise complicated Bayesian models for large and rich data. These capabilities, however, bring into focus the problem of model criticism. Researchers need too…
▽ More
Bayesian modeling helps applied researchers articulate assumptions about their data and develop models tailored for specific applications. Thanks to good methods for approximate posterior inference, researchers can now easily build, use, and revise complicated Bayesian models for large and rich data. These capabilities, however, bring into focus the problem of model criticism. Researchers need tools to diagnose the fitness of their models, to understand where they fall short, and to guide their revision. In this paper we develop a new method for Bayesian model criticism, the population predictive check (Pop-PC). Pop-PCs are built on posterior predictive checks (PPCs), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. However, PPCs use the data twice -- both to calculate the posterior predictive and to evaluate it -- which can lead to overconfident assessments of the quality of a model. Pop-PCs, in contrast, compare the posterior predictive distribution to a draw from the population distribution, a heldout dataset. This method blends Bayesian modeling with frequenting assessment. Unlike the PPC, we prove that the Pop-PC is properly calibrated. Empirically, we study Pop-PC on classical regression and a hierarchical model of text data.
△ Less
Submitted 15 July, 2022; v1 submitted 2 August, 2019;
originally announced August 2019.
-
The Dynamic Embedded Topic Model
Authors:
Adji B. Dieng,
Francisco J. R. Ruiz,
David M. Blei
Abstract:
Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative model of documents that combines dynamic latent Dirichlet allocation (D-LDA) and word embeddings. The D-ETM models each word with a categorical distribution par…
▽ More
Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative model of documents that combines dynamic latent Dirichlet allocation (D-LDA) and word embeddings. The D-ETM models each word with a categorical distribution parameterized by the inner product between the word embedding and a per-time-step embedding representation of its assigned topic. The D-ETM learns smooth topic trajectories by defining a random walk prior over the embedding representations of the topics. We fit the D-ETM using structured amortized variational inference with a recurrent neural network. On three different corpora---a collection of United Nations debates, a set of ACL abstracts, and a dataset of Science Magazine articles---we found that the D-ETM outperforms D-LDA on a document completion task. We further found that the D-ETM learns more diverse and coherent topics than D-LDA while requiring significantly less time to fit.
△ Less
Submitted 10 October, 2019; v1 submitted 11 July, 2019;
originally announced July 2019.
-
Topic Modeling in Embedding Spaces
Authors:
Adji B. Dieng,
Francisco J. R. Ruiz,
David M. Blei
Abstract:
Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings. In particular, it models each word with a categorical dist…
▽ More
Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings. In particular, it models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. To fit the ETM, we develop an efficient amortized variational inference algorithm. The ETM discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation (LDA), in terms of both topic quality and predictive performance.
△ Less
Submitted 7 July, 2019;
originally announced July 2019.
-
A Bayesian Model of Dose-Response for Cancer Drug Studies
Authors:
Wesley Tansey,
Christopher Tosh,
David M. Blei
Abstract:
Exploratory cancer drug studies test multiple tumor cell lines against multiple candidate drugs. The goal in each paired (cell line, drug) experiment is to map out the dose-response curve of the cell line as the dose level of the drug increases. We propose Bayesian Tensor Filtering (BTF), a hierarchical Bayesian model for dose-response modeling in multi-sample, multi-treatment cancer drug studies.…
▽ More
Exploratory cancer drug studies test multiple tumor cell lines against multiple candidate drugs. The goal in each paired (cell line, drug) experiment is to map out the dose-response curve of the cell line as the dose level of the drug increases. We propose Bayesian Tensor Filtering (BTF), a hierarchical Bayesian model for dose-response modeling in multi-sample, multi-treatment cancer drug studies. BTF uses low-dimensional embeddings to share statistical strength between similar drugs and similar cell lines. Structured shrinkage priors in BTF encourage smoothness in the dose-response curves while remaining adaptive to sharp jumps when the data call for it. We focus on a pair of cancer drug studies exhibiting a particular pathology in their experimental design, leading us to a non-conjugate monotone mixture-of-Gammas likelihood. To perform posterior inference, we develop a variant of the elliptical slice sampling algorithm for sampling from linearly-constrained multivariate normal priors with non-conjugate likelihoods. In benchmarks, BTF outperforms state-of-the-art methods for covariance regression and dynamic Poisson matrix factorization. On the two cancer drug studies, BTF outperforms the current standard approach in biology and reveals potential new biomarkers of drug sensitivity in cancer. Code is available at https://github.com/tansey/functionalmf.
△ Less
Submitted 22 March, 2021; v1 submitted 10 June, 2019;
originally announced June 2019.