Skip to main content

Showing 1–17 of 17 results for author: Hartford, J

.
  1. arXiv:2405.20482  [pdf, other

    cs.LG stat.ML

    Sparsity regularization via tree-structured environments for disentangled representations

    Authors: Elliot Layne, Jason Hartford, Sébastien Lachapelle, Mathieu Blanchette, Dhanya Sridhar

    Abstract: Many causal systems such as biological processes in cells can only be observed indirectly via measurements, such as gene expression. Causal representation learning -- the task of correctly map** low-level observations to latent causal variables -- could advance scientific understanding by enabling inference of latent variables such as pathway activation. In this paper, we develop methods for inf… ▽ More

    Submitted 10 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2405.19985  [pdf, other

    stat.ME cs.LG

    Targeted Sequential Indirect Experiment Design

    Authors: Elisabeth Ailer, Niclas Dern, Jason Hartford, Niki Kilbertus

    Abstract: Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variabl… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2404.01595  [pdf, other

    cs.LG stat.ME stat.ML

    Propensity Score Alignment of Unpaired Multimodal Data

    Authors: Johnny Xi, Jason Hartford

    Abstract: Multimodal representation learning techniques typically rely on paired samples to learn common representations, but paired samples are challenging to collect in fields such as biology where measurement devices often destroy the samples. This paper presents an approach to address the challenge of aligning unpaired samples across disparate modalities in multimodal representation learning. We draw an… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  4. arXiv:2402.18545  [pdf, other

    cs.CY

    Crowdsourcing Dermatology Images with Google Search Ads: Creating a Real-World Skin Condition Dataset

    Authors: Abbi Ward, Jimmy Li, Julie Wang, Sriram Lakshminarasimhan, Ashley Carrick, Bilson Campana, Jay Hartford, Pradeep Kumar S, Tiya Tiyasirichokchai, Sunny Virmani, Renee Wong, Yossi Matias, Greg S. Corrado, Dale R. Webster, Dawn Siegel, Steven Lin, Justin Ko, Alan Karthikesalingam, Christopher Semturs, Pooja Rao

    Abstract: Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in the real world, impacting research, medical education, and artificial intelligence (AI) tool development. Dermatology is a suitable area to develop and test a new and scalable method to create representative health datasets. Methods: We used Google Search advertisements to invite contribution… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  5. arXiv:2310.19054  [pdf, other

    cs.LG

    Object-centric architectures enable efficient causal representation learning

    Authors: Amin Mansouri, Jason Hartford, Yan Zhang, Yoshua Bengio

    Abstract: Causal representation learning has showed a variety of settings in which we can disentangle latent variables with identifiability guarantees (up to some reasonable equivalence class). Common to all of these approaches is the assumption that (1) the latent variables are represented as $d$-dimensional vectors, and (2) that the observations are the output of some injective generative function of thes… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

  6. arXiv:2309.05981  [pdf, other

    cs.LG

    Learning Unbiased News Article Representations: A Knowledge-Infused Approach

    Authors: Sadia Kamal, Jimmy Hartford, Jeremy Willis, Arunkumar Bagavathi

    Abstract: Quantification of the political leaning of online news articles can aid in understanding the dynamics of political ideology in social groups and measures to mitigating them. However, predicting the accurate political leaning of a news article with machine learning models is a challenging task. This is due to (i) the political ideology of a news article is defined by several factors, and (ii) the i… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  7. arXiv:2302.05684  [pdf, other

    stat.ME cs.LG stat.ML

    Sequential Underspecified Instrument Selection for Cause-Effect Estimation

    Authors: Elisabeth Ailer, Jason Hartford, Niki Kilbertus

    Abstract: Instrumental variable (IV) methods are used to estimate causal effects in settings with unobserved confounding, where we cannot directly experiment on the treatment variable. Instruments are variables which only affect the outcome indirectly via the treatment variable(s). Most IV applications focus on low-dimensional treatments and crucially require at least as many instruments as treatments. This… ▽ More

    Submitted 25 May, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: Code for this paper is available at https://github.com/EAiler/underspecified-iv

  8. arXiv:2302.04178  [pdf, other

    cs.LG cs.AI

    DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets

    Authors: Lazar Atanackovic, Alexander Tong, Bo Wang, Leo J. Lee, Yoshua Bengio, Jason Hartford

    Abstract: One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and their products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG),… ▽ More

    Submitted 22 December, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

  9. GFlowNets for AI-Driven Scientific Discovery

    Authors: Moksh Jain, Tristan Deleu, Jason Hartford, Cheng-Hao Liu, Alex Hernandez-Garcia, Yoshua Bengio

    Abstract: Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the pace of scientific discovery. While science has traditionally relied on trial and error and even serendipity to a large extent, the last few decades have seen a surge of data-driven scientific discoveries. However, in order to truly leverage large-scale data se… ▽ More

    Submitted 27 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: 31 pages, 5 figures. Updated with camera-ready changes

  10. arXiv:2211.12581  [pdf, other

    cs.AI cs.LG

    UNSAT Solver Synthesis via Monte Carlo Forest Search

    Authors: Chris Cameron, Jason Hartford, Taylor Lundy, Tuan Truong, Alan Milligan, Rex Chen, Kevin Leyton-Brown

    Abstract: We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in {tree MDPs}, for which policy execution involves traversing an exponential-sized tree. Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula; and finding the optimal solution to a mixed-integer… ▽ More

    Submitted 25 May, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

  11. arXiv:2206.01101  [pdf, other

    cs.LG stat.ML

    Weakly Supervised Representation Learning with Sparse Perturbations

    Authors: Kartik Ahuja, Jason Hartford, Yoshua Bengio

    Abstract: The theory of representation learning aims to build methods that provably invert the data generating process with minimal domain knowledge or any source of supervision. Most prior approaches require strong distributional assumptions on the latent variables and weak supervision (auxiliary information such as timestamps) to provide provable identification guarantees. In this work, we show that if on… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  12. arXiv:2110.15796  [pdf, other

    cs.LG cs.AI stat.ML

    Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning

    Authors: Kartik Ahuja, Jason Hartford, Yoshua Bengio

    Abstract: A key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties. Existing work that provably achieves this goal relies on strong assumptions on relationships between the latent variables (e.g., independence conditional on auxiliary information). In this paper, we take a very different perspective on the problem and ask, "Can we instead i… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  13. arXiv:2106.10349  [pdf, other

    cs.LG cs.AI math.OC

    The Perils of Learning Before Optimizing

    Authors: Chris Cameron, Jason Hartford, Taylor Lundy, Kevin Leyton-Brown

    Abstract: Formulating real-world optimization problems often begins with making predictions from historical data (e.g., an optimizer that aims to recommend fast routes relies upon travel-time predictions). Typically, learning the prediction model used to generate the optimization problem and solving that problem are performed in two separate stages. Recent work has showed how such prediction models can be l… ▽ More

    Submitted 16 December, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  14. arXiv:2011.01285  [pdf, other

    cs.LG cs.CL

    Exemplar Guided Active Learning

    Authors: Jason Hartford, Kevin Leyton-Brown, Hadas Raviv, Dan Padnos, Shahar Lev, Barak Lenz

    Abstract: We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. We are motivated by the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in t… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: Published at NeurIPS 2020

  15. arXiv:2006.11386  [pdf, other

    stat.ME cs.LG econ.EM stat.ML

    Valid Causal Inference with (Some) Invalid Instruments

    Authors: Jason Hartford, Victor Veitch, Dhanya Sridhar, Kevin Leyton-Brown

    Abstract: Instrumental variable methods provide a powerful approach to estimating causal effects in the presence of unobserved confounding. But a key challenge when applying them is the reliance on untestable "exclusion" assumptions that rule out any relationship between the instrument variable and the response that is not mediated by the treatment. In this paper, we show how to perform consistent IV estima… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  16. arXiv:1803.02879  [pdf, other

    stat.ML cs.LG

    Deep Models of Interactions Across Sets

    Authors: Jason Hartford, Devon R Graham, Kevin Leyton-Brown, Siamak Ravanbakhsh

    Abstract: We use deep learning to model interactions across two or more sets of objects, such as user-movie ratings, protein-drug bindings, or ternary user-item-tag interactions. The canonical representation of such interactions is a matrix (or a higher-dimensional tensor) with an exchangeability property: the encoding's meaning is not changed by permuting rows or columns. We argue that models should hence… ▽ More

    Submitted 8 June, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

  17. arXiv:1612.09596  [pdf, other

    stat.AP cs.LG stat.ML

    Counterfactual Prediction with Deep Instrumental Variables Networks

    Authors: Jason Hartford, Greg Lewis, Kevin Leyton-Brown, Matt Taddy

    Abstract: We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction, and they can be combined in AI systems to solve co… ▽ More

    Submitted 30 December, 2016; originally announced December 2016.