-
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Authors:
Jacob Russin,
Sam Whitman McGrath,
Danielle J. Williams,
Lotem Elber-Dorozko
Abstract:
Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to di…
▽ More
Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs), which share the same fundamental design principles as their predecessors, have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, large language models (LLMs), DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs -- all behaviors thought to require compositional processing. In this chapter, we survey recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience, situating recent breakthroughs within the broader context of philosophical arguments about compositionality. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities: (1) architectural inductive biases, and (2) metalearning, or learning to learn. We also present findings suggesting that LLM pretraining can be understood as a kind of metalearning, and can thereby equip DNNs with compositional generalization abilities in a similar way. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition and by suggesting avenues for future research.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Multiple Realizability and the Rise of Deep Learning
Authors:
Sam Whitman McGrath,
Jacob Russin
Abstract:
The multiple realizability thesis holds that psychological states may be implemented in a diversity of physical systems. The deep learning revolution seems to be bringing this possibility to life, offering the most plausible examples of man-made realizations of sophisticated cognitive functions to date. This paper explores the implications of deep learning models for the multiple realizability the…
▽ More
The multiple realizability thesis holds that psychological states may be implemented in a diversity of physical systems. The deep learning revolution seems to be bringing this possibility to life, offering the most plausible examples of man-made realizations of sophisticated cognitive functions to date. This paper explores the implications of deep learning models for the multiple realizability thesis. Among other things, it challenges the widely held view that multiple realizability entails that the study of the mind can and must be pursued independently of the study of its implementation in the brain or in artificial analogues. Although its central contribution is philosophical, the paper has substantial methodological upshots for contemporary cognitive science, suggesting that deep neural networks may play a crucial role in formulating and evaluating hypotheses about cognition, even if they are interpreted as implementation-level models. In the age of deep learning, multiple realizability possesses a renewed significance.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
CausalMetaR: An R package for performing causally interpretable meta-analyses
Authors:
Guanbo Wang,
Sean McGrath,
Yi Lian
Abstract:
Researchers would often like to leverage data from a collection of sources (e.g., primary studies in a meta-analysis) to estimate causal effects in a target population of interest. However, traditional meta-analytic methods do not produce causally interpretable estimates for a well-defined target population. In this paper, we present the CausalMetaR R package, which implements efficient and robust…
▽ More
Researchers would often like to leverage data from a collection of sources (e.g., primary studies in a meta-analysis) to estimate causal effects in a target population of interest. However, traditional meta-analytic methods do not produce causally interpretable estimates for a well-defined target population. In this paper, we present the CausalMetaR R package, which implements efficient and robust methods to estimate causal effects in a given internal or external target population using multi-source data. The package includes estimators of average and subgroup treatment effects for the entire target population. To produce efficient and robust estimates of causal effects, the package implements doubly robust and non-parametric efficient estimators and supports using flexible data-adaptive (e.g., machine learning techniques) methods and cross-fitting techniques to estimate the nuisance models (e.g., the treatment model, the outcome model). We describe the key features of the package and demonstrate how to use the package through an example.
△ Less
Submitted 2 July, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
LEI: Livestock Event Information Schema for Enabling Data Sharing
Authors:
Mahir Habib,
Muhammad Ashad Kabir,
Lihong Zheng,
Shawn McGrath
Abstract:
Data-driven advances have resulted in significant improvements in dairy production. However, the meat industry has lagged behind in adopting data-driven approaches, underscoring the crucial need for data standardisation to facilitate seamless data transmission to maximise productivity, save costs, and increase market access. To address this gap, we propose a novel data schema, Livestock Event Info…
▽ More
Data-driven advances have resulted in significant improvements in dairy production. However, the meat industry has lagged behind in adopting data-driven approaches, underscoring the crucial need for data standardisation to facilitate seamless data transmission to maximise productivity, save costs, and increase market access. To address this gap, we propose a novel data schema, Livestock Event Information (LEI) schema, designed to accurately and uniformly record livestock events. LEI complies with the International Committee for Animal Recording (ICAR) and Integrity System Company (ISC) schemas to deliver this data standardisation and enable data sharing between producers and consumers. To validate the superiority of LEI, we conducted a structural metrics analysis and a comprehensive case study. The analysis demonstrated that LEI outperforms the ICAR and ISC schemas in terms of design, while the case study confirmed its superior ability to capture livestock event information. Our findings lay the foundation for the implementation of the LEI schema, unlocking the potential for data-driven advances in livestock management. Moreover, LEI's versatility opens avenues for future expansion into other agricultural domains, encompassing poultry, fisheries, and crops. The adoption of LEI promises substantial benefits, including improved data accuracy, reduced costs, and increased productivity, heralding a new era of sustainability in the meat industry.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
metamedian: An R package for meta-analyzing studies reporting medians
Authors:
Sean McGrath,
XiaoFei Zhao,
Omer Ozturk,
Stephan Katzenschlager,
Russell Steele,
Andrea Benedetti
Abstract:
When performing an aggregate data meta-analysis of a continuous outcome, researchers often come across primary studies that report the sample median of the outcome. However, standard meta-analytic methods typically cannot be directly applied in this setting. In recent years, there has been substantial development in statistical methods to incorporate primary studies reporting sample medians in met…
▽ More
When performing an aggregate data meta-analysis of a continuous outcome, researchers often come across primary studies that report the sample median of the outcome. However, standard meta-analytic methods typically cannot be directly applied in this setting. In recent years, there has been substantial development in statistical methods to incorporate primary studies reporting sample medians in meta-analysis, yet there are currently no comprehensive software tools implementing these methods. In this paper, we present the metamedian R package, a freely available and open-source software tool for meta-analyzing primary studies that report sample medians. We summarize the main features of the software and illustrate its application through real data examples involving risk factors for a severe course of COVID-19.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Nuisance Function Tuning for Optimal Doubly Robust Estimation
Authors:
Sean McGrath,
Rajarshi Mukherjee
Abstract:
Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal…
▽ More
Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. For several plug-in type estimators and a one-step type estimator, we illustrate the interplay between different tuning parameter choices for the nuisance function estimators and sample splitting strategies on the optimal rate of estimating the functional of interest. For each of these estimators and each sample splitting strategy, we show the necessity to undersmooth the nuisance function estimators under low regularity conditions to obtain optimal rates of convergence for the functional of interest. By performing suitable nuisance function tuning and sample splitting strategies, we show that some of these estimators can achieve minimax rates of convergence in all Hölder smoothness classes of the nuisance functions.
△ Less
Submitted 29 May, 2024; v1 submitted 30 December, 2022;
originally announced December 2022.
-
Automatic Cattle Identification using YOLOv5 and Mosaic Augmentation: A Comparative Analysis
Authors:
Rabin Dulal,
Lihong Zheng,
Muhammad Ashad Kabir,
Shawn McGrath,
Jonathan Medway,
Dave Swain,
Will Swain
Abstract:
You Only Look Once (YOLO) is a single-stage object detection model popular for real-time object detection, accuracy, and speed. This paper investigates the YOLOv5 model to identify cattle in the yards. The current solution to cattle identification includes radio-frequency identification (RFID) tags. The problem occurs when the RFID tag is lost or damaged. A biometric solution identifies the cattle…
▽ More
You Only Look Once (YOLO) is a single-stage object detection model popular for real-time object detection, accuracy, and speed. This paper investigates the YOLOv5 model to identify cattle in the yards. The current solution to cattle identification includes radio-frequency identification (RFID) tags. The problem occurs when the RFID tag is lost or damaged. A biometric solution identifies the cattle and helps to assign the lost or damaged tag or replace the RFID-based system. Muzzle patterns in cattle are unique biometric solutions like a fingerprint in humans. This paper aims to present our recent research in utilizing five popular object detection models, looking at the architecture of YOLOv5, investigating the performance of eight backbones with the YOLOv5 model, and the influence of mosaic augmentation in YOLOv5 by experimental results on the available cattle muzzle images. Finally, we concluded with the excellent potential of using YOLOv5 in automatic cattle identification. Our experiments show YOLOv5 with transformer performed best with mean Average Precision (mAP) 0.5 (the average of AP when the IoU is greater than 50%) of 0.995, and mAP 0.5:0.95 (the average of AP from 50% to 95% IoU with an interval of 5%) of 0.9366. In addition, our experiments show the increase in accuracy of the model by using mosaic augmentation in all backbones used in our experiments. Moreover, we can also detect cattle with partial muzzle images.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
A Systematic Review of Machine Learning Techniques for Cattle Identification: Datasets, Methods and Future Directions
Authors:
Md Ekramul Hossain,
Muhammad Ashad Kabir,
Lihong Zheng,
Dave L. Swain,
Shawn McGrath,
Jonathan Medway
Abstract:
Increased biosecurity and food safety requirements may increase demand for efficient traceability and identification systems of livestock in the supply chain. The advanced technologies of machine learning and computer vision have been applied in precision livestock management, including critical disease detection, vaccination, production management, tracking, and health monitoring. This paper offe…
▽ More
Increased biosecurity and food safety requirements may increase demand for efficient traceability and identification systems of livestock in the supply chain. The advanced technologies of machine learning and computer vision have been applied in precision livestock management, including critical disease detection, vaccination, production management, tracking, and health monitoring. This paper offers a systematic literature review (SLR) of vision-based cattle identification. More specifically, this SLR is to identify and analyse the research related to cattle identification using Machine Learning (ML) and Deep Learning (DL). For the two main applications of cattle detection and cattle identification, all the ML based papers only solve cattle identification problems. However, both detection and identification problems were studied in the DL based papers. Based on our survey report, the most used ML models for cattle identification were support vector machine (SVM), k-nearest neighbour (KNN), and artificial neural network (ANN). Convolutional neural network (CNN), residual network (ResNet), Inception, You Only Look Once (YOLO), and Faster R-CNN were popular DL models in the selected papers. Among these papers, the most distinguishing features were the muzzle prints and coat patterns of cattle. Local binary pattern (LBP), speeded up robust features (SURF), scale-invariant feature transform (SIFT), and Inception or CNN were identified as the most used feature extraction methods.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Standard error estimation in meta-analysis of studies reporting medians
Authors:
Sean McGrath,
Stephan Katzenschlager,
Alexandra J. Zimmer,
Alexander Seitel,
Russell Steele,
Andrea Benedetti
Abstract:
We consider the setting of an aggregate data meta-analysis of a continuous outcome of interest. When the distribution of the outcome is skewed, it is often the case that some primary studies report the sample mean and standard deviation of the outcome and other studies report the sample median along with the first and third quartiles and/or minimum and maximum values. To perform meta-analysis in t…
▽ More
We consider the setting of an aggregate data meta-analysis of a continuous outcome of interest. When the distribution of the outcome is skewed, it is often the case that some primary studies report the sample mean and standard deviation of the outcome and other studies report the sample median along with the first and third quartiles and/or minimum and maximum values. To perform meta-analysis in this context, a number of approaches have recently been developed to impute the sample mean and standard deviation from studies reporting medians. Then, standard meta-analytic approaches with inverse-variance weighting are applied based on the (imputed) study-specific sample means and standard deviations. In this paper, we illustrate how this common practice can severely underestimate the within-study standard errors, which results in overestimation of between-study heterogeneity in random effects meta-analyses. We propose a straightforward bootstrap approach to estimate the standard errors of the imputed sample means. Our simulation study illustrates how the proposed approach can improve estimation of the within-study standard errors and between-study heterogeneity. Moreover, we apply the proposed approach in a meta-analysis to identify risk factors of a severe course of COVID-19.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Revisiting the g-null paradox
Authors:
Sean McGrath,
Jessica G. Young,
Miguel A. Hernán
Abstract:
The parametric g-formula is an approach to estimating causal effects of sustained treatment strategies from observational data. An often cited limitation of the parametric g-formula is the g-null paradox: a phenomenon in which model misspecification in the parametric g-formula is guaranteed under the conditions that motivate its use (i.e., when identifiability conditions hold and measured time-var…
▽ More
The parametric g-formula is an approach to estimating causal effects of sustained treatment strategies from observational data. An often cited limitation of the parametric g-formula is the g-null paradox: a phenomenon in which model misspecification in the parametric g-formula is guaranteed under the conditions that motivate its use (i.e., when identifiability conditions hold and measured time-varying confounders are affected by past treatment). Many users of the parametric g-formula know they must acknowledge the g-null paradox as a limitation when reporting results but still require clarity on its meaning and implications. Here we revisit the g-null paradox to clarify its role in causal inference studies. In doing so, we present analytic examples and a simulation-based illustration of the bias of parametric g-formula estimates under the conditions associated with this paradox. Our results highlight the importance of avoiding overly parsimonious models for the components of the g-formula when using this method.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making
Authors:
Sean McGrath,
Parth Mehta,
Alexandra Zytek,
Isaac Lage,
Himabindu Lakkaraju
Abstract:
As machine learning (ML) models are increasingly being employed to assist human decision makers, it becomes critical to provide these decision makers with relevant inputs which can help them decide if and how to incorporate model predictions into their decision making. For instance, communicating the uncertainty associated with model predictions could potentially be helpful in this regard. In this…
▽ More
As machine learning (ML) models are increasingly being employed to assist human decision makers, it becomes critical to provide these decision makers with relevant inputs which can help them decide if and how to incorporate model predictions into their decision making. For instance, communicating the uncertainty associated with model predictions could potentially be helpful in this regard. In this work, we carry out user studies (1,330 responses from 190 participants) to systematically assess how people with differing levels of expertise respond to different types of predictive uncertainty (i.e., posterior predictive distributions with different shapes and variances) in the context of ML assisted decision making for predicting apartment rental prices. We found that showing posterior predictive distributions led to smaller disagreements with the ML model's predictions, regardless of the shapes and variances of the posterior predictive distributions we considered, and that these effects may be sensitive to expertise in both ML and the domain. This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of distribution and the expertise of the human.
△ Less
Submitted 12 June, 2023; v1 submitted 11 November, 2020;
originally announced November 2020.
-
gfoRmula: An R package for estimating effects of general time-varying treatment interventions via the parametric g-formula
Authors:
Victoria Lin,
Sean McGrath,
Zilu Zhang,
Lucia C. Petito,
Roger W. Logan,
Miguel A. Hernán,
Jessica G. Young
Abstract:
Researchers are often interested in using longitudinal data to estimate the causal effects of hypothetical time-varying treatment interventions on the mean or risk of a future outcome. Standard regression/conditioning methods for confounding control generally fail to recover causal effects when time-varying confounders are themselves affected by past treatment. In such settings, estimators derived…
▽ More
Researchers are often interested in using longitudinal data to estimate the causal effects of hypothetical time-varying treatment interventions on the mean or risk of a future outcome. Standard regression/conditioning methods for confounding control generally fail to recover causal effects when time-varying confounders are themselves affected by past treatment. In such settings, estimators derived from Robins's g-formula may recover time-varying treatment effects provided sufficient covariates are measured to control confounding by unmeasured risk factors. The package gfoRmula implements in R one such estimator: the parametric g-formula. This estimator easily adapts to binary or continuous time-varying treatments as well as contrasts defined by static or dynamic, deterministic or random treatment interventions, as well as interventions that depend on the natural value of treatment. The package accommodates survival outcomes as well as binary or continuous end of follow-up outcomes. For survival outcomes, the package has different options for handling competing events. This paper describes the gfoRmula package, along with motivating background, features, and examples.
△ Less
Submitted 29 October, 2019; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis
Authors:
Sean McGrath,
XiaoFei Zhao,
Russell Steele,
Brett D. Thombs,
Andrea Benedetti,
the DEPRESsion Screening Data,
Collaboration
Abstract:
Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median…
▽ More
Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the sample mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Two-sample aggregate data meta-analysis of medians
Authors:
Sean McGrath,
Hojoon Sohn,
Russell Steele,
Andrea Benedetti
Abstract:
We consider the problem of meta-analyzing two-group studies that report the median of the outcome. Often, these studies are excluded from meta-analysis because there are no well-established statistical methods to pool the difference of medians. To include these studies in meta-analysis, several authors have recently proposed methods to estimate the sample mean and standard deviation from the media…
▽ More
We consider the problem of meta-analyzing two-group studies that report the median of the outcome. Often, these studies are excluded from meta-analysis because there are no well-established statistical methods to pool the difference of medians. To include these studies in meta-analysis, several authors have recently proposed methods to estimate the sample mean and standard deviation from the median, sample size, and several commonly reported measures of spread. Researchers frequently apply these methods to estimate the difference of means and its variance for each primary study and pool the difference of means using inverse variance weighting. In this work, we develop several methods to directly meta-analyze the difference of medians. We conduct a simulation study evaluating the performance of the proposed median-based methods and the competing transformation-based methods. The simulation results show that the median-based methods outperform the transformation-based methods when meta-analyzing studies that report the median of the outcome, especially when the outcome is skewed. Moreover, we illustrate the various methods on a real-life data set.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
One-sample aggregate data meta-analysis of medians
Authors:
Sean McGrath,
XiaoFei Zhao,
Zhi Zhen Qin,
Russell Steele,
Andrea Benedetti
Abstract:
An aggregate data meta-analysis is a statistical method that pools the summary statistics of several selected studies to estimate the outcome of interest. When considering a continuous outcome, typically each study must report the same measure of the outcome variable and its spread (e.g., the sample mean and its standard error). However, some studies may instead report the median along with variou…
▽ More
An aggregate data meta-analysis is a statistical method that pools the summary statistics of several selected studies to estimate the outcome of interest. When considering a continuous outcome, typically each study must report the same measure of the outcome variable and its spread (e.g., the sample mean and its standard error). However, some studies may instead report the median along with various measures of spread. Recently, the task of incorporating medians in meta-analysis has been achieved by estimating the sample mean and its standard error from each study that reports a median in order to meta-analyze the means. In this paper, we propose two alternative approaches to meta-analyze data that instead rely on medians. We systematically compare these approaches via simulation study to each other and to methods that transform the study-specific medians and spread into sample means and their standard errors. We demonstrate that the proposed median-based approaches perform better than the transformation-based approaches, especially when applied to skewed data and data with high inter-study variance. In addition, when meta-analyzing data that consists of medians, we show that the median-based approaches perform considerably better than or comparably to the best-case scenario for a transformation approach: conducting a meta-analysis using the actual sample mean and standard error of the mean of each study. Finally, we illustrate these approaches in a meta-analysis of patient delay in tuberculosis diagnosis.
△ Less
Submitted 15 December, 2017; v1 submitted 9 September, 2017;
originally announced September 2017.
-
Study of ATLAS sensitivity to FCNC top decays
Authors:
J. Carvalho,
N. Castro,
L. Chikovani,
T. Djobava,
J. Dodd,
S. McGrath,
A. Onofre,
J. Parsons,
F. Veloso
Abstract:
The ATLAS experiment sensitivity to top quark Flavour Changing Neutral Current (FCNC) decays was studied at LHC using ttbar events. While one of the top quarks is expected to follow the dominant Standard Model decay t->bW, the other decays through a FCNC channel, i.e. t-> Z u(c), t-> gamma u(c) or t-> g u(c). Different types of analyses, applied to each FCNC decay mode, were compared. The FCNC b…
▽ More
The ATLAS experiment sensitivity to top quark Flavour Changing Neutral Current (FCNC) decays was studied at LHC using ttbar events. While one of the top quarks is expected to follow the dominant Standard Model decay t->bW, the other decays through a FCNC channel, i.e. t-> Z u(c), t-> gamma u(c) or t-> g u(c). Different types of analyses, applied to each FCNC decay mode, were compared. The FCNC branching ratio sensitivity (assuming a 5sigma signal significance) and 95% confidence level limits on the branching ratios (in the hypothesis of signal absence) were obtained.
△ Less
Submitted 7 December, 2007;
originally announced December 2007.