Search | arXiv e-print repository

Omitted Labels in Causality: A Study of Paradoxes

Authors: Bijan Mazaheri, Siddharth Jain, Matthew Cook, Jehoshua Bruck

Abstract: We explore what we call ``omitted label contexts,'' in which training data is limited to a subset of the possible labels. This setting is common among specialized human experts or specific focused studies. We lean on well-studied paradoxes (Simpson's and Condorcet) to illustrate the more general difficulties of causal inference in omitted label contexts. Contrary to the fundamental principles on w… ▽ More We explore what we call ``omitted label contexts,'' in which training data is limited to a subset of the possible labels. This setting is common among specialized human experts or specific focused studies. We lean on well-studied paradoxes (Simpson's and Condorcet) to illustrate the more general difficulties of causal inference in omitted label contexts. Contrary to the fundamental principles on which much of causal inference is built, we show that ``correct'' adjustments sometimes require non-exchangeable treatment and control groups. These pitfalls lead us to the study networks of conclusions drawn from different contexts and the structures the form, proving an interesting connection between these networks and social choice theory. △ Less

Submitted 23 May, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

arXiv:2308.09312 [pdf, other]

Path Signatures for Seizure Forecasting

Authors: Jonas F. Haderlein, Andre D. H. Peterson, Parvin Zarei Eskikand, Mark J. Cook, Anthony N. Burkitt, Iven M. Y. Mareels, David B. Grayden

Abstract: Predicting future system behaviour from past observed behaviour (time series) is fundamental to science and engineering. In computational neuroscience, the prediction of future epileptic seizures from brain activity measurements, using EEG data, remains largely unresolved despite much dedicated research effort. Based on a longitudinal and state-of-the-art data set using intercranial EEG measuremen… ▽ More Predicting future system behaviour from past observed behaviour (time series) is fundamental to science and engineering. In computational neuroscience, the prediction of future epileptic seizures from brain activity measurements, using EEG data, remains largely unresolved despite much dedicated research effort. Based on a longitudinal and state-of-the-art data set using intercranial EEG measurements from people with epilepsy, we consider the automated discovery of predictive features (or biomarkers) to forecast seizures in a patient-specific way. To this end, we use the path signature, a recent development in the analysis of data streams, to map from measured time series to seizure prediction. The predictor is based on linear classification, here augmented with sparsity constraints, to discern time series with and without an impending seizure. This approach may be seen as a step towards a generic pattern recognition pipeline where the main advantages are simplicity and ease of customisation, while maintaining forecasting performance on par with modern machine learning. Nevertheless, it turns out that although the path signature method has some powerful theoretical guarantees, appropriate time series statistics can achieve essentially the same results in our context of seizure prediction. This suggests that, due to their inherent complexity and non-stationarity, the brain's dynamics are not identifiable from the available EEG measurement data, and, more concretely, epileptic episode prediction is not reliably achieved using EEG measurement data alone. △ Less

Submitted 23 October, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2302.13231 [pdf]

A Synthetic Texas Backbone Power System with Climate-Dependent Spatio-Temporal Correlated Profiles

Authors: ** Lu, Xingpeng Li, Hongyi Li, Taher Chegini, Carlos Gamarra, Y. C. Ethan Yang, Margaret Cook, Gavin Dillingham

Abstract: Most power system test cases only have electrical parameters and can be used only for studies based on a snapshot of system profiles. To facilitate more comprehensive and practical studies, a synthetic power system including spatio-temporal correlated profiles for the entire year of 2019 at one-hour resolution has been created in this work. This system, referred to as the synthetic Texas 123-bus b… ▽ More Most power system test cases only have electrical parameters and can be used only for studies based on a snapshot of system profiles. To facilitate more comprehensive and practical studies, a synthetic power system including spatio-temporal correlated profiles for the entire year of 2019 at one-hour resolution has been created in this work. This system, referred to as the synthetic Texas 123-bus backbone transmission (TX-123BT) system, has very similar temporal and spatial characteristics with the actual Electric Reliability Council of Texas (ERCOT) system. It has a backbone network consisting of only high-voltage transmission lines in Texas, which is obtained by the K-medoids clustering method. The climate data extracted from the North American Land Data Assimilation System (NLDAS) are used to create the climate-dependent profiles of renewable generation and transmission thermal limits. Two climate-dependent models are implemented to determine wind and solar power production pro-files respectively. In addition, two sets of climate-dependent dy-namic line rating (DLR) profiles are created with the actual climate information: (i) daily DLR and (ii) hourly DLR. Simulation results of security-constrained unit commitment (SCUC) conducted on each of the daily system profiles have validated the developed one-year hourly time series dataset. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 10 pages, 14 figures, 12 tables

arXiv:2007.01263 [pdf, other]

Outlier Detection through Null Space Analysis of Neural Networks

Authors: Matthew Cook, Alina Zare, Paul Gader

Abstract: Many machine learning classification systems lack competency awareness. Specifically, many systems lack the ability to identify when outliers (e.g., samples that are distinct from and not represented in the training data distribution) are being presented to the system. The ability to detect outliers is of practical significance since it can help the system behave in an reasonable way when encounte… ▽ More Many machine learning classification systems lack competency awareness. Specifically, many systems lack the ability to identify when outliers (e.g., samples that are distinct from and not represented in the training data distribution) are being presented to the system. The ability to detect outliers is of practical significance since it can help the system behave in an reasonable way when encountering unexpected data. In prior work, outlier detection is commonly carried out in a processing pipeline that is distinct from the classification model. Thus, for a complete system that incorporates outlier detection and classification, two models must be trained, increasing the overall complexity of the approach. In this paper we use the concept of the null space to integrate an outlier detection method directly into a neural network used for classification. Our method, called Null Space Analysis (NuSA) of neural networks, works by computing and controlling the magnitude of the null space projection as data is passed through a network. Using these projections, we can then calculate a score that can differentiate between normal and abnormal data. Results are shown that indicate networks trained with NuSA retain their classification performance while also being able to detect outliers at rates similar to commonly used outlier detection algorithms. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: 6 pages, 4 figures, Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning

arXiv:2002.00228 [pdf, other]

Estimation of Z-Thickness and XY-Anisotropy of Electron Microscopy Images using Gaussian Processes

Authors: Thanuja D. Ambegoda, Julien N. P. Martel, Jozef Adamcik, Matthew Cook, Richard H. R. Hahnloser

Abstract: Serial section electron microscopy (ssEM) is a widely used technique for obtaining volumetric information of biological tissues at nanometer scale. However, accurate 3D reconstructions of identified cellular structures and volumetric quantifications require precise estimates of section thickness and anisotropy (or stretching) along the XY imaging plane. In fact, many image processing algorithms si… ▽ More Serial section electron microscopy (ssEM) is a widely used technique for obtaining volumetric information of biological tissues at nanometer scale. However, accurate 3D reconstructions of identified cellular structures and volumetric quantifications require precise estimates of section thickness and anisotropy (or stretching) along the XY imaging plane. In fact, many image processing algorithms simply assume isotropy within the imaging plane. To ameliorate this problem, we present a method for estimating thickness and stretching of electron microscopy sections using non-parametric Bayesian regression of image statistics. We verify our thickness and stretching estimates using direct measurements obtained by atomic force microscopy (AFM) and show that our method has a lower estimation error compared to a recent indirect thickness estimation method as well as a relative Z coordinate estimation method. Furthermore, we have made the first dataset of ssSEM images with directly measured section thickness values publicly available for the evaluation of indirect thickness estimation methods. △ Less

Submitted 4 February, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

Journal ref: Journal of Neuroinformatics and Neuroimaging. 2018;2(2):15-22

arXiv:1904.01014 [pdf, other]

doi 10.1117/12.2519484

Comparison of Possibilistic Fuzzy Local Information C-Means and Possibilistic K-Nearest Neighbors for Synthetic Aperture Sonar Image Segmentation

Authors: Joshua Peeples, Matthew Cook, Daniel Suen, Alina Zare, James Keller

Abstract: Synthetic aperture sonar (SAS) imagery can generate high resolution images of the seafloor. Thus, segmentation algorithms can be used to partition the images into different seafloor environments. In this paper, we compare two possibilistic segmentation approaches. Possibilistic approaches allow for the ability to detect novel or outlier environments as well as well known classes. The Possibilistic… ▽ More Synthetic aperture sonar (SAS) imagery can generate high resolution images of the seafloor. Thus, segmentation algorithms can be used to partition the images into different seafloor environments. In this paper, we compare two possibilistic segmentation approaches. Possibilistic approaches allow for the ability to detect novel or outlier environments as well as well known classes. The Possibilistic Fuzzy Local Information C-Means (PFLICM) algorithm has been previously applied to segment SAS imagery. Additionally, the Possibilistic K-Nearest Neighbors (PKNN) algorithm has been used in other domains such as landmine detection and hyperspectral imagery. In this paper, we compare the segmentation performance of a semi-supervised approach using PFLICM and a supervised method using Possibilistic K-NN. We include final segmentation results on multiple SAS images and a quantitative assessment of each algorithm. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Journal ref: Proc. SPIE 110120, Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XXIV (10 May 2019)

Showing 1–6 of 6 results for author: Cook, M