-
HyperMagNet: A Magnetic Laplacian based Hypergraph Neural Network
Authors:
Tatyana Benko,
Martin Buck,
Ilya Amburg,
Stephen J. Young,
Sinan G. Aksoy
Abstract:
In data science, hypergraphs are natural models for data exhibiting multi-way relations, whereas graphs only capture pairwise. Nonetheless, many proposed hypergraph neural networks effectively reduce hypergraphs to undirected graphs via symmetrized matrix representations, potentially losing important information. We propose an alternative approach to hypergraph neural networks in which the hypergr…
▽ More
In data science, hypergraphs are natural models for data exhibiting multi-way relations, whereas graphs only capture pairwise. Nonetheless, many proposed hypergraph neural networks effectively reduce hypergraphs to undirected graphs via symmetrized matrix representations, potentially losing important information. We propose an alternative approach to hypergraph neural networks in which the hypergraph is represented as a non-reversible Markov chain. We use this Markov chain to construct a complex Hermitian Laplacian matrix - the magnetic Laplacian - which serves as the input to our proposed hypergraph neural network. We study HyperMagNet for the task of node classification, and demonstrate its effectiveness over graph-reduction based hypergraph neural networks.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Data Models for Dataset Drift Controls in Machine Learning With Optical Images
Authors:
Luis Oala,
Marco Aversa,
Gabriel Nobis,
Kurt Willis,
Yoan Neuenschwander,
Michèle Buck,
Christian Matek,
Jerome Extermann,
Enrico Pomarico,
Wojciech Samek,
Roderick Murray-Smith,
Christoph Clausen,
Bruno Sanguinetti
Abstract:
Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. Wh…
▽ More
Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of the primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models. We demonstrate how such data models can be constructed for image data and used to control downstream machine learning model performance related to dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases to power model selection and targeted generalization. Second, the gradient connection between machine learning task model and data model allows advanced, precise tolerancing of task model sensitivity to changes in the data generation. These drift forensics can be used to precisely specify the acceptable data environments in which a task model may be run. Third, drift optimization opens up the possibility to create drifts that can help the task model learn better faster, effectively optimizing the data generating process itself. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.
△ Less
Submitted 7 May, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
A Simple Search Problem
Authors:
Marshall Buck,
Doug Wiedemann
Abstract:
A simple problem is studied in which there are N boxes and a prize known to be in one of the boxes. Furthermore, the probability that the prize is in any box is given. It is desired to find the prize with minimal expected work, where it takes one unit of work to open a box and look inside. The paper establishes bounds on the minimal work in terms of the $p=1/2$ Hölder norm of the probability densi…
▽ More
A simple problem is studied in which there are N boxes and a prize known to be in one of the boxes. Furthermore, the probability that the prize is in any box is given. It is desired to find the prize with minimal expected work, where it takes one unit of work to open a box and look inside. The paper establishes bounds on the minimal work in terms of the $p=1/2$ Hölder norm of the probability density and in terms of the entropy of the probability density. We also introduce the notion of "Cartesian product" of problems, and determine the asymptotic behavior of the minimal work for the $n$th power of a problem.
(This article is a newly typeset version of an internal publication written in 1984. The second author passed away on November 12, 2020, and his estate has approved the submission of this paper.)
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Similar Image Search for Histopathology: SMILY
Authors:
Narayan Hegde,
Jason D. Hipp,
Yun Liu,
Michael E. Buck,
Emily Reif,
Daniel Smilkov,
Michael Terry,
Carrie J. Cai,
Mahul B. Amin,
Craig H. Mermel,
Phil Q. Nelson,
Lily H. Peng,
Greg S. Corrado,
Martin C. Stumpe
Abstract:
The increasing availability of large institutional and public histopathology image datasets is enabling the searching of these datasets for diagnosis, research, and education. Though these datasets typically have associated metadata such as diagnosis or clinical notes, even carefully curated datasets rarely contain annotations of the location of regions of interest on each image. Because pathology…
▽ More
The increasing availability of large institutional and public histopathology image datasets is enabling the searching of these datasets for diagnosis, research, and education. Though these datasets typically have associated metadata such as diagnosis or clinical notes, even carefully curated datasets rarely contain annotations of the location of regions of interest on each image. Because pathology images are extremely large (up to 100,000 pixels in each dimension), further laborious visual search of each image may be needed to find the feature of interest. In this paper, we introduce a deep learning based reverse image search tool for histopathology images: Similar Medical Images Like Yours (SMILY). We assessed SMILY's ability to retrieve search results in two ways: using pathologist-provided annotations, and via prospective studies where pathologists evaluated the quality of SMILY search results. As a negative control in the second evaluation, pathologists were blinded to whether search results were retrieved by SMILY or randomly. In both types of assessments, SMILY was able to retrieve search results with similar histologic features, organ site, and prostate cancer Gleason grade compared with the original query. SMILY may be a useful general-purpose tool in the pathologist's arsenal, to improve the efficiency of searching large archives of histopathology images, without the need to develop and implement specific tools for each application.
△ Less
Submitted 5 February, 2019; v1 submitted 30 January, 2019;
originally announced January 2019.