-
Large language models surpass human experts in predicting neuroscience results
Authors:
Xiaoliang Luo,
Akilles Rechardt,
Guangzhi Sun,
Kevin K. Nejad,
Felipe Yáñez,
Bati Yilmaz,
Kangjoo Lee,
Alexandra O. Cohen,
Valentina Borghesani,
Anton Pashkov,
Daniele Marinazzo,
Jonathan Nicholas,
Alessandro Salatiello,
Ilia Sucholutsky,
Pasquale Minervini,
Sepehr Razavi,
Roberta Rocca,
Elkhan Yusifov,
Tereza Okalova,
Nianlong Gu,
Martin Ferianc,
Mikail Khona,
Kaustubh R. Patil,
Pui-Shee Lee,
Rui Mata
, et al. (14 additional authors not shown)
Abstract:
Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain…
▽ More
Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.
△ Less
Submitted 21 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Estimation of Optical Aberrations in 3D Microscopic Bioimages
Authors:
Kira Vinogradova,
Eugene W. Myers
Abstract:
The quality of microscopy images often suffers from optical aberrations. These aberrations and their associated point spread functions have to be quantitatively estimated to restore aberrated images. The recent state-of-the-art method PhaseNet, based on a convolutional neural network, can quantify aberrations accurately but is limited to images of point light sources, e.g. fluorescent beads. In th…
▽ More
The quality of microscopy images often suffers from optical aberrations. These aberrations and their associated point spread functions have to be quantitatively estimated to restore aberrated images. The recent state-of-the-art method PhaseNet, based on a convolutional neural network, can quantify aberrations accurately but is limited to images of point light sources, e.g. fluorescent beads. In this research, we describe an extension of PhaseNet enabling its use on 3D images of biological samples. To this end, our method incorporates object-specific information into the simulated images used for training the network. Further, we add a Python-based restoration of images via Richardson-Lucy deconvolution. We demonstrate that the deconvolution with the predicted PSF can not only remove the simulated aberrations but also improve the quality of the real raw microscopic images with unknown residual PSF. We provide code for fast and convenient prediction and correction of aberrations.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Blindspots in Python and Java APIs Result in Vulnerable Code
Authors:
Yuriy Brun,
Tian Lin,
Jessie Elise Somerville,
Elisha Myers,
Natalie C. Ebner
Abstract:
Blindspots in APIs can cause software engineers to introduce vulnerabilities, but such blindspots are, unfortunately, common. We study the effect APIs with blindspots have on developers in two languages by replicating an 109-developer, 24-Java-API controlled experiment. Our replication applies to Python and involves 129 new developers and 22 new APIs. We find that using APIs with blindspots statis…
▽ More
Blindspots in APIs can cause software engineers to introduce vulnerabilities, but such blindspots are, unfortunately, common. We study the effect APIs with blindspots have on developers in two languages by replicating an 109-developer, 24-Java-API controlled experiment. Our replication applies to Python and involves 129 new developers and 22 new APIs. We find that using APIs with blindspots statistically significantly reduces the developers' ability to correctly reason about the APIs in both languages, but that the effect is more pronounced for Python. Interestingly, for Java, the effect increased with complexity of the code relying on the API, whereas for Python, the opposite was true. Whether the developers considered API uses to be more difficult, less clear, and less familiar did not have an effect on their ability to correctly reason about them. Developers with better long-term memory recall were more likely to correctly reason about APIs with blindspots, but short-term memory, processing speed, episodic memory, and memory span had no effect. Surprisingly, professional experience and expertice did not improve the developers' ability to reason about APIs with blindspots across both languages, with long-term professionals with many years of experience making mistakes as often as relative novices. Finally, personality traits did not significantly affect the Python developers' ability to reason about APIs with blindspots, but less extraverted and more open developers were better at reasoning about Java APIs with blindspots. Overall, our findings suggest that blindspots in APIs are a serious problem across languages, and that experience and education alone do not overcome that problem, suggesting that tools are needed to help developers recognize blindspots in APIs as they write code that uses those APIs.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Practical sensorless aberration estimation for 3D microscopy with deep learning
Authors:
Debayan Saha,
Uwe Schmidt,
Qinrong Zhang,
Aurelien Barbotin,
Qi Hu,
Na Ji,
Martin J. Booth,
Martin Weigert,
Eugene W. Myers
Abstract:
Estimation of optical aberrations from volumetric intensity images is a key step in sensorless adaptive optics for 3D microscopy. Recent approaches based on deep learning promise accurate results at fast processing speeds. However, collecting ground truth microscopy data for training the network is typically very difficult or even impossible thereby limiting this approach in practice. Here, we dem…
▽ More
Estimation of optical aberrations from volumetric intensity images is a key step in sensorless adaptive optics for 3D microscopy. Recent approaches based on deep learning promise accurate results at fast processing speeds. However, collecting ground truth microscopy data for training the network is typically very difficult or even impossible thereby limiting this approach in practice. Here, we demonstrate that neural networks trained only on simulated data yield accurate predictions for real experimental images. We validate our approach on simulated and experimental datasets acquired with two different microscopy modalities, and also compare the results to non-learned methods. Additionally, we study the predictability of individual aberrations with respect to their data requirements and find that the symmetry of the wavefront plays a crucial role. Finally, we make our implementation freely available as open source software in Python.
△ Less
Submitted 5 July, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
A Simple Methodology for Computing Families of Algorithms
Authors:
Devangi N. Parikh,
Margaret E. Myers,
Richard Vuduc,
Robert A. van de Geijn
Abstract:
Discovering "good" algorithms for an operation is often considered an art best left to experts. What if there is a simple methodology, an algorithm, for systematically deriving a family of algorithms as well as their cost analyses, so that the best algorithm can be chosen? We discuss such an approach for deriving loop-based algorithms. The example used to illustrate this methodology, evaluation of…
▽ More
Discovering "good" algorithms for an operation is often considered an art best left to experts. What if there is a simple methodology, an algorithm, for systematically deriving a family of algorithms as well as their cost analyses, so that the best algorithm can be chosen? We discuss such an approach for deriving loop-based algorithms. The example used to illustrate this methodology, evaluation of a polynomial, is itself simple yet the best algorithm that results is surprising to a non-expert: Horner's rule. We finish by discussing recent advances that make this approach highly practical for the domain of high-performance linear algebra software libraries.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Deriving Correct High-Performance Algorithms
Authors:
Devangi N. Parikh,
Maggie E. Myers,
Robert A. van de Geijn
Abstract:
Dijkstra observed that verifying correctness of a program is difficult and conjectured that derivation of a program hand-in-hand with its proof of correctness was the answer. We illustrate this goal-oriented approach by applying it to the domain of dense linear algebra libraries for distributed memory parallel computers. We show that algorithms that underlie the implementation of most functionalit…
▽ More
Dijkstra observed that verifying correctness of a program is difficult and conjectured that derivation of a program hand-in-hand with its proof of correctness was the answer. We illustrate this goal-oriented approach by applying it to the domain of dense linear algebra libraries for distributed memory parallel computers. We show that algorithms that underlie the implementation of most functionality for this domain can be systematically derived to be correct. The benefit is that an entire family of algorithms for an operation is discovered so that the best algorithm for a given architecture can be chosen. This approach is very practical: Ideas inspired by it have been used to rewrite the dense linear algebra software stack starting below the Basic Linear Algebra Subprograms (BLAS) and reaching up through the Elemental distributed memory library, and every level in between. The paper demonstrates how formal methods and rigorous mathematical techniques for correctness impact HPC.
△ Less
Submitted 11 October, 2017;
originally announced October 2017.
-
Efficient Algorithms for Moral Lineage Tracing
Authors:
Markus Rempfler,
Jan-Hendrik Lange,
Florian Jug,
Corinna Blasse,
Eugene W. Myers,
Bjoern H. Menze,
Bjoern Andres
Abstract:
Lineage tracing, the joint segmentation and tracking of living cells as they move and divide in a sequence of light microscopy images, is a challenging task. Jug et al. have proposed a mathematical abstraction of this task, the moral lineage tracing problem (MLTP), whose feasible solutions define both a segmentation of every image and a lineage forest of cells. Their branch-and-cut algorithm, howe…
▽ More
Lineage tracing, the joint segmentation and tracking of living cells as they move and divide in a sequence of light microscopy images, is a challenging task. Jug et al. have proposed a mathematical abstraction of this task, the moral lineage tracing problem (MLTP), whose feasible solutions define both a segmentation of every image and a lineage forest of cells. Their branch-and-cut algorithm, however, is prone to many cuts and slow convergence for large instances. To address this problem, we make three contributions: (i) we devise the first efficient primal feasible local search algorithms for the MLTP, (ii) we improve the branch-and-cut algorithm by separating tighter cutting planes and by incorporating our primal algorithms, (iii) we show in experiments that our algorithms find accurate solutions on the problem instances of Jug et al. and scale to larger instances, leveraging moral lineage tracing to practical significance.
△ Less
Submitted 25 August, 2017; v1 submitted 14 February, 2017;
originally announced February 2017.
-
Moral Lineage Tracing
Authors:
Florian Jug,
Evgeny Levinkov,
Corinna Blasse,
Eugene W. Myers,
Bjoern Andres
Abstract:
Lineage tracing, the tracking of living cells as they move and divide, is a central problem in biological image analysis. Solutions, called lineage forests, are key to understanding how the structure of multicellular organisms emerges. We propose an integer linear program (ILP) whose feasible solutions define a decomposition of each image in a sequence into cells (segmentation), and a lineage fore…
▽ More
Lineage tracing, the tracking of living cells as they move and divide, is a central problem in biological image analysis. Solutions, called lineage forests, are key to understanding how the structure of multicellular organisms emerges. We propose an integer linear program (ILP) whose feasible solutions define a decomposition of each image in a sequence into cells (segmentation), and a lineage forest of cells across images (tracing). Unlike previous formulations, we do not constrain the set of decompositions, except by contracting pixels to superpixels. The main challenge, as we show, is to enforce the morality of lineages, i.e., the constraint that cells do not merge. To enforce morality, we introduce path-cut inequalities. To find feasible solutions of the NP-hard ILP, with certified bounds to the global optimum, we define efficient separation procedures and apply these as part of a branch-and-cut algorithm. We show the effectiveness of this approach by analyzing feasible solutions for real microscopy data in terms of bounds and run-time, and by their weighted edit distance to ground truth lineage forests traced by humans.
△ Less
Submitted 8 November, 2016; v1 submitted 17 November, 2015;
originally announced November 2015.
-
Map** Auto-context Decision Forests to Deep ConvNets for Semantic Segmentation
Authors:
David L. Richmond,
Dagmar Kainmueller,
Michael Y. Yang,
Eugene W. Myers,
Carsten Rother
Abstract:
We consider the task of pixel-wise semantic segmentation given a small set of labeled training images. Among two of the most popular techniques to address this task are Decision Forests (DF) and Neural Networks (NN). In this work, we explore the relationship between two special forms of these techniques: stacked DFs (namely Auto-context) and deep Convolutional Neural Networks (ConvNet). Our main c…
▽ More
We consider the task of pixel-wise semantic segmentation given a small set of labeled training images. Among two of the most popular techniques to address this task are Decision Forests (DF) and Neural Networks (NN). In this work, we explore the relationship between two special forms of these techniques: stacked DFs (namely Auto-context) and deep Convolutional Neural Networks (ConvNet). Our main contribution is to show that Auto-context can be mapped to a deep ConvNet with novel architecture, and thereby trained end-to-end. This map** can be used as an initialization of a deep ConvNet, enabling training even in the face of very limited amounts of training data. We also demonstrate an approximate map** back from the refined ConvNet to a second stacked DF, with improved performance over the original. We experimentally verify that these map**s outperform stacked DFs for two different applications in computer vision and biology: Kinect-based body part labeling from depth images, and somite segmentation in microscopy images of develo** zebrafish. Finally, we revisit the core map** from a Decision Tree (DT) to a NN, and show that it is also possible to map a fuzzy DT, with sigmoidal split decisions, to a NN. This addresses multiple limitations of the previous map**, and yields new insights into the popular Rectified Linear Unit (ReLU), and more recently proposed concatenated ReLU (CReLU), activation functions.
△ Less
Submitted 13 August, 2018; v1 submitted 27 July, 2015;
originally announced July 2015.