Search | arXiv e-print repository

Scaling Laws for Galaxy Images

Authors: Mike Walmsley, Micah Bowles, Anna M. M. Scaife, Jason Shingirai Makechemu, Alexander J. Gordon, Annette M. N. Ferguson, Robert G. Mann, James Pearson, Jürgen J. Popp, Jo Bovy, Josh Speagle, Hugh Dickinson, Lucy Fortson, Tobias Géron, Sandor Kruk, Chris J. Lintott, Kameswara Mantha, Devina Mohan, David O'Ryan, Inigo V. Slijepevic

Abstract: We present the first systematic investigation of supervised scaling laws outside of an ImageNet-like context - on images of galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo volunteers, comparable in scale to Imagenet-1K. We find that adding annotated galaxy images provides a power law improvement in performance across all architectures and all tasks, while adding trainab… ▽ More We present the first systematic investigation of supervised scaling laws outside of an ImageNet-like context - on images of galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo volunteers, comparable in scale to Imagenet-1K. We find that adding annotated galaxy images provides a power law improvement in performance across all architectures and all tasks, while adding trainable parameters is effective only for some (typically more subjectively challenging) tasks. We then compare the downstream performance of finetuned models pretrained on either ImageNet-12k alone vs. additionally pretrained on our galaxy images. We achieve an average relative error rate reduction of 31% across 5 downstream tasks of scientific interest. Our finetuned models are more label-efficient and, unlike their ImageNet-12k-pretrained equivalents, often achieve linear transfer performance equal to that of end-to-end finetuning. We find relatively modest additional downstream benefits from scaling model size, implying that scaling alone is not sufficient to address our domain gap, and suggest that practitioners with qualitatively different images might benefit more from in-domain adaption followed by targeted downstream labelling. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 10+6 pages, 12 figures. Appendix C2 based on arxiv:2206.11927. Code, demos, documentation at https://github.com/mwalmsley/zoobot

arXiv:2303.18017 [pdf, other]

Rapid prediction of lab-grown tissue properties using deep learning

Authors: Allison E. Andrews, Hugh Dickinson, James P. Hague

Abstract: The interactions between cells and the extracellular matrix are vital for the self-organisation of tissues. In this paper we present proof-of-concept to use machine learning tools to predict the role of this mechanobiology in the self-organisation of cell-laden hydrogels grown in tethered moulds. We develop a process for the automated generation of mould designs with and without key symmetries. We… ▽ More The interactions between cells and the extracellular matrix are vital for the self-organisation of tissues. In this paper we present proof-of-concept to use machine learning tools to predict the role of this mechanobiology in the self-organisation of cell-laden hydrogels grown in tethered moulds. We develop a process for the automated generation of mould designs with and without key symmetries. We create a large training set with $N=6500$ cases by running detailed biophysical simulations of cell-matrix interactions using the contractile network dipole orientation (CONDOR) model for the self-organisation of cellular hydrogels within these moulds. These are used to train an implementation of the \texttt{pix2pix} deep learning model, reserving $740$ cases that were unseen in the training of the neural network for training and validation. Comparison between the predictions of the machine learning technique and the reserved predictions from the biophysical algorithm show that the machine learning algorithm makes excellent predictions. The machine learning algorithm is significantly faster than the biophysical method, opening the possibility of very high throughput rational design of moulds for pharmaceutical testing, regenerative medicine and fundamental studies of biology. Future extensions for scaffolds and 3D bioprinting will open additional applications. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: 26 Pages, 11 Figures

arXiv:2110.12735 [pdf, other]

doi 10.1093/mnras/stac525

Practical Galaxy Morphology Tools from Deep Supervised Representation Learning

Authors: Mike Walmsley, Anna M. M. Scaife, Chris Lintott, Michelle Lochner, Verlon Etsebeth, Tobias Géron, Hugh Dickinson, Lucy Fortson, Sandor Kruk, Karen L. Masters, Kameswara Bharadwaj Mantha, Brooke D. Simmons

Abstract: Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several rec… ▽ More Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several recent approaches at practical tasks crucial for investigating large galaxy samples. The first task is identifying galaxies of similar morphology to a query galaxy. Given a single galaxy assigned a free text tag by humans (e.g. "#diffuse"), we can find galaxies matching that tag for most tags. The second task is identifying the most interesting anomalies to a particular researcher. Our approach is 100% accurate at identifying the most interesting 100 anomalies (as judged by Galaxy Zoo 2 volunteers). The third task is adapting a model to solve a new task using only a small number of newly-labelled galaxies. Models fine-tuned from our representation are better able to identify ring galaxies than models fine-tuned from terrestrial images (ImageNet) or trained from scratch. We solve each task with very few new labels; either one (for the similarity search) or several hundred (for anomaly detection or fine-tuning). This challenges the longstanding view that deep supervised methods require new large labelled datasets for practical use in astronomy. To help the community benefit from our pretrained models, we release our fine-tuning code Zoobot. Zoobot is accessible to researchers with no prior experience in deep learning. △ Less

Submitted 8 June, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: 20 pages plus appendix. Accepted to MNRAS (open-access DOI below). Code, documentation, pretrained models: https://github.com/mwalmsley/zoobot (PyTorch and TensorFlow)

Journal ref: MNRAS Volume 513, Issue 2, June 2022, Pages 1581-1599

arXiv:1905.07424 [pdf, other]

doi 10.1093/mnras/stz2816

Galaxy Zoo: Probabilistic Morphology through Bayesian CNNs and Active Learning

Authors: Mike Walmsley, Lewis Smith, Chris Lintott, Yarin Gal, Steven Bamford, Hugh Dickinson, Lucy Fortson, Sandor Kruk, Karen Masters, Claudia Scarlata, Brooke Simmons, Rebecca Smethurst, Darryl Wright

Abstract: We use Bayesian convolutional neural networks and a novel generative model of Galaxy Zoo volunteer responses to infer posteriors for the visual morphology of galaxies. Bayesian CNN can learn from galaxy images with uncertain labels and then, for previously unlabelled galaxies, predict the probability of each possible label. Our posteriors are well-calibrated (e.g. for predicting bars, we achieve c… ▽ More We use Bayesian convolutional neural networks and a novel generative model of Galaxy Zoo volunteer responses to infer posteriors for the visual morphology of galaxies. Bayesian CNN can learn from galaxy images with uncertain labels and then, for previously unlabelled galaxies, predict the probability of each possible label. Our posteriors are well-calibrated (e.g. for predicting bars, we achieve coverage errors of 11.8% within a vote fraction deviation of 0.2) and hence are reliable for practical use. Further, using our posteriors, we apply the active learning strategy BALD to request volunteer responses for the subset of galaxies which, if labelled, would be most informative for training our network. We show that training our Bayesian CNNs using active learning requires up to 35-60% fewer labelled galaxies, depending on the morphological feature being classified. By combining human and machine intelligence, Galaxy Zoo will be able to classify surveys of any conceivable scale on a timescale of weeks, providing massive and detailed morphology catalogues to support research into galaxy evolution. △ Less

Submitted 4 October, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

Comments: Accepted by MNRAS. 21 pages, including appendices

arXiv:1001.4297 [pdf, other]

doi 10.1098/rsif.2010.0230

Multi-camera Realtime 3D Tracking of Multiple Flying Animals

Authors: Andrew D. Straw, Kristin Branson, Titus R. Neumann, Michael H. Dickinson

Abstract: Automated tracking of animal movement allows analyses that would not otherwise be possible by providing great quantities of data. The additional capability of tracking in realtime - with minimal latency - opens up the experimental possibility of manipulating sensory feedback, thus allowing detailed explorations of the neural basis for control of behavior. Here we describe a new system capable of… ▽ More Automated tracking of animal movement allows analyses that would not otherwise be possible by providing great quantities of data. The additional capability of tracking in realtime - with minimal latency - opens up the experimental possibility of manipulating sensory feedback, thus allowing detailed explorations of the neural basis for control of behavior. Here we describe a new system capable of tracking the position and body orientation of animals such as flies and birds. The system operates with less than 40 msec latency and can track multiple animals simultaneously. To achieve these results, a multi target tracking algorithm was developed based on the Extended Kalman Filter and the Nearest Neighbor Standard Filter data association algorithm. In one implementation, an eleven camera system is capable of tracking three flies simultaneously at 60 frames per second using a gigabit network of nine standard Intel Pentium 4 and Core 2 Duo computers. This manuscript presents the rationale and details of the algorithms employed and shows three implementations of the system. An experiment was performed using the tracking system to measure the effect of visual contrast on the flight speed of Drosophila melanogaster. At low contrasts, speed is more variable and faster on average than at high contrasts. Thus, the system is already a useful tool to study the neurobiology and behavior of freely flying animals. If combined with other techniques, such as `virtual reality'-type computer graphics or genetic manipulation, the tracking system would offer a powerful new way to investigate the biology of flying animals. △ Less

Submitted 24 January, 2010; originally announced January 2010.

Comments: pdfTeX using libpoppler 3.141592-1.40.3-2.2 (Web2C 7.5.6), 18 pages with 9 figures

Showing 1–5 of 5 results for author: Dickinson, H