-
Scaling Laws for Galaxy Images
Authors:
Mike Walmsley,
Micah Bowles,
Anna M. M. Scaife,
Jason Shingirai Makechemu,
Alexander J. Gordon,
Annette M. N. Ferguson,
Robert G. Mann,
James Pearson,
Jürgen J. Popp,
Jo Bovy,
Josh Speagle,
Hugh Dickinson,
Lucy Fortson,
Tobias Géron,
Sandor Kruk,
Chris J. Lintott,
Kameswara Mantha,
Devina Mohan,
David O'Ryan,
Inigo V. Slijepevic
Abstract:
We present the first systematic investigation of supervised scaling laws outside of an ImageNet-like context - on images of galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo volunteers, comparable in scale to Imagenet-1K. We find that adding annotated galaxy images provides a power law improvement in performance across all architectures and all tasks, while adding trainab…
▽ More
We present the first systematic investigation of supervised scaling laws outside of an ImageNet-like context - on images of galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo volunteers, comparable in scale to Imagenet-1K. We find that adding annotated galaxy images provides a power law improvement in performance across all architectures and all tasks, while adding trainable parameters is effective only for some (typically more subjectively challenging) tasks. We then compare the downstream performance of finetuned models pretrained on either ImageNet-12k alone vs. additionally pretrained on our galaxy images. We achieve an average relative error rate reduction of 31% across 5 downstream tasks of scientific interest. Our finetuned models are more label-efficient and, unlike their ImageNet-12k-pretrained equivalents, often achieve linear transfer performance equal to that of end-to-end finetuning. We find relatively modest additional downstream benefits from scaling model size, implying that scaling alone is not sufficient to address our domain gap, and suggest that practitioners with qualitatively different images might benefit more from in-domain adaption followed by targeted downstream labelling.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Use of Docker for deployment and testing of astronomy software
Authors:
D. Morris,
S. Voutsinas,
N. C. Hambly,
R. G. Mann
Abstract:
We describe preliminary investigations of using Docker for the deployment and testing of astronomy software. Docker is a relatively new containerisation technology that is develo** rapidly and being adopted across a range of domains. It is based upon virtualization at operating system level, which presents many advantages in comparison to the more traditional hardware virtualization that underpi…
▽ More
We describe preliminary investigations of using Docker for the deployment and testing of astronomy software. Docker is a relatively new containerisation technology that is develo** rapidly and being adopted across a range of domains. It is based upon virtualization at operating system level, which presents many advantages in comparison to the more traditional hardware virtualization that underpins most cloud computing infrastructure today. A particular strength of Docker is its simple format for describing and managing software containers, which has benefits for software developers, system administrators and end users.
We report on our experiences from two projects -- a simple activity to demonstrate how Docker works, and a more elaborate set of services that demonstrates more of its capabilities and what they can achieve within an astronomical context -- and include an account of how we solved problems through interaction with Docker's very active open source development community, which is currently the key to the most effective use of this rapidly-changing technology.
△ Less
Submitted 11 July, 2017;
originally announced July 2017.
-
Renewal Strings for Cleaning Astronomical Databases
Authors:
Amos J. Storkey,
Nigel C. Hambly,
Christopher K. I. Williams,
Robert G. Mann
Abstract:
Large astronomical databases obtained from sky surveys such as the SuperCOSMOS Sky Surveys (SSS) invariably suffer from a small number of spurious records coming from artefactual effects of the telescope, satellites and junk objects in orbit around earth and physical defects on the photographic plate or CCD. Though relatively small in number these spurious records present a significant problem in…
▽ More
Large astronomical databases obtained from sky surveys such as the SuperCOSMOS Sky Surveys (SSS) invariably suffer from a small number of spurious records coming from artefactual effects of the telescope, satellites and junk objects in orbit around earth and physical defects on the photographic plate or CCD. Though relatively small in number these spurious records present a significant problem in many situations where they can become a large proportion of the records potentially of interest to a given astronomer. In this paper we focus on the four most common causes of unwanted records in the SSS: satellite or aeroplane tracks, scratches fibres and other linear phenomena introduced to the plate, circular halos around bright stars due to internal reflections within the telescope and diffraction spikes near to bright stars. Accurate and robust techniques are needed for locating and flagging such spurious objects. We have developed renewal strings, a probabilistic technique combining the Hough transform, renewal processes and hidden Markov models which have proven highly effective in this context. The methods are applied to the SSS data to develop a dataset of spurious object detections, along with confidence measures, which can allow this unwanted data to be removed from consideration. These methods are general and can be adapted to any future astronomical survey data.
△ Less
Submitted 7 August, 2014;
originally announced August 2014.
-
Astronomy and Computing: a New Journal for the Astronomical Computing Community
Authors:
Alberto Accomazzi,
Tamás Budavári,
Christopher Fluke,
Norman Gray,
Robert G Mann,
William O'Mullane,
Andreas Wicenec,
Michael Wise
Abstract:
We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In…
▽ More
We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In this inaugural editorial, we describe the rationale for creating the journal, outline its scope and ambitions, and seek input from the community in defining in detail how the journal should work towards its high-level goals.
△ Less
Submitted 30 October, 2012;
originally announced October 2012.
-
AstroDAbis: Annotations and Cross-Matches for Remote Catalogues
Authors:
Norman Gray,
Robert G Mann,
Dave Morris,
Mark Holliman,
Keith Noddle
Abstract:
Astronomers are good at sharing data, but poorer at sharing knowledge.
Almost all astronomical data ends up in open archives, and access to these is being simplified by the development of the global Virtual Observatory (VO). This is a great advance, but the fundamental problem remains that these archives contain only basic observational data, whereas all the astrophysical interpretation of that…
▽ More
Astronomers are good at sharing data, but poorer at sharing knowledge.
Almost all astronomical data ends up in open archives, and access to these is being simplified by the development of the global Virtual Observatory (VO). This is a great advance, but the fundamental problem remains that these archives contain only basic observational data, whereas all the astrophysical interpretation of that data -- which source is a quasar, which a low-mass star, and which an image artefact -- is contained in journal papers, with very little linkage back from the literature to the original data archives. It is therefore currently impossible for an astronomer to pose a query like "give me all sources in this data archive that have been identified as quasars" and this limits the effective exploitation of these archives, as the user of an archive has no direct means of taking advantage of the knowledge derived by its previous users.
The AstroDAbis service aims to address this, in a prototype service enabling astronomers to record annotations and cross-identifications in the AstroDAbis service, annotating objects in other catalogues. We have deployed two interfaces to the annotations, namely one astronomy-specific one using the TAP protocol}, and a second exploiting generic Linked Open Data (LOD) and RDF techniques.
△ Less
Submitted 25 November, 2011;
originally announced November 2011.
-
Collaborative Astronomical Image Mosaics
Authors:
Daniel S. Katz,
G. Bruce Berriman,
Robert G. Mann
Abstract:
This chapter describes how astronomical imaging survey data have become a vital part of modern astronomy, how these data are archived and then served to the astronomical community through on-line data access portals. The Virtual Observatory, now under development, aims to make all these data accessible through a uniform set of interfaces. This chapter also describes the scientific need for one com…
▽ More
This chapter describes how astronomical imaging survey data have become a vital part of modern astronomy, how these data are archived and then served to the astronomical community through on-line data access portals. The Virtual Observatory, now under development, aims to make all these data accessible through a uniform set of interfaces. This chapter also describes the scientific need for one common image processing task, that of composing individual images into large scale mosaics and introduces Montage as a tool for this task. Montage, as distributed, can be used in four ways: as a single thread/process on a single CPU, in parallel using MPI to distribute similar tasks across a parallel computer, in parallel using grid tools (Pegasus/DAGMan) to distributed tasks across a grid, or in parallel using a script-driven approach (Swift). An on-request web based Montage service is available for users who do not need to build a local version. We also introduce some work on a new scripted version of Montage, which offers ease of customization for users. Then, we discuss various ideas where Web 2.0 technologies can help the Montage community.
△ Less
Submitted 23 November, 2010;
originally announced November 2010.