Search | arXiv e-print repository

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Authors: Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay

Abstract: We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing a… ▽ More We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing and probing the capabilities of present frontier models. Notably, our hard set contains >50% questions that all frontier models answer incorrectly. We explore the nuances of designing, evaluating, and ranking models on ultra challenging prompts. We also discuss trade-offs between human and automatic evaluation, and show that automatic model evaluation using Reka Core roughly correlates to human judgment. We offer free API access for the purpose of lightweight evaluation and plan to conduct formal human evaluations for public models that perform well on the Vibe-Eval's automatic scores. We release the evaluation code and data, see https://github.com/reka-ai/reka-vibe-eval △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.14412 [pdf, other]

AutoAD III: The Prequel -- Back to the Pixels

Authors: Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Abstract: Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is hampered by using performance measures not specialized to the AD domain. In this paper, we make three c… ▽ More Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is hampered by using performance measures not specialized to the AD domain. In this paper, we make three contributions: (i) We propose two approaches for constructing AD datasets with aligned video data, and build training and evaluation datasets using these. These datasets will be publicly released; (ii) We develop a Q-former-based architecture which ingests raw video and generates AD, using frozen pre-trained visual encoders and large language models; and (iii) We provide new evaluation metrics to benchmark AD quality that are well-matched to human performance. Taken together, we improve the state of the art on AD generation. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: CVPR2024. Project page: https://www.robots.ox.ac.uk/~vgg/research/autoad/

arXiv:2404.12387 [pdf, other]

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Authors: Reka Team, Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu , et al. (1 additional authors not shown)

Abstract: We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but al… ▽ More We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai . △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.13677 [pdf, other]

Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers

Authors: Yuyang Shu, Michael E. Bain

Abstract: Humans see low and high spatial frequency components at the same time, and combine the information from both to form a visual scene. Drawing on this neuroscientific inspiration, we propose an altered Vision Transformer architecture where patches from scaled down versions of the input image are added to the input of the first Transformer Encoder layer. We name this model Retina Vision Transformer (… ▽ More Humans see low and high spatial frequency components at the same time, and combine the information from both to form a visual scene. Drawing on this neuroscientific inspiration, we propose an altered Vision Transformer architecture where patches from scaled down versions of the input image are added to the input of the first Transformer Encoder layer. We name this model Retina Vision Transformer (RetinaViT) due to its inspiration from the human visual system. Our experiments show that when trained on the ImageNet-1K dataset with a moderate configuration, RetinaViT achieves a 3.3% performance improvement over the original ViT. We hypothesize that this improvement can be attributed to the inclusion of low spatial frequency components in the input, which improves the ability to capture structural features, and to select and forward important features to deeper layers. RetinaViT thereby opens doors to further investigations into vertical pathways and attention patterns. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2311.17965 [pdf]

doi 10.1371/journal.pone.0019517

Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data

Authors: Manal Helal, Fanrong Kong, Sharon C. A. Chen, Michael Bain, Richard Christen, Vitali Sintchenko

Abstract: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S… ▽ More The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear map** (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. Results: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. △ Less

Submitted 29 November, 2023; originally announced November 2023.

ACM Class: I.2.6

Journal ref: PLoS ONE June 2011 | Volume 6 | Issue 6 | e19517

arXiv:2310.15390 [pdf]

MEMPSEP III. A machine learning-oriented multivariate data set for forecasting the Occurrence and Properties of Solar Energetic Particle Events using a Multivariate Ensemble Approach

Authors: Kimberly Moreland, Maher Dayeh, Hazel M. Bain, Subhamoy Chatterjee, Andres Munoz-Jaramillo, Samuel Hart

Abstract: We introduce a new multivariate data set that utilizes multiple spacecraft collecting in-situ and remote sensing heliospheric measurements shown to be linked to physical processes responsible for generating solar energetic particles (SEPs). Using the Geostationary Operational Environmental Satellites (GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998-2013), we identify 252 so… ▽ More We introduce a new multivariate data set that utilizes multiple spacecraft collecting in-situ and remote sensing heliospheric measurements shown to be linked to physical processes responsible for generating solar energetic particles (SEPs). Using the Geostationary Operational Environmental Satellites (GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998-2013), we identify 252 solar events (flares) that produce SEPs and 17,542 events that do not. For each identified event, we acquire the local plasma properties at 1 au, such as energetic proton and electron data, upstream solar wind conditions, and the interplanetary magnetic field vector quantities using various instruments onboard GOES and the Advanced Composition Explorer (ACE) spacecraft. We also collect remote sensing data from instruments onboard the Solar Dynamic Observatory (SDO), Solar and Heliospheric Observatory (SoHO), and the Wind solar radio instrument WAVES. The data set is designed to allow for variations of the inputs and feature sets for machine learning (ML) in heliophysics and has a specific purpose for forecasting the occurrence of SEP events and their subsequent properties. This paper describes a dataset created from multiple publicly available observation sources that is validated, cleaned, and carefully curated for our machine-learning pipeline. The dataset has been used to drive the newly-developed Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP; see MEMPSEP I (Chatterjee et al., 2023) and MEMPSEP II (Dayeh et al., 2023) for associated papers). △ Less

Submitted 26 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.06838 [pdf, other]

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Authors: Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Abstract: Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences. For movies, this presents notable challenges -- AD must occur only during existing pauses in dialogue, should refer to characters by name, and ought to aid understanding of the storyline as a whole. To this end, we develop a new model for auto… ▽ More Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences. For movies, this presents notable challenges -- AD must occur only during existing pauses in dialogue, should refer to characters by name, and ought to aid understanding of the storyline as a whole. To this end, we develop a new model for automatically generating movie AD, given CLIP visual features of the frames, the cast list, and the temporal locations of the speech; addressing all three of the 'who', 'when', and 'what' questions: (i) who -- we introduce a character bank consisting of the character's name, the actor that played the part, and a CLIP feature of their face, for the principal cast of each movie, and demonstrate how this can be used to improve naming in the generated AD; (ii) when -- we investigate several models for determining whether an AD should be generated for a time interval or not, based on the visual content of the interval and its neighbours; and (iii) what -- we implement a new vision-language model for this task, that can ingest the proposals from the character bank, whilst conditioning on the visual features using cross-attention, and demonstrate how this improves over previous architectures for AD text generation in an apples-to-apples comparison. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: ICCV2023. Project page: https://www.robots.ox.ac.uk/vgg/research/autoad/

arXiv:2309.14570 [pdf, other]

MEMPSEP I : Forecasting the Probability of Solar Energetic Particle Event Occurrence using a Multivariate Ensemble of Convolutional Neural Networks

Authors: Subhamoy Chatterjee, Maher Dayeh, Andrés Muñoz-Jaramillo, Hazel M. Bain, Kimberly Moreland, Samuel Hart

Abstract: The Sun continuously affects the interplanetary environment through a host of interconnected and dynamic physical processes. Solar flares, Coronal Mass Ejections (CMEs), and Solar Energetic Particles (SEPs) are among the key drivers of space weather in the near-Earth environment and beyond. While some CMEs and flares are associated with intense SEPs, some show little to no SEP association. To date… ▽ More The Sun continuously affects the interplanetary environment through a host of interconnected and dynamic physical processes. Solar flares, Coronal Mass Ejections (CMEs), and Solar Energetic Particles (SEPs) are among the key drivers of space weather in the near-Earth environment and beyond. While some CMEs and flares are associated with intense SEPs, some show little to no SEP association. To date, robust long-term (hours-days) forecasting of SEP occurrence and associated properties (e.g., onset, peak intensities) does not effectively exist and the search for such development continues. Through an Operations-2-Research support, we developed a self-contained model that utilizes a comprehensive dataset and provides a probabilistic forecast for SEP event occurrence and its properties. The model is named Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP). MEMPSEP workhorse is an ensemble of Convolutional Neural Networks that ingests a comprehensive dataset (MEMPSEP III - (Moreland et al., 2023)) of full-disc magnetogram-sequences and in-situ data from different sources to forecast the occurrence (MEMPSEP I - this work) and properties (MEMPSEP II - Dayeh et al. (2023)) of a SEP event. This work focuses on estimating true SEP occurrence probabilities achieving a 2.5% improvement in reliability and a Brier score of 0.14. The outcome provides flexibility for the end-users to determine their own acceptable level of risk, rather than imposing a detection threshold that optimizes an arbitrary binary classification metric. Furthermore, the model-ensemble, trained to utilize the large class-imbalance between events and non-events, provides a clear measure of uncertainty in our forecast △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 17 pages, 8 figures, 1 table, accepted for publication in Space Weather journal

arXiv:2309.14503 [pdf]

MEMPSEP II. -- Forecasting the Properties of Solar Energetic Particle Events using a Multivariate Ensemble Approach

Authors: Maher A. Dayeh, Subhamoy Chatterjee, Andres Munoz-Jaramillo, Kimberly Moreland, Hazel M. Bain, Samuel Hart

Abstract: Solar Energetic Particles (SEPs) form a critical component of Space Weather. The complex, intertwined dynamics of SEP sources, acceleration, and transport make their forecasting very challenging. Yet, information about SEP arrival and their properties (e.g., peak flux) is crucial for space exploration on many fronts. We have recently introduced a novel probabilistic ensemble model called the Multi… ▽ More Solar Energetic Particles (SEPs) form a critical component of Space Weather. The complex, intertwined dynamics of SEP sources, acceleration, and transport make their forecasting very challenging. Yet, information about SEP arrival and their properties (e.g., peak flux) is crucial for space exploration on many fronts. We have recently introduced a novel probabilistic ensemble model called the Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP). Its primary aim is to forecast the occurrence and physical properties of SEPs. The occurrence forecasting, thoroughly discussed in a preceding paper (Chatterjee et al., 2023), is complemented by the work presented here, which focuses on forecasting the physical properties of SEPs. The MEMPSEP model relies on an ensemble of Convolutional Neural Networks, which leverage a multi-variate dataset comprising full-disc magnetogram sequences and numerous derived and in-situ data from various sources. Skill scores demonstrate that MEMPSEP exhibits improved predictions on SEP properties for the test set data with SEP occurrence probability above 50%, compared to those with a probability below 50%. Results present a promising approach to address the challenging task of forecasting SEP physical properties, thus improving our forecasting capabilities and advancing our understanding of the dominant parameters and processes that govern SEP production. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2308.11926 [pdf, other]

Particle Radiation Environment in the Heliosphere: Status, limitations and recommendations

Authors: **gnan Guo, Bingbing Wang, Kathryn Whitman, Christina Plainaki, Lingling Zhao, Hazel M. Bain, Christina Cohen, Silvia Dalla, Mateja Dumbovic, Miho Janvier, Insoo Jun, Janet Luhmann, Olga E. Malandraki, M. Leila Mays, Jamie S. Rankin, Linghua Wang, Yihua Zheng

Abstract: Space weather is a multidisciplinary research area connecting scientists from across heliophysics domains seeking a coherent understanding of our space environment that can also serve modern life and society's needs. COSPAR's ISWAT (International Space Weather Action Teams) 'clusters' focus attention on different areas of space weather study while ensuring the coupled system is broadly addressed v… ▽ More Space weather is a multidisciplinary research area connecting scientists from across heliophysics domains seeking a coherent understanding of our space environment that can also serve modern life and society's needs. COSPAR's ISWAT (International Space Weather Action Teams) 'clusters' focus attention on different areas of space weather study while ensuring the coupled system is broadly addressed via regular communications and interactions. The ISWAT cluster "H3: Radiation Environment in the Heliosphere" (https://www.iswat-cospar.org/h3) has been working to provide a scientific platform to understand, characterize and predict the energetic particle radiation in the heliosphere with the practical goal of mitigating radiation risks associated with areospace activities, satellite industry and human space explorations. In particular, present approaches help us understand the physical phenomena at large, optimizing the output of multi-viewpoint observations and pushing current models to their limits. In this paper, we review the scientific aspects of the radiation environment in the heliosphere covering four different radiation types: Solar Energetic Particles (SEPs), Ground Level Enhancement (GLE, a type of SEP events with energies high enough to trigger the enhancement of ground-level detectors), Galactic Cosmic Rays (GCRs) and Anomalous Cosmic Rays (ACRs). We focus on related advances in the research community in the past 10-20 years and what we still lack in terms of understanding and predictive capabilities. Finally we also consider some recommendations related to the improvement of both observational and modeling capabilities in the field of space radiation environment. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.10322 [pdf, other]

Homogenising SoHO/EIT and SDO/AIA 171Å$~$ Images: A Deep Learning Approach

Authors: Subhamoy Chatterjee, Andrés Muñoz-Jaramillo, Maher Dayeh, Hazel M. Bain, Kimberly Moreland

Abstract: Extreme Ultraviolet images of the Sun are becoming an integral part of space weather prediction tasks. However, having different surveys requires the development of instrument-specific prediction algorithms. As an alternative, it is possible to combine multiple surveys to create a homogeneous dataset. In this study, we utilize the temporal overlap of SoHO/EIT and SDO/AIA 171~Å~surveys to train an… ▽ More Extreme Ultraviolet images of the Sun are becoming an integral part of space weather prediction tasks. However, having different surveys requires the development of instrument-specific prediction algorithms. As an alternative, it is possible to combine multiple surveys to create a homogeneous dataset. In this study, we utilize the temporal overlap of SoHO/EIT and SDO/AIA 171~Å~surveys to train an ensemble of deep learning models for creating a single homogeneous survey of EUV images for 2 solar cycles. Prior applications of deep learning have focused on validating the homogeneity of the output while overlooking the systematic estimation of uncertainty. We use an approach called `Approximate Bayesian Ensembling' to generate an ensemble of models whose uncertainty mimics that of a fully Bayesian neural network at a fraction of the cost. We find that ensemble uncertainty goes down as the training set size increases. Additionally, we show that the model ensemble adds immense value to the prediction by showing higher uncertainty in test data that are not well represented in the training data. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 20 pages, 8 figures, accepted for publication in ApJS

arXiv:2307.09006 [pdf, other]

OxfordVGG Submission to the EGO4D AV Transcription Challenge

Authors: Jaesung Huh, Max Bain, Andrew Zisserman

Abstract: This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with word-level time alignment, along with two text normalisers which are publicly available. Our final submission obtained 56.0% of the Word Error Rate (W… ▽ More This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with word-level time alignment, along with two text normalisers which are publicly available. Our final submission obtained 56.0% of the Word Error Rate (WER) on the challenge test set, ranked 1st on the leaderboard. All baseline codes and models are available on https://github.com/m-bain/whisperX. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Technical Report

arXiv:2305.15407 [pdf, other]

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

Authors: Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

Abstract: Vision-language models are growing in popularity and public visibility to generate, edit, and caption images at scale; but their outputs can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. Although debiasing methods have been proposed, we argue that these measurements of model bias lack validity due to dataset bias. We demonstrate… ▽ More Vision-language models are growing in popularity and public visibility to generate, edit, and caption images at scale; but their outputs can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. Although debiasing methods have been proposed, we argue that these measurements of model bias lack validity due to dataset bias. We demonstrate there are spurious correlations in COCO Captions, the most commonly used dataset for evaluating bias, between background context and the gender of people in-situ. This is problematic because commonly-used bias metrics (such as Bias@K) rely on per-gender base rates. To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed. However, existing image editing methods have limitations and sometimes produce low-quality images; so, we introduce a method to automatically filter the generated images based on their similarity to real images. Using our balanced synthetic contrast sets, we benchmark bias in multiple CLIP-based models, demonstrating how metrics are skewed by imbalance in the original COCO images. Our results indicate that the proposed approach improves the validity of the evaluation, ultimately contributing to more realistic understanding of bias in vision-language models. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Github: https://github.com/oxai/debias-gensynth

arXiv:2303.16899 [pdf, other]

AutoAD: Movie Description in Context

Authors: Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Abstract: The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. Generating high-quality movie AD is challenging due to the dependency of the descriptions on context, and the limited amount of training data available. In this work, we leverage the power of pretrained foundation models, such as GPT and CLIP, and only train a map** network t… ▽ More The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. Generating high-quality movie AD is challenging due to the dependency of the descriptions on context, and the limited amount of training data available. In this work, we leverage the power of pretrained foundation models, such as GPT and CLIP, and only train a map** network that bridges the two models for visually-conditioned text generation. In order to obtain high-quality AD, we make the following four contributions: (i) we incorporate context from the movie clip, AD from previous clips, as well as the subtitles; (ii) we address the lack of training data by pretraining on large-scale datasets, where visual or contextual information is unavailable, e.g. text-only AD without movies or visual captioning datasets without context; (iii) we improve on the currently available AD datasets, by removing label noise in the MAD dataset, and adding character naming information; and (iv) we obtain strong results on the movie AD task compared with previous methods. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: CVPR2023 Highlight. Project page: https://www.robots.ox.ac.uk/~vgg/research/autoad/

arXiv:2303.00747 [pdf, other]

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Authors: Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman

Abstract: Large-scale, weakly-supervised speech recognition models, such as Whisper, have demonstrated impressive results on speech recognition across domains and languages. However, their application to long audio transcription via buffered or sliding window approaches is prone to drifting, hallucination & repetition; and prohibits batched transcription due to their sequential nature. Further, timestamps c… ▽ More Large-scale, weakly-supervised speech recognition models, such as Whisper, have demonstrated impressive results on speech recognition across domains and languages. However, their application to long audio transcription via buffered or sliding window approaches is prone to drifting, hallucination & repetition; and prohibits batched transcription due to their sequential nature. Further, timestamps corresponding each utterance are prone to inaccuracies and word-level timestamps are not available out-of-the-box. To overcome these challenges, we present WhisperX, a time-accurate speech recognition system with word-level timestamps utilising voice activity detection and forced phoneme alignment. In doing so, we demonstrate state-of-the-art performance on long-form transcription and word segmentation benchmarks. Additionally, we show that pre-segmenting audio with our proposed VAD Cut & Merge strategy improves transcription quality and enables a twelve-fold transcription speedup via batched inference. △ Less

Submitted 11 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted to INTERSPEECH 2023

arXiv:2301.01819 [pdf, other]

A Protocol for Intelligible Interaction Between Agents That Learn and Explain

Authors: Ashwin Srinivasan, Michael Bain, A. Baskar, Enrico Coiera

Abstract: Recent engineering developments have seen the emergence of Machine Learning (ML) as a powerful form of data analysis with widespread applicability beyond its historical roots in the design of autonomous agents. However, relatively little attention has been paid to the interaction between people and ML systems. Recent developments on Explainable ML address this by providing visual and textual infor… ▽ More Recent engineering developments have seen the emergence of Machine Learning (ML) as a powerful form of data analysis with widespread applicability beyond its historical roots in the design of autonomous agents. However, relatively little attention has been paid to the interaction between people and ML systems. Recent developments on Explainable ML address this by providing visual and textual information on how the ML system arrived at a conclusion. In this paper we view the interaction between humans and ML systems within the broader context of interaction between agents capable of learning and explanation. Within this setting, we argue that it is more helpful to view the interaction as characterised by two-way intelligibility of information rather than once-off explanation of a prediction. We formulate two-way intelligibility as a property of a communication protocol. Development of the protocol is motivated by a set of `Intelligibility Axioms' for decision-support systems that use ML with a human-in-the-loop. The axioms are intended as sufficient criteria to claim that: (a) information provided by a human is intelligible to an ML system; and (b) information provided by an ML system is intelligible to a human. The axioms inform the design of a general synchronous interaction model between agents capable of learning and explanation. We identify conditions of compatibility between agents that result in bounded communication, and define Weak and Strong Two-Way Intelligibility between agents as properties of the communication protocol. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: arXiv admin note: text overlap with arXiv:2205.08954

arXiv:2212.13325 [pdf]

Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050

Authors: R. M. McGranaghan, B. Thompson, E. Camporeale, J. Bortnik, M. Bobra, G. Lapenta, S. Wing, B. Poduval, S. Lotz, S. Murray, M. Kirk, T. Y. Chen, H. M. Bain, P. Riley, B. Tremblay, M. Cheung, V. Delouille

Abstract: Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires… ▽ More Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 4 pages; Heliophysics 2050 White Paper

arXiv:2212.03247 [pdf]

Ground-based Synoptic Studies of the Sun

Authors: Sanjay Gosain, V. Martinez Pillet, A. Pevtsov, H. Gilbert, S. Gibson, A. G. de Wijn, J. Burkepile, A. Asai, H. M. Bain, C. J. Henney, Y. Katsukawa, H. Lin, W. Manchester, J. McAteer, K. Muglach, M. Rast, M. Roth, J. Zhang

Abstract: Ground-based synoptic solar observations provide critical contextual data used to model the large-scale state of the heliosphere. The next decade will see a combination of ground-based telescopes and space missions that will study our Sun's atmosphere microscopic processes with unprecedented detail. This white paper describes contextual observations from a ground-based network needed to fully expl… ▽ More Ground-based synoptic solar observations provide critical contextual data used to model the large-scale state of the heliosphere. The next decade will see a combination of ground-based telescopes and space missions that will study our Sun's atmosphere microscopic processes with unprecedented detail. This white paper describes contextual observations from a ground-based network needed to fully exploit this new knowledge of the underlying physics that leads to the magnetic linkages between the heliosphere and the Sun. This combination of a better understanding of small-scale processes and the appropriate global context will enable a physics-based approach to Space Weather comparable to Terrestrial Weather forecasting. △ Less

Submitted 18 February, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: 10 pages, 5 figures, White paper submitted to Heliodecadal 2024, Category: Basic Research, Solar Physics. arXiv admin note: text overlap with arXiv:1903.06944 [updated version as submitted to Heliodecadal 2024]

arXiv:2205.08954 [pdf, other]

One-way Explainability Isn't The Message

Authors: Ashwin Srinivasan, Michael Bain, Enrico Coiera

Abstract: Recent engineering developments in specialised computational hardware, data-acquisition and storage technology have seen the emergence of Machine Learning (ML) as a powerful form of data analysis with widespread applicability beyond its historical roots in the design of autonomous agents. However -- possibly because of its origins in the development of agents capable of self-discovery -- relativel… ▽ More Recent engineering developments in specialised computational hardware, data-acquisition and storage technology have seen the emergence of Machine Learning (ML) as a powerful form of data analysis with widespread applicability beyond its historical roots in the design of autonomous agents. However -- possibly because of its origins in the development of agents capable of self-discovery -- relatively little attention has been paid to the interaction between people and ML. In this paper we are concerned with the use of ML in automated or semi-automated tools that assist one or more human decision makers. We argue that requirements on both human and machine in this context are significantly different to the use of ML either as part of autonomous agents for self-discovery or as part statistical data analysis. Our principal position is that the design of such human-machine systems should be driven by repeated, two-way intelligibility of information rather than one-way explainability of the ML-system's recommendations. Iterated rounds of intelligible information exchange, we think, will characterise the kinds of collaboration that will be needed to understand complex phenomena for which neither man or machine have complete answers. We propose operational principles -- we call them Intelligibility Axioms -- to guide the design of a collaborative decision-support system. The principles are concerned with: (a) what it means for information provided by the human to be intelligible to the ML system; and (b) what it means for an explanation provided by an ML system to be intelligible to a human. Using examples from the literature on the use of ML for drug-design and in medicine, we demonstrate cases where the conditions of the axioms are met. We describe some additional requirements needed for the design of a truly collaborative decision-support system. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: (22 pages. Submitted for review as a Perspectives paper to Nature Machine Intelligence)

arXiv:2205.08508 [pdf, other]

A CLIP-Hitchhiker's Guide to Long Video Retrieval

Authors: Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Abstract: Our goal in this paper is the adaptation of image-text models for long video retrieval. Recent works have demonstrated state-of-the-art performance in video retrieval by adopting CLIP, effectively hitchhiking on the image-text representation for video tasks. However, there has been limited success in learning temporal aggregation that outperform mean-pooling the image-level representations extract… ▽ More Our goal in this paper is the adaptation of image-text models for long video retrieval. Recent works have demonstrated state-of-the-art performance in video retrieval by adopting CLIP, effectively hitchhiking on the image-text representation for video tasks. However, there has been limited success in learning temporal aggregation that outperform mean-pooling the image-level representations extracted per frame by CLIP. We find that the simple yet effective baseline of weighted-mean of frame embeddings via query-scoring is a significant improvement above all prior temporal modelling attempts and mean-pooling. In doing so, we provide an improved baseline for others to compare to and demonstrate state-of-the-art performance of this simple baseline on a suite of long video retrieval benchmarks. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2203.11933 [pdf, other]

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

Authors: Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

Abstract: Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation. To address these challenges, we investigate bias measures and apply ranking metrics for image-text representations. We then investigate debiasing methods and show that prepending learned embeddi… ▽ More Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation. To address these challenges, we investigate bias measures and apply ranking metrics for image-text representations. We then investigate debiasing methods and show that prepending learned embeddings to text queries that are jointly trained with adversarial debiasing and a contrastive loss reduces various bias measures with minimal degradation to the image-text representation. △ Less

Submitted 25 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 17 pages, 4 figures, 7 tables. For code and trained token embeddings, see https://github.com/oxai/debias-vision-lang; Changed to use ACL layout, added joint training with comparison figure, corrected spelling and formatting errors; This paper is accepted for publication at AACL 2022, the official version of record is in the ACL Anthology

arXiv:2104.00650 [pdf, other]

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Authors: Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Abstract: Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval. The challenges in this area include the design of the visual architecture and the nature of the training data, in that the available large scale video-text training datasets, such as HowTo100M, are noisy and hence competitive performance is achieved only at scale thr… ▽ More Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval. The challenges in this area include the design of the visual architecture and the nature of the training data, in that the available large scale video-text training datasets, such as HowTo100M, are noisy and hence competitive performance is achieved only at scale through large amounts of compute. We address both these challenges in this paper. We propose an end-to-end trainable model that is designed to take advantage of both large-scale image and video captioning datasets. Our model is an adaptation and extension of the recent ViT and Timesformer architectures, and consists of attention in both space and time. The model is flexible and can be trained on both image and video text datasets, either independently or in conjunction. It is trained with a curriculum learning schedule that begins by treating images as 'frozen' snapshots of video, and then gradually learns to attend to increasing temporal context when trained on video datasets. We also provide a new video-text pretraining dataset WebVid-2M, comprised of over two million videos with weak captions scraped from the internet. Despite training on datasets that are an order of magnitude smaller, we show that this approach yields state-of-the-art results on standard downstream video-retrieval benchmarks including MSR-VTT, MSVD, DiDeMo and LSMDC. △ Less

Submitted 13 May, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: ICCV 2021. Update: Scaling up extension, WebVid10M release

arXiv:2005.04208 [pdf, other]

Condensed Movies: Story Based Retrieval with Contextual Embeddings

Authors: Max Bain, Arsha Nagrani, Andrew Brown, Andrew Zisserman

Abstract: Our objective in this work is long range understanding of the narrative structure of movies. Instead of considering the entire movie, we propose to learn from the `key scenes' of the movie, providing a condensed look at the full storyline. To this end, we make the following three contributions: (i) We create the Condensed Movies Dataset (CMD) consisting of the key scenes from over 3K movies: each… ▽ More Our objective in this work is long range understanding of the narrative structure of movies. Instead of considering the entire movie, we propose to learn from the `key scenes' of the movie, providing a condensed look at the full storyline. To this end, we make the following three contributions: (i) We create the Condensed Movies Dataset (CMD) consisting of the key scenes from over 3K movies: each key scene is accompanied by a high level semantic description of the scene, character face-tracks, and metadata about the movie. The dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use. It is also an order of magnitude larger than existing movie datasets in the number of movies; (ii) We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding; and finally (iii) We demonstrate how the addition of context from other video clips improves retrieval performance. △ Less

Submitted 22 October, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: Appears in: Asian Conference on Computer Vision 2020 (ACCV 2020) - Oral presentation

arXiv:1910.00144 [pdf, other]

doi 10.1051/swsc/2019036

Real-time solar image classification: assessing spectral, pixel-based approaches

Authors: J. Marcus Hughes, Vicki W. Hsu, Daniel B. Seaton, Hazel M. Bain, Jonathan M. Darnel, Larisza Krista

Abstract: In order to utilize solar imagery for real-time feature identification and large-scale data science investigations of solar structures, we need maps of the Sun where phenomena, or themes, are labeled. Since solar imagers produce observations every few minutes, it is not feasible to label all images by hand. Here, we compare three machine learning algorithms performing solar image classification us… ▽ More In order to utilize solar imagery for real-time feature identification and large-scale data science investigations of solar structures, we need maps of the Sun where phenomena, or themes, are labeled. Since solar imagers produce observations every few minutes, it is not feasible to label all images by hand. Here, we compare three machine learning algorithms performing solar image classification using extreme ultraviolet and Hydrogen-alpha images: a maximum likelihood model assuming a single normal probability distribution for each theme from Rigler et al. (2012), a maximum-likelihood model with an underlying Gaussian mixtures distribution, and a random forest model. We create a small database of expert-labeled maps to train and test these algorithms. Due to the ambiguity between the labels created by different experts, a collaborative labeling is used to include all inputs. We find the random forest algorithm performs the best amongst the three algorithms. The advantages of this algorithm are best highlighted in: comparison of outputs to hand-drawn maps; response to short-term variability; and tracking long-term changes on the Sun. Our work indicates that the next generation of solar image classification algorithms would benefit significantly from using spatial structure recognition, compared to only using spectral, pixel-by-pixel brightness distributions. △ Less

Submitted 30 September, 2019; originally announced October 2019.

arXiv:1909.08950 [pdf, other]

Count, Crop and Recognise: Fine-Grained Recognition in the Wild

Authors: Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman

Abstract: The goal of this paper is to label all the animal individuals present in every frame of a video. Unlike previous methods that have principally concentrated on labelling face tracks, we aim to label individuals even when their faces are not visible. We make the following contributions: (i) we introduce a 'Count, Crop and Recognise' (CCR) multistage recognition process for frame level labelling. The… ▽ More The goal of this paper is to label all the animal individuals present in every frame of a video. Unlike previous methods that have principally concentrated on labelling face tracks, we aim to label individuals even when their faces are not visible. We make the following contributions: (i) we introduce a 'Count, Crop and Recognise' (CCR) multistage recognition process for frame level labelling. The Count and Recognise stages involve specialised CNNs for the task, and we show that this simple staging gives a substantial boost in performance; (ii) we compare the recall using frame based labelling to both face and body track based labelling, and demonstrate the advantage of frame based with CCR for the specified goal; (iii) we introduce a new dataset for chimpanzee recognition in the wild; and (iv) we apply a high-granularity visualisation technique to further understand the learned CNN features for the recognition of chimpanzee individuals. △ Less

Submitted 9 October, 2019; v1 submitted 19 September, 2019; originally announced September 2019.

arXiv:1903.06944 [pdf]

Astro2020 Science White Paper: Synoptic Studies of the Sun as a Key to Understanding Stellar Astrospheres

Authors: Valentin Martinez Pillet, Frank Hill, Heidi Hammel, Dr. Alfred G. de Wijn, Sanjay Gosain, Joan Burkepile, Carl J. Henney, James R. T. McAteer, Hazel M. Bain, Ward B. Manchester IV, Haosheng Lin, Markus Roth, Kiyoshi Ichimoto, Yoshinori Suematsu

Abstract: Ground-based solar observations provide key contextual data (i.e., the 'big picture') to produce a complete description of the only astrosphere we can study in situ: our Sun's heliosphere. The next decade will see the beginning of operations of the Daniel K. Inouye Solar Telescope (DKIST). DKIST will join NASA's Parker Solar Probe and the NASA/ESA Solar Orbital mission, which together will study o… ▽ More Ground-based solar observations provide key contextual data (i.e., the 'big picture') to produce a complete description of the only astrosphere we can study in situ: our Sun's heliosphere. The next decade will see the beginning of operations of the Daniel K. Inouye Solar Telescope (DKIST). DKIST will join NASA's Parker Solar Probe and the NASA/ESA Solar Orbital mission, which together will study our Sun's atmosphere with unprecedented detail. This white paper outlines the current paradigm for ground-based solar synoptic observations, and indicates those areas that will benefit from focused attention. △ Less

Submitted 16 March, 2019; originally announced March 2019.

arXiv:1807.00595 [pdf, other]

Logical Explanations for Deep Relational Machines Using Relevance Information

Authors: Ashwin Srinivasan, Lovekesh Vig, Michael Bain

Abstract: Our interest in this paper is in the construction of symbolic explanations for predictions made by a deep neural network. We will focus attention on deep relational machines (DRMs, first proposed by H. Lodhi). A DRM is a deep network in which the input layer consists of Boolean-valued functions (features) that are defined in terms of relations provided as domain, or background, knowledge. Our DRMs… ▽ More Our interest in this paper is in the construction of symbolic explanations for predictions made by a deep neural network. We will focus attention on deep relational machines (DRMs, first proposed by H. Lodhi). A DRM is a deep network in which the input layer consists of Boolean-valued functions (features) that are defined in terms of relations provided as domain, or background, knowledge. Our DRMs differ from those proposed by Lodhi, which use an Inductive Logic Programming (ILP) engine to first select features (we use random selections from a space of features that satisfies some approximate constraints on logical relevance and non-redundancy). But why do the DRMs predict what they do? One way of answering this is the LIME setting, in which readable proxies for a black-box predictor. The proxies are intended only to model the predictions of the black-box in local regions of the instance-space. But readability alone may not enough: to be understandable, the local models must use relevant concepts in an meaningful manner. We investigate the use of a Bayes-like approach to identify logical proxies for local predictions of a DRM. We show: (a) DRM's with our randomised propositionalization method achieve state-of-the-art predictive performance; (b) Models in first-order logic can approximate the DRM's prediction closely in a small local region; and (c) Expert-provided relevance information can play the role of a prior to distinguish between logical explanations that perform equivalently on prediction alone. △ Less

Submitted 2 July, 2018; originally announced July 2018.

arXiv:1709.09890 [pdf, other]

B-CNN: Branch Convolutional Neural Network for Hierarchical Classification

Authors: Xinqi Zhu, Michael Bain

Abstract: Convolutional Neural Network (CNN) image classifiers are traditionally designed to have sequential convolutional layers with a single output layer. This is based on the assumption that all target classes should be treated equally and exclusively. However, some classes can be more difficult to distinguish than others, and classes may be organized in a hierarchy of categories. At the same time, a CN… ▽ More Convolutional Neural Network (CNN) image classifiers are traditionally designed to have sequential convolutional layers with a single output layer. This is based on the assumption that all target classes should be treated equally and exclusively. However, some classes can be more difficult to distinguish than others, and classes may be organized in a hierarchy of categories. At the same time, a CNN is designed to learn internal representations that abstract from the input data based on its hierarchical layered structure. So it is natural to ask if an inverse of this idea can be applied to learn a model that can predict over a classification hierarchy using multiple output layers in decreasing order of class abstraction. In this paper, we introduce a variant of the traditional CNN model named the Branch Convolutional Neural Network (B-CNN). A B-CNN model outputs multiple predictions ordered from coarse to fine along the concatenated convolutional layers corresponding to the hierarchical structure of the target classes, which can be regarded as a form of prior knowledge on the output. To learn with B-CNNs a novel training strategy, named the Branch Training strategy (BT-strategy), is introduced which balances the strictness of the prior with the freedom to adjust parameters on the output layers to minimize the loss. In this way we show that CNN based models can be forced to learn successively coarse to fine concepts in the internal layers at the output stage, and that hierarchical prior knowledge can be adopted to boost CNN models' classification performance. Our models are evaluated to show that the B-CNN extensions improve over the corresponding baseline CNN on the benchmark datasets MNIST, CIFAR-10 and CIFAR-100. △ Less

Submitted 5 October, 2017; v1 submitted 28 September, 2017; originally announced September 2017.

Comments: 9 pages, 8 figures

arXiv:1709.05778 [pdf, other]

Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems

Authors: Bradford Heap, Michael Bain, Wayne Wobcke, Alfred Krzywicki, Susanne Schmeidl

Abstract: The bag-of-words model is a standard representation of text for many linear classifier learners. In many problem domains, linear classifiers are preferred over more complex models due to their efficiency, robustness and interpretability, and the bag-of-words text representation can capture sufficient information for linear classifiers to make highly accurate predictions. However in settings where… ▽ More The bag-of-words model is a standard representation of text for many linear classifier learners. In many problem domains, linear classifiers are preferred over more complex models due to their efficiency, robustness and interpretability, and the bag-of-words text representation can capture sufficient information for linear classifiers to make highly accurate predictions. However in settings where there is a large vocabulary, large variance in the frequency of terms in the training corpus, many classes and very short text (e.g., single sentences or document titles) the bag-of-words representation becomes extremely sparse, and this can reduce the accuracy of classifiers. A particular issue in such settings is that short texts tend to contain infrequently occurring or rare terms which lack class-conditional evidence. In this work we introduce a method for enriching the bag-of-words model by complementing such rare term information with related terms from both general and domain-specific Word Vector models. By reducing sparseness in the bag-of-words models, our enrichment approach achieves improved classification over several baseline classifiers in a variety of text classification problems. Our approach is also efficient because it requires no change to the linear classifier before or during training, since bag-of-words enrichment applies only to text being classified. △ Less

Submitted 18 September, 2017; originally announced September 2017.

Comments: 8 pages

ACM Class: I.2.7; I.2.6

arXiv:1609.08558 [pdf, other]

doi 10.1117/12.2233859

First flight of the Gamma-Ray Imager/Polarimeter for Solar flares (GRIPS) instrument

Authors: Nicole Duncan, P. Saint-Hilaire, A. Y. Shih, G. J. Hurford, H. M. Bain, M. Amman, B. A. Mochizuki, J. Hoberman, J. Olson, B. A. Maruca, N. M. Godbole, D. M. Smith, J. Sample, N. A. Kelley, A. Zoglauer, A. Caspi, P. Kaufmann, S. Boggs, R. P. Lin

Abstract: The Gamma-Ray Imager/Polarimeter for Solar flares (GRIPS) is a balloon-borne telescope designed to study solar-flare particle acceleration and transport. We describe GRIPS's first Antarctic long-duration flight in Jan 2016 and report preliminary calibration and science results. Electron and ion dynamics, particle abundances and the ambient plasma conditions in solar flares can be understood by e… ▽ More The Gamma-Ray Imager/Polarimeter for Solar flares (GRIPS) is a balloon-borne telescope designed to study solar-flare particle acceleration and transport. We describe GRIPS's first Antarctic long-duration flight in Jan 2016 and report preliminary calibration and science results. Electron and ion dynamics, particle abundances and the ambient plasma conditions in solar flares can be understood by examining hard X-ray (HXR) and gamma-ray emission (20 keV to 10 MeV) with enhanced imaging, spectroscopy and polarimetry. GRIPS is specifically designed to answer questions including: What causes the spatial separation between energetic electrons producing HXRs and energetic ions producing gamma-ray lines? How anisotropic are the relativistic electrons, and why can they dominate in the corona? How do the compositions of accelerated and ambient material vary with space and time, and why? GRIPS's key technological improvements over the Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI) include 3D position-sensitive germanium detectors (3D-GeDs) and a single-grid, multi-pitch rotating modulator (MPRM) collimator. The 3D-GeDs have spectral FWHM resolution of a few hundred keV and spatial resolution $<$1 mm$^3$. For photons that Compton scatter, usually $\gtrsim$150 keV, the energy deposition sites can be tracked, providing polarization measurements as well as enhanced background reduction. The MPRM single-grid design provides twice the throughput of a bi-grid imaging system like RHESSI. The grid is composed of 2.5 cm thick W/Cu slats with 1-13 mm variable slit pitch, achieving quasi-continuous FWHM angular coverage over 12.5-162 arcsecs. This resolution is capable of imaging the separate magnetic loop footpoint emissions in a variety of flare sizes. (Abstract edited down from source.) △ Less

Submitted 27 September, 2016; originally announced September 2016.

Comments: 17 pages, 15 figures; presented at SPIE 9905 (Space Telescopes and Instrumentation 2016: Ultraviolet to Gamma Ray) in Edinburgh, Scotland

Journal ref: Proc. SPIE 9905, Space Telescopes and Instrumentation 2016: Ultraviolet to Gamma Ray, 99052Q (July 18, 2016)

arXiv:1406.4919 [pdf, other]

doi 10.1007/s11207-014-0585-8

Bridging EUV and white-light observations to inspect the initiation phase of a "two-stage" solar eruptive event

Authors: Jason P. Byrne, Huw Morgan, Dan B. Seaton, Hazel M. Bain, Shadia R. Habbal

Abstract: The initiation phase of CMEs is a very important aspect of solar physics, as these phenomena ultimately drive space weather in the heliosphere. This phase is known to occur between the photosphere and low corona, where many models introduce an instability and/or magnetic reconnection that triggers a CME, often with associated flaring activity. To this end, it is important to obtain a variety of ob… ▽ More The initiation phase of CMEs is a very important aspect of solar physics, as these phenomena ultimately drive space weather in the heliosphere. This phase is known to occur between the photosphere and low corona, where many models introduce an instability and/or magnetic reconnection that triggers a CME, often with associated flaring activity. To this end, it is important to obtain a variety of observations of the low corona in order to build as clear a picture as possible of the dynamics that occur therein. Here, we combine the EUV imagery of the SWAP instrument on board PROBA2 with the white-light imagery of the ground-based Mk4 coronameter at MLSO in order to bridge the observational gap that exists between the disk imagery of AIA on board SDO and the coronal imagery of LASCO on board SOHO. Methods of multiscale image analysis were applied to the observations to better reveal the coronal signal while suppressing noise and other features. This allowed an investigation into the initiation phase of a CME that was driven by a rising flux rope structure from a "two-stage" flaring active region underlying an extended helmet streamer. It was found that the initial outward motion of the erupting loop system in the EUV observations coincided with the first X-ray flare peak, and led to a plasma pile-up of the white-light CME core material. The characterized CME core then underwent a strong jerk in its motion, as the early acceleration increased abruptly, simultaneous with the second X-ray flare peak. The overall system expanded into the helmet streamer to become the larger CME structure observed in the LASCO coronagraph images, which later became concave-outward in shape. Theoretical models for the event are discussed in light of these unique observations, and it is concluded that the formation of either a kink-unstable or torus-unstable flux rope may be the likeliest scenario. △ Less

Submitted 18 June, 2014; originally announced June 2014.

Comments: 21 pages, 9 figures, 3 movies

arXiv:1202.2375 [pdf, other]

doi 10.1088/0004-637X/748/1/66

The 2010 August 01 type II burst: A CME-CME Interaction, and its radio and white-light manifestations

Authors: Juan Carlos Martínez Oliveros, Claire L. Raftery, Hazel M. Bain, Ying Liu, Vratislav Krupar, Stuart Bale, Säm Krucker

Abstract: We present observational results of a type II burst associated with a CME-CME interaction observed in the radio and white-light wavelength range. We applied radio direction-finding techniques to observations from the STEREO and Wind spacecraft, the results of which were interpreted using white-light coronagraphic measurements for context. The results of the multiple radio-direction finding techniq… ▽ More We present observational results of a type II burst associated with a CME-CME interaction observed in the radio and white-light wavelength range. We applied radio direction-finding techniques to observations from the STEREO and Wind spacecraft, the results of which were interpreted using white-light coronagraphic measurements for context. The results of the multiple radio-direction finding techniques applied were found to be consistent both with each other and with those derived from the white-light observations of coronal mass ejections (CMEs). The results suggest that the Type II burst radio emission is causally related to the CMEs interaction. △ Less

Submitted 10 February, 2012; originally announced February 2012.

Comments: 7 pages, 6 figures, Accepted to ApJ: January 16, 2012

Showing 1–32 of 32 results for author: Bain, M