Search | arXiv e-print repository

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Authors: **hyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

arXiv:2405.05658 [pdf]

Artificial intelligence for abnormality detection in high volume neuroimaging: a systematic review and meta-analysis

Authors: Siddharth Agarwal, David A. Wood, Mariusz Grzeda, Chandhini Suresh, Munaib Din, James Cole, Marc Modat, Thomas C Booth

Abstract: Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-vo… ▽ More Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-volume neuroimaging tasks. Methods: Medline, Embase, Cochrane library and Web of Science were searched until September 2021 for studies that temporally or externally validated AI capable of detecting abnormalities in first-line CT or MR neuroimaging. A bivariate random-effects model was used for meta-analysis where appropriate. PROSPERO: CRD42021269563. Results: Only 16 studies were eligible for inclusion. Included studies were not compromised by unrepresentative datasets or inadequate validation methodology. Direct comparison with radiologists was available in 4/16 studies. 15/16 had a high risk of bias. Meta-analysis was only suitable for intracranial haemorrhage detection in CT imaging (10/16 studies), where AI systems had a pooled sensitivity and specificity 0.90 (95% CI 0.85 - 0.94) and 0.90 (95% CI 0.83 - 0.95) respectively. Other AI studies using CT and MRI detected target conditions other than haemorrhage (2/16), or multiple target conditions (4/16). Only 3/16 studies implemented AI in clinical pathways, either for pre-read triage or as post-read discrepancy identifiers. Conclusion: The paucity of eligible studies reflects that most abnormality detection AI studies were not adequately validated in representative clinical cohorts. The few studies describing how abnormality detection AI could impact patients and clinicians did not explore the full ramifications of clinical implementation. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.02782 [pdf]

A self-supervised text-vision framework for automated brain abnormality detection

Authors: David A. Wood, Emily Guilhem, Sina Kafiabadi, Ayisha Al Busaidi, Kishan Dissanayake, Ahmed Hammam, Nina Mansoor, Matthew Townend, Siddharth Agarwal, Yiran Wei, Asif Mazumder, Gareth J. Barker, Peter Sasieni, Sebastien Ourselin, James H. Cole, Thomas C. Booth

Abstract: Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address… ▽ More Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports. Our training approach consisted of two-steps. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports (N = 50,523) via domain-specific self-supervised learning tasks. Next, convolutional neural networks (one per MRI sequence) learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., 'there is an acute stroke', 'there is hydrocephalus' etc.), enabling a range of classification-based applications including automated triage. Potentially, our framework could also serve as a clinical decision support tool, not only by suggesting findings to radiologists and detecting errors in provisional reports, but also by retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors. △ Less

Submitted 11 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: Under Review

arXiv:2403.20327 [pdf, other]

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Authors: **hyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each query, and relabeling the positive and hard negative passages using the same LLM. The effectiveness of our approach is demonstrated by the compactness of the Gecko. On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size. Gecko with 768 embedding dimensions achieves an average score of 66.31, competing with 7x larger models and 5x higher dimensional embeddings. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 18 pages

arXiv:2402.09137 [pdf, other]

Semi-Supervised Diffusion Model for Brain Age Prediction

Authors: Ayodeji Ijishakin, Sophie Martin, Florence Townend, Federica Agosta, Edoardo Gioele Spinelli, Silvia Basaia, Paride Schito, Yuri Falzone, Massimo Filippi, James Cole, Andrea Malaspina

Abstract: Brain age prediction models have succeeded in predicting clinical outcomes in neurodegenerative diseases, but can struggle with tasks involving faster progressing diseases and low quality data. To enhance their performance, we employ a semi-supervised diffusion model, obtaining a 0.83(p<0.01) correlation between chronological and predicted age on low quality T1w MR images. This was competitive wit… ▽ More Brain age prediction models have succeeded in predicting clinical outcomes in neurodegenerative diseases, but can struggle with tasks involving faster progressing diseases and low quality data. To enhance their performance, we employ a semi-supervised diffusion model, obtaining a 0.83(p<0.01) correlation between chronological and predicted age on low quality T1w MR images. This was competitive with state-of-the-art non-generative methods. Furthermore, the predictions produced by our model were significantly associated with survival length (r=0.24, p<0.05) in Amyotrophic Lateral Sclerosis. Thus, our approach demonstrates the value of diffusion-based architectures for the task of brain age prediction. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Journal ref: Deep Generative Models for Health Workshop, NeurIPS 2023

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2310.08464 [pdf, other]

Crowdsourced and Automatic Speech Prominence Estimation

Authors: Max Morrison, Pranav Pawar, Nathan Pruyne, Jennifer Cole, Bryan Pardo

Abstract: The prominence of a spoken word is the degree to which an average native listener perceives the word as salient or emphasized relative to its context. Speech prominence estimation is the process of assigning a numeric value to the prominence of each word in an utterance. These prominence labels are useful for linguistic analysis, as well as training automated systems to perform emphasis-controlled… ▽ More The prominence of a spoken word is the degree to which an average native listener perceives the word as salient or emphasized relative to its context. Speech prominence estimation is the process of assigning a numeric value to the prominence of each word in an utterance. These prominence labels are useful for linguistic analysis, as well as training automated systems to perform emphasis-controlled text-to-speech or emotion recognition. Manually annotating prominence is time-consuming and expensive, which motivates the development of automated methods for speech prominence estimation. However, develo** such an automated system using machine-learning methods requires human-annotated training data. Using our system for acquiring such human annotations, we collect and open-source crowdsourced annotations of a portion of the LibriTTS dataset. We use these annotations as ground truth to train a neural speech prominence estimator that generalizes to unseen speakers, datasets, and speaking styles. We investigate design decisions for neural prominence estimation as well as how neural prominence estimation improves as a function of two key factors of annotation cost: dataset size and the number of annotations per utterance. △ Less

Submitted 22 December, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: Published as a conference paper at ICASSP 2024

arXiv:2307.07072 [pdf, other]

Rician likelihood loss for quantitative MRI using self-supervised deep learning

Authors: Christopher S. Parker, Anna Schroder, Sean C. Epstein, James Cole, Daniel C. Alexander, Hui Zhang

Abstract: Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. M… ▽ More Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. Methods: A numerically stable and accurate implementation of the NLR loss was developed to estimate quantitative parameters of the apparent diffusion coefficient (ADC) model and intra-voxel incoherent motion (IVIM) model. Parameter estimation accuracy, precision and overall error were evaluated in terms of bias, variance and root mean squared error and compared against the MSE loss over a range of SNRs (5 - 30). Results: Networks trained with NLR loss show higher estimation accuracy than MSE for the ADC and IVIM diffusion coefficients as SNR decreases, with minimal loss of precision or total error. At high effective SNR (high SNR and small diffusion coefficients), both losses show comparable accuracy and precision for all parameters of both models. Conclusion: The proposed NLR loss is numerically stable and accurate across the full range of tested SNRs and improves parameter estimation accuracy of diffusion coefficients using self-supervised deep learning. We expect the development to benefit quantitative MR imaging techniques broadly, enabling more accurate parameter estimation from noisy data. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 16 pages, 6 figures

arXiv:2306.03022 [pdf, other]

Interpretable Alzheimer's Disease Classification Via a Contrastive Diffusion Autoencoder

Authors: Ayodeji Ijishakin, Ahmed Abdulaal, Adamos Hadjivasiliou, Sophie Martin, James Cole

Abstract: In visual object classification, humans often justify their choices by comparing objects to prototypical examples within that class. We may therefore increase the interpretability of deep learning models by imbuing them with a similar style of reasoning. In this work, we apply this principle by classifying Alzheimer's Disease based on the similarity of images to training examples within the latent… ▽ More In visual object classification, humans often justify their choices by comparing objects to prototypical examples within that class. We may therefore increase the interpretability of deep learning models by imbuing them with a similar style of reasoning. In this work, we apply this principle by classifying Alzheimer's Disease based on the similarity of images to training examples within the latent space. We use a contrastive loss combined with a diffusion autoencoder backbone, to produce a semantically meaningful latent space, such that neighbouring latents have similar image-level features. We achieve a classification accuracy comparable to black box approaches on a dataset of 2D MRI images, whilst producing human interpretable model explanations. Therefore, this work stands as a contribution to the pertinent development of accurate and interpretable deep learning within medical imaging. △ Less

Submitted 25 October, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Journal ref: ICML (2023), 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH)

arXiv:2305.14613 [pdf, other]

Selectively Answering Ambiguous Questions

Authors: Jeremy R. Cole, Michael J. Q. Zhang, Daniel Gillick, Julian Martin Eisenschlos, Bhuwan Dhingra, Jacob Eisenstein

Abstract: Trustworthy language models should abstain from answering questions when they do not know the answer. However, the answer to a question can be unknown for a variety of reasons. Prior research has focused on the case in which the question is clear and the answer is unambiguous but possibly unknown, but the answer to a question can also be unclear due to uncertainty of the questioner's intent or con… ▽ More Trustworthy language models should abstain from answering questions when they do not know the answer. However, the answer to a question can be unknown for a variety of reasons. Prior research has focused on the case in which the question is clear and the answer is unambiguous but possibly unknown, but the answer to a question can also be unclear due to uncertainty of the questioner's intent or context. We investigate question answering from this perspective, focusing on answering a subset of questions with a high degree of accuracy, from a set of questions in which many are inherently ambiguous. In this setting, we find that the most reliable approach to decide when to abstain involves quantifying repetition within sampled model outputs, rather than the model's likelihood or self-verification as used in prior work. We find this to be the case across different types of uncertainty and model scales,and with or without instruction tuning. Our results suggest that sampling-based confidence scores help calibrate answers to relatively unambiguous questions, with more dramatic improvements on ambiguous questions. △ Less

Submitted 14 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: To appear in EMNLP 2023. 9 pages, 5 figures, 2 pages of appendix

arXiv:2305.14499 [pdf, other]

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

Authors: Livio Baldini Soares, Daniel Gillick, Jeremy R. Cole, Tom Kwiatkowski

Abstract: Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs… ▽ More Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-of-the art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with Language models) as a model architecture that is compatible with recent encoder-decoder and decoder-only large language models, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries. △ Less

Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: To appear at EMNLP 2023

arXiv:2303.12860 [pdf, other]

Salient Span Masking for Temporal Understanding

Authors: Jeremy R. Cole, Aditi Chaudhary, Bhuwan Dhingra, Partha Talukdar

Abstract: Salient Span Masking (SSM) has shown itself to be an effective strategy to improve closed-book question answering performance. SSM extends general masked language model pretraining by creating additional unsupervised training sentences that mask a single entity or date span, thus oversampling factual information. Despite the success of this paradigm, the span types and sampling strategies are rela… ▽ More Salient Span Masking (SSM) has shown itself to be an effective strategy to improve closed-book question answering performance. SSM extends general masked language model pretraining by creating additional unsupervised training sentences that mask a single entity or date span, thus oversampling factual information. Despite the success of this paradigm, the span types and sampling strategies are relatively arbitrary and not widely studied for other tasks. Thus, we investigate SSM from the perspective of temporal tasks, where learning a good representation of various temporal expressions is important. To that end, we introduce Temporal Span Masking (TSM) intermediate training. First, we find that SSM alone improves the downstream performance on three temporal tasks by an avg. +5.8 points. Further, we are able to achieve additional improvements (avg. +0.29 points) by adding the TSM task. These comprise the new best reported results on the targeted tasks. Our analysis suggests that the effectiveness of SSM stems from the sentences chosen in the training data rather than the mask choice: sentences with entities frequently also contain temporal expressions. Nonetheless, the additional targeted spans of TSM can still improve performance, especially in a zero-shot context. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 5 pages 1 figure, to appear in EACL 2023

arXiv:2303.00242 [pdf, other]

DIFFQG: Generating Questions to Summarize Factual Changes

Authors: Jeremy R. Cole, Palak Jain, Julian Martin Eisenschlos, Michael J. Q. Zhang, Eunsol Choi, Bhuwan Dhingra

Abstract: Identifying the difference between two versions of the same article is useful to update knowledge bases and to understand how articles evolve. Paired texts occur naturally in diverse situations: reporters write similar news stories and maintainers of authoritative websites must keep their information up to date. We propose representing factual changes between paired documents as question-answer pa… ▽ More Identifying the difference between two versions of the same article is useful to update knowledge bases and to understand how articles evolve. Paired texts occur naturally in diverse situations: reporters write similar news stories and maintainers of authoritative websites must keep their information up to date. We propose representing factual changes between paired documents as question-answer pairs, where the answer to the same question differs between two versions. We find that question-answer pairs can flexibly and concisely capture the updated contents. Provided with paired documents, annotators identify questions that are answered by one passage but answered differently or cannot be answered by the other. We release DIFFQG which consists of 759 QA pairs and 1153 examples of paired passages with no factual change. These questions are intended to be both unambiguous and information-seeking and involve complex edits, pushing beyond the capabilities of current question generation and factual change detection systems. Our dataset summarizes the changes between two versions of the document as questions and answers, studying automatic update summarization in a novel way. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 14 pages. Accepted at EACL 2023 (main, long)

arXiv:2301.00504 [pdf]

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Authors: Timothy T. Yu, Da Ma, Jayden Cole, Myeong ** Ju, Mirza F. Beg, Marinko V. Sarunic

Abstract: Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subs… ▽ More Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture. △ Less

Submitted 1 January, 2023; originally announced January 2023.

arXiv:2209.12786 [pdf, other]

Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour

Authors: Fangyu Liu, Julian Martin Eisenschlos, Jeremy R. Cole, Nigel Collier

Abstract: Language models (LMs) trained on raw texts have no direct access to the physical world. Gordon and Van Durme (2013) point out that LMs can thus suffer from reporting bias: texts rarely report on common facts, instead focusing on the unusual aspects of a situation. If LMs are only trained on text corpora and naively memorise local co-occurrence statistics, they thus naturally would learn a biased v… ▽ More Language models (LMs) trained on raw texts have no direct access to the physical world. Gordon and Van Durme (2013) point out that LMs can thus suffer from reporting bias: texts rarely report on common facts, instead focusing on the unusual aspects of a situation. If LMs are only trained on text corpora and naively memorise local co-occurrence statistics, they thus naturally would learn a biased view of the physical world. While prior studies have repeatedly verified that LMs of smaller scales (e.g., RoBERTa, GPT-2) amplify reporting bias, it remains unknown whether such trends continue when models are scaled up. We investigate reporting bias from the perspective of colour in larger language models (LLMs) such as PaLM and GPT-3. Specifically, we query LLMs for the typical colour of objects, which is one simple type of perceptually grounded physical common sense. Surprisingly, we find that LLMs significantly outperform smaller LMs in determining an object's typical colour and more closely track human judgments, instead of overfitting to surface patterns stored in texts. This suggests that very large models of language alone are able to overcome certain types of reporting bias that are characterized by local co-occurrences. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: AACL 2022

arXiv:2209.12153 [pdf, other]

WinoDict: Probing language models for in-context word acquisition

Authors: Julian Martin Eisenschlos, Jeremy R. Cole, Fangyu Liu, William W. Cohen

Abstract: We introduce a new in-context learning paradigm to measure Large Language Models' (LLMs) ability to learn novel words during inference. In particular, we rewrite Winograd-style co-reference resolution problems by replacing the key concept word with a synthetic but plausible word that the model must understand to complete the task. Solving this task requires the model to make use of the dictionary… ▽ More We introduce a new in-context learning paradigm to measure Large Language Models' (LLMs) ability to learn novel words during inference. In particular, we rewrite Winograd-style co-reference resolution problems by replacing the key concept word with a synthetic but plausible word that the model must understand to complete the task. Solving this task requires the model to make use of the dictionary definition of the new word given in the prompt. This benchmark addresses word acquisition, one important aspect of the diachronic degradation known to afflict LLMs. As LLMs are frozen in time at the moment they are trained, they are normally unable to reflect the way language changes over time. We show that the accuracy of LLMs compared to the original Winograd tasks decreases radically in our benchmark, thus identifying a limitation of current models and providing a benchmark to measure future improvements in LLMs ability to do in-context learning. △ Less

Submitted 25 September, 2022; originally announced September 2022.

arXiv:2206.13346 [pdf, other]

Distributional Gaussian Processes Layers for Out-of-Distribution Detection

Authors: Sebastian G. Popescu, David J. Sharp, James H. Cole, Konstantinos Kamnitsas, Ben Glocker

Abstract: Machine learning models deployed on medical imaging tasks must be equipped with out-of-distribution detection capabilities in order to avoid erroneous predictions. It is unsure whether out-of-distribution detection models reliant on deep neural networks are suitable for detecting domain shifts in medical imaging. Gaussian Processes can reliably separate in-distribution data points from out-of-dist… ▽ More Machine learning models deployed on medical imaging tasks must be equipped with out-of-distribution detection capabilities in order to avoid erroneous predictions. It is unsure whether out-of-distribution detection models reliant on deep neural networks are suitable for detecting domain shifts in medical imaging. Gaussian Processes can reliably separate in-distribution data points from out-of-distribution data points via their mathematical construction. Hence, we propose a parameter efficient Bayesian layer for hierarchical convolutional Gaussian Processes that incorporates Gaussian Processes operating in Wasserstein-2 space to reliably propagate uncertainty. This directly replaces convolving Gaussian Processes with a distance-preserving affine operator on distributions. Our experiments on brain tissue-segmentation show that the resulting architecture approaches the performance of well-established deterministic segmentation algorithms (U-Net), which has not been achieved with previous hierarchical Gaussian Processes. Moreover, by applying the same segmentation model to out-of-distribution data (i.e., images with pathology such as brain tumors), we show that our uncertainty estimates result in out-of-distribution detection that outperforms the capabilities of previous Bayesian networks and reconstruction-based approaches that learn normative distributions. To facilitate future work our code is publicly available. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Published in Journal of Machine Learning for Biomedical Imaging: Special Issue: Information Processing in Medical Imaging (IPMI) 2021

arXiv:2203.17019 [pdf, other]

DeepFry: Identifying Vocal Fry Using Deep Neural Networks

Authors: Bronya R. Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer S. Cole, Joseph Keshet

Abstract: Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly f… ▽ More Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly for languages where creak is frequently used. This paper proposes a deep learning model to detect creaky voice in fluent speech. The model is composed of an encoder and a classifier trained together. The encoder takes the raw waveform and learns a representation using a convolutional neural network. The classifier is implemented as a multi-headed fully-connected network trained to detect creaky voice, voicing, and pitch, where the last two are used to refine creak prediction. The model is trained and tested on speech of American English speakers, annotated for creak by trained phoneticians. We evaluated the performance of our system using two encoders: one is tailored for the task, and the other is based on a state-of-the-art unsupervised representation. Results suggest our best-performing system has improved recall and F1 scores compared to previous methods on unseen data. △ Less

Submitted 26 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Accepted to Interspeech 2022

arXiv:2201.10553 [pdf]

Online discussion forums for monitoring the need for targeted psychological health support: an observational case study of r/COVID19_support

Authors: Fathima Rushda Balabaskaran, Annabel Jones-Gammon, Rebecca How, Jennifer Cole

Abstract: The COVID-19 pandemic has placed a severe mental strain on people in general, and on young people in particular. Online support forums offer opportunities for peer-to-peer health support, which can ease pressure on professional and established volunteer services when demand is high. Such forums can also be used to monitor at-risk communities to identify concerns and causes of psychological stress.… ▽ More The COVID-19 pandemic has placed a severe mental strain on people in general, and on young people in particular. Online support forums offer opportunities for peer-to-peer health support, which can ease pressure on professional and established volunteer services when demand is high. Such forums can also be used to monitor at-risk communities to identify concerns and causes of psychological stress. We created and monitored r/COVID19_support, an online forum for people seeking support during the COVID-19 pandemic, on the platform Reddit. We identify posts made by users self-identifying as students or posting about college/university life, then coded these posts to identify emerging themes that related to triggers of psychological anxiety and distress. 147 posts were made to the forum by 111 unique users during the study period. A number of themes were identified by manual coding, included: feelings of grief associated with the loss of college-related life experiences, such as graduation ceremonies or proms; difficulties with focussing on online and self-guided learning; and fears for the future, in particular of graduating into a constrained job market. The identification of specific issues enabled users to be signposted to information to help them cope with address those particular concerns. Monitoring peer-to-peer forums can help to identify specific issues with which vulnerable groups may require additional support, enabling users to be signposted on to high-quality information to address specific issues. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 27 pages, 1 table, 5 figures

Journal ref: Global Journal of Medicine and Public Health Vol 10 Issue 5 2021

arXiv:2111.14671 [pdf, other]

ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models

Authors: Salva Rühling Cachay, Venkatesh Ramesh, Jason N. S. Cole, Howard Barker, David Rolnick

Abstract: Numerical simulations of Earth's weather and climate require substantial amounts of computation. This has led to a growing interest in replacing subroutines that explicitly compute physical processes with approximate machine learning (ML) methods that are fast at inference time. Within weather and climate models, atmospheric radiative transfer (RT) calculations are especially expensive. This has m… ▽ More Numerical simulations of Earth's weather and climate require substantial amounts of computation. This has led to a growing interest in replacing subroutines that explicitly compute physical processes with approximate machine learning (ML) methods that are fast at inference time. Within weather and climate models, atmospheric radiative transfer (RT) calculations are especially expensive. This has made them a popular target for neural network-based emulators. However, prior work is hard to compare due to the lack of a comprehensive dataset and standardized best practices for ML benchmarking. To fill this gap, we build a large dataset, ClimART, with more than \emph{10 million samples from present, pre-industrial, and future climate conditions}, based on the Canadian Earth System Model. ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed. We also present several novel baselines that indicate shortcomings of datasets and network architectures used in prior work. Download instructions, baselines, and code are available at: https://github.com/RolnickLab/climart △ Less

Submitted 29 November, 2021; originally announced November 2021.

Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks

arXiv:2109.04587 [pdf, other]

Graph-Based Decoding for Task Oriented Semantic Parsing

Authors: Jeremy R. Cole, Nanjiang Jiang, Panupong Pasupat, Luheng He, Peter Shaw

Abstract: The dominant paradigm for semantic parsing in recent years is to formulate parsing as a sequence-to-sequence task, generating predictions with auto-regressive sequence decoders. In this work, we explore an alternative paradigm. We formulate semantic parsing as a dependency parsing task, applying graph-based decoding techniques developed for syntactic parsing. We compare various decoding techniques… ▽ More The dominant paradigm for semantic parsing in recent years is to formulate parsing as a sequence-to-sequence task, generating predictions with auto-regressive sequence decoders. In this work, we explore an alternative paradigm. We formulate semantic parsing as a dependency parsing task, applying graph-based decoding techniques developed for syntactic parsing. We compare various decoding techniques given the same pre-trained Transformer encoder on the TOP dataset, including settings where training data is limited or contains only partially-annotated examples. We find that our graph-based approach is competitive with sequence decoders on the standard setting, and offers significant improvements in data efficiency and settings where partially-annotated data is available. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: To appear in EMNLP 5 pages 4 figures

arXiv:2107.07977 [pdf, other]

An Uncertainty-Aware, Shareable and Transparent Neural Network Architecture for Brain-Age Modeling

Authors: Tim Hahn, Jan Ernsting, Nils R. Winter, Vincent Holstein, Ramona Leenings, Marie Beisemann, Lukas Fisch, Kelvin Sarink, Daniel Emden, Nils Opel, Ronny Redlich, Jonathan Repple, Dominik Grotegerd, Susanne Meinert, Jochen G. Hirsch, Thoralf Niendorf, Beate Endemann, Fabian Bamberg, Thomas Kröncke, Robin Bülow, Henry Völzke, Oyunbileg von Stackelberg, Ramona Felizitas Sowade, Lale Umutlu, Börge Schmidt , et al. (9 additional authors not shown)

Abstract: The deviation between chronological age and age predicted from neuroimaging data has been identified as a sensitive risk-marker of cross-disorder brain changes, growing into a cornerstone of biological age-research. However, Machine Learning models underlying the field do not consider uncertainty, thereby confounding results with training data density and variability. Also, existing models are com… ▽ More The deviation between chronological age and age predicted from neuroimaging data has been identified as a sensitive risk-marker of cross-disorder brain changes, growing into a cornerstone of biological age-research. However, Machine Learning models underlying the field do not consider uncertainty, thereby confounding results with training data density and variability. Also, existing models are commonly based on homogeneous training sets, often not independently validated, and cannot be shared due to data protection issues. Here, we introduce an uncertainty-aware, shareable, and transparent Monte-Carlo Dropout Composite-Quantile-Regression (MCCQR) Neural Network trained on N=10,691 datasets from the German National Cohort. The MCCQR model provides robust, distribution-free uncertainty quantification in high-dimensional neuroimaging data, achieving lower error rates compared to existing models across ten recruitment centers and in three independent validation samples (N=4,004). In two examples, we demonstrate that it prevents spurious associations and increases power to detect accelerated brain-aging. We make the pre-trained model publicly available. △ Less

Submitted 16 July, 2021; originally announced July 2021.

arXiv:2106.15110 [pdf, other]

doi 10.1162/tacl_a_00459

Time-Aware Language Models as Temporal Knowledge Bases

Authors: Bhuwan Dhingra, Jeremy R. Cole, Julian Martin Eisenschlos, Daniel Gillick, Jacob Eisenstein, William W. Cohen

Abstract: Many facts come with an expiration date, from the name of the President to the basketball team Lebron James plays for. But language models (LMs) are trained on snapshots of data collected at a specific moment in time, and this can limit their utility, especially in the closed-book setting where the pretraining corpus must contain the facts the model should memorize. We introduce a diagnostic datas… ▽ More Many facts come with an expiration date, from the name of the President to the basketball team Lebron James plays for. But language models (LMs) are trained on snapshots of data collected at a specific moment in time, and this can limit their utility, especially in the closed-book setting where the pretraining corpus must contain the facts the model should memorize. We introduce a diagnostic dataset aimed at probing LMs for factual knowledge that changes over time and highlight problems with LMs at either end of the spectrum -- those trained on specific slices of temporal data, as well as those trained on a wide range of temporal data. To mitigate these problems, we propose a simple technique for jointly modeling text with its timestamp. This improves memorization of seen facts from the training time period, as well as calibration on predictions about unseen facts from future time periods. We also show that models trained with temporal context can be efficiently "refreshed" as new data arrives, without the need for retraining from scratch. △ Less

Submitted 23 April, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: Version accepted to TACL

Journal ref: Transactions of the Association for Computational Linguistics 2022; 10 257-273

arXiv:2106.08176 [pdf, other]

Automated triaging of head MRI examinations using convolutional neural networks

Authors: David A. Wood, Sina Kafiabadi, Ayisha Al Busaidi, Emily Guilhem, Antanas Montvila, Siddharth Agarwal, Jeremy Lynch, Matthew Townend, Gareth Barker, Sebastien Ourselin, James H. Cole, Thomas C. Booth

Abstract: The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abno… ▽ More The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abnormalities at the time of imaging and prioritizing the reporting of these scans. In this work, we present a convolutional neural network for detecting clinically-relevant abnormalities in $\text{T}_2$-weighted head MRI scans. Using a validated neuroradiology report classifier, we generated a labelled dataset of 43,754 scans from two large UK hospitals for model training, and demonstrate accurate classification (area under the receiver operating curve (AUC) = 0.943) on a test set of 800 scans labelled by a team of neuroradiologists. Importantly, when trained on scans from only a single hospital the model generalized to scans from the other hospital ($Δ$AUC $\leq$ 0.02). A simulation study demonstrated that our model would reduce the mean reporting time for abnormal examinations from 28 days to 14 days and from 9 days to 5 days at the two hospitals, demonstrating feasibility for use in a clinical triage environment. △ Less

Submitted 28 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: Accepted as an oral presentation at Medical Imaging with Deep Learning (MIDL) 2021

arXiv:2105.12235 [pdf]

Acquisition and analysis of crowd-sourced traffic data

Authors: Markus Hilpert, Jenni A. Shearston, Jemaleddin Cole, Steven N. Chillrud, Micaela E. Martinez

Abstract: Crowd-sourced traffic data offer great promise in environmental modeling. However, archives of such traffic data are typically not made available for research; instead, the data must be acquired in real time. The objective of this paper is to present methods we developed for acquiring and analyzing time series of real-time crowd-sourced traffic data. We present scripts, which can be run in Unix/Li… ▽ More Crowd-sourced traffic data offer great promise in environmental modeling. However, archives of such traffic data are typically not made available for research; instead, the data must be acquired in real time. The objective of this paper is to present methods we developed for acquiring and analyzing time series of real-time crowd-sourced traffic data. We present scripts, which can be run in Unix/Linux like computational environments, to automatically download tiles of crowd-sourced Google traffic congestion maps for a user-specifiable region of interest. Broad and international applicability of our method is demonstrated for Manhattan in New York City and Mexico City. We also demonstrate that Google traffic data can be used to quantify decreases in traffic congestion due to social distancing policies implemented to curb the COVID-19 pandemic in the South Bronx in New York City. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: 19 pages, 8 figures, 1 table

ACM Class: D.1; J.2; J.3

arXiv:2104.13756 [pdf, other]

Distributional Gaussian Process Layers for Outlier Detection in Image Segmentation

Authors: Sebastian G. Popescu, David J. Sharp, James H. Cole, Konstantinos Kamnitsas, Ben Glocker

Abstract: We propose a parameter efficient Bayesian layer for hierarchical convolutional Gaussian Processes that incorporates Gaussian Processes operating in Wasserstein-2 space to reliably propagate uncertainty. This directly replaces convolving Gaussian Processes with a distance-preserving affine operator on distributions. Our experiments on brain tissue-segmentation show that the resulting architecture a… ▽ More We propose a parameter efficient Bayesian layer for hierarchical convolutional Gaussian Processes that incorporates Gaussian Processes operating in Wasserstein-2 space to reliably propagate uncertainty. This directly replaces convolving Gaussian Processes with a distance-preserving affine operator on distributions. Our experiments on brain tissue-segmentation show that the resulting architecture approaches the performance of well-established deterministic segmentation algorithms (U-Net), which has never been achieved with previous hierarchical Gaussian Processes. Moreover, by applying the same segmentation model to out-of-distribution data (i.e., images with pathology such as brain tumors), we show that our uncertainty estimates result in out-of-distribution detection that outperforms the capabilities of previous Bayesian networks and reconstruction-based approaches that learn normative distributions. △ Less

Submitted 28 April, 2021; originally announced April 2021.

arXiv:2011.08787 [pdf]

The COVID19 infodemic. The role and place of academics in science communication

Authors: Jennifer Cole

Abstract: As the COVID19 pandemic has spread across the world, a concurrent pandemic of information has spread with it. Deemed an infodemic by the World Health Organization, and described as an overabundance of information, some accurate, some not, that occurs during an epidemic, this proliferation of data, research and opinions provides both opportunities and challenges for academics. Academics and scienti… ▽ More As the COVID19 pandemic has spread across the world, a concurrent pandemic of information has spread with it. Deemed an infodemic by the World Health Organization, and described as an overabundance of information, some accurate, some not, that occurs during an epidemic, this proliferation of data, research and opinions provides both opportunities and challenges for academics. Academics and scientists have a key role to play in the solutions to the infodemic challenge: as educators, influences and communicators, even where their expertise and experience does not align precisely with the SARS-Cov2 virus and its impacts. Successful communication requires a better understanding of how the public seeks, understands and processes scientific information, however, in order to maximise the ways in which experts engage with traditional and social media and to make sure that such engagement does not add to confusion and misinformation alongside efforts to counter or challenge it. This paper will outline the key advantages to be had from greater engagement with COVID19 discussions, the popular channels through which such discussions take place and through which information is disseminated. It also warns against the common pitfalls those who choose to engage might encounter, whilst stressing that the disadvantages of doing so are far outweighed by the advantages such engagement offers. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 17 Pages

Journal ref: Global Journal of Medicine and Public Health Vol 9 Issue 2 2020

arXiv:2010.14877 [pdf, other]

Hierarchical Gaussian Processes with Wasserstein-2 Kernels

Authors: Sebastian Popescu, David Sharp, James Cole, Ben Glocker

Abstract: Stacking Gaussian Processes severely diminishes the model's ability to detect outliers, which when combined with non-zero mean functions, further extrapolates low non-parametric variance to low training data density regions. We propose a hybrid kernel inspired from Varifold theory, operating in both Euclidean and Wasserstein space. We posit that directly taking into account the variance in the com… ▽ More Stacking Gaussian Processes severely diminishes the model's ability to detect outliers, which when combined with non-zero mean functions, further extrapolates low non-parametric variance to low training data density regions. We propose a hybrid kernel inspired from Varifold theory, operating in both Euclidean and Wasserstein space. We posit that directly taking into account the variance in the computation of Wasserstein-2 distances is of key importance towards maintaining outlier status throughout the hierarchy. We show improved performance on medium and large scale datasets and enhanced out-of-distribution detection on both toy and real data. △ Less

Submitted 1 February, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2007.04226 [pdf, other]

Labelling imaging datasets on the basis of neuroradiology reports: a validation study

Authors: David A. Wood, Sina Kafiabadi, Aisha Al Busaidi, Emily Guilhem, Jeremy Lynch, Matthew Townend, Antanas Montvila, Juveria Siddiqui, Naveen Gadapa, Matthew Benger, Gareth Barker, Sebastian Ourselin, James H. Cole, Thomas C. Booth

Abstract: Natural language processing (NLP) shows promise as a means to automate the labelling of hospital-scale neuroradiology magnetic resonance imaging (MRI) datasets for computer vision applications. To date, however, there has been no thorough investigation into the validity of this approach, including determining the accuracy of report labels compared to image labels as well as examining the performan… ▽ More Natural language processing (NLP) shows promise as a means to automate the labelling of hospital-scale neuroradiology magnetic resonance imaging (MRI) datasets for computer vision applications. To date, however, there has been no thorough investigation into the validity of this approach, including determining the accuracy of report labels compared to image labels as well as examining the performance of non-specialist labellers. In this work, we draw on the experience of a team of neuroradiologists who labelled over 5000 MRI neuroradiology reports as part of a project to build a dedicated deep learning-based neuroradiology report classifier. We show that, in our experience, assigning binary labels (i.e. normal vs abnormal) to images from reports alone is highly accurate. In contrast to the binary labels, however, the accuracy of more granular labelling is dependent on the category, and we highlight reasons for this discrepancy. We also show that downstream model performance is reduced when labelling of training reports is performed by a non-specialist. To allow other researchers to accelerate their research, we make our refined abnormality definitions and labelling rules available, as well as our easy-to-use radiology report labelling app which helps streamline this process. △ Less

Submitted 8 March, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

arXiv:2004.11123 [pdf]

doi 10.1016/j.jhydrol.2020.125126

Imputation of missing sub-hourly precipitation data in a large sensor network: a machine learning approach

Authors: Benedict Delahaye Chivers, John Wallbank, Steven J. Cole, Ondrej Sebek, Simon Stanley, Matthew Fry, Georgios Leontidis

Abstract: Precipitation data collected at sub-hourly resolution represents specific challenges for missing data recovery by being largely stochastic in nature and highly unbalanced in the duration of rain vs non-rain. Here we present a two-step analysis utilising current machine learning techniques for imputing precipitation data sampled at 30-minute intervals by devolving the task into (a) the classificati… ▽ More Precipitation data collected at sub-hourly resolution represents specific challenges for missing data recovery by being largely stochastic in nature and highly unbalanced in the duration of rain vs non-rain. Here we present a two-step analysis utilising current machine learning techniques for imputing precipitation data sampled at 30-minute intervals by devolving the task into (a) the classification of rain or non-rain samples, and (b) regressing the absolute values of predicted rain samples. Investigating 37 weather stations in the UK, this machine learning process produces more accurate predictions for recovering precipitation data than an established surface fitting technique utilising neighbouring rain gauges. Increasing available features for the training of machine learning algorithms increases performance with the integration of weather data at the target site with externally sourced rain gauges providing the highest performance. This method informs machine learning models by utilising information in concurrently collected environmental data to make accurate predictions of missing rain data. Capturing complex non-linear relationships from weakly correlated variables is critical for data recovery at sub-hourly resolutions. Such pipelines for data recovery can be developed and deployed for highly automated and near instantaneous imputation of missing values in ongoing datasets at high temporal resolutions. △ Less

Submitted 2 May, 2020; v1 submitted 30 March, 2020; originally announced April 2020.

Comments: 24 pages, 7 figures, 5 tables

Journal ref: Journal of Hydrology 2020

arXiv:2002.06588 [pdf, other]

Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM)

Authors: David A. Wood, Jeremy Lynch, Sina Kafiabadi, Emily Guilhem, Aisha Al Busaidi, Antanas Montvila, Thomas Varsavsky, Juveria Siddiqui, Naveen Gadapa, Matthew Townend, Martin Kiik, Keena Patel, Gareth Barker, Sebastian Ourselin, James H. Cole, Thomas C. Booth

Abstract: Labelling large datasets for training high-capacity neural networks is a major obstacle to the development of deep learning-based medical imaging applications. Here we present a transformer-based network for magnetic resonance imaging (MRI) radiology report classification which automates this task by assigning image labels on the basis of free-text expert radiology reports. Our model's performance… ▽ More Labelling large datasets for training high-capacity neural networks is a major obstacle to the development of deep learning-based medical imaging applications. Here we present a transformer-based network for magnetic resonance imaging (MRI) radiology report classification which automates this task by assigning image labels on the basis of free-text expert radiology reports. Our model's performance is comparable to that of an expert radiologist, and better than that of an expert physician, demonstrating the feasibility of this approach. We make code available online for researchers to label their own MRI datasets for medical imaging applications. △ Less

Submitted 16 February, 2020; originally announced February 2020.

arXiv:1910.04721 [pdf, other]

NEURO-DRAM: a 3D recurrent visual attention model for interpretable neuroimaging classification

Authors: David Wood, James Cole, Thomas Booth

Abstract: Deep learning is attracting significant interest in the neuroimaging community as a means to diagnose psychiatric and neurological disorders from structural magnetic resonance images. However, there is a tendency amongst researchers to adopt architectures optimized for traditional computer vision tasks, rather than design networks customized for neuroimaging data. We address this by introducing NE… ▽ More Deep learning is attracting significant interest in the neuroimaging community as a means to diagnose psychiatric and neurological disorders from structural magnetic resonance images. However, there is a tendency amongst researchers to adopt architectures optimized for traditional computer vision tasks, rather than design networks customized for neuroimaging data. We address this by introducing NEURO-DRAM, a 3D recurrent visual attention model tailored for neuroimaging classification. The model comprises an agent which, trained by reinforcement learning, learns to navigate through volumetric images, selectively attending to the most informative regions for a given task. When applied to Alzheimer's disease prediction, NEURODRAM achieves state-of-the-art classification accuracy on an out-of-sample dataset, significantly outperforming a baseline convolutional neural network. When further applied to the task of predicting which patients with mild cognitive impairment will be diagnosed with Alzheimer's disease within two years, the model achieves state-of-the-art accuracy with no additional training. Encouragingly, the agent learns, without explicit instruction, a search policy in agreement with standardized radiological hallmarks of Alzheimer's disease, suggesting a route to automated biomarker discovery for more poorly understood disorders. △ Less

Submitted 18 October, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: Improved network figure

arXiv:1901.08180 [pdf]

Vision-based Obstacle Removal System for Autonomous Ground Vehicles Using a Robotic Arm

Authors: Khashayar Asadi, Rahul Jain, Ziqian Qin, Mingda Sun, Mojtaba Noghabaei, Jeremy Cole, Kevin Han, Edgar Lobaton

Abstract: Over the past few years, the use of camera-equipped robotic platforms for data collection and visually monitoring applications has exponentially grown. Cluttered construction sites with many objects (e.g., bricks, pipes, etc.) on the ground are challenging environments for a mobile unmanned ground vehicle (UGV) to navigate. To address this issue, this study presents a mobile UGV equipped with a st… ▽ More Over the past few years, the use of camera-equipped robotic platforms for data collection and visually monitoring applications has exponentially grown. Cluttered construction sites with many objects (e.g., bricks, pipes, etc.) on the ground are challenging environments for a mobile unmanned ground vehicle (UGV) to navigate. To address this issue, this study presents a mobile UGV equipped with a stereo camera and a robotic arm that can remove obstacles along the UGV's path. To achieve this objective, the surrounding environment is captured by the stereo camera and obstacles are detected. The obstacle's relative location to the UGV is sent to the robotic arm module through Robot Operating System (ROS). Then, the robotic arm picks up and removes the obstacle. The proposed method will greatly enhance the degree of automation and the frequency of data collection for construction monitoring. The proposed system is validated through two case studies. The results successfully demonstrate the detection and removal of obstacles, serving as one of the enabling factors for develo** an autonomous UGV with various construction operating applications. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Comments: The 2019 ASCE International Conference on Computing in Civil Engineering

arXiv:1810.12646 [pdf, other]

Prosodic entrainment in dialog acts

Authors: Uwe D. Reichel, Katalin Mády, Jennifer Cole

Abstract: We examined prosodic entrainment in spoken dialogs separately for several dialog acts in cooperative and competitive games. Entrainment was measured for intonation features derived from a superpositional intonation stylization as well as for rhythm features. The found differences can be related to the cooperative or competitive nature of the game, as well as to dialog act properties as its intrins… ▽ More We examined prosodic entrainment in spoken dialogs separately for several dialog acts in cooperative and competitive games. Entrainment was measured for intonation features derived from a superpositional intonation stylization as well as for rhythm features. The found differences can be related to the cooperative or competitive nature of the game, as well as to dialog act properties as its intrinsic authority, supportiveness and distributional characteristics. In cooperative games dialog acts with a high authority given by knowledge and with a high frequency showed the most entrainment. The results are discussed amongst others with respect to the degree of active entrainment control in cooperative behavior. △ Less

Submitted 30 October, 2018; originally announced October 2018.

Comments: This manuscript is under revision. Please contact the authors for information about updates

arXiv:1705.10312 [pdf]

Classification of Major Depressive Disorder via Multi-Site Weighted LASSO Model

Authors: Dajiang Zhu, Brandalyn C. Riedel, Neda Jahanshad, Nynke A. Groenewold, Dan J. Stein, Ian H. Gotlib, Matthew D. Sacchet, Danai Dima, James H. Cole, Cynthia H. Y. Fu, Henrik Walter, Ilya M. Veer, Thomas Frodl, Lianne Schmaal, Dick J. Veltman, Paul M. Thompson

Abstract: Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distribut… ▽ More Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distributed information. Here we propose a novel classification framework through multi-site weighted LASSO: each site performs an iterative weighted LASSO for feature selection separately. Within each iteration, the classification result and the selected features are collected to update the weighting parameters for each feature. This new weight is used to guide the LASSO process at the next iteration. Only the fea-tures that help to improve the classification accuracy are preserved. In tests on da-ta from five sites (299 patients with major depressive disorder (MDD) and 258 normal controls), our method boosted classification accuracy for MDD by 4.9% on average. This result shows the potential of the proposed new strategy as an ef-fective and practical collaborative platform for machine learning on large scale distributed imaging and biobank data. △ Less

Submitted 3 June, 2017; v1 submitted 26 May, 2017; originally announced May 2017.

Comments: Accepted by MICCAI 2017

arXiv:1612.02572 [pdf]

Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

Authors: James H Cole, Rudra PK Poudel, Dimosthenis Tsagkrasoulis, Matthan WA Caan, Claire Steves, Tim D Spector, Giovanni Montana

Abstract: Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on… ▽ More Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on deep learning, and specifically convolutional neural networks (CNN), and applied to both pre-processed and raw T1-weighted MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted age using a large dataset of healthy adults (N = 2001). Next, we sought to establish the heritability of brain-predicted age using a sample of monozygotic and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and multi-centre reliability of brain-predicted age using two samples (within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were generated and compared to a Gaussian Process Regression (GPR) approach, on all datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps generated by Statistical Parametric Map** (SPM) or raw data. Brain-predicted age represents an accurate, highly reliable and genetically-valid phenotype, that has potential to be used as a biomarker of brain ageing. Moreover, age predictions can be accurately generated on raw T1-MRI data, substantially reducing computation time for novel data, bringing the process closer to giving real-time information on brain health in clinical settings. △ Less

Submitted 8 December, 2016; originally announced December 2016.

Showing 1–36 of 36 results for author: Cole, J