-
Foundational GPT Model for MEG
Authors:
Richard Csaky,
Mats W. J. van Es,
Oiwi Parker Jones,
Mark Woolrich
Abstract:
Deep learning techniques can be used to first training unsupervised models on large amounts of unlabelled data, before fine-tuning the models on specific tasks. This approach has seen massive success for various kinds of data, e.g. images, language, audio, and holds the promise of improving performance in various downstream tasks (e.g. encoding or decoding brain data). However, there has been limi…
▽ More
Deep learning techniques can be used to first training unsupervised models on large amounts of unlabelled data, before fine-tuning the models on specific tasks. This approach has seen massive success for various kinds of data, e.g. images, language, audio, and holds the promise of improving performance in various downstream tasks (e.g. encoding or decoding brain data). However, there has been limited progress taking this approach for modelling brain signals, such as Magneto-/electroencephalography (M/EEG). Here we propose two classes of deep learning foundational models that can be trained using forecasting of unlabelled MEG. First, we consider a modified Wavenet; and second, we consider a modified Transformer-based (GPT2) model. The modified GPT2 includes a novel application of tokenisation and embedding methods, allowing a model developed initially for the discrete domain of language to be applied to continuous multichannel time series data. We also extend the forecasting framework to include condition labels as inputs, enabling better modelling (encoding) of task data. We compare the performance of these deep learning models with standard linear autoregressive (AR) modelling on MEG data. This shows that GPT2-based models provide better modelling capabilities than Wavenet and linear AR models, by better reproducing the temporal, spatial and spectral characteristics of real data and evoked activity in task data. We show how the GPT2 model scales well to multiple subjects, while adapting its model to each subject through subject embedding. Finally, we show how such a model can be useful in downstream decoding tasks through data simulation. All code is available on GitHub (https://github.com/ricsinaruto/MEG-transfer-decoding).
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Mai Ho'omÄuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian
Authors:
Kaavya Chaparala,
Guido Zarrella,
Bruce Torres Fischer,
Larry Kimura,
Oiwi Parker Jones
Abstract:
In this paper we address the challenge of improving Automatic Speech Recognition (ASR) for a low-resource language, Hawaiian, by incorporating large amounts of independent text data into an ASR foundation model, Whisper. To do this, we train an external language model (LM) on ~1.5M words of Hawaiian text. We then use the LM to rescore Whisper and compute word error rates (WERs) on a manually curat…
▽ More
In this paper we address the challenge of improving Automatic Speech Recognition (ASR) for a low-resource language, Hawaiian, by incorporating large amounts of independent text data into an ASR foundation model, Whisper. To do this, we train an external language model (LM) on ~1.5M words of Hawaiian text. We then use the LM to rescore Whisper and compute word error rates (WERs) on a manually curated test set of labeled Hawaiian data. As a baseline, we use Whisper without an external LM. Experimental results reveal a small but significant improvement in WER when ASR outputs are rescored with a Hawaiian LM. The results support leveraging all available data in the development of ASR systems for underrepresented languages.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
VAE-IF: Deep feature extraction with averaging for unsupervised artifact detection in routine acquired ICU time-series
Authors:
Hollan Haule,
Ian Piper,
Patricia Jones,
Chen Qin,
Tsz-Yan Milly Lo,
Javier Escudero
Abstract:
Artifacts are a common problem in physiological time-series data collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel unsupervised approach to detect artifacts in clinic…
▽ More
Artifacts are a common problem in physiological time-series data collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel unsupervised approach to detect artifacts in clinical-standard minute-by-minute resolution ICU data without any prior labeling or signal-specific knowledge. Our approach combines a variational autoencoder (VAE) and an isolation forest (iForest) model to learn features and identify anomalies in different types of vital signs, such as blood pressure, heart rate, and intracranial pressure. We evaluate our approach on a real-world ICU dataset and compare it with supervised models based on long short-term memory (LSTM) and XGBoost. We show that our approach achieves comparable sensitivity and generalizes well to an external dataset. We also visualize the latent space learned by the VAE and demonstrate its ability to disentangle clean and noisy samples. Our approach offers a promising solution for cleaning ICU data in clinical research and practice without the need for any labels whatsoever.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Group-level Brain Decoding with Deep Learning
Authors:
Richard Csaky,
Mats Van Es,
Oiwi Parker Jones,
Mark Woolrich
Abstract:
Decoding brain imaging data are gaining popularity, with applications in brain-computer interfaces and the study of neural representations. Decoding is typicallysubject-specific and does not generalise well over subjects, due to high amounts ofbetween subject variability. Techniques that overcome this will not only providericher neuroscientific insights but also make it possible for group-level mo…
▽ More
Decoding brain imaging data are gaining popularity, with applications in brain-computer interfaces and the study of neural representations. Decoding is typicallysubject-specific and does not generalise well over subjects, due to high amounts ofbetween subject variability. Techniques that overcome this will not only providericher neuroscientific insights but also make it possible for group-level models to out-perform subject-specific models. Here, we propose a method that uses subjectembedding, analogous to word embedding in natural language processing, to learnand exploit the structure in between-subject variability as part of a decoding model,our adaptation of the WaveNet architecture for classification. We apply this to mag-netoencephalography data, where 15 subjects viewed 118 different images, with30 examples per image; to classify images using the entire 1 s window followingimage presentation. We show that the combination of deep learning and subjectembedding is crucial to closing the performance gap between subject- and group-level decoding models. Importantly, group models outperform subject models onlow-accuracy subjects (although slightly impair high-accuracy subjects) and can behelpful for initialising subject models. While we have not generally found group-levelmodels to perform better than subject-level models, the performance of groupmodelling is expected to be even higher with bigger datasets. In order to providephysiological interpretation at the group level, we make use of permutation featureimportance. This provides insights into the spatiotemporal and spectral informationencoded in the models. All code is available on GitHub (https://github.com/ricsinaruto/MEG-group-decode).
△ Less
Submitted 19 January, 2024; v1 submitted 27 May, 2022;
originally announced May 2022.
-
There and Back Again: Learning to Simulate Radar Data for Real-World Applications
Authors:
Rob Weston,
Oiwi Parker Jones,
Ingmar Posner
Abstract:
Simulating realistic radar data has the potential to significantly accelerate the development of data-driven approaches to radar processing. However, it is fraught with difficulty due to the notoriously complex image formation process. Here we propose to learn a radar sensor model capable of synthesising faithful radar observations based on simulated elevation maps. In particular, we adopt an adve…
▽ More
Simulating realistic radar data has the potential to significantly accelerate the development of data-driven approaches to radar processing. However, it is fraught with difficulty due to the notoriously complex image formation process. Here we propose to learn a radar sensor model capable of synthesising faithful radar observations based on simulated elevation maps. In particular, we adopt an adversarial approach to learning a forward sensor model from unaligned radar examples. In addition, modelling the backward model encourages the output to remain aligned to the world state through a cyclical consistency criterion. The backward model is further constrained to predict elevation maps from real radar data that are grounded by partial measurements obtained from corresponding lidar scans. Both models are trained in a joint optimisation. We demonstrate the efficacy of our approach by evaluating a down-stream segmentation model trained purely on simulated data in a real-world deployment. This achieves performance within four percentage points of the same model trained entirely on real data.
△ Less
Submitted 29 November, 2020;
originally announced November 2020.
-
Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels
Authors:
Murad Qasaimeh,
Kristof Denolf,
Jack Lo,
Kees Vissers,
Joseph Zambreno,
Phillip H. Jones
Abstract:
Develo** high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determin…
▽ More
Develo** high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1-3.2x compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3x. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.
△ Less
Submitted 31 May, 2019;
originally announced June 2019.
-
Lagrangian Reachabililty
Authors:
Jacek Cyranka,
Md. Ariful Islam,
Greg Byrne,
Paul Jones,
Scott A. Smolka,
Radu Grosu
Abstract:
We introduce LRT, a new Lagrangian-based ReachTube computation algorithm that conservatively approximates the set of reachable states of a nonlinear dynamical system. LRT makes use of the Cauchy-Green stretching factor (SF), which is derived from an over-approximation of the gradient of the solution flows. The SF measures the discrepancy between two states propagated by the system solution from tw…
▽ More
We introduce LRT, a new Lagrangian-based ReachTube computation algorithm that conservatively approximates the set of reachable states of a nonlinear dynamical system. LRT makes use of the Cauchy-Green stretching factor (SF), which is derived from an over-approximation of the gradient of the solution flows. The SF measures the discrepancy between two states propagated by the system solution from two initial states lying in a well-defined region, thereby allowing LRT to compute a reachtube with a ball-overestimate in a metric where the computed enclosure is as tight as possible. To evaluate its performance, we implemented a prototype of LRT in C++/Matlab, and ran it on a set of well-established benchmarks. Our results show that LRT compares very favorably with respect to the CAPD and Flow* tools.
△ Less
Submitted 3 July, 2017; v1 submitted 16 May, 2017;
originally announced May 2017.