-
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Authors:
Frank Palma Gomez,
Ramon Sanabria,
Yun-hsuan Sung,
Daniel Cer,
Siddharth Dalmia,
Gustavo Hernandez Abrego
Abstract:
Large language models (LLMs) are trained on text-only data that go far beyond the languages with paired speech and text data. At the same time, Dual Encoder (DE) based retrieval systems project queries and documents into the same embedding space and have demonstrated their success in retrieval and bi-text mining. To match speech and text in many languages, we propose using LLMs to initialize multi…
▽ More
Large language models (LLMs) are trained on text-only data that go far beyond the languages with paired speech and text data. At the same time, Dual Encoder (DE) based retrieval systems project queries and documents into the same embedding space and have demonstrated their success in retrieval and bi-text mining. To match speech and text in many languages, we propose using LLMs to initialize multi-modal DE retrieval systems. Unlike traditional methods, our system doesn't require speech data during LLM pre-training and can exploit LLM's multilingual text understanding capabilities to match speech and text in languages unseen during retrieval training. Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages. Our system outperforms previous systems trained explicitly on all 102 languages. We achieve a 10% absolute improvement in Recall@1 averaged across these languages. Additionally, our model demonstrates cross-lingual speech and text matching, which is further enhanced by readily available machine translation data.
△ Less
Submitted 3 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
A Self-Commissioning Edge Computing Method for Data-Driven Anomaly Detection in Power Electronic Systems
Authors:
Pere Izquierdo Gomez,
Miguel E. Lopez Gajardo,
Nenad Mijatovic,
Tomislav Dragicevic
Abstract:
Ensuring the reliability of power electronic converters is a matter of great importance, and data-driven condition monitoring techniques are cementing themselves as an important tool for this purpose. However, translating methods that work well in controlled lab environments to field applications presents significant challenges, notably because of the limited diversity and accuracy of the lab trai…
▽ More
Ensuring the reliability of power electronic converters is a matter of great importance, and data-driven condition monitoring techniques are cementing themselves as an important tool for this purpose. However, translating methods that work well in controlled lab environments to field applications presents significant challenges, notably because of the limited diversity and accuracy of the lab training data. By enabling the use of field data, online machine learning can be a powerful tool to overcome this problem, but it introduces additional challenges in ensuring the stability and predictability of the training processes. This work presents an edge computing method that mitigates these shortcomings with minimal additional memory usage, by employing an autonomous algorithm that prioritizes the storage of training samples with larger prediction errors. The method is demonstrated on the use case of a self-commissioning condition monitoring system, in the form of a thermal anomaly detection scheme for a variable frequency motor drive, where the algorithm self-learned to distinguish normal and anomalous operation with minimal prior knowledge. The obtained results, based on experimental data, show a significant improvement in prediction accuracy and training speed, when compared to equivalent models trained online without the proposed data selection process.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles
Authors:
Fernando López,
Jordi Luque,
Carlos Segura,
Pablo Gómez
Abstract:
Voice-based interfaces rely on a wake-up word mechanism to initiate communication with devices. However, achieving a robust, energy-efficient, and fast detection remains a challenge. This paper addresses these real production needs by enhancing data with temporal alignments and using detection based on two phases with multi-resolution. It employs two models: a lightweight on-device model for real-…
▽ More
Voice-based interfaces rely on a wake-up word mechanism to initiate communication with devices. However, achieving a robust, energy-efficient, and fast detection remains a challenge. This paper addresses these real production needs by enhancing data with temporal alignments and using detection based on two phases with multi-resolution. It employs two models: a lightweight on-device model for real-time processing of the audio stream and a verification model on the server-side, which is an ensemble of heterogeneous architectures that refine detection. This scheme allows the optimization of two operating points. To protect privacy, audio features are sent to the cloud instead of raw audio. The study investigated different parametric configurations for feature extraction to select one for on-device detection and another for the verification model. Furthermore, thirteen different audio classifiers were compared in terms of performance and inference time. The proposed ensemble outperforms our stronger classifier in every noise condition.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
PAseos Simulates the Environment for Operating multiple Spacecraft
Authors:
Pablo Gómez,
Johan Östman,
Vinutha Magal Shreenath,
Gabriele Meoni
Abstract:
The next generation of spacecraft is anticipated to enable various new applications involving onboard processing, machine learning and decentralised operational scenarios. Even though many of these have been previously proposed and evaluated, the operational constraints of real mission scenarios are often either not considered or only rudimentary. Here, we present an open-source Python module call…
▽ More
The next generation of spacecraft is anticipated to enable various new applications involving onboard processing, machine learning and decentralised operational scenarios. Even though many of these have been previously proposed and evaluated, the operational constraints of real mission scenarios are often either not considered or only rudimentary. Here, we present an open-source Python module called PASEOS that is capable of modelling operational scenarios involving one or multiple spacecraft. It considers several physical phenomena including thermal, power, bandwidth and communications constraints as well as the impact of radiation on spacecraft. PASEOS can be run both as a high-performance-oriented numerical simulation and/or in a real-time mode directly on edge hardware. We demonstrate these capabilities in three scenarios, one in real-time simulation on a Unibap iX-10 100 satellite processor, another in a simulation modelling an entire constellation performing tasks over several hours and one training a machine learning model in a decentralised setting. While we demonstrate tasks in Earth orbit, PASEOS is conceptually designed to allow deep space scenarios too. Our results show that PASEOS can model the described scenarios efficiently and thus provide insight into operational considerations. We show this in terms of runtime and overhead as well as by investigating the modelled temperature, battery status and communication windows of a constellation. By running PASEOS on an actual satellite processor, we showcase how PASEOS can be directly included in hardware demonstrators for future missions. Overall, we provide the first solution to holistically model the physical constraints spacecraft encounter in Earth orbit and beyond. The PASEOS module is available open-source online together with an extensive documentation to enable researchers to quickly incorporate it in their studies.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Data-Driven Thermal Modelling for Anomaly Detection in Electric Vehicle Charging Stations
Authors:
Pere Izquierdo Gómez,
Alberto Barragan Moreno,
Jun Lin,
Tomislav Dragičević
Abstract:
The rapid growth of the electric vehicle (EV) sector is giving rise to many infrastructural challenges. One such challenge is its requirement for the widespread development of EV charging stations which must be able to provide large amounts of power in an on-demand basis. This can cause large stresses on the electrical and electronic components of the charging infrastructure - negatively affecting…
▽ More
The rapid growth of the electric vehicle (EV) sector is giving rise to many infrastructural challenges. One such challenge is its requirement for the widespread development of EV charging stations which must be able to provide large amounts of power in an on-demand basis. This can cause large stresses on the electrical and electronic components of the charging infrastructure - negatively affecting its reliability as well as leading to increased maintenance and operation costs. This paper proposes a human-interpretable data-driven method for anomaly detection in EV charging stations, aiming to provide information for the condition monitoring and predictive maintenance of power converters within such a station. To this end, a model of a high-efficiency EV charging station is used to simulate the thermal behaviour of EV charger power converter modules, creating a data set for the training of neural network models. These machine learning models are then employed for the identification of anomalous performance.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Speech Enhancement for Wake-Up-Word detection in Voice Assistants
Authors:
David Bonet,
Guillermo Cámbara,
Fernando López,
Pablo Gómez,
Carlos Segura,
Jordi Luque
Abstract:
Keyword spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants. A very common issue of voice assistants is that they get easily activated by background noise like music, TV or background speech that accidentally triggers the device. In this paper, we propose a Speech Enhancement (SE) model adapted to the task of WUW detection that aims at increasing t…
▽ More
Keyword spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants. A very common issue of voice assistants is that they get easily activated by background noise like music, TV or background speech that accidentally triggers the device. In this paper, we propose a Speech Enhancement (SE) model adapted to the task of WUW detection that aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises. The SE model is a fully-convolutional denoising auto-encoder at waveform level and is trained using a log-Mel Spectrogram and waveform reconstruction losses together with the BCE loss of a simple WUW classification network. A new database has been purposely prepared for the task of recognizing the WUW in challenging conditions containing negative samples that are very phonetically similar to the keyword. The database is extended with public databases and an exhaustive data augmentation to simulate different noises and environments. The results obtained by concatenating the SE with a simple and state-of-the-art WUW detectors show that the SE does not have a negative impact on the recognition rate in quiet environments while increasing the performance in the presence of noise, especially when the SE and WUW detector are trained jointly end-to-end.
△ Less
Submitted 29 January, 2021;
originally announced January 2021.
-
Deep learning-based parameter map** for joint relaxation and diffusion tensor MR Fingerprinting
Authors:
Carolin M. Pirkl,
Pedro A. Gómez,
Ilona Lipp,
Guido Buonincontri,
Miguel Molina-Romero,
Anjany Sekuboyina,
Diana Waldmannstetter,
Jonathan Dannenberg,
Sebastian Endt,
Alberto Merola,
Joseph R. Whittaker,
Valentina Tomassini,
Michela Tosetti,
Derek K. Jones,
Bjoern H. Menze,
Marion I. Menzel
Abstract:
Magnetic Resonance Fingerprinting (MRF) enables the simultaneous quantification of multiple properties of biological tissues. It relies on a pseudo-random acquisition and the matching of acquired signal evolutions to a precomputed dictionary. However, the dictionary is not scalable to higher-parametric spaces, limiting MRF to the simultaneous map** of only a small number of parameters (proton de…
▽ More
Magnetic Resonance Fingerprinting (MRF) enables the simultaneous quantification of multiple properties of biological tissues. It relies on a pseudo-random acquisition and the matching of acquired signal evolutions to a precomputed dictionary. However, the dictionary is not scalable to higher-parametric spaces, limiting MRF to the simultaneous map** of only a small number of parameters (proton density, T1 and T2 in general). Inspired by diffusion-weighted SSFP imaging, we present a proof-of-concept of a novel MRF sequence with embedded diffusion-encoding gradients along all three axes to efficiently encode orientational diffusion and T1 and T2 relaxation. We take advantage of a convolutional neural network (CNN) to reconstruct multiple quantitative maps from this single, highly undersampled acquisition. We bypass expensive dictionary matching by learning the implicit physical relationships between the spatiotemporal MRF data and the T1, T2 and diffusion tensor parameters. The predicted parameter maps and the derived scalar diffusion metrics agree well with state-of-the-art reference protocols. Orientational diffusion information is captured as seen from the estimated primary diffusion directions. In addition to this, the joint acquisition and reconstruction framework proves capable of preserving tissue abnormalities in multiple sclerosis lesions.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
Rapid three-dimensional multiparametric MRI with quantitative transient-state imaging
Authors:
Pedro A. Gómez,
Matteo Cencini,
Mohammad Golbabaee,
Rolf F. Schulte,
Carolin Pirkl,
Izabela Horvath,
Giada Fallo,
Luca Peretti,
Michela Tosetti,
Bjoern H. Menze,
Guido Buonincontri
Abstract:
Novel methods for quantitative, transient-state multiparametric imaging are increasingly being demonstrated for assessment of disease and treatment efficacy. Here, we build on these by assessing the most common Non-Cartesian readout trajectories (2D/3D radials and spirals), demonstrating efficient anti-aliasing with a k-space view-sharing technique, and proposing novel methods for parameter infere…
▽ More
Novel methods for quantitative, transient-state multiparametric imaging are increasingly being demonstrated for assessment of disease and treatment efficacy. Here, we build on these by assessing the most common Non-Cartesian readout trajectories (2D/3D radials and spirals), demonstrating efficient anti-aliasing with a k-space view-sharing technique, and proposing novel methods for parameter inference with neural networks that incorporate the estimation of proton density. Our results show good agreement with gold standard and phantom references for all readout trajectories at 1.5T and 3T. Parameters inferred with the neural network were within 6.58% difference from the parameters inferred with a high-resolution dictionary. Concordance correlation coefficients were above 0.92 and the normalized root mean squared error ranged between 4.2% - 12.7% with respect to gold-standard phantom references for T1 and T2. In vivo acquisitions demonstrate sub-millimetric isotropic resolution in under five minutes with reconstruction and inference times < 7 minutes. Our 3D quantitative transient-state imaging approach could enable high-resolution multiparametric tissue quantification within clinically acceptable acquisition and reconstruction times.
△ Less
Submitted 27 June, 2020; v1 submitted 20 January, 2020;
originally announced January 2020.
-
A Fully Convolutional Network for MR Fingerprinting
Authors:
Dongdong Chen,
Mohammad Golbabaee,
Pedro A. Gomez,
Marion I. Menzel,
Mike E. Davies
Abstract:
Magnetic Resonance Fingerprinting (MRF) methods typically rely on dictionary matching to map the temporal MRF signals to quantitative tissue parameters. These methods suffer from heavy storage and computation requirements as the dictionary size grows. To address these issues, we proposed an end to end fully convolutional neural network for MRF reconstruction (MRF-FCNN), which firstly employ linear…
▽ More
Magnetic Resonance Fingerprinting (MRF) methods typically rely on dictionary matching to map the temporal MRF signals to quantitative tissue parameters. These methods suffer from heavy storage and computation requirements as the dictionary size grows. To address these issues, we proposed an end to end fully convolutional neural network for MRF reconstruction (MRF-FCNN), which firstly employ linear dimensionality reduction and then use neural network to project the data into the tissue parameters manifold space. Experiments on the MAGIC data demonstrate the effectiveness of the method.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Multi-shot Echo Planar Imaging for accelerated Cartesian MR Fingerprinting: an alternative to conventional spiral MR Fingerprinting
Authors:
Arnold Julian Vinoj Benjamin,
Pedro A. Gomez,
Mohammad Golbabaee,
Zaid Mahbub,
Tim Sprenger,
Marion I. Menzel Michael Davies,
Ian Marshall
Abstract:
Purpose: To develop an accelerated Cartesian MRF implementation using a multi-shot EPI sequence for rapid simultaneous quantification of T1 and T2 parameters. Methods: The proposed Cartesian MRF method involved the acquisition of highly subsampled MR images using a 16-shot EPI readout. A linearly varying flip angle train was used for rapid, simultaneous T1 and T2 quantification. The accuracy of pa…
▽ More
Purpose: To develop an accelerated Cartesian MRF implementation using a multi-shot EPI sequence for rapid simultaneous quantification of T1 and T2 parameters. Methods: The proposed Cartesian MRF method involved the acquisition of highly subsampled MR images using a 16-shot EPI readout. A linearly varying flip angle train was used for rapid, simultaneous T1 and T2 quantification. The accuracy of parametric map estimations were improved by using an iterative projection algorithm. The results were compared to a conventional spiral MRF implementation. The acquisition time per slice was 8s and this method was validated on a phantom and a healthy volunteer brain in vivo. Results: Joint T1 and T2 estimations using the 16-shot EPI readout are in good agreement with the spiral implementation using the same acquisition parameters (deviation less than 3% for T1 and less than 4% for T2) for the healthy volunteer brain. The T1 and T2 values also agree with the conventional values previously reported in the literature. The visual quality of the multi-parametric maps generated by the multi-shot EPI-MRF and spiral-MRF implementations were comparable. Conclusion: The multi-shot EPI-MRF method generated accurate quantitative multi-parametric maps similar to conventional Spiral - MRF. This multi-shot approach achieved provides an alternative for performing MRF using an accelerated Cartesian readout, thereby increasing the potential usability of MRF.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Designing contrasts for rapid, simultaneous parameter quantification and flow visualization with quantitative transient-state imaging
Authors:
Pedro A. Gómez,
Miguel Molina-Romero,
Guido Buonincontri,
Marion I. Menzel,
Bjoern H. Menze
Abstract:
Magnetic resonance imaging (MRI) is a remarkably powerful diagnostic technique: it generates wide-ranging information for the non-invasive study of tissue anatomy and physiology. Complementary data is normally obtained in separate measurements, either as contrast-weighted images, which are fast and simple to acquire, or as quantitative parametric maps, which offer an absolute quantification of und…
▽ More
Magnetic resonance imaging (MRI) is a remarkably powerful diagnostic technique: it generates wide-ranging information for the non-invasive study of tissue anatomy and physiology. Complementary data is normally obtained in separate measurements, either as contrast-weighted images, which are fast and simple to acquire, or as quantitative parametric maps, which offer an absolute quantification of underlying biophysical effects, such as relaxation times or flow. Here, we demonstrate how to acquire and reconstruct data in a transient-state with a dual purpose: 1 - to generate contrast-weighted images that can be adjusted to emphasise clinically relevant image biomarkers; exemplified with signal modulation according to flow to obtain angiography information, and 2 - to simultaneously infer multiple quantitative parameters with a single, highly accelerated acquisition. This is a achieved by introducing three novel elements: a model that accounts for flowing blood, a method for sequence design that incorporates both parameter encoding and signal contrast, and the reconstruction of temporally resolved contrast-weighted images. From these images we simultaneously obtain angiography projections and multiple quantitative maps. By doing so, we increase the amount of clinically relevant data without adding measurement time, creating new dimensions for biomarker exploration and adding value to MR examinations for patients and clinicians alike.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Greedy Approximate Projection for Magnetic Resonance Fingerprinting with Partial Volumes
Authors:
Roberto Duarte,
Audrey Repetti,
Pedro A. Gómez,
Mike Davies,
Yves Wiaux
Abstract:
In quantitative Magnetic Resonance Imaging, traditional methods suffer from the so-called Partial Volume Effect (PVE) due to spatial resolution limitations. As a consequence of PVE, the parameters of the voxels containing more than one tissue are not correctly estimated. Magnetic Resonance Fingerprinting (MRF) is not an exception. The existing methods addressing PVE are neither scalable nor accura…
▽ More
In quantitative Magnetic Resonance Imaging, traditional methods suffer from the so-called Partial Volume Effect (PVE) due to spatial resolution limitations. As a consequence of PVE, the parameters of the voxels containing more than one tissue are not correctly estimated. Magnetic Resonance Fingerprinting (MRF) is not an exception. The existing methods addressing PVE are neither scalable nor accurate. We propose to formulate the recovery of multiple tissues per voxel as a nonconvex constrained least-squares minimisation problem. To solve this problem, we develop a memory efficient, greedy approximate projected gradient descent algorithm, dubbed GAP-MRF. Our method adaptively finds the regions of interest on the manifold of fingerprints defined by the MRF sequence. We generalise our method to compensate for phase errors appearing in the model, using an alternating minimisation approach. We show, through simulations on synthetic data with PVE, that our algorithm outperforms state-of-the-art methods. Our approach is validated on the EUROSPIN phantom and on in vivo datasets.
△ Less
Submitted 28 November, 2018; v1 submitted 18 July, 2018;
originally announced July 2018.