-
High-Resolution Maps of Left Atrial Displacements and Strains Estimated with 3D CINE MRI and Unsupervised Neural Networks
Authors:
Christoforos Galazis,
Samuel Shepperd,
Emma Brouwer,
Sandro Queirós,
Ebraham Alskaf,
Mustafa Anjari,
Amedeo Chiribiri,
Jack Lee,
Anil A. Bharath,
Marta Varela
Abstract:
The functional analysis of the left atrium (LA) is important for evaluating cardiac health and understanding diseases like atrial fibrillation. Cine MRI is ideally placed for the detailed 3D characterisation of LA motion and deformation, but it is lacking appropriate acquisition and analysis tools. In this paper, we present Analysis for Left Atrial Displacements and Deformations using unsupervIsed…
▽ More
The functional analysis of the left atrium (LA) is important for evaluating cardiac health and understanding diseases like atrial fibrillation. Cine MRI is ideally placed for the detailed 3D characterisation of LA motion and deformation, but it is lacking appropriate acquisition and analysis tools. In this paper, we present Analysis for Left Atrial Displacements and Deformations using unsupervIsed neural Networks, \textit{Aladdin}, to automatically and reliably characterise regional LA deformations from high-resolution 3D Cine MRI. The tool includes: an online few-shot segmentation network (Aladdin-S), an online unsupervised image registration network (Aladdin-R), and a strain calculations pipeline tailored to the LA. We create maps of LA Displacement Vector Field (DVF) magnitude and LA principal strain values from images of 10 healthy volunteers and 8 patients with cardiovascular disease (CVD). We additionally create an atlas of these biomarkers using the data from the healthy volunteers. Aladdin is able to accurately track the LA wall across the cardiac cycle and characterize its motion and deformation. The overall DVF magnitude and principal strain values are significantly higher in the healthy group vs CVD patients: $2.85 \pm 1.59~mm$ and $0.09 \pm 0.05$ vs $1.96 \pm 0.74~mm$ and $0.03 \pm 0.04$, respectively. The time course of these metrics is also different in the two groups, with a more marked active contraction phase observed in the healthy cohort. Finally, utilizing the LA atlas allows us to identify regional deviations from the population distribution that may indicate focal tissue abnormalities. The proposed tool for the quantification of novel regional LA deformation biomarkers should have important clinical applications. The source code, anonymized images, generated maps and atlas are publicly available: https://github.com/cgalaz01/aladdin_cmr_la.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
High-resolution 3D Maps of Left Atrial Displacements using an Unsupervised Image Registration Neural Network
Authors:
Christoforos Galazis,
Anil Anthony Bharath,
Marta Varela
Abstract:
Functional analysis of the left atrium (LA) plays an increasingly important role in the prognosis and diagnosis of cardiovascular diseases. Echocardiography-based measurements of LA dimensions and strains are useful biomarkers, but they provide an incomplete picture of atrial deformations. High-resolution dynamic magnetic resonance images (Cine MRI) offer the opportunity to examine LA motion and d…
▽ More
Functional analysis of the left atrium (LA) plays an increasingly important role in the prognosis and diagnosis of cardiovascular diseases. Echocardiography-based measurements of LA dimensions and strains are useful biomarkers, but they provide an incomplete picture of atrial deformations. High-resolution dynamic magnetic resonance images (Cine MRI) offer the opportunity to examine LA motion and deformation in 3D, at higher spatial resolution and with full LA coverage. However, there are no dedicated tools to automatically characterise LA motion in 3D. Thus, we propose a tool that automatically segments the LA and extracts the displacement fields across the cardiac cycle. The pipeline is able to accurately track the LA wall across the cardiac cycle with an average Hausdorff distance of $2.51 \pm 1.3~mm$ and Dice score of $0.96 \pm 0.02$.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Codes, Patterns and Shapes of Contemporary Online Antisemitism and Conspiracy Narratives -- an Annotation Guide and Labeled German-Language Dataset in the Context of COVID-19
Authors:
Elisabeth Steffen,
Helena Mihaljević,
Milena Pustet,
Nyco Bischoff,
María do Mar Castro Varela,
Yener Bayramoğlu,
Bahar Oghalai
Abstract:
Over the course of the COVID-19 pandemic, existing conspiracy theories were refreshed and new ones were created, often interwoven with antisemitic narratives, stereotypes and codes. The sheer volume of antisemitic and conspiracy theory content on the Internet makes data-driven algorithmic approaches essential for anti-discrimination organizations and researchers alike. However, the manifestation a…
▽ More
Over the course of the COVID-19 pandemic, existing conspiracy theories were refreshed and new ones were created, often interwoven with antisemitic narratives, stereotypes and codes. The sheer volume of antisemitic and conspiracy theory content on the Internet makes data-driven algorithmic approaches essential for anti-discrimination organizations and researchers alike. However, the manifestation and dissemination of these two interrelated phenomena is still quite under-researched in scholarly empirical research of large text corpora. Algorithmic approaches for the detection and classification of specific contents usually require labeled datasets, annotated based on conceptually sound guidelines. While there is a growing number of datasets for the more general phenomenon of hate speech, the development of corpora and annotation guidelines for antisemitic and conspiracy content is still in its infancy, especially for languages other than English. We contribute to closing this gap by develo** an annotation guide for antisemitic and conspiracy theory online content in the context of the COVID-19 pandemic. We provide working definitions, including specific forms of antisemitism such as encoded and post-Holocaust antisemitism. We use these to annotate a German-language dataset consisting of ~3,700 Telegram messages sent between 03/2020 and 12/2021.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
How to Configure Masked Event Anomaly Detection on Software Logs?
Authors:
Jesse Nyyssölä,
Mika Mäntylä,
Martín Varela
Abstract:
Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term…
▽ More
Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term memory (LSTM) and convolutional neural network (CNN). For datasets we used four datasets Profilence, BlueGene/L (BGL), Hadoop Distributed File System (HDFS) and Hadoop. Other settings are the size of the sliding window which determines how many surrounding events we are using to predict a given event, mask position (the position within the window we are predicting), the usage of only unique sequences, and the portion of data that is used for training. The results show clear indications of settings that can be generalized across datasets. The performance of the DL models does not deteriorate as the window size increases while the N-Gram model shows worse performance with large window sizes on the BGL and Profilence datasets. Despite the popularity of Next Event Prediction, the results show that in this context it is better not to predict events at the edges of the subsequence, i.e., first or last event, with the best result coming from predicting the fourth event when the window size is five. Regarding the amount of data used for training, the results show differences across datasets and models. For example, the N-Gram model appears to be more sensitive toward the lack of data than the DL models. Overall, for similar experimental setups we suggest the following general baseline: Window size 10, mask position second to last, do not filter out non-unique sequences, and use a half of the total data for training.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Tempera: Spatial Transformer Feature Pyramid Network for Cardiac MRI Segmentation
Authors:
Christoforos Galazis,
Huiyi Wu,
Zhuoyu Li,
Camille Petri,
Anil A. Bharath,
Marta Varela
Abstract:
Assessing the structure and function of the right ventricle (RV) is important in the diagnosis of several cardiac pathologies. However, it remains more challenging to segment the RV than the left ventricle (LV). In this paper, we focus on segmenting the RV in both short (SA) and long-axis (LA) cardiac MR images simultaneously. For this task, we propose a new multi-input/output architecture, hybrid…
▽ More
Assessing the structure and function of the right ventricle (RV) is important in the diagnosis of several cardiac pathologies. However, it remains more challenging to segment the RV than the left ventricle (LV). In this paper, we focus on segmenting the RV in both short (SA) and long-axis (LA) cardiac MR images simultaneously. For this task, we propose a new multi-input/output architecture, hybrid 2D/3D geometric spatial TransformEr Multi-Pass fEature pyRAmid (Tempera). Our feature pyramid extends current designs by allowing not only a multi-scale feature output but multi-scale SA and LA input images as well. Tempera transfers learned features between SA and LA images via layer weight sharing and incorporates a geometric target transformer to map the predicted SA segmentation to LA space. Our model achieves an average Dice score of 0.836 and 0.798 for the SA and LA, respectively, and 26.31 mm and 31.19 mm Hausdorff distances. This opens up the potential for the incorporation of RV segmentation models into clinical workflows.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Pinpointing Anomaly Events in Logs from Stability Testing -- N-Grams vs. Deep-Learning
Authors:
Mika Mäntylä,
Martín Varela,
Shayan Hashemi
Abstract:
As stability testing execution logs can be very long, software engineers need help in locating anomalous events. We develop and evaluate two models for scoring individual log-events for anomalousness, namely an N-Gram model and a Deep Learning model with LSTM (Long short-term memory). Both are trained on normal log sequences only. We evaluate the models with long log sequences of Android stability…
▽ More
As stability testing execution logs can be very long, software engineers need help in locating anomalous events. We develop and evaluate two models for scoring individual log-events for anomalousness, namely an N-Gram model and a Deep Learning model with LSTM (Long short-term memory). Both are trained on normal log sequences only. We evaluate the models with long log sequences of Android stability testing in our company case and with short log sequences from HDFS (Hadoop Distributed File System) public dataset. We evaluate next event prediction accuracy and computational efficiency. The LSTM model is more accurate in stability testing logs (0.848 vs 0.865), whereas in HDFS logs the N-Gram is slightly more accurate (0.904 vs 0.900). The N-Gram model has far superior computational efficiency compared to the Deep model (4 to 13 seconds vs 16 minutes to nearly 4 hours), making it the preferred choice for our case company. Scoring individual log events for anomalousness seems like a good aid for root cause analysis of failing test cases, and our case company plans to add it to its online services. Despite the recent surge in using deep learning in software system anomaly detection, we found limited benefits in doing so. However, future work should consider whether our finding holds with different LSTM-model hyper-parameters, other datasets, and with other deep-learning approaches that promise better accuracy and computational efficiency than LSTM based models.
△ Less
Submitted 23 February, 2022; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Deep Learning methods for automatic evaluation of delayed enhancement-MRI. The results of the EMIDEC challenge
Authors:
Alain Lalande,
Zhihao Chen,
Thibaut Pommier,
Thomas Decourselle,
Abdul Qayyum,
Michel Salomon,
Dominique Ginhac,
Youssef Skandarani,
Arnaud Boucher,
Khawla Brahim,
Marleen de Bruijne,
Robin Camarasa,
Teresa M. Correia,
Xue Feng,
Kibrom B. Girum,
Anja Hennemuth,
Markus Huellebrand,
Raabid Hussain,
Matthias Ivantsits,
Jun Ma,
Craig Meyer,
Rishabh Sharma,
Jixi Shi,
Nikolaos V. Tsekos,
Marta Varela
, et al. (8 additional authors not shown)
Abstract:
A key factor for assessing the state of the heart after myocardial infarction (MI) is to measure whether the myocardium segment is viable after reperfusion or revascularization therapy. Delayed enhancement-MRI or DE-MRI, which is performed several minutes after injection of the contrast agent, provides high contrast between viable and nonviable myocardium and is therefore a method of choice to eva…
▽ More
A key factor for assessing the state of the heart after myocardial infarction (MI) is to measure whether the myocardium segment is viable after reperfusion or revascularization therapy. Delayed enhancement-MRI or DE-MRI, which is performed several minutes after injection of the contrast agent, provides high contrast between viable and nonviable myocardium and is therefore a method of choice to evaluate the extent of MI. To automatically assess myocardial status, the results of the EMIDEC challenge that focused on this task are presented in this paper. The challenge's main objectives were twofold. First, to evaluate if deep learning methods can distinguish between normal and pathological cases. Second, to automatically calculate the extent of myocardial infarction. The publicly available database consists of 150 exams divided into 50 cases with normal MRI after injection of a contrast agent and 100 cases with myocardial infarction (and then with a hyperenhanced area on DE-MRI), whatever their inclusion in the cardiac emergency department. Along with MRI, clinical characteristics are also provided. The obtained results issued from several works show that the automatic classification of an exam is a reachable task (the best method providing an accuracy of 0.92), and the automatic segmentation of the myocardium is possible. However, the segmentation of the diseased area needs to be improved, mainly due to the small size of these areas and the lack of contrast with the surrounding structures.
△ Less
Submitted 10 August, 2021; v1 submitted 9 August, 2021;
originally announced August 2021.
-
From QoS Distributions to QoE Distributions: a System's Perspective
Authors:
Tobias Hossfeld,
Poul E. Heegaard,
Martin Varela,
Lea Skorin-Kapov,
Markus Fiedler
Abstract:
In the context of QoE management, network and service providers commonly rely on models that map system QoS conditions (e.g., system response time, paket loss, etc.) to estimated end user QoE values. Observable QoS conditions in the system may be assumed to follow a certain distribution, meaning that different end users will experience different conditions. On the other hand, drawing from the resu…
▽ More
In the context of QoE management, network and service providers commonly rely on models that map system QoS conditions (e.g., system response time, paket loss, etc.) to estimated end user QoE values. Observable QoS conditions in the system may be assumed to follow a certain distribution, meaning that different end users will experience different conditions. On the other hand, drawing from the results of subjective user studies, we know that user diversity leads to distributions of user scores for any given test conditions (in this case referring to the QoS parameters of interest). Our previous studies have shown that to correctly derive various QoE metrics (e.g., Mean Opinion Score (MOS), quantiles, probability of users rating "good or better", etc.) in a system under given conditions, there is a need to consider rating distributions obtained from user studies, which are often times not available. In this paper we extend these findings to show how to approximate user rating distributions given a QoS-to-MOS map** function and second order statistics. Such a user rating distribution may then be combined with a QoS distribution observed in a system to finally derive corresponding distributions of QoE scores. We provide two examples to illustrate this process: 1) analytical results using a Web QoE model relating waiting times to QoE, and 2) numerical results using measurements relating packet losses to video stall pattern, which are in turn mapped to QoE estimates. The results in this paper provide a solution to the problem of understanding the QoE distribution in a system, in cases where the necessary data is not directly available in the form of models going beyond the MOS, or where the full details of subjective experiments are not available.
△ Less
Submitted 28 March, 2020;
originally announced March 2020.
-
Confidence Interval Estimators for MOS Values
Authors:
Tobias Hossfeld,
Poul E. Heegaard,
Martin Varela,
Lea Skorin-Kapov
Abstract:
For the quantification of QoE, subjects often provide individual rating scores on certain rating scales which are then aggregated into Mean Opinion Scores (MOS). From the observed sample data, the expected value is to be estimated. While the sample average only provides a point estimator, confidence intervals (CI) are an interval estimate which contains the desired expected value with a given conf…
▽ More
For the quantification of QoE, subjects often provide individual rating scores on certain rating scales which are then aggregated into Mean Opinion Scores (MOS). From the observed sample data, the expected value is to be estimated. While the sample average only provides a point estimator, confidence intervals (CI) are an interval estimate which contains the desired expected value with a given confidence level. In subjective studies, the number of subjects performing the test is typically small, especially in lab environments. The used rating scales are bounded and often discrete like the 5-point ACR rating scale. Therefore, we review statistical approaches in the literature for their applicability in the QoE domain for MOS interval estimation (instead of having only a point estimator, which is the MOS). We provide a conservative estimator based on the SOS hypothesis and binomial distributions and compare its performance (CI width, outlier ratio of CI violating the rating scale bounds) and coverage probability with well known CI estimators. We show that the provided CI estimator works very well in practice for MOS interval estimators, while the commonly used studentized CIs suffer from a positive outlier ratio, i.e., CIs beyond the bounds of the rating scale. As an alternative, bootstrap**, i.e., random sampling of the subjective ratings with replacement, is an efficient CI estimator leading to typically smaller CIs, but lower coverage than the proposed estimator.
△ Less
Submitted 4 June, 2018;
originally announced June 2018.
-
Formal Definition of QoE Metrics
Authors:
Tobias Hossfeld,
Poul E. Heegaard,
Martin Varela,
Sebastian Möller
Abstract:
This technical report formally defines the QoE metrics which are introduced and discussed in the article "QoE Beyond the MOS: An In-Depth Look at QoE via Better Metrics and their Relation to MOS" by Tobias Hoßfeld, Poul E. Heegaard, Martin Varela, Sebastian Möller, accepted for publication in the Springer journal "Quality and User Experience". Matlab scripts for computing the QoE metrics for given…
▽ More
This technical report formally defines the QoE metrics which are introduced and discussed in the article "QoE Beyond the MOS: An In-Depth Look at QoE via Better Metrics and their Relation to MOS" by Tobias Hoßfeld, Poul E. Heegaard, Martin Varela, Sebastian Möller, accepted for publication in the Springer journal "Quality and User Experience". Matlab scripts for computing the QoE metrics for given data sets are available in GitHub.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.
-
Single-sided Real-time PESQ Score Estimation
Authors:
Sebastián Basterrech,
Gerardo Rubino,
Martín Varela
Abstract:
For several years now, the ITU-T's Perceptual Evaluation of Speech Quality (PESQ) has been the reference for objective speech quality assessment. It is widely deployed in commercial QoE measurement products, and it has been well studied in the literature. While PESQ does provide reasonably good correlation with subjective scores for VoIP applications, the algorithm itself is not usable in a real-t…
▽ More
For several years now, the ITU-T's Perceptual Evaluation of Speech Quality (PESQ) has been the reference for objective speech quality assessment. It is widely deployed in commercial QoE measurement products, and it has been well studied in the literature. While PESQ does provide reasonably good correlation with subjective scores for VoIP applications, the algorithm itself is not usable in a real-time context, since it requires a reference signal, which is usually not available in normal conditions. In this paper we provide an alternative technique for estimating PESQ scores in a single-sided fashion, based on the Pseudo Subjective Quality Assessment (PSQA) technique.
△ Less
Submitted 27 December, 2012;
originally announced December 2012.