Search | arXiv e-print repository

arXiv:2312.09387 [pdf, other]

High-Resolution Maps of Left Atrial Displacements and Strains Estimated with 3D CINE MRI and Unsupervised Neural Networks

Authors: Christoforos Galazis, Samuel Shepperd, Emma Brouwer, Sandro Queirós, Ebraham Alskaf, Mustafa Anjari, Amedeo Chiribiri, Jack Lee, Anil A. Bharath, Marta Varela

Abstract: The functional analysis of the left atrium (LA) is important for evaluating cardiac health and understanding diseases like atrial fibrillation. Cine MRI is ideally placed for the detailed 3D characterisation of LA motion and deformation, but it is lacking appropriate acquisition and analysis tools. In this paper, we present Analysis for Left Atrial Displacements and Deformations using unsupervIsed… ▽ More The functional analysis of the left atrium (LA) is important for evaluating cardiac health and understanding diseases like atrial fibrillation. Cine MRI is ideally placed for the detailed 3D characterisation of LA motion and deformation, but it is lacking appropriate acquisition and analysis tools. In this paper, we present Analysis for Left Atrial Displacements and Deformations using unsupervIsed neural Networks, \textit{Aladdin}, to automatically and reliably characterise regional LA deformations from high-resolution 3D Cine MRI. The tool includes: an online few-shot segmentation network (Aladdin-S), an online unsupervised image registration network (Aladdin-R), and a strain calculations pipeline tailored to the LA. We create maps of LA Displacement Vector Field (DVF) magnitude and LA principal strain values from images of 10 healthy volunteers and 8 patients with cardiovascular disease (CVD). We additionally create an atlas of these biomarkers using the data from the healthy volunteers. Aladdin is able to accurately track the LA wall across the cardiac cycle and characterize its motion and deformation. The overall DVF magnitude and principal strain values are significantly higher in the healthy group vs CVD patients: $2.85 \pm 1.59~mm$ and $0.09 \pm 0.05$ vs $1.96 \pm 0.74~mm$ and $0.03 \pm 0.04$, respectively. The time course of these metrics is also different in the two groups, with a more marked active contraction phase observed in the healthy cohort. Finally, utilizing the LA atlas allows us to identify regional deviations from the population distribution that may indicate focal tissue abnormalities. The proposed tool for the quantification of novel regional LA deformation biomarkers should have important clinical applications. The source code, anonymized images, generated maps and atlas are publicly available: https://github.com/cgalaz01/aladdin_cmr_la. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2309.02179 [pdf, other]

High-resolution 3D Maps of Left Atrial Displacements using an Unsupervised Image Registration Neural Network

Authors: Christoforos Galazis, Anil Anthony Bharath, Marta Varela

Abstract: Functional analysis of the left atrium (LA) plays an increasingly important role in the prognosis and diagnosis of cardiovascular diseases. Echocardiography-based measurements of LA dimensions and strains are useful biomarkers, but they provide an incomplete picture of atrial deformations. High-resolution dynamic magnetic resonance images (Cine MRI) offer the opportunity to examine LA motion and d… ▽ More Functional analysis of the left atrium (LA) plays an increasingly important role in the prognosis and diagnosis of cardiovascular diseases. Echocardiography-based measurements of LA dimensions and strains are useful biomarkers, but they provide an incomplete picture of atrial deformations. High-resolution dynamic magnetic resonance images (Cine MRI) offer the opportunity to examine LA motion and deformation in 3D, at higher spatial resolution and with full LA coverage. However, there are no dedicated tools to automatically characterise LA motion in 3D. Thus, we propose a tool that automatically segments the LA and extracts the displacement fields across the cardiac cycle. The pipeline is able to accurately track the LA wall across the cardiac cycle with an average Hausdorff distance of $2.51 \pm 1.3~mm$ and Dice score of $0.96 \pm 0.02$. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Journal ref: Medical Imaging with Deep Learning, short paper track, 2023

arXiv:2210.07934 [pdf, ps, other]

Codes, Patterns and Shapes of Contemporary Online Antisemitism and Conspiracy Narratives -- an Annotation Guide and Labeled German-Language Dataset in the Context of COVID-19

Authors: Elisabeth Steffen, Helena Mihaljević, Milena Pustet, Nyco Bischoff, María do Mar Castro Varela, Yener Bayramoğlu, Bahar Oghalai

Abstract: Over the course of the COVID-19 pandemic, existing conspiracy theories were refreshed and new ones were created, often interwoven with antisemitic narratives, stereotypes and codes. The sheer volume of antisemitic and conspiracy theory content on the Internet makes data-driven algorithmic approaches essential for anti-discrimination organizations and researchers alike. However, the manifestation a… ▽ More Over the course of the COVID-19 pandemic, existing conspiracy theories were refreshed and new ones were created, often interwoven with antisemitic narratives, stereotypes and codes. The sheer volume of antisemitic and conspiracy theory content on the Internet makes data-driven algorithmic approaches essential for anti-discrimination organizations and researchers alike. However, the manifestation and dissemination of these two interrelated phenomena is still quite under-researched in scholarly empirical research of large text corpora. Algorithmic approaches for the detection and classification of specific contents usually require labeled datasets, annotated based on conceptually sound guidelines. While there is a growing number of datasets for the more general phenomenon of hate speech, the development of corpora and annotation guidelines for antisemitic and conspiracy content is still in its infancy, especially for languages other than English. We contribute to closing this gap by develo** an annotation guide for antisemitic and conspiracy theory online content in the context of the COVID-19 pandemic. We provide working definitions, including specific forms of antisemitism such as encoded and post-Holocaust antisemitism. We use these to annotate a German-language dataset consisting of ~3,700 Telegram messages sent between 03/2020 and 12/2021. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Link to the data sheet of the dataset: https://doi.org/10.5281/zenodo.6412114

arXiv:2208.01991 [pdf, other]

How to Configure Masked Event Anomaly Detection on Software Logs?

Authors: Jesse Nyyssölä, Mika Mäntylä, Martín Varela

Abstract: Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term… ▽ More Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term memory (LSTM) and convolutional neural network (CNN). For datasets we used four datasets Profilence, BlueGene/L (BGL), Hadoop Distributed File System (HDFS) and Hadoop. Other settings are the size of the sliding window which determines how many surrounding events we are using to predict a given event, mask position (the position within the window we are predicting), the usage of only unique sequences, and the portion of data that is used for training. The results show clear indications of settings that can be generalized across datasets. The performance of the DL models does not deteriorate as the window size increases while the N-Gram model shows worse performance with large window sizes on the BGL and Profilence datasets. Despite the popularity of Next Event Prediction, the results show that in this context it is better not to predict events at the edges of the subsequence, i.e., first or last event, with the best result coming from predicting the fourth event when the window size is five. Regarding the amount of data used for training, the results show differences across datasets and models. For example, the N-Gram model appears to be more sensitive toward the lack of data than the DL models. Overall, for similar experimental setups we suggest the following general baseline: Window size 10, mask position second to last, do not filter out non-unique sequences, and use a half of the total data for training. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: Accepted to the New Ideas and Emerging Results (NIER) track of the 38th IEEE International Conference on Software Maintenance and Evolution (ICSME)

arXiv:2203.00355 [pdf, other]

doi 10.1007/978-3-030-93722-5_29

Tempera: Spatial Transformer Feature Pyramid Network for Cardiac MRI Segmentation

Authors: Christoforos Galazis, Huiyi Wu, Zhuoyu Li, Camille Petri, Anil A. Bharath, Marta Varela

Abstract: Assessing the structure and function of the right ventricle (RV) is important in the diagnosis of several cardiac pathologies. However, it remains more challenging to segment the RV than the left ventricle (LV). In this paper, we focus on segmenting the RV in both short (SA) and long-axis (LA) cardiac MR images simultaneously. For this task, we propose a new multi-input/output architecture, hybrid… ▽ More Assessing the structure and function of the right ventricle (RV) is important in the diagnosis of several cardiac pathologies. However, it remains more challenging to segment the RV than the left ventricle (LV). In this paper, we focus on segmenting the RV in both short (SA) and long-axis (LA) cardiac MR images simultaneously. For this task, we propose a new multi-input/output architecture, hybrid 2D/3D geometric spatial TransformEr Multi-Pass fEature pyRAmid (Tempera). Our feature pyramid extends current designs by allowing not only a multi-scale feature output but multi-scale SA and LA input images as well. Tempera transfers learned features between SA and LA images via layer weight sharing and incorporates a geometric target transformer to map the predicted SA segmentation to LA space. Our model achieves an average Dice score of 0.836 and 0.798 for the SA and LA, respectively, and 26.31 mm and 31.19 mm Hausdorff distances. This opens up the potential for the incorporation of RV segmentation models into clinical workflows. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Journal ref: Statistical Atlases and Computational Models of the Heart. Multi-Disease, Multi-View, and Multi-Center Right Ventricular Segmentation in Cardiac MRI Challenge. STACOM 2021. Lecture Notes in Computer Science, vol 13131

arXiv:2202.09214 [pdf, other]

Pinpointing Anomaly Events in Logs from Stability Testing -- N-Grams vs. Deep-Learning

Authors: Mika Mäntylä, Martín Varela, Shayan Hashemi

Abstract: As stability testing execution logs can be very long, software engineers need help in locating anomalous events. We develop and evaluate two models for scoring individual log-events for anomalousness, namely an N-Gram model and a Deep Learning model with LSTM (Long short-term memory). Both are trained on normal log sequences only. We evaluate the models with long log sequences of Android stability… ▽ More As stability testing execution logs can be very long, software engineers need help in locating anomalous events. We develop and evaluate two models for scoring individual log-events for anomalousness, namely an N-Gram model and a Deep Learning model with LSTM (Long short-term memory). Both are trained on normal log sequences only. We evaluate the models with long log sequences of Android stability testing in our company case and with short log sequences from HDFS (Hadoop Distributed File System) public dataset. We evaluate next event prediction accuracy and computational efficiency. The LSTM model is more accurate in stability testing logs (0.848 vs 0.865), whereas in HDFS logs the N-Gram is slightly more accurate (0.904 vs 0.900). The N-Gram model has far superior computational efficiency compared to the Deep model (4 to 13 seconds vs 16 minutes to nearly 4 hours), making it the preferred choice for our case company. Scoring individual log events for anomalousness seems like a good aid for root cause analysis of failing test cases, and our case company plans to add it to its online services. Despite the recent surge in using deep learning in software system anomaly detection, we found limited benefits in doing so. However, future work should consider whether our finding holds with different LSTM-model hyper-parameters, other datasets, and with other deep-learning approaches that promise better accuracy and computational efficiency than LSTM based models. △ Less

Submitted 23 February, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: Accepted to 5th Workshop on NEXt level of Test Automation (NEXTA), ICST Workshops 2022

arXiv:2108.04016 [pdf, other]

Deep Learning methods for automatic evaluation of delayed enhancement-MRI. The results of the EMIDEC challenge

Authors: Alain Lalande, Zhihao Chen, Thibaut Pommier, Thomas Decourselle, Abdul Qayyum, Michel Salomon, Dominique Ginhac, Youssef Skandarani, Arnaud Boucher, Khawla Brahim, Marleen de Bruijne, Robin Camarasa, Teresa M. Correia, Xue Feng, Kibrom B. Girum, Anja Hennemuth, Markus Huellebrand, Raabid Hussain, Matthias Ivantsits, Jun Ma, Craig Meyer, Rishabh Sharma, Jixi Shi, Nikolaos V. Tsekos, Marta Varela , et al. (8 additional authors not shown)

Abstract: A key factor for assessing the state of the heart after myocardial infarction (MI) is to measure whether the myocardium segment is viable after reperfusion or revascularization therapy. Delayed enhancement-MRI or DE-MRI, which is performed several minutes after injection of the contrast agent, provides high contrast between viable and nonviable myocardium and is therefore a method of choice to eva… ▽ More A key factor for assessing the state of the heart after myocardial infarction (MI) is to measure whether the myocardium segment is viable after reperfusion or revascularization therapy. Delayed enhancement-MRI or DE-MRI, which is performed several minutes after injection of the contrast agent, provides high contrast between viable and nonviable myocardium and is therefore a method of choice to evaluate the extent of MI. To automatically assess myocardial status, the results of the EMIDEC challenge that focused on this task are presented in this paper. The challenge's main objectives were twofold. First, to evaluate if deep learning methods can distinguish between normal and pathological cases. Second, to automatically calculate the extent of myocardial infarction. The publicly available database consists of 150 exams divided into 50 cases with normal MRI after injection of a contrast agent and 100 cases with myocardial infarction (and then with a hyperenhanced area on DE-MRI), whatever their inclusion in the cardiac emergency department. Along with MRI, clinical characteristics are also provided. The obtained results issued from several works show that the automatic classification of an exam is a reachable task (the best method providing an accuracy of 0.92), and the automatic segmentation of the myocardium is possible. However, the segmentation of the diseased area needs to be improved, mainly due to the small size of these areas and the lack of contrast with the surrounding structures. △ Less

Submitted 10 August, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: Submitted to Medical Image Analysis

arXiv:2003.12742 [pdf, other]

From QoS Distributions to QoE Distributions: a System's Perspective

Authors: Tobias Hossfeld, Poul E. Heegaard, Martin Varela, Lea Skorin-Kapov, Markus Fiedler

Abstract: In the context of QoE management, network and service providers commonly rely on models that map system QoS conditions (e.g., system response time, paket loss, etc.) to estimated end user QoE values. Observable QoS conditions in the system may be assumed to follow a certain distribution, meaning that different end users will experience different conditions. On the other hand, drawing from the resu… ▽ More In the context of QoE management, network and service providers commonly rely on models that map system QoS conditions (e.g., system response time, paket loss, etc.) to estimated end user QoE values. Observable QoS conditions in the system may be assumed to follow a certain distribution, meaning that different end users will experience different conditions. On the other hand, drawing from the results of subjective user studies, we know that user diversity leads to distributions of user scores for any given test conditions (in this case referring to the QoS parameters of interest). Our previous studies have shown that to correctly derive various QoE metrics (e.g., Mean Opinion Score (MOS), quantiles, probability of users rating "good or better", etc.) in a system under given conditions, there is a need to consider rating distributions obtained from user studies, which are often times not available. In this paper we extend these findings to show how to approximate user rating distributions given a QoS-to-MOS map** function and second order statistics. Such a user rating distribution may then be combined with a QoS distribution observed in a system to finally derive corresponding distributions of QoE scores. We provide two examples to illustrate this process: 1) analytical results using a Web QoE model relating waiting times to QoE, and 2) numerical results using measurements relating packet losses to video stall pattern, which are in turn mapped to QoE estimates. The results in this paper provide a solution to the problem of understanding the QoE distribution in a system, in cases where the necessary data is not directly available in the form of models going beyond the MOS, or where the full details of subjective experiments are not available. △ Less

Submitted 28 March, 2020; originally announced March 2020.

Comments: 4th International Workshop on Quality of Experience Management (QoE Management 2020), featured by IEEE Conference on Network Softwarization (IEEE NetSoft 2020), Ghent, Belgium

arXiv:1806.01126 [pdf, ps, other]

Confidence Interval Estimators for MOS Values

Authors: Tobias Hossfeld, Poul E. Heegaard, Martin Varela, Lea Skorin-Kapov

Abstract: For the quantification of QoE, subjects often provide individual rating scores on certain rating scales which are then aggregated into Mean Opinion Scores (MOS). From the observed sample data, the expected value is to be estimated. While the sample average only provides a point estimator, confidence intervals (CI) are an interval estimate which contains the desired expected value with a given conf… ▽ More For the quantification of QoE, subjects often provide individual rating scores on certain rating scales which are then aggregated into Mean Opinion Scores (MOS). From the observed sample data, the expected value is to be estimated. While the sample average only provides a point estimator, confidence intervals (CI) are an interval estimate which contains the desired expected value with a given confidence level. In subjective studies, the number of subjects performing the test is typically small, especially in lab environments. The used rating scales are bounded and often discrete like the 5-point ACR rating scale. Therefore, we review statistical approaches in the literature for their applicability in the QoE domain for MOS interval estimation (instead of having only a point estimator, which is the MOS). We provide a conservative estimator based on the SOS hypothesis and binomial distributions and compare its performance (CI width, outlier ratio of CI violating the rating scale bounds) and coverage probability with well known CI estimators. We show that the provided CI estimator works very well in practice for MOS interval estimators, while the commonly used studentized CIs suffer from a positive outlier ratio, i.e., CIs beyond the bounds of the rating scale. As an alternative, bootstrap**, i.e., random sampling of the subjective ratings with replacement, is an efficient CI estimator leading to typically smaller CIs, but lower coverage than the proposed estimator. △ Less

Submitted 4 June, 2018; originally announced June 2018.

arXiv:1607.00321 [pdf, ps, other]

doi 10.1007/s41233-016-0002-1

Formal Definition of QoE Metrics

Authors: Tobias Hossfeld, Poul E. Heegaard, Martin Varela, Sebastian Möller

Abstract: This technical report formally defines the QoE metrics which are introduced and discussed in the article "QoE Beyond the MOS: An In-Depth Look at QoE via Better Metrics and their Relation to MOS" by Tobias Hoßfeld, Poul E. Heegaard, Martin Varela, Sebastian Möller, accepted for publication in the Springer journal "Quality and User Experience". Matlab scripts for computing the QoE metrics for given… ▽ More This technical report formally defines the QoE metrics which are introduced and discussed in the article "QoE Beyond the MOS: An In-Depth Look at QoE via Better Metrics and their Relation to MOS" by Tobias Hoßfeld, Poul E. Heegaard, Martin Varela, Sebastian Möller, accepted for publication in the Springer journal "Quality and User Experience". Matlab scripts for computing the QoE metrics for given data sets are available in GitHub. △ Less

Submitted 1 July, 2016; originally announced July 2016.

Journal ref: Quality and User Experience (2016) 1: 2

arXiv:1212.6350 [pdf, other]

Single-sided Real-time PESQ Score Estimation

Authors: Sebastián Basterrech, Gerardo Rubino, Martín Varela

Abstract: For several years now, the ITU-T's Perceptual Evaluation of Speech Quality (PESQ) has been the reference for objective speech quality assessment. It is widely deployed in commercial QoE measurement products, and it has been well studied in the literature. While PESQ does provide reasonably good correlation with subjective scores for VoIP applications, the algorithm itself is not usable in a real-t… ▽ More For several years now, the ITU-T's Perceptual Evaluation of Speech Quality (PESQ) has been the reference for objective speech quality assessment. It is widely deployed in commercial QoE measurement products, and it has been well studied in the literature. While PESQ does provide reasonably good correlation with subjective scores for VoIP applications, the algorithm itself is not usable in a real-time context, since it requires a reference signal, which is usually not available in normal conditions. In this paper we provide an alternative technique for estimating PESQ scores in a single-sided fashion, based on the Pseudo Subjective Quality Assessment (PSQA) technique. △ Less

Submitted 27 December, 2012; originally announced December 2012.

Comments: In Proceeding of Measurement of Speech, Audio and Video Quality in Networks (MESAQIN'09), Prague, Czech Republic, June 2009, pp. 94-99

MSC Class: 82C32; 62P30; 62M20 ACM Class: C.4; D.4.4; I.5.1; B.4

Showing 1–11 of 11 results for author: Varela, M