-
Learning to Taste: A Multimodal Wine Dataset
Authors:
Thoranna Bender,
Simon Moe Sørensen,
Alireza Kashani,
K. Eldjarn Hjorleifsson,
Grethe Hyldig,
Søren Hauberg,
Serge Belongie,
Frederik Warburg
Abstract:
We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique bottlings, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor anno…
▽ More
We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique bottlings, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.
△ Less
Submitted 15 January, 2024; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG
Authors:
Theresa Bender,
Philip Gemke,
Ennio Idrobo-Avila,
Henning Dathe,
Dagmar Krefting,
Nicolai Spicher
Abstract:
Electrocardiography analysis is widely used in various clinical applications and Deep Learning models for classification tasks are currently in the focus of research. Due to their data-driven character, they bear the potential to handle signal noise efficiently, but its influence on the accuracy of these methods is still unclear. Therefore, we benchmark the influence of four types of noise on the…
▽ More
Electrocardiography analysis is widely used in various clinical applications and Deep Learning models for classification tasks are currently in the focus of research. Due to their data-driven character, they bear the potential to handle signal noise efficiently, but its influence on the accuracy of these methods is still unclear. Therefore, we benchmark the influence of four types of noise on the accuracy of a Deep Learning-based method for atrial fibrillation detection in 12-lead electrocardiograms. We use a subset of a publicly available dataset (PTBXL) and use the metadata provided by human experts regarding noise for assigning a signal quality to each electrocardiogram. Furthermore, we compute a quantitative signal-to-noise ratio for each electrocardiogram. We analyze the accuracy of the Deep Learning model with respect to both metrics and observe that the method can robustly identify atrial fibrillation, even in cases signals are labelled by human experts as being noisy on multiple leads. False positive and false negative rates are slightly worse for data being labelled as noisy. Interestingly, data annotated as showing baseline drift noise results in an accuracy very similar to data without. We conclude that the issue of processing noisy electrocardiography data can be addressed successfully by Deep Learning methods that might not need preprocessing as many conventional methods do.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Analysis of a Deep Learning Model for 12-Lead ECG Classification Reveals Learned Features Similar to Diagnostic Criteria
Authors:
Theresa Bender,
Jacqueline Michelle Beinecke,
Dagmar Krefting,
Carolin Müller,
Henning Dathe,
Tim Seidler,
Nicolai Spicher,
Anne-Christin Hauschild
Abstract:
Despite their remarkable performance, deep neural networks remain unadopted in clinical practice, which is considered to be partially due to their lack in explainability. In this work, we apply attribution methods to a pre-trained deep neural network (DNN) for 12-lead electrocardiography classification to open this "black box" and understand the relationship between model prediction and learned fe…
▽ More
Despite their remarkable performance, deep neural networks remain unadopted in clinical practice, which is considered to be partially due to their lack in explainability. In this work, we apply attribution methods to a pre-trained deep neural network (DNN) for 12-lead electrocardiography classification to open this "black box" and understand the relationship between model prediction and learned features. We classify data from a public data set and the attribution methods assign a "relevance score" to each sample of the classified signals. This allows analyzing what the network learned during training, for which we propose quantitative methods: average relevance scores over a) classes, b) leads, and c) average beats. The analyses of relevance scores for atrial fibrillation (AF) and left bundle branch block (LBBB) compared to healthy controls show that their mean values a) increase with higher classification probability and correspond to false classifications when around zero, and b) correspond to clinical recommendations regarding which lead to consider. Furthermore, c) visible P-waves and concordant T-waves result in clearly negative relevance scores in AF and LBBB classification, respectively. In summary, our analysis suggests that the DNN learned features similar to cardiology textbook knowledge.
△ Less
Submitted 3 July, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
menoci: Lightweight Extensible Web Portal enabling FAIR Data Management for Biomedical Research Projects
Authors:
Markus Suhr,
Christoph Lehmann,
Christian Robert Bauer,
Theresa Bender,
Cornelius Knopp,
Luca Freckmann,
Björn Öst Hansen,
Christian Henke,
Georg Aschenbrandt,
Lea Kühlborn,
Sophia Rheinländer,
Linus Weber,
Bartlomiej Marzec,
Marcel Hellkamp,
Philipp Wieder,
Harald Kusch,
Ulrich Sax,
Sara Yasemin Nussbeck
Abstract:
Background: Biomedical research projects deal with data management requirements from multiple sources like funding agencies' guidelines, publisher policies, discipline best practices, and their own users' needs. We describe functional and quality requirements based on many years of experience implementing data management for the CRC 1002 and CRC 1190. A fully equipped data management software shou…
▽ More
Background: Biomedical research projects deal with data management requirements from multiple sources like funding agencies' guidelines, publisher policies, discipline best practices, and their own users' needs. We describe functional and quality requirements based on many years of experience implementing data management for the CRC 1002 and CRC 1190. A fully equipped data management software should improve documentation of experiments and materials, enable data storage and sharing according to the FAIR Guiding Principles while maximizing usability, information security, as well as software sustainability and reusability. Results: We introduce the modular web portal software menoci for data collection, experiment documentation, data publication, sharing, and preservation in biomedical research projects. Menoci modules are based on the Drupal content management system which enables lightweight deployment and setup, and creates the possibility to combine research data management with a customisable project home page or collaboration platform. Conclusions: Management of research data and digital research artefacts is transforming from individual researcher or groups best practices towards project- or organisation-wide service infrastructures. To enable and support this structural transformation process, a vital ecosystem of open source software tools is needed. Menoci is a contribution to this ecosystem of research data management tools that is specifically designed to support biomedical research projects.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.