-
Verbesserung des Record Linkage für die Gesundheitsforschung in Deutschland
Authors:
Timm Intemann,
Knut Kaulke,
Dennis-Kenji Kipker,
Vanessa Lettieri,
Christoph Stallmann,
Carsten O. Schmidt,
Lars Geidel,
Martin Bialke,
Christopher Hampf,
Dana Stahl,
Martin Lablans,
Florens Rohde,
Martin Franke,
Klaus Kraywinkel,
Joachim Kieschke,
Sebastian Bartholomäus,
Anatol-Fiete Näher,
Galina Tremper,
Mohamed Lambarki,
Stefanie March,
Fabian Prasser,
Anna Christine Haber,
Johannes Drepper,
Irene Schlünder,
Toralf Kirsten
, et al. (5 additional authors not shown)
Abstract:
Record linkage means linking data from multiple sources. This approach enables the answering of scientific questions that cannot be addressed using single data sources due to limited variables. The potential of linked data for health research is enormous, as it can enhance prevention, treatment, and population health policies. Due the sensitivity of health data, there are strict legal requirements…
▽ More
Record linkage means linking data from multiple sources. This approach enables the answering of scientific questions that cannot be addressed using single data sources due to limited variables. The potential of linked data for health research is enormous, as it can enhance prevention, treatment, and population health policies. Due the sensitivity of health data, there are strict legal requirements to prevent potential misuse. However, these requirements also limit the use of health data for research, thereby hindering innovations in prevention and care. Also, comprehensive Record linkage in Germany is often challenging due to lacking unique personal identifiers or interoperable solutions. Rather, the need to protect data is often weighed against the importance of research aiming at healthcare enhancements: for instance, data protection officers may demand the informed consent of individual study participants for data linkage, even when this is not mandatory. Furthermore, legal frameworks may be interpreted differently on varying occasions. Given both, technical and legal challenges, record linkage for health research in Germany falls behind the standards of other European countries. To ensure successful record linkage, case-specific solutions must be developed, tested, and modified as necessary before implementation. This paper discusses limitations and possibilities of various data linkage approaches tailored to different use cases in compliance with the European General Data Protection Regulation. It further describes requirements for achieving a more research-friendly approach to linking health data records in Germany. Additionally, it provides recommendations to legislators. The objective of this work is to improve record linkage for health research in Germany.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Privacy-Preserving Linkage of Distributed Datasets using the Personal Health Train
Authors:
Maximilian Jugl,
Sascha Welten,
Yongli Mou,
Yeliz Ucer Yediel,
Oya Deniz Beyan,
Ulrich Sax,
Toralf Kirsten
Abstract:
With the generation of personal and medical data at several locations, medical data science faces unique challenges when working on distributed datasets. Growing data protection requirements in recent years drastically limit the use of personally identifiable information. Distributed data analysis aims to provide solutions for securely working on highly sensitive data while minimizing the risk of…
▽ More
With the generation of personal and medical data at several locations, medical data science faces unique challenges when working on distributed datasets. Growing data protection requirements in recent years drastically limit the use of personally identifiable information. Distributed data analysis aims to provide solutions for securely working on highly sensitive data while minimizing the risk of information leaks, which would not be possible to the same degree in a centralized approach. A novel concept in this field is the Personal Health Train (PHT), which encapsulates the idea of bringing the analysis to the data, not vice versa. Data sources are represented as train stations. Trains containing analysis tasks move between stations and aggregate results. Train executions are coordinated by a central station which data analysts can interact with. Data remains at their respective stations and analysis results are only stored inside the train, providing a safe and secure environment for distributed data analysis.
Duplicate records across multiple locations can skew results in a distributed data analysis. On the other hand, merging information from several datasets referring to the same real-world entities may improve data completeness and therefore data quality. In this paper, we present an approach for record linkage on distributed datasets using the Personal Health Train. We verify this approach and evaluate its effectiveness by applying it to two datasets based on real-world data and outline its possible applications in the context of distributed data analysis tasks.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Distributed Learning for Melanoma Classification using Personal Health Train
Authors:
Yongli Mou,
Sascha Welten,
Yeliz Ucer Yediel,
Toralf Kirsten,
Oya Deniz Beyan
Abstract:
Skin cancer is the most common cancer type. Usually, patients with suspicion of cancer are treated by doctors without any aided visual inspection. At this point, dermoscopy has become a suitable tool to support physicians in their decision-making. However, clinicians need years of expertise to classify possibly malicious skin lesions correctly. Therefore, research has applied image processing and…
▽ More
Skin cancer is the most common cancer type. Usually, patients with suspicion of cancer are treated by doctors without any aided visual inspection. At this point, dermoscopy has become a suitable tool to support physicians in their decision-making. However, clinicians need years of expertise to classify possibly malicious skin lesions correctly. Therefore, research has applied image processing and analysis tools to improve the treatment process. In order to perform image analysis and train a model on dermoscopic images data needs to be centralized. Nevertheless, data centralization does not often comply with local data protection regulations due to its sensitive nature and due to the loss of sovereignty if data providers allow unlimited access to the data. A method to circumvent all privacy-related challenges of data centralization is Distributed Analytics (DA) approaches, which bring the analysis to the data instead of vice versa. This paradigm shift enables data analyses - in our case, image analysis - with data remaining inside institutional borders, i.e., the origin. In this documentation, we describe a straightforward use case including a model training for skin lesion classification based on decentralised data.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
Data Partitioning for Parallel Entity Matching
Authors:
Toralf Kirsten,
Lars Kolb,
Michael Hartung,
Anika Groß,
Hanna Köpcke,
Erhard Rahm
Abstract:
Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose different strategies to partition the input data and generate multiple match tasks that can be independently executed. One of our strategies supports both, bloc…
▽ More
Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose different strategies to partition the input data and generate multiple match tasks that can be independently executed. One of our strategies supports both, blocking to reduce the search space for matching and parallel matching to improve efficiency. Special attention is given to the number and size of data partitions as they impact the overall communication overhead and memory requirements of individual match tasks. We have developed a service-based distributed infrastructure for the parallel execution of match workflows. We evaluate our approach in detail for different match strategies for matching real-world product data of different web shops. We also consider caching of in-put entities and affinity-based scheduling of match tasks.
△ Less
Submitted 28 June, 2010;
originally announced June 2010.
-
Reanalysis of the GALLEX solar neutrino flux and source experiments
Authors:
F. Kaether,
W. Hampel,
G. Heusser,
J. Kiko,
T. Kirsten
Abstract:
After the completion of the gallium solar neutrino experiments at the Laboratori Nazionali del Gran Sasso (GALLEX}: 1991-1997; GNO: 1998-2003) we have retrospectively updated the GALLEX results with the help of new technical data that were impossible to acquire for principle reasons before the completion of the low rate measurement phase (that is, before the end of the GNO solar runs). Subsequen…
▽ More
After the completion of the gallium solar neutrino experiments at the Laboratori Nazionali del Gran Sasso (GALLEX}: 1991-1997; GNO: 1998-2003) we have retrospectively updated the GALLEX results with the help of new technical data that were impossible to acquire for principle reasons before the completion of the low rate measurement phase (that is, before the end of the GNO solar runs). Subsequent high rate experiments have allowed the calibration of absolute internal counter efficiencies and of an advanced pulse shape analysis for counter background discrimination. The updated overall result for GALLEX (only) is (73.4 +7.1 -7.3) SNU. This is 5.3% below the old value of (77.5 + 7.5 -7.8) SNU (PLB 447 (1999) 127-133) with a substantially reduced error. A similar reduction is obtained from the reanalysis of the 51Cr neutrino source experiments of 1994/1995.
△ Less
Submitted 15 January, 2010;
originally announced January 2010.
-
Complete results for five years of GNO solar neutrino observations
Authors:
GNO COLLABORATION,
M. Altmann,
M. Balata,
P. Belli,
E. Bellotti,
R. Bernabei,
E. Burkert,
C. Cattadori,
R. Cerulli,
M. Chiarini,
M. Cribier,
S. d'Angelo,
G. Del Re,
K. H. Ebert,
F. v. Feilitzsch,
N. Ferrari,
W. Hampel,
F. X. Hartmann,
E. Henrich,
G. Heusser,
F. Kaether,
J. Kiko,
T. Kirsten,
T. Lachenmaier,
J. Lanfranchi
, et al. (15 additional authors not shown)
Abstract:
We report the complete GNO solar neutrino results for the measuring periods GNO III, GNO II, and GNO I. The result for GNO III (last 15 solar runs) is [54.3 + 9.9 - 9.3 (stat.)+- 2.3 (syst.)] SNU (1 sigma) or [54.3 + 10.2 - 9.6 (incl. syst.)] SNU (1 sigma) with errors combined. The GNO experiment is now terminated after altogether 58 solar exposure runs that were performed between May 20, 1998 a…
▽ More
We report the complete GNO solar neutrino results for the measuring periods GNO III, GNO II, and GNO I. The result for GNO III (last 15 solar runs) is [54.3 + 9.9 - 9.3 (stat.)+- 2.3 (syst.)] SNU (1 sigma) or [54.3 + 10.2 - 9.6 (incl. syst.)] SNU (1 sigma) with errors combined. The GNO experiment is now terminated after altogether 58 solar exposure runs that were performed between May 20, 1998 and April 9, 2003. The combined result for GNO (I+II+III) is [62.9 + 5.5 - 5.3 (stat.) +- 2.5 (syst.)] SNU (1 sigma) or [62.9 + 6.0 - 5.9] SNU (1 sigma) with errors combined in quadrature. Overall, gallium based solar observations at LNGS (first in GALLEX, later in GNO) lasted from May 14, 1991 through April 9, 2003. The joint result from 123 runs in GNO and GALLEX is [69.3 +- 5.5 (incl. syst.)] SNU (1 sigma). The distribution of the individual run results is consistent with the hypothesis of a neutrino flux that is constant in time. Implications from the data in particle- and astrophysics are reiterated.
△ Less
Submitted 19 April, 2005;
originally announced April 2005.
-
Base pair interactions and hybridization isotherms of matched and mismatched oligonucleotide probes on microarrays
Authors:
Hans Binder,
Stephan Preibisch,
Toralf Kirsten
Abstract:
The lack of specificity in microarray experiments due to non-specific hybridization raises a serious problem for the analysis of microarray data because the residual chemical background intensity is not related to the expression degree of the gene of interest. We analyzed the concentration dependence of the signal intensity of perfect match (PM) and mismatch (MM) probes in terms using a microsco…
▽ More
The lack of specificity in microarray experiments due to non-specific hybridization raises a serious problem for the analysis of microarray data because the residual chemical background intensity is not related to the expression degree of the gene of interest. We analyzed the concentration dependence of the signal intensity of perfect match (PM) and mismatch (MM) probes in terms using a microscopic binding model using a combination of mean hybridization isotherms and single base related affinity terms. The signal intensities of the PM and MM probes and their difference are assessed with regard to their sensitivity, specificity and resolution for gene expression measures. The presented theory implies the refinement of existing algorithms of probe level analysis to correct microarray data for non-specific background intensities.
△ Less
Submitted 11 May, 2005; v1 submitted 6 January, 2005;
originally announced January 2005.
-
GNO Solar Neutrino Observations: Results for GNOI
Authors:
GNO Collaboration,
M. Altmann,
M. Balata,
P. Belli,
E. Bellotti,
R. Bernabei,
E. Burkert,
C. Cattadori,
G. Cerichelli,
M. Chiarini,
M. Cribier,
S. d'Angelo,
G. Del Re,
K. H. Ebert,
F. v. Feilitzsch,
N. Ferrari,
W. Hampel,
J. Handt,
E. Henrich,
G. Heusser,
J. Kiko,
T. Kirsten,
T. Lachenmaier,
J. Lanfranchi,
M. Laubenstein
, et al. (6 additional authors not shown)
Abstract:
We report the first GNO solar neutrino results for the measuring period GNOI, solar exposure time May 20, 1998 till January 12, 2000. In the present analysis, counting results for solar runs SR1 - SR19 were used till April 4, 2000. With counting completed for all but the last 3 runs (SR17 - SR19), the GNO I result is [65.8 +10.2 -9.6 (stat.) +3.4 -3.6 (syst.)]SNU (1sigma) or [65.8 + 10.7 -10.2 (…
▽ More
We report the first GNO solar neutrino results for the measuring period GNOI, solar exposure time May 20, 1998 till January 12, 2000. In the present analysis, counting results for solar runs SR1 - SR19 were used till April 4, 2000. With counting completed for all but the last 3 runs (SR17 - SR19), the GNO I result is [65.8 +10.2 -9.6 (stat.) +3.4 -3.6 (syst.)]SNU (1sigma) or [65.8 + 10.7 -10.2 (incl. syst.)]SNU (1sigma) with errors combined. This may be compared to the result for Gallex(I-IV), which is [77.5 +7.6 -7.8 (incl. syst.)] SNU (1sigma). A combined result from both GNOI and Gallex(I-IV) together is [74.1 + 6.7 -6.8 (incl. syst.)] SNU (1sigma).
△ Less
Submitted 29 June, 2000;
originally announced June 2000.