Practices and challenges in clinical data sharing
Authors:
Fida K. Dankar
Abstract:
The debate on data access and privacy is an ongoing one. It is kept alive by the never-ending changes/upgrades in (i) the shape of the data collected (in terms of size, diversity, sensitivity and quality), (ii) the laws governing data sharing, (iii) the amount of free public data available on individuals (social media, blogs, population-based databases, etc.), as well as (iv) the available privacy…
▽ More
The debate on data access and privacy is an ongoing one. It is kept alive by the never-ending changes/upgrades in (i) the shape of the data collected (in terms of size, diversity, sensitivity and quality), (ii) the laws governing data sharing, (iii) the amount of free public data available on individuals (social media, blogs, population-based databases, etc.), as well as (iv) the available privacy enhancing technologies. This paper identifies current directions, challenges and best practices in constructing a clinical data-sharing framework for research purposes. Specifically, we create a taxonomy for the framework, identify the design choices available within each taxon, and demonstrate thew choices using current legal frameworks. The purpose is to devise best practices for the implementation of an effective, safe and transparent research access framework.
△ Less
Submitted 17 March, 2023;
originally announced April 2023.
A new PCA-based utility measure for synthetic data evaluation
Authors:
F. K. Dankar,
M. K. Ibrahim
Abstract:
Data synthesis is a privacy enhancing technology aiming to produce realistic and timely data when real data is hard to obtain. Utility of synthetic data generators (SDGs) has been investigated through different utility metrics. These metrics have been found to generate conflicting conclusions making direct comparison of SDGs surprisingly difficult. Moreover, prior research found no correlation bet…
▽ More
Data synthesis is a privacy enhancing technology aiming to produce realistic and timely data when real data is hard to obtain. Utility of synthetic data generators (SDGs) has been investigated through different utility metrics. These metrics have been found to generate conflicting conclusions making direct comparison of SDGs surprisingly difficult. Moreover, prior research found no correlation between popular metrics, concluding they tackle different utility-dimensions. This paper aggregates four popular utility metrics (representing different utility dimensions) into one using principal-component-analysis and checks whether the new measure can generate synthetic data that perform well in real-life. The new measure is used to compare four well-recognized SDGs.
△ Less
Submitted 26 November, 2022;
originally announced December 2022.