Data privacy protection in microscopic image analysis for material data mining
Authors:
Boyuan Ma,
Xiang Yin,
Xiaojuan Ban,
Haiyou Huang,
Neng Zhang,
Hao Wang,
Weihua Xue
Abstract:
Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data has been extremely costly owing to the amount of human effort and expertise required. Therefore, material researchers are often reluctant to easily disclose their private data, which leads to the problem of data island, and it is difficult to collect a la…
▽ More
Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data has been extremely costly owing to the amount of human effort and expertise required. Therefore, material researchers are often reluctant to easily disclose their private data, which leads to the problem of data island, and it is difficult to collect a large amount of data to train high-quality models. In this study, a material microstructure image feature extraction algorithm FedTransfer based on data privacy protection is proposed. The core contributions are as follows: 1) the federated learning algorithm is introduced into the polycrystalline microstructure image segmentation task to make full use of different user data to carry out machine learning, break the data island and improve the model generalization ability under the condition of ensuring the privacy and security of user data; 2) A data sharing strategy based on style transfer is proposed. By sharing style information of images that is not urgent for user confidentiality, it can reduce the performance penalty caused by the distribution difference of data among different users.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
Data augmentation in microscopic images for material data mining
Authors:
Boyuan Ma,
Xiaoyan Wei,
Chuni Liu,
Xiaojuan Ban,
Haiyou Huang,
Hao Wang,
Weihua Xue,
Stephen Wu,
Mingfei Gao,
Qing Shen,
Adnan Omer Abuassba,
Haokai Shen,
Yan**g Su
Abstract:
Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data,…
▽ More
Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data, and the augmentation of training data in data mining procedure. For a specific task of image segmentation, this strategy can generate synthetic images by fusing physical mechanism of simulated images and "image style" of real images. The result shows that the model trained with the acquired synthetic images and 35% of the real images outperforms the model trained on all real images. As the time required to generate synthetic data is almost negligible, this strategy is able to reduce the time cost of real data preparation by roughly 65%.
△ Less
Submitted 28 October, 2019; v1 submitted 12 May, 2019;
originally announced May 2019.