Search | arXiv e-print repository

Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration

Authors: Anna Bogdanova, Akira Imakura, Tetsuya Sakurai, Tomoya Fujii, Teppei Sakamoto, Hiroyuki Abe

Abstract: Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used to explain the predictions of black-box machine learning models to customers and developers. However, a parallel trend has been to train machine learning models… ▽ More Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used to explain the predictions of black-box machine learning models to customers and developers. However, a parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data. Such models, trained over horizontally or vertically partitioned data, present a challenge for explainable AI because the explaining party may have a biased view of background data or a partial view of the feature space. As a result, explanations obtained from different participants of distributed machine learning might not be consistent with one another, undermining trust in the product. This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm (KernelSHAP) and Data Collaboration method of privacy-preserving distributed machine learning. In particular, we present three algorithms for different scenarios of explainability in Data Collaboration and verify their consistency with experiments on open-access datasets. Our results demonstrated a significant (by at least a factor of 1.75) decrease in feature attribution discrepancies among the users of distributed machine learning. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Presented at PKAW 2022 (arXiv:2211.03888) Report-no: PKAW/2022/03

Report number: Report-no: PKAW/2022/03

arXiv:2208.14611 [pdf, other]

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Authors: Akira Imakura, Tetsuya Sakurai, Yukihiko Okada, Tomoya Fujii, Teppei Sakamoto, Hiroyuki Abe

Abstract: Multi-source data fusion, in which multiple data sources are jointly analyzed to obtain improved information, has considerable research attention. For the datasets of multiple medical institutions, data confidentiality and cross-institutional communication are critical. In such cases, data collaboration (DC) analysis by sharing dimensionality-reduced intermediate representations without iterative… ▽ More Multi-source data fusion, in which multiple data sources are jointly analyzed to obtain improved information, has considerable research attention. For the datasets of multiple medical institutions, data confidentiality and cross-institutional communication are critical. In such cases, data collaboration (DC) analysis by sharing dimensionality-reduced intermediate representations without iterative cross-institutional communications may be appropriate. Identifiability of the shared data is essential when analyzing data including personal information. In this study, the identifiability of the DC analysis is investigated. The results reveals that the shared intermediate representations are readily identifiable to the original data for supervised learning. This study then proposes a non-readily identifiable DC analysis only sharing non-readily identifiable data for multiple medical datasets including personal information. The proposed method solves identifiability concerns based on a random sample permutation, the concept of interpretable DC analysis, and usage of functions that cannot be reconstructed. In numerical experiments on medical datasets, the proposed method exhibits a non-readily identifiability while maintaining a high recognition performance of the conventional DC analysis. For a hospital dataset, the proposed method exhibits a nine percentage point improvement regarding the recognition performance over the local analysis that uses only local dataset. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: 19 pages, 3 figures, 7 tables

arXiv:1711.02977 [pdf]

Cross-National Measurement of Polarization in Political Discourse: Analyzing floor debate in the U.S. and the Japanese legislatures

Authors: Takuto Sakamoto, Hiroki Takikawa

Abstract: Political polarization in public space can seriously hamper the function and the integrity of contemporary democratic societies. In this paper, we propose a novel measure of such polarization, which, by way of simple topic modelling, quantifies differences in collective articulation of public agendas among relevant political actors. Unlike most other polarization measures, our measure allows cross… ▽ More Political polarization in public space can seriously hamper the function and the integrity of contemporary democratic societies. In this paper, we propose a novel measure of such polarization, which, by way of simple topic modelling, quantifies differences in collective articulation of public agendas among relevant political actors. Unlike most other polarization measures, our measure allows cross-national comparison. Analyzing a large amount of speech records of legislative debate in the United States Congress and the Japanese Diet over a long period of time, we have reached two intriguing findings. First, on average, Japanese political actors are far more polarized in their issue articulation than their counterparts in the U.S., which is somewhat surprising given the recent notion of U.S. politics as highly polarized. Second, the polarization in each country shows its own temporal dynamics in response to a different set of factors. In Japan, structural factors such as the roles of the ruling party and the opposition often dominate such dynamics, whereas the U.S. legislature suffers from persistent ideological differences over particular issues between major political parties. The analysis confirms a strong influence of institutional differences on legislative debate in parliamentary democracies. △ Less

Submitted 8 November, 2017; originally announced November 2017.

Comments: To be published in the Proceedings of the 2017 IEEE International Conference on Big Data; 7 pages, 6 figures, 2 tables

arXiv:1704.06903 [pdf]

Moral Foundations of Political Discourse: Comparative Analysis of the Speech Records of the US Congress and the Japanese Diet

Authors: Hiroki Takikawa, Takuto Sakamoto

Abstract: There has been a growing body of study on the relationship between public/political discourse and its moral-emotional foundations. Most of the studies, however, have been confined to a single country's context, lacking cross-cultural perspectives. Taking a comparative perspective, we examined the emotional and moral structures of political and public discussion observed in the U.S. and Japan by em… ▽ More There has been a growing body of study on the relationship between public/political discourse and its moral-emotional foundations. Most of the studies, however, have been confined to a single country's context, lacking cross-cultural perspectives. Taking a comparative perspective, we examined the emotional and moral structures of political and public discussion observed in the U.S. and Japan by employing extensive text data that cover these two countries. Specifically, we conducted dictionary-based sentiment and moral analyses of floor debate in the U.S. Congress and the Japanese Diet over a long period of time. The analyses revealed intriguing cross-national patterns in the moral-emotional framework employed in parliamentary deliberations, which cast doubt on some of the dominant arguments in the field, including, among others, J. Haidt's moral foundation hypothesis. △ Less

Submitted 23 April, 2017; originally announced April 2017.

Comments: Originally submitted to the 3rd International Conference on Computational Social Science (IC2S2), July 10-13, 2017; 4 pages

Showing 1–4 of 4 results for author: Sakamoto, T