Search | arXiv e-print repository

Multi-VFL: A Vertical Federated Learning System for Multiple Data and Label Owners

Authors: Vaikkunth Mugunthan, Pawan Goyal, Lalana Kagal

Abstract: Vertical Federated Learning (VFL) refers to the collaborative training of a model on a dataset where the features of the dataset are split among multiple data owners, while label information is owned by a single data owner. In this paper, we propose a novel method, Multi Vertical Federated Learning (Multi-VFL), to train VFL models when there are multiple data and label owners. Our approach is the… ▽ More Vertical Federated Learning (VFL) refers to the collaborative training of a model on a dataset where the features of the dataset are split among multiple data owners, while label information is owned by a single data owner. In this paper, we propose a novel method, Multi Vertical Federated Learning (Multi-VFL), to train VFL models when there are multiple data and label owners. Our approach is the first to consider the setting where $D$-data owners (across which features are distributed) and $K$-label owners (across which labels are distributed) exist. This proposed configuration allows different entities to train and learn optimal models without having to share their data. Our framework makes use of split learning and adaptive federated optimizers to solve this problem. For empirical evaluation, we run experiments on the MNIST and FashionMNIST datasets. Our results show that using adaptive optimizers for model aggregation fastens convergence and improves accuracy. △ Less

Submitted 16 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2103.09876 [pdf, other]

Bias-Free FedGAN: A Federated Approach to Generate Bias-Free Datasets

Authors: Vaikkunth Mugunthan, Vignesh Gokul, Lalana Kagal, Shlomo Dubnov

Abstract: Federated Generative Adversarial Network (FedGAN) is a communication-efficient approach to train a GAN across distributed clients without clients having to share their sensitive training data. In this paper, we experimentally show that FedGAN generates biased data points under non-independent-and-identically-distributed (non-iid) settings. Also, we propose Bias-Free FedGAN, an approach to generate… ▽ More Federated Generative Adversarial Network (FedGAN) is a communication-efficient approach to train a GAN across distributed clients without clients having to share their sensitive training data. In this paper, we experimentally show that FedGAN generates biased data points under non-independent-and-identically-distributed (non-iid) settings. Also, we propose Bias-Free FedGAN, an approach to generate bias-free synthetic datasets using FedGAN. Our approach generates metadata at the aggregator using the models received from clients and retrains the federated model to achieve bias-free results for image synthesis. Bias-Free FedGAN has the same communication cost as that of FedGAN. Experimental results on image datasets (MNIST and FashionMNIST) validate our claims. △ Less

Submitted 15 April, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

arXiv:2012.05463 [pdf, other]

Investigating Bias in Image Classification using Model Explanations

Authors: Schrasing Tong, Lalana Kagal

Abstract: We evaluated whether model explanations could efficiently detect bias in image classification by highlighting discriminating features, thereby removing the reliance on sensitive attributes for fairness calculations. To this end, we formulated important characteristics for bias detection and observed how explanations change as the degree of bias in models change. The paper identifies strengths and… ▽ More We evaluated whether model explanations could efficiently detect bias in image classification by highlighting discriminating features, thereby removing the reliance on sensitive attributes for fairness calculations. To this end, we formulated important characteristics for bias detection and observed how explanations change as the degree of bias in models change. The paper identifies strengths and best practices for detecting bias using explanations, as well as three main weaknesses: explanations poorly estimate the degree of bias, could potentially introduce additional bias into the analysis, and are sometimes inefficient in terms of human effort involved. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2010.11398 [pdf, other]

DPD-InfoGAN: Differentially Private Distributed InfoGAN

Authors: Vaikkunth Mugunthan, Vignesh Gokul, Lalana Kagal, Shlomo Dubnov

Abstract: Generative Adversarial Networks (GANs) are deep learning architectures capable of generating synthetic datasets. Despite producing high-quality synthetic images, the default GAN has no control over the kinds of images it generates. The Information Maximizing GAN (InfoGAN) is a variant of the default GAN that introduces feature-control variables that are automatically learned by the framework, henc… ▽ More Generative Adversarial Networks (GANs) are deep learning architectures capable of generating synthetic datasets. Despite producing high-quality synthetic images, the default GAN has no control over the kinds of images it generates. The Information Maximizing GAN (InfoGAN) is a variant of the default GAN that introduces feature-control variables that are automatically learned by the framework, hence providing greater control over the different kinds of images produced. Due to the high model complexity of InfoGAN, the generative distribution tends to be concentrated around the training data points. This is a critical problem as the models may inadvertently expose the sensitive and private information present in the dataset. To address this problem, we propose a differentially private version of InfoGAN (DP-InfoGAN). We also extend our framework to a distributed setting (DPD-InfoGAN) to allow clients to learn different attributes present in other clients' datasets in a privacy-preserving manner. In our experiments, we show that both DP-InfoGAN and DPD-InfoGAN can synthesize high-quality images with flexible control over image attributes while preserving privacy. △ Less

Submitted 22 March, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

arXiv:2007.03856 [pdf, other]

BlockFLow: An Accountable and Privacy-Preserving Solution for Federated Learning

Authors: Vaikkunth Mugunthan, Ravi Rahman, Lalana Kagal

Abstract: Federated learning enables the development of a machine learning model among collaborating agents without requiring them to share their underlying data. However, malicious agents who train on random data, or worse, on datasets with the result classes inverted, can weaken the combined model. BlockFLow is an accountable federated learning system that is fully decentralized and privacy-preserving. It… ▽ More Federated learning enables the development of a machine learning model among collaborating agents without requiring them to share their underlying data. However, malicious agents who train on random data, or worse, on datasets with the result classes inverted, can weaken the combined model. BlockFLow is an accountable federated learning system that is fully decentralized and privacy-preserving. Its primary goal is to reward agents proportional to the quality of their contribution while protecting the privacy of the underlying datasets and being resilient to malicious adversaries. Specifically, BlockFLow incorporates differential privacy, introduces a novel auditing mechanism for model contribution, and uses Ethereum smart contracts to incentivize good behavior. Unlike existing auditing and accountability methods for federated learning systems, our system does not require a centralized test dataset, sharing of datasets between the agents, or one or more trusted auditors; it is fully decentralized and resilient up to a 50% collusion attack in a malicious trust model. When run on the public Ethereum blockchain, BlockFLow uses the results from the audit to reward parties with cryptocurrency based on the quality of their contribution. We evaluated BlockFLow on two datasets that offer classification tasks solvable via logistic regression models. Our results show that the resultant auditing scores reflect the quality of the honest agents' datasets. Moreover, the scores from dishonest agents are statistically lower than those from the honest agents. These results, along with the reasonable blockchain costs, demonstrate the effectiveness of BlockFLow as an accountable federated learning system. △ Less

Submitted 7 July, 2020; originally announced July 2020.

arXiv:2002.08423 [pdf, other]

doi 10.1145/3340531.3412771

PrivacyFL: A simulator for privacy-preserving and secure federated learning

Authors: Vaikkunth Mugunthan, Anton Peraire-Bueno, Lalana Kagal

Abstract: Federated learning is a technique that enables distributed clients to collaboratively learn a shared machine learning model while kee** their training data localized. This reduces data privacy risks, however, privacy concerns still exist since it is possible to leak information about the training dataset from the trained model's weights or parameters. Setting up a federated learning environment,… ▽ More Federated learning is a technique that enables distributed clients to collaboratively learn a shared machine learning model while kee** their training data localized. This reduces data privacy risks, however, privacy concerns still exist since it is possible to leak information about the training dataset from the trained model's weights or parameters. Setting up a federated learning environment, especially with security and privacy guarantees, is a time-consuming process with numerous configurations and parameters that can be manipulated. In order to help clients ensure that collaboration is feasible and to check that it improves their model accuracy, a real-world simulator for privacy-preserving and secure federated learning is required. In this paper, we introduce PrivacyFL, which is an extensible, easily configurable and scalable simulator for federated learning environments. Its key features include latency simulation, robustness to client departure, support for both centralized and decentralized learning, and configurable privacy and security mechanisms based on differential privacy and secure multiparty computation. In this paper, we motivate our research, describe the architecture of the simulator and associated protocols, and discuss its evaluation in numerous scenarios that highlight its wide range of functionality and its advantages. Our paper addresses a significant real-world problem: checking the feasibility of participating in a federated learning environment under a variety of circumstances. It also has a strong practical impact because organizations such as hospitals, banks, and research institutes, which have large amounts of sensitive data and would like to collaborate, would greatly benefit from having a system that enables them to do so in a privacy-preserving and secure manner. △ Less

Submitted 7 July, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

arXiv:2001.02479 [pdf, other]

doi 10.1145/3313831.3376321

Dark Patterns after the GDPR: Scra** Consent Pop-ups and Demonstrating their Influence

Authors: Midas Nouwens, Ilaria Liccardi, Michael Veale, David Karger, Lalana Kagal

Abstract: New consent management platforms (CMPs) have been introduced to the web to conform with the EU's General Data Protection Regulation, particularly its requirements for consent when companies collect and process users' personal data. This work analyses how the most prevalent CMP designs affect people's consent choices. We scraped the designs of the five most popular CMPs on the top 10,000 websites i… ▽ More New consent management platforms (CMPs) have been introduced to the web to conform with the EU's General Data Protection Regulation, particularly its requirements for consent when companies collect and process users' personal data. This work analyses how the most prevalent CMP designs affect people's consent choices. We scraped the designs of the five most popular CMPs on the top 10,000 websites in the UK (n=680). We found that dark patterns and implied consent are ubiquitous; only 11.8% meet the minimal requirements that we set based on European law. Second, we conducted a field experiment with 40 participants to investigate how the eight most common designs affect consent choices. We found that notification style (banner or barrier) has no effect; removing the opt-out button from the first page increases consent by 22--23 percentage points; and providing more granular controls on the first page decreases consent by 8--20 percentage points. This study provides an empirical basis for the necessary regulatory action to enforce the GDPR, in particular the possibility of focusing on the centralised, third-party CMP services as an effective way to increase compliance. △ Less

Submitted 8 January, 2020; originally announced January 2020.

Comments: 13 pages, 3 figures. To appear in the Proceedings of CHI '20 CHI Conference on Human Factors in Computing Systems, April 25--30, 2020, Honolulu, HI, USA

arXiv:1806.00069 [pdf, ps, other]

Explaining Explanations: An Overview of Interpretability of Machine Learning

Authors: Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, Lalana Kagal

Abstract: There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanat… ▽ More There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence. △ Less

Submitted 3 February, 2019; v1 submitted 31 May, 2018; originally announced June 2018.

Comments: The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018). [Research Track]

arXiv:1804.10645 [pdf, other]

Enforceable Data Sharing Agreements Using Smart Contracts

Authors: Kevin Liu, Harsh Desai, Lalana Kagal, Murat Kantarcioglu

Abstract: As more and more data is collected for various reasons, the sharing of such data becomes paramount to increasing its value. Many applications ranging from smart cities to personalized health care require individuals and organizations to share data at an unprecedented scale. Data sharing is crucial in today's world, but due to privacy reasons, security concerns and regulation issues, the conditions… ▽ More As more and more data is collected for various reasons, the sharing of such data becomes paramount to increasing its value. Many applications ranging from smart cities to personalized health care require individuals and organizations to share data at an unprecedented scale. Data sharing is crucial in today's world, but due to privacy reasons, security concerns and regulation issues, the conditions under which the sharing occurs needs to be carefully specified. Currently, this process is done by lawyers and requires the costly signing of legal agreements. In many cases, these data sharing agreements are hard to track, manage or enforce. In this work, we propose a novel alternative for tracking, managing and especially enforcing such data sharing agreements using smart contracts and blockchain technology. We design a framework that generates smart contracts from parameters based on legal data sharing agreements. The terms in these agreements are automatically enforced by the system. Monetary punishment can be employed using secure voting by external auditors to hold the violators accountable. Our experimental evaluation shows that our proposed framework is efficient and low-cost. △ Less

Submitted 27 April, 2018; originally announced April 2018.

Comments: 8 pages

arXiv:1611.04967 [pdf, other]

Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models

Authors: Julius Adebayo, Lalana Kagal

Abstract: Predictive models are increasingly deployed for the purpose of determining access to services such as credit, insurance, and employment. Despite potential gains in productivity and efficiency, several potential problems have yet to be addressed, particularly the potential for unintentional discrimination. We present an iterative procedure, based on orthogonal projection of input attributes, for en… ▽ More Predictive models are increasingly deployed for the purpose of determining access to services such as credit, insurance, and employment. Despite potential gains in productivity and efficiency, several potential problems have yet to be addressed, particularly the potential for unintentional discrimination. We present an iterative procedure, based on orthogonal projection of input attributes, for enabling interpretability of black-box predictive models. Through our iterative procedure, one can quantify the relative dependence of a black-box model on its input attributes.The relative significance of the inputs to a predictive model can then be used to assess the fairness (or discriminatory extent) of such a model. △ Less

Submitted 15 November, 2016; originally announced November 2016.

arXiv:0711.1533 [pdf, ps, other]

N3Logic: A Logical Framework For the World Wide Web

Authors: Tim Berners-Lee, Dan Connolly, Lalana Kagal, Yosi Scharf, Jim Hendler

Abstract: The Semantic Web drives towards the use of the Web for interacting with logically interconnected data. Through knowledge models such as Resource Description Framework (RDF), the Semantic Web provides a unifying representation of richly structured data. Adding logic to the Web implies the use of rules to make inferences, choose courses of action, and answer questions. This logic must be powerful… ▽ More The Semantic Web drives towards the use of the Web for interacting with logically interconnected data. Through knowledge models such as Resource Description Framework (RDF), the Semantic Web provides a unifying representation of richly structured data. Adding logic to the Web implies the use of rules to make inferences, choose courses of action, and answer questions. This logic must be powerful enough to describe complex properties of objects but not so powerful that agents can be tricked by being asked to consider a paradox. The Web has several characteristics that can lead to problems when existing logics are used, in particular, the inconsistencies that inevitably arise due to the openness of the Web, where anyone can assert anything. N3Logic is a logic that allows rules to be expressed in a Web environment. It extends RDF with syntax for nested graphs and quantified variables and with predicates for implication and accessing resources on the Web, and functions including cryptographic, string, math. The main goal of N3Logic is to be a minimal extension to the RDF data model such that the same language can be used for logic and data. In this paper, we describe N3Logic and illustrate through examples why it is an appropriate logic for the Web. △ Less

Submitted 9 November, 2007; originally announced November 2007.

Comments: To appear in Theory and Practice of Logic Programming (TPLP)

Showing 1–11 of 11 results for author: Kagal, L