Search | arXiv e-print repository

The Intelligible and Effective Graph Neural Additive Networks

Authors: Maya Bechler-Speicher, Amir Globerson, Ran Gilad-Bachrach

Abstract: Graph Neural Networks (GNNs) have emerged as the predominant approach for learning over graph-structured data. However, most GNNs operate as black-box models and require post-hoc explanations, which may not suffice in high-stakes scenarios where transparency is crucial. In this paper, we present a GNN that is interpretable by design. Our model, Graph Neural Additive Network (GNAN), is a novel exte… ▽ More Graph Neural Networks (GNNs) have emerged as the predominant approach for learning over graph-structured data. However, most GNNs operate as black-box models and require post-hoc explanations, which may not suffice in high-stakes scenarios where transparency is crucial. In this paper, we present a GNN that is interpretable by design. Our model, Graph Neural Additive Network (GNAN), is a novel extension of the interpretable class of Generalized Additive Models, and can be visualized and fully understood by humans. GNAN is designed to be fully interpretable, allowing both global and local explanations at the feature and graph levels through direct visualization of the model. These visualizations describe the exact way the model uses the relationships between the target variable, the features, and the graph. We demonstrate the intelligibility of GNANs in a series of examples on different tasks and datasets. In addition, we show that the accuracy of GNAN is on par with black-box GNNs, making it suitable for critical applications where transparency is essential, alongside high accuracy. △ Less

Submitted 28 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2402.07639 [pdf, other]

Tighter Bounds on the Information Bottleneck with Application to Deep Learning

Authors: Nir Weingarten, Zohar Yakhini, Moshe Butman, Ran Gilad-Bachrach

Abstract: Deep Neural Nets (DNNs) learn latent representations induced by their downstream task, objective function, and other parameters. The quality of the learned representations impacts the DNN's generalization ability and the coherence of the emerging latent space. The Information Bottleneck (IB) provides a hypothetically optimal framework for data modeling, yet it is often intractable. Recent efforts… ▽ More Deep Neural Nets (DNNs) learn latent representations induced by their downstream task, objective function, and other parameters. The quality of the learned representations impacts the DNN's generalization ability and the coherence of the emerging latent space. The Information Bottleneck (IB) provides a hypothetically optimal framework for data modeling, yet it is often intractable. Recent efforts combined DNNs with the IB by applying VAE-inspired variational methods to approximate bounds on mutual information, resulting in improved robustness to adversarial attacks. This work introduces a new and tighter variational bound for the IB, improving performance of previous IB-inspired DNNs. These advancements strengthen the case for the IB and its variational approximations as a data modeling framework, and provide a simple method to significantly enhance the adversarial robustness of classifier DNNs. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 10 pages, 5 figures, code included in github repo

MSC Class: 94A08; 94A10; 94A11; 68T06; 62B04; 62B08 ACM Class: I.2; E.4; I.4; I.7

arXiv:2309.04332 [pdf, other]

Graph Neural Networks Use Graphs When They Shouldn't

Authors: Maya Bechler-Speicher, Ido Amos, Ran Gilad-Bachrach, Amir Globerson

Abstract: Predictions over graphs play a crucial role in various domains, including social networks and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Although a graph-structure is provided as input to the GNN, in some cases the best solution can be obtained by ignoring it. While GNNs have the ability to ignore the graph- structure in such cases, it… ▽ More Predictions over graphs play a crucial role in various domains, including social networks and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Although a graph-structure is provided as input to the GNN, in some cases the best solution can be obtained by ignoring it. While GNNs have the ability to ignore the graph- structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the given graph-structure. Namely, they use it even when a better solution can be obtained by ignoring it. We analyze the implicit bias of gradient-descent learning of GNNs and prove that when the ground truth function does not use the graphs, GNNs are not guaranteed to learn a solution that ignores the graph, even with infinite data. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this over-fitting. We also prove that within the family of regular graphs, GNNs are guaranteed to extrapolate when learning with gradient descent. Finally, based on our empirical and theoretical findings, we demonstrate on real-data how regular graphs can be leveraged to reduce graph overfitting and enhance performance. △ Less

Submitted 25 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2305.12167 [pdf, other]

The Case Against Explainability

Authors: Hofit Wasserman Rozen, Niva Elkin-Koren, Ran Gilad-Bachrach

Abstract: As artificial intelligence (AI) becomes more prevalent there is a growing demand from regulators to accompany decisions made by such systems with explanations. However, a persistent gap exists between the need to execute a meaningful right to explanation vs. the ability of Machine Learning systems to deliver on such a legal requirement. The regulatory appeal towards "a right to explanation" of AI… ▽ More As artificial intelligence (AI) becomes more prevalent there is a growing demand from regulators to accompany decisions made by such systems with explanations. However, a persistent gap exists between the need to execute a meaningful right to explanation vs. the ability of Machine Learning systems to deliver on such a legal requirement. The regulatory appeal towards "a right to explanation" of AI systems can be attributed to the significant role of explanations, part of the notion called reason-giving, in law. Therefore, in this work we examine reason-giving's purposes in law to analyze whether reasons provided by end-user Explainability can adequately fulfill them. We find that reason-giving's legal purposes include: (a) making a better and more just decision, (b) facilitating due-process, (c) authenticating human agency, and (d) enhancing the decision makers' authority. Using this methodology, we demonstrate end-user Explainabilty's inadequacy to fulfil reason-giving's role in law, given reason-giving's functions rely on its impact over a human decision maker. Thus, end-user Explainability fails, or is unsuitable, to fulfil the first, second and third legal function. In contrast we find that end-user Explainability excels in the fourth function, a quality which raises serious risks considering recent end-user Explainability research trends, Large Language Models' capabilities, and the ability to manipulate end-users by both humans and machines. Hence, we suggest that in some cases the right to explanation of AI systems could bring more harm than good to end users. Accordingly, this study carries some important policy ramifications, as it calls upon regulators and Machine Learning practitioners to reconsider the widespread pursuit of end-user Explainability and a right to explanation of AI systems. △ Less

Submitted 20 May, 2023; originally announced May 2023.

arXiv:2207.02760 [pdf, other]

TREE-G: Decision Trees Contesting Graph Neural Networks

Authors: Maya Bechler-Speicher, Amir Globerson, Ran Gilad-Bachrach

Abstract: When dealing with tabular data, models based on decision trees are a popular choice due to their high accuracy on these data types, their ease of application, and explainability properties. However, when it comes to graph-structured data, it is not clear how to apply them effectively, in a way that incorporates the topological information with the tabular data available on the vertices of the grap… ▽ More When dealing with tabular data, models based on decision trees are a popular choice due to their high accuracy on these data types, their ease of application, and explainability properties. However, when it comes to graph-structured data, it is not clear how to apply them effectively, in a way that incorporates the topological information with the tabular data available on the vertices of the graph. To address this challenge, we introduce TREE-G. TREE-G modifies standard decision trees, by introducing a novel split function that is specialized for graph data. Not only does this split function incorporate the node features and the topological information, but it also uses a novel pointer mechanism that allows split nodes to use information computed in previous splits. Therefore, the split function adapts to the predictive task and the graph at hand. We analyze the theoretical properties of TREE-G and demonstrate its benefits empirically on multiple graph and vertex prediction benchmarks. In these experiments, TREE-G consistently outperforms other tree-based models and often outperforms other graph-learning algorithms such as Graph Neural Networks (GNNs) and Graph Kernels, sometimes by large margins. Moreover, TREE-Gs models and their predictions can be explained and visualized △ Less

Submitted 25 February, 2024; v1 submitted 6 July, 2022; originally announced July 2022.

arXiv:2206.08204 [pdf, ps, other]

Inherent Inconsistencies of Feature Importance

Authors: Nimrod Harel, Uri Obolski, Ran Gilad-Bachrach

Abstract: The rapid advancement and widespread adoption of machine learning-driven technologies have underscored the practical and ethical need for creating interpretable artificial intelligence systems. Feature importance, a method that assigns scores to the contribution of individual features on prediction outcomes, seeks to bridge this gap as a tool for enhancing human comprehension of these systems. Fea… ▽ More The rapid advancement and widespread adoption of machine learning-driven technologies have underscored the practical and ethical need for creating interpretable artificial intelligence systems. Feature importance, a method that assigns scores to the contribution of individual features on prediction outcomes, seeks to bridge this gap as a tool for enhancing human comprehension of these systems. Feature importance serves as an explanation of predictions in diverse contexts, whether by providing a global interpretation of a phenomenon across the entire dataset or by offering a localized explanation for the outcome of a specific data point. Furthermore, feature importance is being used both for explaining models and for identifying plausible causal relations in the data, independently from the model. However, it is worth noting that these various contexts have traditionally been explored in isolation, with limited theoretical foundations. This paper presents an axiomatic framework designed to establish coherent relationships among the different contexts of feature importance scores. Notably, our work unveils a surprising conclusion: when we combine the proposed properties with those previously outlined in the literature, we demonstrate the existence of an inconsistency. This inconsistency highlights that certain essential properties of feature importance scores cannot coexist harmoniously within a single framework. △ Less

Submitted 5 December, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

arXiv:2112.02144 [pdf, ps, other]

PyBryt: auto-assessment and auto-grading for computational thinking

Authors: Christopher Pyles, Francois van Schalkwyk, Gerard J. Gorman, Marijan Beg, Lee Stott, Nir Levy, Ran Gilad-Bachrach

Abstract: We continuously interact with computerized systems to achieve goals and perform tasks in our personal and professional lives. Therefore, the ability to program such systems is a skill needed by everyone. Consequently, computational thinking skills are essential for everyone, which creates a challenge for the educational system to teach these skills at scale and allow students to practice these ski… ▽ More We continuously interact with computerized systems to achieve goals and perform tasks in our personal and professional lives. Therefore, the ability to program such systems is a skill needed by everyone. Consequently, computational thinking skills are essential for everyone, which creates a challenge for the educational system to teach these skills at scale and allow students to practice these skills. To address this challenge, we present a novel approach to providing formative feedback to students on programming assignments. Our approach uses dynamic evaluation to trace intermediate results generated by student's code and compares them to the reference implementation provided by their teachers. We have implemented this method as a Python library and demonstrate its use to give students relevant feedback on their work while allowing teachers to challenge their students' computational thinking skills. △ Less

Submitted 3 December, 2021; originally announced December 2021.

arXiv:2110.11819 [pdf, other]

A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits

Authors: Pierre Laforgue, Giulia Clerici, Nicolò Cesa-Bianchi, Ran Gilad-Bachrach

Abstract: Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependen… ▽ More Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function. This enables the modeling of phenomena such as progressive satiation and periodic behaviours. Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute). Similarly to previous works, our regret analysis is based on defining and solving an appropriate trade-off between approximation and estimation. Preliminary experiments confirm the superiority of our algorithm over both the oracle greedy approach and a vanilla CSB solver. △ Less

Submitted 7 March, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

arXiv:2103.07668 [pdf, ps, other]

Robust Model Compression Using Deep Hypotheses

Authors: Omri Armstrong, Ran Gilad-Bachrach

Abstract: Machine Learning models should ideally be compact and robust. Compactness provides efficiency and comprehensibility whereas robustness provides resilience. Both topics have been studied in recent years but in isolation. Here we present a robust model compression scheme which is independent of model types: it can compress ensembles, neural networks and other types of models into diverse types of sm… ▽ More Machine Learning models should ideally be compact and robust. Compactness provides efficiency and comprehensibility whereas robustness provides resilience. Both topics have been studied in recent years but in isolation. Here we present a robust model compression scheme which is independent of model types: it can compress ensembles, neural networks and other types of models into diverse types of small models. The main building block is the notion of depth derived from robust statistics. Originally, depth was introduced as a measure of the centrality of a point in a sample such that the median is the deepest point. This concept was extended to classification functions which makes it possible to define the depth of a hypothesis and the median hypothesis. Algorithms have been suggested to approximate the median but they have been limited to binary classification. In this study, we present a new algorithm, the Multiclass Empirical Median Optimization (MEMO) algorithm that finds a deep hypothesis in multi-class tasks, and prove its correctness. This leads to our Compact Robust Estimated Median Belief Optimization (CREMBO) algorithm for robust model compression. We demonstrate the success of this algorithm empirically by compressing neural networks and random forests into small decision trees, which are interpretable models, and show that they are more accurate and robust than other comparable methods. In addition, our empirical study shows that our method outperforms Knowledge Distillation on DNN to DNN compression. △ Less

Submitted 13 March, 2021; originally announced March 2021.

arXiv:2010.07910 [pdf, other]

Marginal Contribution Feature Importance -- an Axiomatic Approach for The Natural Case

Authors: Amnon Catav, Boyang Fu, Jason Ernst, Sriram Sankararaman, Ran Gilad-Bachrach

Abstract: When training a predictive model over medical data, the goal is sometimes to gain insights about a certain disease. In such cases, it is common to use feature importance as a tool to highlight significant factors contributing to that disease. As there are many existing methods for computing feature importance scores, understanding their relative merits is not trivial. Further, the diversity of sce… ▽ More When training a predictive model over medical data, the goal is sometimes to gain insights about a certain disease. In such cases, it is common to use feature importance as a tool to highlight significant factors contributing to that disease. As there are many existing methods for computing feature importance scores, understanding their relative merits is not trivial. Further, the diversity of scenarios in which they are used lead to different expectations from the feature importance scores. While it is common to make the distinction between local scores that focus on individual predictions and global scores that look at the contribution of a feature to the model, another important division distinguishes model scenarios, in which the goal is to understand predictions of a given model from natural scenarios, in which the goal is to understand a phenomenon such as a disease. We develop a set of axioms that represent the properties expected from a feature importance function in the natural scenario and prove that there exists only one function that satisfies all of them, the Marginal Contribution Feature Importance (MCI). We analyze this function for its theoretical and empirical properties and compare it to other feature importance scores. While our focus is the natural scenario, we suggest that our axiomatic approach could be carried out in other scenarios too. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:1910.12274 [pdf, other]

Algorithmic Copywriting: Automated Generation of Health-Related Advertisements to Improve their Performance

Authors: Brit Youngmann, Ran Gilad-Bachrach, Danny Karmon, Elad Yom-Tov

Abstract: Search advertising, a popular method for online marketing, has been employed to improve health by eliciting positive behavioral change. However, writing effective advertisements requires expertise and experimentation, which may not be available to health authorities wishing to elicit such changes, especially when dealing with public health crises such as epidemic outbreaks. Here we develop a fra… ▽ More Search advertising, a popular method for online marketing, has been employed to improve health by eliciting positive behavioral change. However, writing effective advertisements requires expertise and experimentation, which may not be available to health authorities wishing to elicit such changes, especially when dealing with public health crises such as epidemic outbreaks. Here we develop a framework, comprised of two neural networks models, that automatically generate ads. First, it employs a generator model, which create ads from web pages. It then employs a translation model, which transcribes ads to improve performance. We trained the networks using 114K health-related ads shown on Microsoft Advertising. We measure ads performance using the click-through rates (CTR). Our experiments show that the generated advertisements received approximately the same CTR as human-authored ads. The marginal contribution of the generator model was, on average, 28\% lower than that of human-authored ads, while the translator model received, on average, 32\% more clicks than human-authored ads. Our analysis shows that the translator model produces ads reflecting higher values of psychological attributes associated with a user action, including higher valance and arousal, and more calls-to-actions. In contrast, levels of these attributes in ads produced by the generator model are similar to those of human-authored ads. Our results demonstrate the ability to automatically generate useful advertisements for the health domain. We believe that our work offers health authorities an improved ability to nudge people towards healthier behaviors while saving the time and cost needed to build effective advertising campaigns. △ Less

Submitted 12 July, 2020; v1 submitted 27 October, 2019; originally announced October 2019.

arXiv:1904.12268 [pdf, other]

E-Gotsky: Sequencing Content using the Zone of Proximal Development

Authors: Oded Vainas, Ori Bar-Ilan, Yossi Ben-David, Ran Gilad-Bachrach, Galit Lukin, Meitar Ronen, Roi Shillo, Daniel Sitton

Abstract: Vygotsky's notions of Zone of Proximal Development and Dynamic Assessment emphasize the importance of personalized learning that adapts to the needs and abilities of the learners and enables more efficient learning. In this work we introduce a novel adaptive learning engine called E-gostky that builds on these concepts to personalize the learning path within an e-learning system. E-gostky uses mac… ▽ More Vygotsky's notions of Zone of Proximal Development and Dynamic Assessment emphasize the importance of personalized learning that adapts to the needs and abilities of the learners and enables more efficient learning. In this work we introduce a novel adaptive learning engine called E-gostky that builds on these concepts to personalize the learning path within an e-learning system. E-gostky uses machine learning techniques to select the next content item that will challenge the student but will not be overwhelming, kee** students in their Zone of Proximal Development. To evaluate the system, we conducted an experiment where hundreds of students from several different elementary schools used our engine to learn fractions for five months. Our results show that using E-gostky can significantly reduce the time required to reach similar mastery. Specifically, in our experiment, it took students who were using the adaptive learning engine $17\%$ less time to reach a similar level of mastery as of those who didn't. Moreover, students made greater efforts to find the correct answer rather than guessing and class teachers reported that even students with learning disabilities showed higher engagement. △ Less

Submitted 28 April, 2019; originally announced April 2019.

Comments: A short version of this paper was accepted for publication Educational Data Mining (EDM) conference 2019

arXiv:1812.10659 [pdf, other]

Low Latency Privacy Preserving Inference

Authors: Alon Brutzkus, Oren Elisha, Ran Gilad-Bachrach

Abstract: When applying machine learning to sensitive data, one has to find a balance between accuracy, information security, and computational-complexity. Recent studies combined Homomorphic Encryption with neural networks to make inferences while protecting against information leakage. However, these methods are limited by the width and depth of neural networks that can be used (and hence the accuracy) an… ▽ More When applying machine learning to sensitive data, one has to find a balance between accuracy, information security, and computational-complexity. Recent studies combined Homomorphic Encryption with neural networks to make inferences while protecting against information leakage. However, these methods are limited by the width and depth of neural networks that can be used (and hence the accuracy) and exhibit high latency even for relatively simple networks. In this study we provide two solutions that address these limitations. In the first solution, we present more than $10\times$ improvement in latency and enable inference on wider networks compared to prior attempts with the same level of security. The improved performance is achieved by novel methods to represent the data during the computation. In the second solution, we apply the method of transfer learning to provide private inference services using deep networks with latency of $\sim0.16$ seconds. We demonstrate the efficacy of our methods on several computer vision tasks. △ Less

Submitted 6 June, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

arXiv:1810.02066 [pdf, ps, other]

Turning Lemons into Peaches using Secure Computation

Authors: Stav Buchsbaum, Ran Gilad-Bachrach, Yehuda Lindell

Abstract: In many cases, assessing the quality of goods is hard. For example, when purchasing a car, it is hard to measure how pollutant the car is since there are infinitely many driving conditions to be tested. Typically, these situations are considered under the umbrella of information asymmetry and as Akelrof showed may lead to a market of lemons. However, we argue that in many of these situations, the… ▽ More In many cases, assessing the quality of goods is hard. For example, when purchasing a car, it is hard to measure how pollutant the car is since there are infinitely many driving conditions to be tested. Typically, these situations are considered under the umbrella of information asymmetry and as Akelrof showed may lead to a market of lemons. However, we argue that in many of these situations, the problem is not the missing information but the computational challenge of obtaining it. In a nut-shell, if verifying the value of goods requires a large amount of computation or even infinite amounts of computation, the buyer is forced to use a finite test that samples, in some sense, the quality of the goods. However, if the seller knows the test, then the seller can over-fit the test and create goods that pass the quality test despite not having the desired quality. We show different solutions to this situation including a novel approach that uses secure computation to hide the test from the seller to prevent over-fitting. △ Less

Submitted 4 October, 2018; originally announced October 2018.

arXiv:1804.06909 [pdf, other]

Modeling and Simultaneously Removing Bias via Adversarial Neural Networks

Authors: John Moore, Joel Pfeiffer, Kai Wei, Rishabh Iyer, Denis Charles, Ran Gilad-Bachrach, Levi Boyles, Eren Manavoglu

Abstract: In real world systems, the predictions of deployed Machine Learned models affect the training data available to build subsequent models. This introduces a bias in the training data that needs to be addressed. Existing solutions to this problem attempt to resolve the problem by either casting this in the reinforcement learning framework or by quantifying the bias and re-weighting the loss functions… ▽ More In real world systems, the predictions of deployed Machine Learned models affect the training data available to build subsequent models. This introduces a bias in the training data that needs to be addressed. Existing solutions to this problem attempt to resolve the problem by either casting this in the reinforcement learning framework or by quantifying the bias and re-weighting the loss functions. In this work, we develop a novel Adversarial Neural Network (ANN) model, an alternative approach which creates a representation of the data that is invariant to the bias. We take the Paid Search auction as our working example and ad display position features as the confounding features for this setting. We show the success of this approach empirically on both synthetic data as well as real world paid search auction data from a major search engine. △ Less

Submitted 18 April, 2018; originally announced April 2018.

arXiv:1710.10556 [pdf, ps, other]

Smooth Sensitivity Based Approach for Differentially Private Principal Component Analysis

Authors: Ran Gilad-Bachrach, Alon Gonen

Abstract: Currently known methods for this task either employ the computationally intensive \emph{exponential mechanism} or require an access to the covariance matrix, and therefore fail to utilize potential sparsity of the data. The problem of designing simpler and more efficient methods for this task has been raised as an open problem in \cite{kapralov2013differentially}. In this paper we address this p… ▽ More Currently known methods for this task either employ the computationally intensive \emph{exponential mechanism} or require an access to the covariance matrix, and therefore fail to utilize potential sparsity of the data. The problem of designing simpler and more efficient methods for this task has been raised as an open problem in \cite{kapralov2013differentially}. In this paper we address this problem by employing the output perturbation mechanism. Despite being arguably the simplest and most straightforward technique, it has been overlooked due to the large \emph{global sensitivity} associated with publishing the leading eigenvector. We tackle this issue by adopting a \emph{smooth sensitivity} based approach, which allows us to establish differential privacy (in a worst-case manner) and near-optimal sample complexity results under eigengap assumption. We consider both the pure and the approximate notions of differential privacy, and demonstrate a tradeoff between privacy level and sample complexity. We conclude by suggesting how our results can be extended to related problems. △ Less

Submitted 29 February, 2020; v1 submitted 28 October, 2017; originally announced October 2017.

arXiv:1505.01866 [pdf, other]

DART: Dropouts meet Multiple Additive Regression Trees

Authors: K. V. Rashmi, Ran Gilad-Bachrach

Abstract: Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaini… ▽ More Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen data, and also makes the model over-sensitive to the contributions of the few, initially added tress. We show that the commonly used tool to address this issue, that of shrinkage, alleviates the problem only to a certain extent and the fundamental issue of over-specialization still remains. In this work, we explore a different approach to address the problem that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. We propose a novel way of employing dropouts in MART, resulting in the DART algorithm. We evaluate DART on ranking, regression and classification tasks, using large scale, publicly available datasets, and show that DART outperforms MART in each of the tasks, with a significant margin. We also show that DART overcomes the issue of over-specialization to a considerable extent. △ Less

Submitted 7 May, 2015; originally announced May 2015.

Comments: AIStats 2015

arXiv:1412.6181 [pdf, other]

Crypto-Nets: Neural Networks over Encrypted Data

Authors: Pengtao Xie, Misha Bilenko, Tom Finley, Ran Gilad-Bachrach, Kristin Lauter, Michael Naehrig

Abstract: The problem we address is the following: how can a user employ a predictive model that is held by a third party, without compromising private information. For example, a hospital may wish to use a cloud service to predict the readmission risk of a patient. However, due to regulations, the patient's medical files cannot be revealed. The goal is to make an inference using the model, without jeopardi… ▽ More The problem we address is the following: how can a user employ a predictive model that is held by a third party, without compromising private information. For example, a hospital may wish to use a cloud service to predict the readmission risk of a patient. However, due to regulations, the patient's medical files cannot be revealed. The goal is to make an inference using the model, without jeopardizing the accuracy of the prediction or the privacy of the data. To achieve high accuracy, we use neural networks, which have been shown to outperform other learning models for many tasks. To achieve the privacy requirements, we use homomorphic encryption in the following protocol: the data owner encrypts the data and sends the ciphertexts to the third party to obtain a prediction from a trained model. The model operates on these ciphertexts and sends back the encrypted prediction. In this protocol, not only the data remains private, even the values predicted are available only to the data owner. Using homomorphic encryption and modifications to the activation functions and training algorithms of neural networks, we show that it is protocol is possible and may be feasible. This method paves the way to build a secure cloud-based neural network prediction services without invading users' privacy. △ Less

Submitted 24 December, 2014; v1 submitted 18 December, 2014; originally announced December 2014.

arXiv:1311.7184 [pdf, other]

Using Multiple Samples to Learn Mixture Models

Authors: Jason D Lee, Ran Gilad-Bachrach, Rich Caruana

Abstract: In the mixture models problem it is assumed that there are $K$ distributions $θ_{1},\ldots,θ_{K}$ and one gets to observe a sample from a mixture of these distributions with unknown coefficients. The goal is to associate instances with their generating distributions, or to identify the parameters of the hidden distributions. In this work we make the assumption that we have access to several sample… ▽ More In the mixture models problem it is assumed that there are $K$ distributions $θ_{1},\ldots,θ_{K}$ and one gets to observe a sample from a mixture of these distributions with unknown coefficients. The goal is to associate instances with their generating distributions, or to identify the parameters of the hidden distributions. In this work we make the assumption that we have access to several samples drawn from the same $K$ underlying distributions, but with different mixing weights. As with topic modeling, having multiple samples is often a reasonable assumption. Instead of pooling the data into one sample, we prove that it is possible to use the differences between the samples to better recover the underlying structure. We present algorithms that recover the underlying structure under milder assumptions than the current state of art when either the dimensionality or the separation is high. The methods, when applied to topic modeling, allow generalization to words not present in the training data. △ Less

Submitted 27 November, 2013; originally announced November 2013.

Comments: Published in Neural Information Processing Systems (NIPS) 2013

arXiv:1012.1370 [pdf, ps, other]

Robust Distributed Online Prediction

Authors: Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, Lin Xiao

Abstract: The standard model of online prediction deals with serial processing of inputs by a single processor. However, in large-scale online prediction problems, where inputs arrive at a high rate, an increasingly common necessity is to distribute the computation across several processors. A non-trivial challenge is to design distributed algorithms for online prediction, which maintain good regret guarant… ▽ More The standard model of online prediction deals with serial processing of inputs by a single processor. However, in large-scale online prediction problems, where inputs arrive at a high rate, an increasingly common necessity is to distribute the computation across several processors. A non-trivial challenge is to design distributed algorithms for online prediction, which maintain good regret guarantees. In \cite{DMB}, we presented the DMB algorithm, which is a generic framework to convert any serial gradient-based online prediction algorithm into a distributed algorithm. Moreover, its regret guarantee is asymptotically optimal for smooth convex loss functions and stochastic inputs. On the flip side, it is fragile to many types of failures that are common in distributed environments. In this companion paper, we present variants of the DMB algorithm, which are resilient to many types of network failures, and tolerant to varying performance of the computing nodes. △ Less

Submitted 6 December, 2010; originally announced December 2010.

arXiv:1012.1367 [pdf, ps, other]

Optimal Distributed Online Prediction using Mini-Batches

Authors: Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, Lin Xiao

Abstract: Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the \emph{distributed mini-batch} algorithm, a method of converting many serial gradie… ▽ More Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the \emph{distributed mini-batch} algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem. △ Less

Submitted 31 January, 2012; v1 submitted 6 December, 2010; originally announced December 2010.

Comments: Final version of paper to appear in Journal of Machine Learning Research (JMLR)

arXiv:0903.1125 [pdf, ps, other]

Efficient Human Computation

Authors: Ran Gilad-Bachrach, Aharon Bar-Hillel, Liat Ein-Dor

Abstract: Collecting large labeled data sets is a laborious and expensive task, whose scaling up requires division of the labeling workload between many teachers. When the number of classes is large, miscorrespondences between the labels given by the different teachers are likely to occur, which, in the extreme case, may reach total inconsistency. In this paper we describe how globally consistent labels c… ▽ More Collecting large labeled data sets is a laborious and expensive task, whose scaling up requires division of the labeling workload between many teachers. When the number of classes is large, miscorrespondences between the labels given by the different teachers are likely to occur, which, in the extreme case, may reach total inconsistency. In this paper we describe how globally consistent labels can be obtained, despite the absence of teacher coordination, and discuss the possible efficiency of this process in terms of human labor. We define a notion of label efficiency, measuring the ratio between the number of globally consistent labels obtained and the number of labels provided by distributed teachers. We show that the efficiency depends critically on the ratio alpha between the number of data instances seen by a single teacher, and the number of classes. We suggest several algorithms for the distributed labeling problem, and analyze their efficiency as a function of alpha. In addition, we provide an upper bound on label efficiency for the case of completely uncoordinated teachers, and show that efficiency approaches 0 as the ratio between the number of labels each teacher provides and the number of classes drops (i.e. alpha goes to 0). △ Less

Submitted 5 March, 2009; originally announced March 2009.

Comments: 8 pages, 1 figure

Showing 1–22 of 22 results for author: Gilad-Bachrach, R