-
Minimax Excess Risk of First-Order Methods for Statistical Learning with Data-Dependent Oracles
Authors:
Kevin Scaman,
Mathieu Even,
Batiste Le Bars,
Laurent Massoulié
Abstract:
In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient…
▽ More
In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient is accessed through partial observations given by a data-dependent oracle. This novel class of oracles can query the gradient with any given data distribution, and is thus well suited to scenarios in which the training data distribution does not match the target (or test) distribution. In particular, our upper and lower bounds are proportional to the smallest mean square error achievable by gradient estimators, thus allowing us to easily derive multiple sharp bounds in the aforementioned scenarios using the extensive literature on parameter estimation.
△ Less
Submitted 1 July, 2024; v1 submitted 10 July, 2023;
originally announced July 2023.
-
Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm
Authors:
Batiste Le Bars,
Aurélien Bellet,
Marc Tommasi,
Kevin Scaman,
Giovanni Neglia
Abstract:
This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex…
▽ More
This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial for generalization.
△ Less
Submitted 13 June, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Black-box Acceleration of Las Vegas Algorithms and Algorithmic Reverse Jensen's Inequalities
Authors:
Kevin Scaman
Abstract:
Let $\mathcal{A}$ be a Las Vegas algorithm, i.e. an algorithm whose running time $T$ is a random variable drawn according to a certain probability distribution $p$. In 1993, Luby, Sinclair and Zuckerman [LSZ93] proved that a simple universal restart strategy can, for any probability distribution $p$, provide an algorithm executing $\mathcal{A}$ and whose expected running time is…
▽ More
Let $\mathcal{A}$ be a Las Vegas algorithm, i.e. an algorithm whose running time $T$ is a random variable drawn according to a certain probability distribution $p$. In 1993, Luby, Sinclair and Zuckerman [LSZ93] proved that a simple universal restart strategy can, for any probability distribution $p$, provide an algorithm executing $\mathcal{A}$ and whose expected running time is $O(\ell^\star_p\log\ell^\star_p)$, where $\ell^\star_p=Θ\left(\inf_{q\in (0,1]}Q_p(q)/q\right)$ is the minimum expected running time achievable with full prior knowledge of the probability distribution $p$, and $Q_p(q)$ is the $q$-quantile of $p$. Moreover, the authors showed that the logarithmic term could not be removed for universal restart strategies and was, in a certain sense, optimal. In this work, we show that, quite surprisingly, the logarithmic term can be replaced by a smaller quantity, thus reducing the expected running time in practical settings of interest. More precisely, we propose a novel restart strategy that executes $\mathcal{A}$ and whose expected running time is $O\big(\inf_{q\in (0,1]}\frac{Q_p(q)}{q}\,ψ\big(\log Q_p(q),\,\log (1/q)\big)\big)$ where $ψ(a,b)=1+\min\left\{a+b,a\log^2 a,\,b\log^2 b\right\}$. This quantity is, up to a multiplicative factor, better than: 1) the universal restart strategy of [LSZ93], 2) any $q$-quantile of $p$ for $q\in(0,1]$, 3) the original algorithm, and 4) any quantity of the form $φ^{-1}(\mathbb{E}[φ(T)])$ for a large class of concave functions $φ$. The latter extends the recent restart strategy of [Zam22] achieving $O\left(e^{\mathbb{E}[\ln(T)]}\right)$, and can be thought of as algorithmic reverse Jensen's inequalities. Finally, we show that the behavior of $\frac{tφ''(t)}{φ'(t)}$ at infinity controls the existence of reverse Jensen's inequalities by providing a necessary and a sufficient condition for these inequalities to hold.
△ Less
Submitted 10 July, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Convergence beyond the over-parameterized regime using Rayleigh quotients
Authors:
David A. R. Robin,
Kevin Scaman,
Marc Lelarge
Abstract:
In this paper, we present a new strategy to prove the convergence of deep learning architectures to a zero training (or even testing) loss by gradient flow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Łojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a unified view for se…
▽ More
In this paper, we present a new strategy to prove the convergence of deep learning architectures to a zero training (or even testing) loss by gradient flow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Łojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a unified view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to infinity, nor the number of samples to be finite, thus extending to test loss minimization and beyond the over-parameterized regime.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
SIFU: Sequential Informed Federated Unlearning for Efficient and Provable Client Unlearning in Federated Optimization
Authors:
Yann Fraboni,
Martin Van Waerebeke,
Kevin Scaman,
Richard Vidal,
Laetitia Kameni,
Marco Lorenzi
Abstract:
Machine Unlearning (MU) is an increasingly important topic in machine learning safety, aiming at removing the contribution of a given data point from a training procedure. Federated Unlearning (FU) consists in extending MU to unlearn a given client's contribution from a federated training routine. While several FU methods have been proposed, we currently lack a general approach providing formal un…
▽ More
Machine Unlearning (MU) is an increasingly important topic in machine learning safety, aiming at removing the contribution of a given data point from a training procedure. Federated Unlearning (FU) consists in extending MU to unlearn a given client's contribution from a federated training routine. While several FU methods have been proposed, we currently lack a general approach providing formal unlearning guarantees to the FedAvg routine, while ensuring scalability and generalization beyond the convex assumption on the clients' loss functions. We aim at filling this gap by proposing SIFU (Sequential Informed Federated Unlearning), a new FU method applying to both convex and non-convex optimization regimes. SIFU naturally applies to FedAvg without additional computational cost for the clients and provides formal guarantees on the quality of the unlearning task. We provide a theoretical analysis of the unlearning properties of SIFU, and practically demonstrate its effectiveness as compared to a panel of unlearning methods from the state-of-the-art.
△ Less
Submitted 15 March, 2024; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize
Authors:
Alain Durmus,
Eric Moulines,
Alexey Naumov,
Sergey Samsonov,
Kevin Scaman,
Hoi-To Wai
Abstract:
This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}θ= \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$. Our ana…
▽ More
This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}θ= \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$. Our analysis is based on new results regarding moments and high probability bounds for products of matrices which are shown to be tight. We derive high probability bounds on the performance of LSA under weaker conditions on the sequence $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$ than previous works. However, in contrast, we establish polynomial concentration bounds with order depending on the stepsize. We show that our conclusions cannot be improved without additional assumptions on the sequence of random matrices $\{{\bf A}_n: n \in \mathbb{N}^*\}$, and in particular that no Gaussian or exponential high probability bounds can hold. Finally, we pay a particular attention to establishing bounds with sharp order with respect to the number of iterations and the stepsize and whose leading terms contain the covariance matrices appearing in the central limit theorems.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks
Authors:
George Dasoulas,
Kevin Scaman,
Aladin Virmaux
Abstract:
Attention based neural networks are state of the art in a large range of applications. However, their performance tends to degrade when the number of layers increases. In this work, we show that enforcing Lipschitz continuity by normalizing the attention scores can significantly improve the performance of deep attention models. First, we show that, for deep graph attention networks (GAT), gradient…
▽ More
Attention based neural networks are state of the art in a large range of applications. However, their performance tends to degrade when the number of layers increases. In this work, we show that enforcing Lipschitz continuity by normalizing the attention scores can significantly improve the performance of deep attention models. First, we show that, for deep graph attention networks (GAT), gradient explosion appears during training, leading to poor performance of gradient-based training algorithms. To address this issue, we derive a theoretical analysis of the Lipschitz continuity of attention modules and introduce LipschitzNorm, a simple and parameter-free normalization for self-attention mechanisms that enforces the model to be Lipschitz continuous. We then apply LipschitzNorm to GAT and Graph Transformers and show that their performance is substantially improved in the deep setting (10 to 30 layers). More specifically, we show that a deep GAT model with LipschitzNorm achieves state of the art results for node label prediction tasks that exhibit long-range dependencies, while showing consistent improvements over their unnormalized counterparts in benchmark node classification tasks.
△ Less
Submitted 13 September, 2021; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Improving Hierarchical Adversarial Robustness of Deep Neural Networks
Authors:
Avery Ma,
Aladin Virmaux,
Kevin Scaman,
Juwei Lu
Abstract:
Do all adversarial examples have the same consequences? An autonomous driving system misclassifying a pedestrian as a car may induce a far more dangerous -- and even potentially lethal -- behavior than, for instance, a car as a bus. In order to better tackle this important problematic, we introduce the concept of hierarchical adversarial robustness. Given a dataset whose classes can be grouped int…
▽ More
Do all adversarial examples have the same consequences? An autonomous driving system misclassifying a pedestrian as a car may induce a far more dangerous -- and even potentially lethal -- behavior than, for instance, a car as a bus. In order to better tackle this important problematic, we introduce the concept of hierarchical adversarial robustness. Given a dataset whose classes can be grouped into coarse-level labels, we define hierarchical adversarial examples as the ones leading to a misclassification at the coarse level. To improve the resistance of neural networks to hierarchical attacks, we introduce a hierarchical adversarially robust (HAR) network design that decomposes a single classification task into one coarse and multiple fine classification tasks, before being specifically trained by adversarial defense techniques. As an alternative to an end-to-end learning approach, we show that HAR significantly improves the robustness of the network against $\ell_2$ and $\ell_{\infty}$ bounded hierarchical attacks on the CIFAR-10 and CIFAR-100 dataset.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Ego-based Entropy Measures for Structural Representations on Graphs
Authors:
George Dasoulas,
Giannis Nikolentzos,
Kevin Scaman,
Aladin Virmaux,
Michalis Vazirgiannis
Abstract:
Machine learning on graph-structured data has attracted high research interest due to the emergence of Graph Neural Networks (GNNs). Most of the proposed GNNs are based on the node homophily, i.e neighboring nodes share similar characteristics. However, in many complex networks, nodes that lie to distant parts of the graph share structurally equivalent characteristics and exhibit similar roles (e.…
▽ More
Machine learning on graph-structured data has attracted high research interest due to the emergence of Graph Neural Networks (GNNs). Most of the proposed GNNs are based on the node homophily, i.e neighboring nodes share similar characteristics. However, in many complex networks, nodes that lie to distant parts of the graph share structurally equivalent characteristics and exhibit similar roles (e.g chemical properties of distant atoms in a molecule, type of social network users). A growing literature proposed representations that identify structurally equivalent nodes. However, most of the existing methods require high time and space complexity. In this paper, we propose VNEstruct, a simple approach, based on entropy measures of the neighborhood's topology, for generating low-dimensional structural representations, that is time-efficient and robust to graph perturbations. Empirically, we observe that VNEstruct exhibits robustness on structural role identification tasks. Moreover, VNEstruct can achieve state-of-the-art performance on graph classification, without incorporating the graph structure information in the optimization, in contrast to GNN competitors.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Ego-based Entropy Measures for Structural Representations
Authors:
George Dasoulas,
Giannis Nikolentzos,
Kevin Scaman,
Aladin Virmaux,
Michalis Vazirgiannis
Abstract:
In complex networks, nodes that share similar structural characteristics often exhibit similar roles (e.g type of users in a social network or the hierarchical position of employees in a company). In order to leverage this relationship, a growing literature proposed latent representations that identify structurally equivalent nodes. However, most of the existing methods require high time and space…
▽ More
In complex networks, nodes that share similar structural characteristics often exhibit similar roles (e.g type of users in a social network or the hierarchical position of employees in a company). In order to leverage this relationship, a growing literature proposed latent representations that identify structurally equivalent nodes. However, most of the existing methods require high time and space complexity. In this paper, we propose VNEstruct, a simple approach for generating low-dimensional structural node embeddings, that is both time efficient and robust to perturbations of the graph structure. The proposed approach focuses on the local neighborhood of each node and employs the Von Neumann entropy, an information-theoretic tool, to extract features that capture the neighborhood's topology. Moreover, on graph classification tasks, we suggest the utilization of the generated structural embeddings for the transformation of an attributed graph structure into a set of augmented node attributes. Empirically, we observe that the proposed approach exhibits robustness on structural role identification tasks and state-of-the-art performance on graph classification tasks, while maintaining very high computational speed.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Coloring graph neural networks for node disambiguation
Authors:
George Dasoulas,
Ludovic Dos Santos,
Kevin Scaman,
Aladin Virmaux
Abstract:
In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks(MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of contin…
▽ More
In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks(MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of continuous functions on graphs with node attributes. Our method relies on separability , a key topological characteristic that allows to extend well-chosen neural networks into universal representations. Finally, we show experimentally that CLIP is capable of capturing structural characteristics that traditional MPNNs fail to distinguish,while being state-of-the-art on benchmark graph classification datasets.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.
-
Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
Authors:
Igor Colin,
Ludovic Dos Santos,
Kevin Scaman
Abstract:
We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optim…
▽ More
We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon^{-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Lipschitz regularity of deep neural networks: analysis and efficient estimation
Authors:
Kevin Scaman,
Aladin Virmaux
Abstract:
Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures. First, we show that, even for t…
▽ More
Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures. First, we show that, even for two layer neural networks, the exact computation of this quantity is NP-hard and state-of-art methods may significantly overestimate it. Then, we both extend and improve previous estimation methods by providing AutoLip, the first generic algorithm for upper bounding the Lipschitz constant of any automatically differentiable function. We provide a power method algorithm working with automatic differentiation, allowing efficient computations even on large convolutions. Second, for sequential neural networks, we propose an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers. Third we propose heuristics on SeqLip in order to tackle very large networks. Our experiments show that SeqLip can significantly improve on the existing upper bounds. Finally, we provide an implementation of AutoLip in the PyTorch environment that may be used to better estimate the robustness of a given neural network to small perturbations or regularize it using more precise Lipschitz estimations.
△ Less
Submitted 25 October, 2019; v1 submitted 28 May, 2018;
originally announced May 2018.
-
KONG: Kernels for ordered-neighborhood graphs
Authors:
Moez Draief,
Konstantin Kutzkov,
Kevin Scaman,
Milan Vojnovic
Abstract:
We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation…
▽ More
We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation of explicit graph feature maps using sketching techniques. We obtain precise bounds for the approximation accuracy and computational complexity of the proposed approaches and demonstrate their applicability on real datasets. In particular, our experiments demonstrate that neighborhood ordering results in more informative features. For the special case of general graphs, i.e. graphs without ordered neighborhoods, the new graph kernels yield efficient and simple algorithms for the comparison of label distributions between graphs.
△ Less
Submitted 29 May, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
A Spectral Method for Activity Sha** in Continuous-Time Information Cascades
Authors:
Kevin Scaman,
Argyris Kalogeratos,
Luca Corinzia,
Nicolas Vayatis
Abstract:
Information Cascades Model captures dynamical properties of user activity in a social network. In this work, we develop a novel framework for activity sha** under the Continuous-Time Information Cascades Model which allows the administrator for local control actions by allocating targeted resources that can alter the spread of the process. Our framework employs the optimization of the spectral r…
▽ More
Information Cascades Model captures dynamical properties of user activity in a social network. In this work, we develop a novel framework for activity sha** under the Continuous-Time Information Cascades Model which allows the administrator for local control actions by allocating targeted resources that can alter the spread of the process. Our framework employs the optimization of the spectral radius of the Hazard matrix, a quantity that has been shown to drive the maximum influence in a network, while enjoying a simple convex relaxation when used to minimize the influence of the cascade. In addition, use-cases such as quarantine and node immunization are discussed to highlight the generality of the proposed activity sha** framework. Finally, we present the NetShape influence minimization method which is compared favorably to baseline and state-of-the-art approaches through simulations on real social networks.
△ Less
Submitted 15 September, 2017;
originally announced September 2017.
-
What Makes a Good Plan? An Efficient Planning Approach to Control Diffusion Processes in Networks
Authors:
Kevin Scaman,
Argyris Kalogeratos,
Nicolas Vayatis
Abstract:
In this paper, we analyze the quality of a large class of simple dynamic resource allocation (DRA) strategies which we name priority planning. Their aim is to control an undesired diffusion process by distributing resources to the contagious nodes of the network according to a predefined priority-order. In our analysis, we reduce the DRA problem to the linear arrangement of the nodes of the networ…
▽ More
In this paper, we analyze the quality of a large class of simple dynamic resource allocation (DRA) strategies which we name priority planning. Their aim is to control an undesired diffusion process by distributing resources to the contagious nodes of the network according to a predefined priority-order. In our analysis, we reduce the DRA problem to the linear arrangement of the nodes of the network. Under this perspective, we shed light on the role of a fundamental characteristic of this arrangement, the maximum cutwidth, for assessing the quality of any priority planning strategy. Our theoretical analysis validates the role of the maximum cutwidth by deriving bounds for the extinction time of the diffusion process. Finally, using the results of our analysis, we propose a novel and efficient DRA strategy, called Maximum Cutwidth Minimization, that outperforms other competing strategies in our simulations.
△ Less
Submitted 17 July, 2014;
originally announced July 2014.
-
Tight Bounds for Influence in Diffusion Networks and Application to Bond Percolation and Epidemiology
Authors:
Remi Lemonnier,
Kevin Scaman,
Nicolas Vayatis
Abstract:
In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM). We relate these bounds to the spectral radius of a particular matrix and show that the behavior is sub-critical when this spectral radius is lower than $1$. More specifically, we point out that, in general networks, the sub-critical regime behaves in $O(\sqrt{n})$ where $n$ is t…
▽ More
In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM). We relate these bounds to the spectral radius of a particular matrix and show that the behavior is sub-critical when this spectral radius is lower than $1$. More specifically, we point out that, in general networks, the sub-critical regime behaves in $O(\sqrt{n})$ where $n$ is the size of the network, and that this upper bound is met for star-shaped networks. We apply our results to epidemiology and percolation on arbitrary networks, and derive a bound for the critical value beyond which a giant connected component arises. Finally, we show empirically the tightness of our bounds for a large family of networks.
△ Less
Submitted 17 July, 2014;
originally announced July 2014.