Search | arXiv e-print repository

Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need

Authors: Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, Giovanni Zappella

Abstract: Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, which has been uncritically adopted by subsequent research, but warrants further research to understand its implications. In this paper, we conduct this… ▽ More Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, which has been uncritically adopted by subsequent research, but warrants further research to understand its implications. In this paper, we conduct this research and find that the choice of prompt tuning as a PEFT method hurts the overall performance of the CL system. To illustrate this, we replace prompt tuning with LoRA in two state-of-the-art continual learning methods: Learning to Prompt and S-Prompts. These variants consistently achieve higher accuracy across a wide range of domain-incremental and class-incremental benchmarks, while being competitive in inference speed. Our work highlights a crucial argument: unexamined choices can hinder progress in the field, and rigorous ablations, such as the PEFT method, are required to drive meaningful adoption of CL techniques in real-world applications. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2312.05021 [pdf, other]

A Negative Result on Gradient Matching for Selective Backprop

Authors: Lukas Balles, Cedric Archambeau, Giovanni Zappella

Abstract: With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. For this approach, we perform a forward pass to obtain a loss value for each data point in a minibatch. The backward pass is then restricted to a subset of that minibatch, prioritizing high-loss examples.… ▽ More With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. For this approach, we perform a forward pass to obtain a loss value for each data point in a minibatch. The backward pass is then restricted to a subset of that minibatch, prioritizing high-loss examples. We build on this approach, but seek to improve the subset selection mechanism by choosing the (weighted) subset which best matches the mean gradient over the entire minibatch. We use the gradients w.r.t. the model's last layer as a cheap proxy, resulting in virtually no overhead in addition to the forward pass. At the same time, for our experiments we add a simple random selection baseline which has been absent from prior work. Surprisingly, we find that both the loss-based as well as the gradient-matching strategy fail to consistently outperform the random baseline. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: Paper accepted at the ICBINB Workshop at NeurIPS 2023

arXiv:2311.17601 [pdf, ps, other]

Continual Learning with Low Rank Adaptation

Authors: Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, Giovanni Zappella

Abstract: Recent work using pretrained transformers has shown impressive performance when fine-tuned with data from the downstream problem of interest. However, they struggle to retain that performance when the data characteristics changes. In this paper, we focus on continual learning, where a pre-trained transformer is updated to perform well on new data, while retaining its performance on data it was pre… ▽ More Recent work using pretrained transformers has shown impressive performance when fine-tuned with data from the downstream problem of interest. However, they struggle to retain that performance when the data characteristics changes. In this paper, we focus on continual learning, where a pre-trained transformer is updated to perform well on new data, while retaining its performance on data it was previously trained on. Earlier works have tackled this primarily through methods inspired from prompt tuning. We question this choice, and investigate the applicability of Low Rank Adaptation (LoRA) to continual learning. On a range of domain-incremental learning benchmarks, our LoRA-based solution, CoLoR, yields state-of-the-art performance, while still being as parameter efficient as the prompt tuning based methods. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: Accepted at Workshop on Distribution Shifts (DistShift), NeurIPS 2023

arXiv:2304.12067 [pdf, other]

Renate: A Library for Real-World Continual Learning

Authors: Martin Wistuba, Martin Ferianc, Lukas Balles, Cedric Archambeau, Giovanni Zappella

Abstract: Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in practical machine learning deployment. This paper presents Renate, a continual learning library designed to build real-world updating pipelines for PyTor… ▽ More Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in practical machine learning deployment. This paper presents Renate, a continual learning library designed to build real-world updating pipelines for PyTorch models. We discuss requirements for the use of continual learning algorithms in practice, from which we derive design principles for Renate. We give a high-level description of the library components and interfaces. Finally, we showcase the strengths of the library by presenting experimental results. Renate may be found at https://github.com/awslabs/renate. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: Paper accepted at the CLVision workshop at CVPR 2023

arXiv:2207.06940 [pdf, other]

PASHA: Efficient HPO and NAS with Progressive Resource Allocation

Authors: Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cédric Archambeau, Giovanni Zappella

Abstract: Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach t… ▽ More Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. Our approach, named PASHA, extends ASHA and is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than ASHA. △ Less

Submitted 8 March, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

Comments: Accepted at ICLR 2023

arXiv:2206.14085 [pdf, other]

Continual Learning with Transformers for Image Classification

Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau

Abstract: In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as the amount of data that can be stored or the limit… ▽ More In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as the amount of data that can be stored or the limited computation sources that can be used. Moreover, training large neural networks, such as Transformers, from scratch is very costly and requires a vast amount of training data, which might not be available in the application domain of interest. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting efficiently in continual learning, but this needs complex tuning to balance the growing number of parameters and barely share any information across tasks. As a result, they struggle to scale to a large number of tasks without significant overhead. In this paper, we validate in the computer vision domain a recent solution called Adaptive Distillation of Adapters (ADA), which is developed to perform continual learning using pre-trained Transformers and Adapters on text classification tasks. We empirically demonstrate on different classification tasks that this method maintains a good predictive performance without retraining the model or increasing the number of model parameters over the time. Besides it is significantly faster at inference time compared to the state-of-the-art methods. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: Appeared in CVPR CLVision workshop. arXiv admin note: substantial text overlap with arXiv:2203.04640

arXiv:2203.14544 [pdf, other]

Gradient-Matching Coresets for Rehearsal-Based Continual Learning

Authors: Lukas Balles, Giovanni Zappella, Cédric Archambeau

Abstract: The goal of continual learning (CL) is to efficiently update a machine learning model with new data without forgetting previously-learned knowledge. Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on new data. Curating such a rehearsal memory to maintain a small, informative subset of all the data seen so far is crucial to the success of these meth… ▽ More The goal of continual learning (CL) is to efficiently update a machine learning model with new data without forgetting previously-learned knowledge. Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on new data. Curating such a rehearsal memory to maintain a small, informative subset of all the data seen so far is crucial to the success of these methods. We devise a coreset selection method for rehearsal-based continual learning. Our method is based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. Inspired by the neural tangent kernel theory, we perform this gradient matching across the model's initialization distribution, allowing us to extract a coreset without having to train the model first. We evaluate the method on a wide range of continual learning scenarios and demonstrate that it improves the performance of rehearsal-based CL methods compared to competing memory management strategies such as reservoir sampling. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: A short version of this paper has been presented at the NeurIPS '21 Workshop on Distribution Shifts

arXiv:2203.04640 [pdf, other]

Memory Efficient Continual Learning with Transformers

Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau

Abstract: In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computa… ▽ More In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computational resources that can be used might be limited. Moreover, applications increasingly rely on large pre-trained neural networks, such as pre-trained Transformers, since the resources or data might not be available in sufficiently large quantities to practitioners to train the model from scratch. In this paper, we devise a method to incrementally train a model on a sequence of tasks using pre-trained Transformers and extending them with Adapters. Different than the existing approaches, our method is able to scale to a large number of tasks without significant overhead and allows sharing information across tasks. On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods. △ Less

Submitted 13 January, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: This paper was published at NeurIPS 2022

arXiv:2112.05025 [pdf, other]

Gradient-matching coresets for continual learning

Authors: Lukas Balles, Giovanni Zappella, Cédric Archambeau

Abstract: We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of continual learning, where it can be used to curate a rehearsal memory. Our method performs strong competitors such as reservoir sampling across a range of memo… ▽ More We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of continual learning, where it can be used to curate a rehearsal memory. Our method performs strong competitors such as reservoir sampling across a range of memory sizes. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Accepted at the NeurIPS '21 Workshop on Distribution Shifts

arXiv:2103.16111 [pdf, other]

A resource-efficient method for repeated HPO and NAS problems

Authors: Giovanni Zappella, David Salinas, Cédric Archambeau

Abstract: In this work we consider the problem of repeated hyperparameter and neural architecture search (HNAS). We propose an extension of Successive Halving that is able to leverage information gained in previous HNAS problems with the goal of saving computational resources. We empirically demonstrate that our solution is able to drastically decrease costs while maintaining accuracy and being robust to ne… ▽ More In this work we consider the problem of repeated hyperparameter and neural architecture search (HNAS). We propose an extension of Successive Halving that is able to leverage information gained in previous HNAS problems with the goal of saving computational resources. We empirically demonstrate that our solution is able to drastically decrease costs while maintaining accuracy and being robust to negative transfer. Our method is significantly simpler than competing transfer learning approaches, setting a new baseline for transfer learning in HNAS. △ Less

Submitted 13 July, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: Accepted at AutoML workshop @ ICML 2021

arXiv:2012.08483 [pdf, other]

Amazon SageMaker Autopilot: a white box AutoML solution at scale

Authors: Piali Das, Valerio Perrone, Nikita Ivkin, Tanya Bansal, Zohar Karnin, Huibin Shen, Iaroslav Shcherbatyi, Yotam Elor, Wilton Wu, Aida Zolic, Thibaut Lienart, Alex Tang, Amr Ahmed, Jean Baptiste Faddoul, Rodolphe Jenatton, Fela Winkelmolen, Philip Gautier, Leo Dirac, Andre Perunicic, Miroslav Miladinovic, Giovanni Zappella, Cédric Archambeau, Matthias Seeger, Bhaskar Dutt, Laurence Rouesnel

Abstract: AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by each particular system would provide sub-par perfo… ▽ More AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by each particular system would provide sub-par performance. In this paper, we present Amazon SageMaker Autopilot: a fully managed system providing an automated ML solution that can be modified when needed. Given a tabular dataset and the target column name, Autopilot identifies the problem type, analyzes the data and produces a diverse set of complete ML pipelines including feature preprocessing and ML algorithms, which are tuned to generate a leaderboard of candidate models. In the scenario where the performance is not satisfactory, a data scientist is able to view and edit the proposed ML pipelines in order to infuse their expertise and business knowledge without having to revert to a fully manual solution. This paper describes the different components of Autopilot, emphasizing the infrastructure choices that allow scalability, high quality models, editable ML pipelines, consumption of artifacts of offline meta-learning, and a convenient integration with the entire SageMaker suite allowing these trained models to be used in a production setting. △ Less

Submitted 16 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2004.13576 [pdf, other]

A Linear Bandit for Seasonal Environments

Authors: Giuseppe Di Benedetto, Vito Bellini, Giovanni Zappella

Abstract: Contextual bandit algorithms are extremely popular and widely used in recommendation systems to provide online personalised recommendations. A recurrent assumption is the stationarity of the reward function, which is rather unrealistic in most of the real-world applications. In the music recommendation scenario for instance, people's music taste can abruptly change during certain events, such as H… ▽ More Contextual bandit algorithms are extremely popular and widely used in recommendation systems to provide online personalised recommendations. A recurrent assumption is the stationarity of the reward function, which is rather unrealistic in most of the real-world applications. In the music recommendation scenario for instance, people's music taste can abruptly change during certain events, such as Halloween or Christmas, and revert to the previous music taste soon after. We would therefore need an algorithm which can promptly react to these changes. Moreover, we would like to leverage already observed rewards collected during different stationary periods which can potentially reoccur, without the need of restarting the learning process from scratch. A growing literature has addressed the problem of reward's non-stationarity, providing algorithms that could quickly adapt to the changing environment. However, up to our knowledge, there is no algorithm which deals with seasonal changes of the reward function. Here we present a contextual bandit algorithm which detects and adapts to abrupt changes of the reward function and leverages previous estimations whenever the environment falls back to a previously observed state. We show that the proposed method can outperform state-of-the-art algorithms for non-stationary environments. We ran our experiment on both synthetic and real datasets. △ Less

Submitted 28 April, 2020; originally announced April 2020.

arXiv:2004.13106 [pdf, other]

Learning to Rank in the Position Based Model with Bandit Feedback

Authors: Beyza Ermis, Patrick Ernst, Yannik Stein, Giovanni Zappella

Abstract: Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To compensate for this problem, we leverage contextua… ▽ More Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To compensate for this problem, we leverage contextual multi-armed bandits. We propose novel extensions of two well-known algorithms viz. LinUCB and Linear Thompson Sampling to the ranking use-case. To account for the biases in a production environment, we employ the position-based click model. Finally, we show the validity of the proposed algorithms by conducting extensive offline experiments on synthetic datasets as well as customer facing online A/B experiments. △ Less

Submitted 27 April, 2020; originally announced April 2020.

arXiv:1807.02089 [pdf, other]

Linear Bandits with Stochastic Delayed Feedback

Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Abstract: Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by practitioners ho** to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase… ▽ More Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by practitioners ho** to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase is usually observable some time after the display, the decision of not buying is never explicitly sent to the system. In other words, the learner only observes delayed positive events. We formalize this problem as a novel stochastic delayed linear bandit and propose ${\tt OTFLinUCB}$ and ${\tt OTFLinTS}$, two computationally efficient algorithms able to integrate new information as it becomes available and to deal with the permanently censored feedback. We prove optimal $\tilde O(\smash{d\sqrt{T}})$ bounds on the regret of the first algorithm and study the dependency on delay-dependent parameters. Our model, assumptions and results are validated by experiments on simulated and real data. △ Less

Submitted 2 March, 2020; v1 submitted 5 July, 2018; originally announced July 2018.

arXiv:1608.03544 [pdf, other]

On Context-Dependent Clustering of Bandits

Authors: Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Evans Etrue, Giovanni Zappella

Abstract: We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving… ▽ More We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving explore-exploit tradeoffs and collaborative steps. We prove regret bounds under various assumptions on the data, which exhibit a crisp dependence on the expected number of clusters over the users, a natural measure of the statistical difficulty of the learning task. Experiments on production and real-world datasets show that CAB offers significantly increased prediction performance against a representative pool of state-of-the-art methods. △ Less

Submitted 27 February, 2017; v1 submitted 6 August, 2016; originally announced August 2016.

arXiv:1401.8257 [pdf, other]

Online Clustering of Bandits

Authors: Claudio Gentile, Shuai Li, Giovanni Zappella

Abstract: We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant incre… ▽ More We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems. △ Less

Submitted 6 June, 2014; v1 submitted 31 January, 2014; originally announced January 2014.

Comments: In E. Xing and T. Jebara (Eds.), Proceedings of 31st International Conference on Machine Learning, Journal of Machine Learning Research Workshop and Conference Proceedings, Vol.32 (JMLR W&CP-32), Bei**g, China, Jun. 21-26, 2014 (ICML 2014), Submitted by Shuai Li (https://sites.google.com/site/shuailidotsli)

arXiv:1306.0811 [pdf, other]

A Gang of Bandits

Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead t… ▽ More Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a marked increase in prediction performance obtained by exploiting the network structure. △ Less

Submitted 4 November, 2013; v1 submitted 4 June, 2013; originally announced June 2013.

Comments: NIPS 2013

arXiv:1301.6630 [pdf, other]

Political Disaffection: a case study on the Italian Twitter community

Authors: Corrado Monti, Alessandro Rozza, Giovanni Zappella, Matteo Zignani, Adam Arvidsson, Monica Poletti

Abstract: In our work we analyse the political disaffection or "the subjective feeling of powerlessness, cynicism, and lack of confidence in the political process, politicians, and democratic institutions, but with no questioning of the political regime" by exploiting Twitter data through machine learning techniques. In order to validate the quality of the time-series generated by the Twitter data, we highl… ▽ More In our work we analyse the political disaffection or "the subjective feeling of powerlessness, cynicism, and lack of confidence in the political process, politicians, and democratic institutions, but with no questioning of the political regime" by exploiting Twitter data through machine learning techniques. In order to validate the quality of the time-series generated by the Twitter data, we highlight the relations of these data with political disaffection as measured by means of public opinion surveys. Moreover, we show that important political news of Italian newspapers are often correlated with the highest peaks of the produced time-series. △ Less

Submitted 8 February, 2013; v1 submitted 28 January, 2013; originally announced January 2013.

arXiv:1301.5160 [pdf, other]

See the Tree Through the Lines: The Shazoo Algorithm -- Full Version --

Authors: Fabio Vitale, Nicolo Cesa-Bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully sati… ▽ More Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, Shazoo, which is nearly optimal on any weighted tree. Moreover, we show that Shazoo can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that Shazoo performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. △ Less

Submitted 28 February, 2013; v1 submitted 22 January, 2013; originally announced January 2013.

arXiv:1301.5112 [pdf, ps, other]

Active Learning on Trees and Graphs

Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We investigate the problem of active learning on a given tree whose nodes are assigned binary labels in an adversarial way. Inspired by recent results by Guillory and Bilmes, we characterize (up to constant factors) the optimal placement of queries so to minimize the mistakes made on the non-queried nodes. Our query selection algorithm is extremely efficient, and the optimal number of mistakes on… ▽ More We investigate the problem of active learning on a given tree whose nodes are assigned binary labels in an adversarial way. Inspired by recent results by Guillory and Bilmes, we characterize (up to constant factors) the optimal placement of queries so to minimize the mistakes made on the non-queried nodes. Our query selection algorithm is extremely efficient, and the optimal number of mistakes on the non-queried nodes is achieved by a simple and efficient mincut classifier. Through a simple modification of the query selection algorithm we also show optimality (up to constant factors) with respect to the trade-off between number of queries and number of mistakes on non-queried nodes. By using spanning trees, our algorithms can be efficiently applied to general graphs, although the problem of finding optimal and efficient active learning algorithms for general graphs remains open. Towards this end, we provide a lower bound on the number of mistakes made on arbitrary graphs by any active learning algorithm using a number of queries which is up to a constant fraction of the graph size. △ Less

Submitted 22 January, 2013; originally announced January 2013.

arXiv:1301.4769 [pdf, other]

A Correlation Clustering Approach to Link Classification in Signed Networks -- Full Version --

Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new fa… ▽ More Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new family of efficient link classifiers based on covering the input graph with small circuits. These are the first active algorithms for link classification with mistake bounds that hold for arbitrary signed networks. △ Less

Submitted 28 February, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

arXiv:1301.4767 [pdf, other]

A Linear Time Active Learning Algorithm for Link Classification -- Full Version --

Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant facto… ▽ More We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V,E) such that |E| = Ω(|V|^{3/2}) by querying O(|V|^{3/2}) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V| + (|V|/k)^{3/2} edge labels. The running time of this algorithm is at most of order |E| + |V|\log|V|. △ Less

Submitted 28 February, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

arXiv:1212.5637 [pdf, other]

Random Spanning Trees and the Prediction of Weighted Graphs

Authors: Nicolo' Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph no… ▽ More We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph nodes. In deriving our characterization, we obtain a simple randomized algorithm achieving in expectation the optimal mistake bound on any polynomially connected weighted graph. Our algorithm draws a random spanning tree of the original graph and then predicts the nodes of this tree in constant expected amortized time and linear space. Experiments on real-world datasets show that our method compares well to both global (Perceptron) and local (label propagation) methods, while being generally faster in practice. △ Less

Submitted 21 December, 2012; originally announced December 2012.

Comments: Appeared in ICML 2010

arXiv:1112.4344 [pdf, other]

A Scalable Multiclass Algorithm for Node Classification

Authors: Giovanni Zappella

Abstract: We introduce a scalable algorithm, MUCCA, for multiclass node classification in weighted graphs. Unlike previously proposed methods for the same task, MUCCA works in time linear in the number of nodes. Our approach is based on a game-theoretic formulation of the problem in which the test labels are expressed as a Nash Equilibrium of a certain game. However, in order to achieve scalability, we find… ▽ More We introduce a scalable algorithm, MUCCA, for multiclass node classification in weighted graphs. Unlike previously proposed methods for the same task, MUCCA works in time linear in the number of nodes. Our approach is based on a game-theoretic formulation of the problem in which the test labels are expressed as a Nash Equilibrium of a certain game. However, in order to achieve scalability, we find the equilibrium on a spanning tree of the original graph. Experiments on real-world data reveal that MUCCA is much faster than its competitors while achieving a similar predictive performance. △ Less

Submitted 19 December, 2011; originally announced December 2011.

Showing 1–24 of 24 results for author: Zappella, G