Search | arXiv e-print repository

arXiv:2401.12062 [pdf]

doi 10.1039/D4CP00293H

Electronic Structure and Transport in the Potential Luttinger Liquids CsNb$_3$Br$_7$S and RbNb$_3$Br$_7$S

Authors: Fabian Grahlow, Fabian Strauß, Marcus Scheele, Markus Ströbele, Alberto Carta, Sophie F. Weber, Scott Kroeker, Carl P. Romao, H. -Jürgen Meyer

Abstract: The crystal structures of ANb$_3$Br$_7$S (A = Rb and Cs) have been refined by single crystal X-ray diffraction, and are found to form highly anisotropic materials based on chains of the triangular Nb$_3$ cluster core. The Nb$_3$ cluster core contains seven valence electrons, six of them being assigned to Nb-Nb bonds within the Nb$_3$ triangle and one unpaired d electron. The presence of this surpl… ▽ More The crystal structures of ANb$_3$Br$_7$S (A = Rb and Cs) have been refined by single crystal X-ray diffraction, and are found to form highly anisotropic materials based on chains of the triangular Nb$_3$ cluster core. The Nb$_3$ cluster core contains seven valence electrons, six of them being assigned to Nb-Nb bonds within the Nb$_3$ triangle and one unpaired d electron. The presence of this surplus electron gives rise to the formation of correlated electronic states. The connectivity in the structures is represented by one-dimensional [Nb$_3$Br$_7$S]$^-$ chains, containing a sulphur atom cap** one face ($μ_3$) of the triangular niobium cluster, which is believed to induce an important electronic feature. Several types of studies are undertaken to obtain deeper insight into the understanding of this unusual type of material: the crystal structure, morphology and elastic properties are analysed, as well the (photo-) electrical properties and NMR relaxation. Electronic structure (DFT) calculations are performed in order to understand the electronic structure and transport in these compounds, and, based on the experimental and theoretical results, we propose that the electronic interactions along the Nb chains are sufficiently one-dimensional to give rise to Luttinger liquid (rather than Fermi liquid) behaviour of the metallic electrons. △ Less

Submitted 3 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 14 pages, 19 figures

Journal ref: Phys. Chem. Chem. Phys., 2024, 26, 11789-11797

arXiv:2312.12033 [pdf, other]

Emergence of a potential charge disproportionated insulating state in SrCrO$_{3}$

Authors: Alberto Carta, Anwesha Panda, Claude Ederer

Abstract: We use a combination of density functional theory (DFT) and dynamical mean-field theory (DMFT) to investigate the potential emergence of a charge-disproportionated insulating phase in SrCrO$_{3}$, whereby the Cr cations disproportionate according to $3\text{Cr}^{4+} \rightarrow 2\text{Cr}^{3+} + \text{Cr}^{6+}$ and arrange in ordered planes perpendicular to the cubic [111] direction. We show that… ▽ More We use a combination of density functional theory (DFT) and dynamical mean-field theory (DMFT) to investigate the potential emergence of a charge-disproportionated insulating phase in SrCrO$_{3}$, whereby the Cr cations disproportionate according to $3\text{Cr}^{4+} \rightarrow 2\text{Cr}^{3+} + \text{Cr}^{6+}$ and arrange in ordered planes perpendicular to the cubic [111] direction. We show that the charge disproportionation couples to a structural distortion where the oxygen octahedra around the nominal Cr$^{6+}$ sites contract, while the octahedra surrounding the Cr$^{3+}$ sites expand and distort. Our results indicate that the charge-disproportionated phase can be stabilized for realistic values of the Hubbard $U$ and Hund's $J$ parameters, but this requires a scaling of the DFT+DMFT double counting correction by at least 35%. The disproportionated state can also be stabilized in DFT+U calculations for a specific magnetic configuration with antiparallel magnetic moments on adjacent Cr$^{3+}$ sites and no magnetic moment on the Cr$^{6+}$ sites. While this phase is higher in energy than other magnetic states that can arise in SrCrO$_{3}$, the presence of an energy minimum suggests that the charge disproportionated state represents a metastable state of SrCrO$_{3}$. △ Less

Submitted 20 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 10 pages, 8 figures

arXiv:2308.10328 [pdf, other]

A Comprehensive Empirical Evaluation on Online Continual Learning

Authors: Albin Soutif--Cormerais, Antonio Carta, Andrea Cossu, Julio Hurtado, Hamed Hemati, Vincenzo Lomonaco, Joost Van de Weijer

Abstract: Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the con… ▽ More Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the context of image classification, where the learner must learn new classes incrementally from a stream of data. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average accuracy, forgetting, stability, and quality of the representations, to evaluate various aspects of the algorithm at the end but also during the whole training period. We find that most methods suffer from stability and underfitting issues. However, the learned representations are comparable to i.i.d. training under the same computational budget. No clear winner emerges from the results and basic experience replay, when properly tuned and implemented, is a very strong baseline. We release our modular and extensible codebase at https://github.com/AlbinSou/ocl_survey based on the avalanche framework to reproduce our results and encourage future research. △ Less

Submitted 23 September, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: ICCV Visual Continual Learning Workshop 2023 accepted paper

arXiv:2306.16817 [pdf, other]

Improving Online Continual Learning Performance and Stability with Temporal Ensembles

Authors: Albin Soutif--Cormerais, Antonio Carta, Joost Van de Weijer

Abstract: Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent wor… ▽ More Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.13452 showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature. △ Less

Submitted 3 July, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: CoLLAs 2023 accepted paper

arXiv:2306.03941 [pdf, other]

A scientometric analysis of the effect of COVID-19 on the spread of research outputs

Authors: Gianpaolo Zammarchi, Andrea Carta, Silvia Columbu, Luca Frigau, Monica Musio

Abstract: The spread of the Sars-COV-2 pandemic in 2020 had a huge impact on the life course of all of us. This rapid spread has also caused an increase in the research production in topics related to COVID-19 with regard to different aspects. Italy has, unfortunately, been one of the first countries to be massively involved in the outbreak of the disease. In this paper we present an extensive scientometric… ▽ More The spread of the Sars-COV-2 pandemic in 2020 had a huge impact on the life course of all of us. This rapid spread has also caused an increase in the research production in topics related to COVID-19 with regard to different aspects. Italy has, unfortunately, been one of the first countries to be massively involved in the outbreak of the disease. In this paper we present an extensive scientometric analysis of the research production both at global (entire literature produced in the first 2 years after the beginning of the pandemic) and local level (COVID-19 literature produced by authors with an Italian affiliation). Our results showed that US and China are the most active countries in terms of number of publications and that the number of collaborations between institutions varies according to geographical distance. Moreover, we identified the medical-biological as the fields with the greatest growth in terms of literature production. Furthermore, we also better explored the relationship between the number of citations and variables obtained from the data set (e.g. number of authors per article). Using multiple correspondence analysis and quantile regression we shed light on the role of journal topics and impact factor, the type of article, the field of study and how these elements affect citations. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2303.15888 [pdf, other]

Projected Latent Distillation for Data-Agnostic Consolidation in Distributed Continual Learning

Authors: Antonio Carta, Andrea Cossu, Vincenzo Lomonaco, Davide Bacciu, Joost van de Weijer

Abstract: Distributed learning on the edge often comprises self-centered devices (SCD) which learn local tasks independently and are unwilling to contribute to the performance of other SDCs. How do we achieve forward transfer at zero cost for the single SCDs? We formalize this problem as a Distributed Continual Learning scenario, where SCD adapt to local tasks and a CL model consolidates the knowledge from… ▽ More Distributed learning on the edge often comprises self-centered devices (SCD) which learn local tasks independently and are unwilling to contribute to the performance of other SDCs. How do we achieve forward transfer at zero cost for the single SCDs? We formalize this problem as a Distributed Continual Learning scenario, where SCD adapt to local tasks and a CL model consolidates the knowledge from the resulting stream of models without looking at the SCD's private data. Unfortunately, current CL methods are not directly applicable to this scenario. We propose Data-Agnostic Consolidation (DAC), a novel double knowledge distillation method that consolidates the stream of SC models without using the original data. DAC performs distillation in the latent space via a novel Projected Latent Distillation loss. Experimental results show that DAC enables forward transfer between SCDs and reaches state-of-the-art accuracy on Split CIFAR100, CORe50 and Split TinyImageNet, both in reharsal-free and distributed CL scenarios. Somewhat surprisingly, even a single out-of-distribution image is sufficient as the only source of data during consolidation. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2302.01766 [pdf, other]

Avalanche: A PyTorch Library for Deep Continual Learning

Authors: Antonio Carta, Lorenzo Pellegrini, Andrea Cossu, Hamed Hemati, Vincenzo Lomonaco

Abstract: Continual learning is the problem of learning from a nonstationary stream of data, a fundamental issue for sustainable and efficient training of deep neural networks over time. Unfortunately, deep learning libraries only provide primitives for offline training, assuming that model's architecture and data are fixed. Avalanche is an open source library maintained by the ContinualAI non-profit organi… ▽ More Continual learning is the problem of learning from a nonstationary stream of data, a fundamental issue for sustainable and efficient training of deep neural networks over time. Unfortunately, deep learning libraries only provide primitives for offline training, assuming that model's architecture and data are fixed. Avalanche is an open source library maintained by the ContinualAI non-profit organization that extends PyTorch by providing first-class support for dynamic architectures, streams of datasets, and incremental training and evaluation methods. Avalanche provides a large set of predefined benchmarks and training algorithms and it is easy to extend and modular while supporting a wide range of continual learning scenarios. Documentation is available at \url{https://avalanche.continualai.org}. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.11396 [pdf, other]

Class-Incremental Learning with Repetition

Authors: Hamed Hemati, Andrea Cossu, Antonio Carta, Julio Hurtado, Lorenzo Pellegrini, Davide Bacciu, Vincenzo Lomonaco, Damian Borth

Abstract: Real-world data streams naturally include the repetition of previous concepts. From a Continual Learning (CL) perspective, repetition is a property of the environment and, unlike replay, cannot be controlled by the agent. Nowadays, the Class-Incremental (CI) scenario represents the leading test-bed for assessing and comparing CL strategies. This scenario type is very easy to use, but it never allo… ▽ More Real-world data streams naturally include the repetition of previous concepts. From a Continual Learning (CL) perspective, repetition is a property of the environment and, unlike replay, cannot be controlled by the agent. Nowadays, the Class-Incremental (CI) scenario represents the leading test-bed for assessing and comparing CL strategies. This scenario type is very easy to use, but it never allows revisiting previously seen classes, thus completely neglecting the role of repetition. We focus on the family of Class-Incremental with Repetition (CIR) scenario, where repetition is embedded in the definition of the stream. We propose two stochastic stream generators that produce a wide range of CIR streams starting from a single dataset and a few interpretable control parameters. We conduct the first comprehensive evaluation of repetition in CL by studying the behavior of existing CL strategies under different CIR streams. We then present a novel replay strategy that exploits repetition and counteracts the natural imbalance present in the stream. On both CIFAR100 and TinyImageNet, our strategy outperforms other replay approaches, which are not designed for environments with repetition. △ Less

Submitted 19 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: Accepted to the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023 19 pages

arXiv:2212.06833 [pdf, other]

3rd Continual Learning Workshop Challenge on Egocentric Category and Instance Level Object Understanding

Authors: Lorenzo Pellegrini, Chenchen Zhu, Fanyi Xiao, Zhicheng Yan, Antonio Carta, Matthias De Lange, Vincenzo Lomonaco, Roshan Sumbaly, Pau Rodriguez, David Vazquez

Abstract: Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community. Recent research efforts have quickly led to the design of novel algorithms able to reduce the impact of the catastrophic forgetting phenomenon in deep neural networks. Due to this surge of interest in the field, many competitions have been h… ▽ More Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community. Recent research efforts have quickly led to the design of novel algorithms able to reduce the impact of the catastrophic forgetting phenomenon in deep neural networks. Due to this surge of interest in the field, many competitions have been held in recent years, as they are an excellent opportunity to stimulate research in promising directions. This paper summarizes the ideas, design choices, rules, and results of the challenge held at the 3rd Continual Learning in Computer Vision (CLVision) Workshop at CVPR 2022. The focus of this competition is the complex continual object detection task, which is still underexplored in literature compared to classification tasks. The challenge is based on the challenge version of the novel EgoObjects dataset, a large-scale egocentric object dataset explicitly designed to benchmark continual learning algorithms for egocentric category-/instance-level object understanding, which covers more than 1k unique main objects and 250+ categories in around 100k video frames. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: 21 pages, 12 figures, 5 tables

arXiv:2206.11849 [pdf, other]

Sample Condensation in Online Continual Learning

Authors: Mattia Sangermano, Antonio Carta, Andrea Cossu, Davide Bacciu

Abstract: Online Continual learning is a challenging learning scenario where the model must learn from a non-stationary stream of data where each sample is seen only once. The main challenge is to incrementally learn while avoiding catastrophic forgetting, namely the problem of forgetting previously acquired knowledge while learning from new data. A popular solution in these scenario is to use a small memor… ▽ More Online Continual learning is a challenging learning scenario where the model must learn from a non-stationary stream of data where each sample is seen only once. The main challenge is to incrementally learn while avoiding catastrophic forgetting, namely the problem of forgetting previously acquired knowledge while learning from new data. A popular solution in these scenario is to use a small memory to retain old data and rehearse them over time. Unfortunately, due to the limited memory size, the quality of the memory will deteriorate over time. In this paper we propose OLCGM, a novel replay-based continual learning strategy that uses knowledge condensation techniques to continuously compress the memory and achieve a better use of its limited size. The sample condensation step compresses old samples, instead of removing them like other replay strategies. As a result, the experiments show that, whenever the memory budget is limited compared to the complexity of the data, OLCGM improves the final accuracy compared to state-of-the-art replay strategies. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: Accepted as a conference paper at 2022 International Joint Conference on Neural Networks (IJCNN 2022). Part of 2022 IEEE World Congress on Computational Intelligence (IEEE WCCI 2022)

arXiv:2205.09357 [pdf, other]

Continual Pre-Training Mitigates Forgetting in Language and Vision

Authors: Andrea Cossu, Tinne Tuytelaars, Antonio Carta, Lucia Passaro, Vincenzo Lomonaco, Davide Bacciu

Abstract: Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during continual learning. We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environ… ▽ More Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during continual learning. We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environments, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We show that continually pre-trained models are robust against catastrophic forgetting and we provide strong empirical evidence supporting the fact that self-supervised pre-training is more effective in retaining previous knowledge than supervised protocols. Code is provided at https://github.com/AndreaCossu/continual-pretraining-nlp-vision . △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: under review

arXiv:2204.06465 [pdf, other]

Evidence for Jahn-Teller-driven metal-insulator transition in strained SrCrO3 from first principles calculations

Authors: Alberto Carta, Claude Ederer

Abstract: Using density-functional theory (DFT) and its extension to DFT+$U$, we propose a possible scenario for a strain-induced metal-insulator transition which has been reported recently in thin films of SrCrO$_3$. The metal-insulator transition involves the emergence of a Jahn-Teller (JT) distortion similar to the case of the related rare-earth vanadates, which also exhibit a nominal $d^2$ occupation of… ▽ More Using density-functional theory (DFT) and its extension to DFT+$U$, we propose a possible scenario for a strain-induced metal-insulator transition which has been reported recently in thin films of SrCrO$_3$. The metal-insulator transition involves the emergence of a Jahn-Teller (JT) distortion similar to the case of the related rare-earth vanadates, which also exhibit a nominal $d^2$ occupation of the transition metal cation. Our calculations indicate that, for realistic values of the Hubbard $U$ parameter, the unstrained system exhibts a C-type antiferromagnetically ordered ground state, that is already rather close to a JT instability. However, the emergence of the JT distortion is disfavored by the large energetic overlap of the $d_{xz}$/$d_{yz}$ band with the lower lying $d_{xy}$ band. Tensile epitaxial strain lowers the energy of the $d_{xy}$ band relative to $d_{xz}$/$d_{yz}$ and thus brings the system closer to the nominal filling of $d_{xy}^1(d_{xz}d_{yz})^1$. The JT distortion then lifts the degeneracy between the $d_{xz}$ and $d_{yz}$ orbitals and thus allows to open up a gap in the electronic band structure. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: 7 pages, 6 figures

arXiv:2203.10317 [pdf, ps, other]

doi 10.1007/978-3-031-13324-4_47

Practical Recommendations for Replay-based Continual Learning Methods

Authors: Gabriele Merlin, Vincenzo Lomonaco, Andrea Cossu, Antonio Carta, Davide Bacciu

Abstract: Continual Learning requires the model to learn from a stream of dynamic, non-stationary data without forgetting previous knowledge. Several approaches have been developed in the literature to tackle the Continual Learning challenge. Among them, Replay approaches have empirically proved to be the most effective ones. Replay operates by saving some samples in memory which are then used to rehearse k… ▽ More Continual Learning requires the model to learn from a stream of dynamic, non-stationary data without forgetting previous knowledge. Several approaches have been developed in the literature to tackle the Continual Learning challenge. Among them, Replay approaches have empirically proved to be the most effective ones. Replay operates by saving some samples in memory which are then used to rehearse knowledge during training in subsequent tasks. However, an extensive comparison and deeper understanding of different replay implementation subtleties is still missing in the literature. The aim of this work is to compare and analyze existing replay-based strategies and provide practical recommendations on develo** efficient, effective and generally applicable replay-based strategies. In particular, we investigate the role of the memory size value, different weighting policies and discuss about the impact of data augmentation, which allows reaching better performance with lower memory sizes. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Journal ref: ICIAP 2022 Workshops

arXiv:2202.13657 [pdf, other]

Avalanche RL: a Continual Reinforcement Learning Library

Authors: Nicolò Lucchesi, Antonio Carta, Vincenzo Lomonaco, Davide Bacciu

Abstract: Continual Reinforcement Learning (CRL) is a challenging setting where an agent learns to interact with an environment that is constantly changing over time (the stream of experiences). In this paper, we describe Avalanche RL, a library for Continual Reinforcement Learning which allows to easily train agents on a continuous stream of tasks. Avalanche RL is based on PyTorch and supports any OpenAI G… ▽ More Continual Reinforcement Learning (CRL) is a challenging setting where an agent learns to interact with an environment that is constantly changing over time (the stream of experiences). In this paper, we describe Avalanche RL, a library for Continual Reinforcement Learning which allows to easily train agents on a continuous stream of tasks. Avalanche RL is based on PyTorch and supports any OpenAI Gym environment. Its design is based on Avalanche, one of the more popular continual learning libraries, which allow us to reuse a large number of continual learning strategies and improve the interaction between reinforcement learning and continual learning researchers. Additionally, we propose Continual Habitat-Lab, a novel benchmark and a high-level library which enables the usage of the photorealistic simulator Habitat-Sim for CRL research. Overall, Avalanche RL attempts to unify under a common framework continual reinforcement learning applications, which we hope will foster the growth of the field. △ Less

Submitted 24 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: Presented at the 21st International Conference on Image Analysis and Processing (ICIAP 2021)

arXiv:2202.01645 [pdf, other]

AI-as-a-Service Toolkit for Human-Centered Intelligence in Autonomous Driving

Authors: Valerio De Caro, Saira Bano, Achilles Machumilane, Alberto Gotta, Pietro Cassará, Antonio Carta, Rudy Semola, Christos Sardianos, Christos Chronis, Iraklis Varlamis, Konstantinos Tserpes, Vincenzo Lomonaco, Claudio Gallicchio, Davide Bacciu

Abstract: This paper presents a proof-of-concept implementation of the AI-as-a-Service toolkit developed within the H2020 TEACHING project and designed to implement an autonomous driving personalization system according to the output of an automatic driver's stress recognition algorithm, both of them realizing a Cyber-Physical System of Systems. In addition, we implemented a data-gathering subsystem to coll… ▽ More This paper presents a proof-of-concept implementation of the AI-as-a-Service toolkit developed within the H2020 TEACHING project and designed to implement an autonomous driving personalization system according to the output of an automatic driver's stress recognition algorithm, both of them realizing a Cyber-Physical System of Systems. In addition, we implemented a data-gathering subsystem to collect data from different sensors, i.e., wearables and cameras, to automatize stress recognition. The system was attached for testing to a driving simulation software, CARLA, which allows testing the approach's feasibility with minimum cost and without putting at risk drivers and passengers. At the core of the relative subsystems, different learning algorithms were implemented using Deep Neural Networks, Recurrent Neural Networks, and Reinforcement Learning. △ Less

Submitted 9 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

arXiv:2112.06511 [pdf, other]

Ex-Model: Continual Learning from a Stream of Trained Models

Authors: Antonio Carta, Andrea Cossu, Vincenzo Lomonaco, Davide Bacciu

Abstract: Learning continually from non-stationary data streams is a challenging research topic of growing popularity in the last few years. Being able to learn, adapt, and generalize continually in an efficient, effective, and scalable way is fundamental for a sustainable development of Artificial Intelligent systems. However, an agent-centric view of continual learning requires learning directly from raw… ▽ More Learning continually from non-stationary data streams is a challenging research topic of growing popularity in the last few years. Being able to learn, adapt, and generalize continually in an efficient, effective, and scalable way is fundamental for a sustainable development of Artificial Intelligent systems. However, an agent-centric view of continual learning requires learning directly from raw data, which limits the interaction between independent agents, the efficiency, and the privacy of current approaches. Instead, we argue that continual learning systems should exploit the availability of compressed information in the form of trained models. In this paper, we introduce and formalize a new paradigm named "Ex-Model Continual Learning" (ExML), where an agent learns from a sequence of previously trained models instead of raw data. We further contribute with three ex-model continual learning algorithms and an empirical setting comprising three datasets (MNIST, CIFAR-10 and CORe50), and eight scenarios, where the proposed algorithms are extensively tested. Finally, we highlight the peculiarities of the ex-model paradigm and we point out interesting future research directions. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2112.02925 [pdf, other]

Is Class-Incremental Enough for Continual Learning?

Authors: Andrea Cossu, Gabriele Graffieti, Lorenzo Pellegrini, Davide Maltoni, Davide Bacciu, Antonio Carta, Vincenzo Lomonaco

Abstract: The ability of a model to learn continually can be empirically assessed in different continual learning scenarios. Each scenario defines the constraints and the opportunities of the learning environment. Here, we challenge the current trend in the continual learning literature to experiment mainly on class-incremental scenarios, where classes present in one experience are never revisited. We posit… ▽ More The ability of a model to learn continually can be empirically assessed in different continual learning scenarios. Each scenario defines the constraints and the opportunities of the learning environment. Here, we challenge the current trend in the continual learning literature to experiment mainly on class-incremental scenarios, where classes present in one experience are never revisited. We posit that an excessive focus on this setting may be limiting for future research on continual learning, since class-incremental scenarios artificially exacerbate catastrophic forgetting, at the expense of other important objectives like forward transfer and computational efficiency. In many real-world environments, in fact, repetition of previously encountered concepts occurs naturally and contributes to softening the disruption of previous knowledge. We advocate for a more in-depth study of alternative continual learning scenarios, in which repetition is integrated by design in the stream of incoming information. Starting from already existing proposals, we describe the advantages such class-incremental with repetition scenarios could offer for a more comprehensive assessment of continual learning models. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: Under review

arXiv:2107.06543 [pdf, other]

TEACHING -- Trustworthy autonomous cyber-physical applications through human-centred intelligence

Authors: Davide Bacciu, Siranush Akarmazyan, Eric Armengaud, Manlio Bacco, George Bravos, Calogero Calandra, Emanuele Carlini, Antonio Carta, Pietro Cassara, Massimo Coppola, Charalampos Davalas, Patrizio Dazzi, Maria Carmela Degennaro, Daniele Di Sarli, Jürgen Dobaj, Claudio Gallicchio, Sylvain Girbal, Alberto Gotta, Riccardo Groppo, Vincenzo Lomonaco, Georg Macher, Daniele Mazzei, Gabriele Mencagli, Dimitrios Michail, Alessio Micheli , et al. (10 additional authors not shown)

Abstract: This paper discusses the perspective of the H2020 TEACHING project on the next generation of autonomous applications running in a distributed and highly heterogeneous environment comprising both virtual and physical resources spanning the edge-cloud continuum. TEACHING puts forward a human-centred vision leveraging the physiological, emotional, and cognitive state of the users as a driver for the… ▽ More This paper discusses the perspective of the H2020 TEACHING project on the next generation of autonomous applications running in a distributed and highly heterogeneous environment comprising both virtual and physical resources spanning the edge-cloud continuum. TEACHING puts forward a human-centred vision leveraging the physiological, emotional, and cognitive state of the users as a driver for the adaptation and optimization of the autonomous applications. It does so by building a distributed, embedded and federated learning system complemented by methods and tools to enforce its dependability, security and privacy preservation. The paper discusses the main concepts of the TEACHING approach and singles out the main AI-related research challenges associated with it. Further, we provide a discussion of the design choices for the TEACHING system to tackle the aforementioned challenges △ Less

Submitted 14 July, 2021; originally announced July 2021.

arXiv:2105.07674 [pdf, ps, other]

Continual Learning with Echo State Networks

Authors: Andrea Cossu, Davide Bacciu, Antonio Carta, Claudio Gallicchio, Vincenzo Lomonaco

Abstract: Continual Learning (CL) refers to a learning setup where data is non stationary and the model has to learn without forgetting existing knowledge. The study of CL for sequential patterns revolves around trained recurrent networks. In this work, instead, we introduce CL in the context of Echo State Networks (ESNs), where the recurrent component is kept fixed. We provide the first evaluation of catas… ▽ More Continual Learning (CL) refers to a learning setup where data is non stationary and the model has to learn without forgetting existing knowledge. The study of CL for sequential patterns revolves around trained recurrent networks. In this work, instead, we introduce CL in the context of Echo State Networks (ESNs), where the recurrent component is kept fixed. We provide the first evaluation of catastrophic forgetting in ESNs and we highlight the benefits in using CL strategies which are not applicable to trained recurrent models. Our results confirm the ESN as a promising model for CL and open to its use in streaming scenarios. △ Less

Submitted 17 August, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Accepted as oral at ESANN 2021

arXiv:2104.00405 [pdf, other]

Avalanche: an End-to-End Library for Continual Learning

Authors: Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu, Antonio Carta, Gabriele Graffieti, Tyler L. Hayes, Matthias De Lange, Marc Masana, Jary Pomponi, Gido van de Ven, Martin Mundt, Qi She, Keiland Cooper, Jeremy Forest, Eden Belouadah, Simone Calderara, German I. Parisi, Fabio Cuzzolin, Andreas Tolias, Simone Scardapane, Luca Antiga, Subutai Amhad, Adrian Popescu, Christopher Kanan, Joost van de Weijer , et al. (3 additional authors not shown)

Abstract: Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standa… ▽ More Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standard benchmarks are hard to reproduce. In this work, we propose Avalanche, an open-source end-to-end library for continual learning research based on PyTorch. Avalanche is designed to provide a shared and collaborative codebase for fast prototy**, training, and reproducible evaluation of continual learning algorithms. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: Official Website: https://avalanche.continualai.org

arXiv:2103.15851 [pdf, other]

Distilled Replay: Overcoming Forgetting through Synthetic Samples

Authors: Andrea Rosasco, Antonio Carta, Andrea Cossu, Vincenzo Lomonaco, Davide Bacciu

Abstract: Replay strategies are Continual Learning techniques which mitigate catastrophic forgetting by kee** a buffer of patterns from previous experiences, which are interleaved with new data during training. The amount of patterns stored in the buffer is a critical parameter which largely influences the final performance and the memory footprint of the approach. This work introduces Distilled Replay, a… ▽ More Replay strategies are Continual Learning techniques which mitigate catastrophic forgetting by kee** a buffer of patterns from previous experiences, which are interleaved with new data during training. The amount of patterns stored in the buffer is a critical parameter which largely influences the final performance and the memory footprint of the approach. This work introduces Distilled Replay, a novel replay strategy for Continual Learning which is able to mitigate forgetting by kee** a very small buffer (1 pattern per class) of highly informative samples. Distilled Replay builds the buffer through a distillation process which compresses a large dataset into a tiny set of informative examples. We show the effectiveness of our Distilled Replay against popular replay-based strategies on four Continual Learning benchmarks. △ Less

Submitted 22 June, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

arXiv:2103.11750 [pdf, other]

Catastrophic Forgetting in Deep Graph Networks: an Introductory Benchmark for Graph Classification

Authors: Antonio Carta, Andrea Cossu, Federico Errica, Davide Bacciu

Abstract: In this work, we study the phenomenon of catastrophic forgetting in the graph representation learning scenario. The primary objective of the analysis is to understand whether classical continual learning techniques for flat and sequential data have a tangible impact on performances when applied to graph data. To do so, we experiment with a structure-agnostic model and a deep graph network in a rob… ▽ More In this work, we study the phenomenon of catastrophic forgetting in the graph representation learning scenario. The primary objective of the analysis is to understand whether classical continual learning techniques for flat and sequential data have a tangible impact on performances when applied to graph data. To do so, we experiment with a structure-agnostic model and a deep graph network in a robust and controlled environment on three different datasets. The benchmark is complemented by an investigation on the effect of structure-preserving regularization techniques on catastrophic forgetting. We find that replay is the most effective strategy in so far, which also benefits the most from the use of regularization. Our findings suggest interesting future research at the intersection of the continual and graph representation learning fields. Finally, we provide researchers with a flexible software framework to reproduce our results and carry out further experiments. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: Accepted at the 2021 Web Conference Workshop on Graph Learning Benchmarks (GLB 2021). Code available at https://github.com/diningphil/continual_learning_for_graphs

ACM Class: I.2.6

arXiv:2103.07492 [pdf, other]

doi 10.1016/j.neunet.2021.07.021

Continual Learning for Recurrent Neural Networks: an Empirical Evaluation

Authors: Andrea Cossu, Antonio Carta, Vincenzo Lomonaco, Davide Bacciu

Abstract: Learning continuously during all model lifetime is fundamental to deploy machine learning solutions robust to drifts in the data distribution. Advances in Continual Learning (CL) with recurrent neural networks could pave the way to a large number of applications where incoming data is non stationary, like natural language processing and robotics. However, the existing body of work on the topic is… ▽ More Learning continuously during all model lifetime is fundamental to deploy machine learning solutions robust to drifts in the data distribution. Advances in Continual Learning (CL) with recurrent neural networks could pave the way to a large number of applications where incoming data is non stationary, like natural language processing and robotics. However, the existing body of work on the topic is still fragmented, with approaches which are application-specific and whose assessment is based on heterogeneous learning protocols and datasets. In this paper, we organize the literature on CL for sequential data processing by providing a categorization of the contributions and a review of the benchmarks. We propose two new benchmarks for CL with sequential data based on existing datasets, whose characteristics resemble real-world applications. We also provide a broad empirical evaluation of CL and Recurrent Neural Networks in class-incremental scenario, by testing their ability to mitigate forgetting with a number of different strategies which are not specific to sequential data processing. Our results highlight the key role played by the sequence length and the importance of a clear specification of the CL scenario. △ Less

Submitted 2 August, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

Comments: Published in Neural Networks

Journal ref: Neural Networks, Volume 143, 2021, pages 607-627

arXiv:2011.02886 [pdf, other]

Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization

Authors: Antonio Carta, Alessandro Sperduti, Davide Bacciu

Abstract: Training RNNs to learn long-term dependencies is difficult due to vanishing gradients. We explore an alternative solution based on explicit memorization using linear autoencoders for sequences, which allows to maximize the short-term memory and that can be solved with a closed-form solution without backpropagation. We introduce an initialization schema that pretrains the weights of a recurrent neu… ▽ More Training RNNs to learn long-term dependencies is difficult due to vanishing gradients. We explore an alternative solution based on explicit memorization using linear autoencoders for sequences, which allows to maximize the short-term memory and that can be solved with a closed-form solution without backpropagation. We introduce an initialization schema that pretrains the weights of a recurrent neural network to approximate the linear autoencoder of the input sequences and we show how such pretraining can better support solving hard classification tasks with long sequences. We test our approach on sequential and permuted MNIST. We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase. △ Less

Submitted 5 November, 2020; originally announced November 2020.

Comments: Accepted at NeurIPS 2020 workshop "Beyond Backpropagation: Novel Ideas for Training Neural Architectures"

arXiv:2006.16800 [pdf, other]

Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory

Authors: Antonio Carta, Alessandro Sperduti, Davide Bacciu

Abstract: The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be introduced into a neural architecture by an appropriate modularization of the dynamic memory. In this paper we propose a novel incrementally trained recurrent architec… ▽ More The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be introduced into a neural architecture by an appropriate modularization of the dynamic memory. In this paper we propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. First, we show how to extend the architecture of a simple RNN by separating its hidden state into different modules, each subsampling the network hidden activations at different frequencies. Then, we discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies. Each new module works at a slower frequency than the previous ones and it is initialized to encode the subsampled sequence of hidden activations. Experimental results on synthetic and real-world datasets on speech recognition and handwritten characters show that the modular architecture and the incremental training algorithm improve the ability of recurrent neural networks to capture long-term dependencies. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: accepted @ ECML 2020. arXiv admin note: substantial text overlap with arXiv:2001.11771

arXiv:2004.04077 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207550

Continual Learning with Gated Incremental Memories for sequential data processing

Authors: Andrea Cossu, Antonio Carta, Davide Bacciu

Abstract: The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions. While the importance of continual learning is largely acknowledged in machine vision and reinforcement learning problems, this is mostly under-documented for sequence processing tas… ▽ More The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions. While the importance of continual learning is largely acknowledged in machine vision and reinforcement learning problems, this is mostly under-documented for sequence processing tasks. This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge. We also implement and test a popular CL approach, Elastic Weight Consolidation (EWC), on top of two different types of RNNs. Finally, we compare the performances of our enhanced architecture against EWC and RNNs on a set of standard CL benchmarks, adapted to the sequential data processing scenario. Results show the superior performance of our architecture and highlight the need for special solutions designed to address CL in RNNs. △ Less

Submitted 8 April, 2020; originally announced April 2020.

Comments: Accepted as a conference paper at 2020 International Joint Conference on Neural Networks (IJCNN 2020). Part of 2020 IEEE World Congress on Computational Intelligence (IEEE WCCI 2020)

arXiv:2001.11771 [pdf, other]

Encoding-based Memory Modules for Recurrent Neural Networks

Authors: Antonio Carta, Alessandro Sperduti, Davide Bacciu

Abstract: Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design and training of recurrent neural networks. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with… ▽ More Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design and training of recurrent neural networks. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. We extend the memorization component with a modular memory that encodes the hidden state sequence at different sampling frequencies. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. The experimental results on synthetic and real-world datasets show that specializing the training algorithm to train the memorization component always improves the final performance whenever the memorization of long sequences is necessary to solve the problem. △ Less

Submitted 31 January, 2020; originally announced January 2020.

Comments: preprint submitted at Elsevier Neural Networks

arXiv:2001.05494 [pdf, other]

Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders

Authors: Andrea Valenti, Antonio Carta, Davide Bacciu

Abstract: We address the challenging open problem of learning an effective latent space for symbolic music data in generative music modeling. We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information concerning music genre and style. Through the paper, we show how Gaussian mixtures taking into account music metadata informatio… ▽ More We address the challenging open problem of learning an effective latent space for symbolic music data in generative music modeling. We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information concerning music genre and style. Through the paper, we show how Gaussian mixtures taking into account music metadata information can be used as an effective prior for the autoencoder latent space, introducing the first Music Adversarial Autoencoder (MusAE). The empirical analysis on a large scale benchmark shows that our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders. It is also able to create realistic interpolations between two musical sequences, smoothly changing the dynamics of the different tracks. Experiments show that the model can organise its latent space accordingly to low-level properties of the musical pieces, as well as to embed into the latent variables the high-level genre information injected from the prior distribution to increase its overall performance. This allows us to perform changes to the generated pieces in a principled way. △ Less

Submitted 20 February, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: Accepted for publication at the 24th European Conference on Artificial Intelligence (ECAI2020)

arXiv:1811.03356 [pdf, other]

Linear Memory Networks

Authors: Davide Bacciu, Antonio Carta, Alessandro Sperduti

Abstract: Recurrent neural networks can learn complex transduction problems that require maintaining and actively exploiting a memory of their inputs. Such models traditionally consider memory and input-output functionalities indissolubly entangled. We introduce a novel recurrent architecture based on the conceptual separation between the functional input-output transformation and the memory mechanism, show… ▽ More Recurrent neural networks can learn complex transduction problems that require maintaining and actively exploiting a memory of their inputs. Such models traditionally consider memory and input-output functionalities indissolubly entangled. We introduce a novel recurrent architecture based on the conceptual separation between the functional input-output transformation and the memory mechanism, showing how they can be implemented through different neural components. By building on such conceptualization, we introduce the Linear Memory Network, a recurrent model comprising a feedforward neural network, realizing the non-linear functional transformation, and a linear autoencoder for sequences, implementing the memory component. The resulting architecture can be efficiently trained by building on closed-form solutions to linear optimization problems. Further, by exploiting equivalence results between feedforward and recurrent neural networks we devise a pretraining schema for the proposed architecture. Experiments on polyphonic music datasets show competitive results against gated recurrent networks and other state of the art models. △ Less

Submitted 8 November, 2018; originally announced November 2018.

arXiv:1807.01577 [pdf, other]

VideoKifu, or the automatic transcription of a Go game

Authors: Mario Corsolini, Andrea Carta

Abstract: In two previous papers [arXiv:1508.03269, arXiv:1701.05419] we described the techniques we employed for reconstructing the whole move sequence of a Go game. That task was at first accomplished by means of a series of photographs, manually shot, as explained during the scientific conference held within the LIX European Go Congress (Liberec, CZ). The photographs were subsequently replaced by a possi… ▽ More In two previous papers [arXiv:1508.03269, arXiv:1701.05419] we described the techniques we employed for reconstructing the whole move sequence of a Go game. That task was at first accomplished by means of a series of photographs, manually shot, as explained during the scientific conference held within the LIX European Go Congress (Liberec, CZ). The photographs were subsequently replaced by a possibly unattended video live stream (provided by webcams, videocameras, smartphones and so on) or, were the live stream not available, by means of a pre-recorded video of the game itself, on condition that the goban and the stones were clearly visible more often than not. As we hinted in the latter paper, in the last two years we have improved both the algorithms employed for reconstructing the grid and detecting the stones, making extensive usage of the multicore capabilities offered by modern CPUs. Those capabilities prompted us to develop some asynchronous routines, capable of double-checking the position of the grid and the number and colour of any stone previously detected, in order to get rid of minor errors possibly occurred during the main analysis, and that may pass undetected especially in the course of an unattended live streaming. Those routines will be described in details, as they address some problems that are of general interest when reconstructing the move sequence, for example what to do when large movements of the whole goban occur (deliberate or not) and how to deal with captures of dead stones $-$ that could be wrongly detected and recorded as "fresh" moves if not promptly removed. △ Less

Submitted 4 July, 2018; originally announced July 2018.

Comments: 14 pages, 6 figures. Accepted for the "International Conference on Research in Mind Games" (August 7-8, 2018) at the EGC in Pisa, Italy. Datasets available from http://www.oipaz.net/VideoKifu.html

ACM Class: I.2.10; I.4.8; I.5.5

arXiv:1701.05419 [pdf, other]

Moving to VideoKifu: the last steps toward a fully automatic record-kee** of a Go game

Authors: Mario Corsolini, Andrea Carta

Abstract: In a previous paper [ arXiv:1508.03269 ] we described the techniques we successfully employed for automatically reconstructing the whole move sequence of a Go game by means of a set of pictures. Now we describe how it is possible to reconstruct the move sequence by means of a video stream (which may be provided by an unattended webcam), possibly in real-time. Although the basic algorithms remain t… ▽ More In a previous paper [ ar**, built on the ideas here outlined. △ Less

Submitted 19 January, 2017; originally announced January 2017.

Comments: 20 pages, 14 figures. Accepted for publication in the "Journal of Baduk Studies", datasets available from http://www.oipaz.net/PhotoKifu.html

ACM Class: I.2.10; I.4.8; I.5.5

Journal ref: Journal of Baduk Studies, 13(2):45-63, December 2016

arXiv:1508.03269 [pdf, other]

A New Approach to an Old Problem: The Reconstruction of a Go Game through a Series of Photographs

Authors: Mario Corsolini, Andrea Carta

Abstract: Given a series of photographs taken during a Go game, we describe the techniques we successfully employ for pinpointing the grid lines of the Go board and for tracking their small movements between consecutive photographs; then we discuss how to approximate the location and orientation of the observer's point of view, in order to compensate for projection effects. Finally we describe the different… ▽ More Given a series of photographs taken during a Go game, we describe the techniques we successfully employ for pinpointing the grid lines of the Go board and for tracking their small movements between consecutive photographs; then we discuss how to approximate the location and orientation of the observer's point of view, in order to compensate for projection effects. Finally we describe the different criteria that jointly form the algorithm for stones' detection, thus enabling us to automatically reconstruct the whole move sequence. △ Less

Submitted 20 April, 2018; v1 submitted 13 August, 2015; originally announced August 2015.

Comments: 13 pages, 18 figures, datasets available from http://www.oipaz.net/VideoKifu.html - added references in section 1, updated addresses, added indication that both authors contributed equally

ACM Class: I.2.10; I.4.8; I.5.5

Journal ref: "Proceedings of the Second International Go Game Science Conference", Liberec, Czech Republic, 2015/07/30. ISBN 978-80-7378-299-3

Showing 1–32 of 32 results for author: Carta, A