-
A Survey on Knowledge Editing of Neural Networks
Authors:
Vittorio Mazzia,
Alessandro Pedrani,
Andrea Caciolai,
Kay Rottmann,
Davide Bernardi
Abstract:
Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or u…
▽ More
Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grou** works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.
△ Less
Submitted 14 December, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
Authors:
Jack FitzGerald,
Shankar Ananthakrishnan,
Konstantine Arkoudas,
Davide Bernardi,
Abhishek Bhagia,
Claudio Delli Bovi,
** Cao,
Rakesh Chada,
Amit Chauhan,
Luoxin Chen,
Anurag Dwarakanath,
Satyam Dwivedi,
Turan Gojayev,
Karthik Gopalakrishnan,
Thomas Gueudre,
Dilek Hakkani-Tur,
Wael Hamza,
Jonathan Hueser,
Kevin Martin Jose,
Haidar Khan,
Beiye Liu,
Jianhua Lu,
Alessandro Manzotti,
Pradeep Natarajan,
Karolina Owczarzak
, et al. (16 additional authors not shown)
Abstract:
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform co…
▽ More
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform comparably to XLM-R and mT5 when evaluated on the written-form Cross-lingual Natural Language Inference (XNLI) corpus. We perform a second stage of pretraining on our teacher models using in-domain data from our system, improving error rates by 3.86% relative for intent classification and 7.01% relative for slot filling. We find that even a 170M-parameter model distilled from our Stage 2 teacher model has 2.88% better intent classification and 7.69% better slot filling error rates when compared to the 2.3B-parameter teacher trained only on public data (Stage 1), emphasizing the importance of in-domain data for pretraining. When evaluated offline using labeled NLU data, our 17M-parameter Stage 2 distilled model outperforms both XLM-R Base (85M params) and DistillBERT (42M params) by 4.23% to 6.14%, respectively. Finally, we present results from a full virtual assistant experimentation platform, where we find that models trained using our pretraining and distillation pipeline outperform models distilled from 85M-parameter teachers by 3.74%-4.91% on an automatic measurement of full-system user dissatisfaction.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Mammographic density: Comparison of visual assessment with fully automatic calculation on a multivendor dataset
Authors:
Daniela Sacchetto,
Lia Morra,
Silvano Agliozzo,
Daniela Bernardi,
Tomas Bjorklund,
Beniamino Brancato,
Patrizia Bravetti,
Luca A. Carbonaro,
Loredana Correale,
Carmen Fantò,
Elisabetta Favettini,
Laura Martincich,
Luisella Milanesio,
Sara Mombelloni,
Francesco Monetti,
Doralba Morrone,
Marco Pellegrini,
Barbara Pesce,
Antonella Petrillo,
Gianni Saguatti,
Carmen Stevanin,
Rubina M. Trimboli,
Paola Tuttobene,
Marvi Valentini,
Vincenzo Marra
, et al. (3 additional authors not shown)
Abstract:
Objectives: To compare breast density (BD) assessment provided by an automated BD evaluator (ABDE) with that provided by a panel of experienced breast radiologists, on a multivendor dataset.
Methods: Twenty-one radiologists assessed 613 screening/diagnostic digital mammograms from 9 centers and 6 different vendors, using the BI-RADS a, b, c, and d density classification. The same mammograms were…
▽ More
Objectives: To compare breast density (BD) assessment provided by an automated BD evaluator (ABDE) with that provided by a panel of experienced breast radiologists, on a multivendor dataset.
Methods: Twenty-one radiologists assessed 613 screening/diagnostic digital mammograms from 9 centers and 6 different vendors, using the BI-RADS a, b, c, and d density classification. The same mammograms were also evaluated by an ABDE providing the ratio between fibroglandular and total breast area on a continuous scale and, automatically, the BI-RADS score. Panel majority report (PMR) was used as reference standard. Agreement (k) and accuracy (proportion of cases correctly classified) were calculated for binary (BI-RADS a-b versus c-d) and 4-class classification.
Results: While the agreement of individual radiologists with PMR ranged from k=0.483 to k=0.885, the ABDE correctly classified 563/613 mammograms (92%). A substantial agreement for binary classification was found for individual reader pairs (k=0.620, standard deviation [SD]=0.140), individual versus PMR (k=0.736, SD=0.117), and individual versus ABDE (k=0.674, SD=0.095). Agreement between ABDE and PMR was almost perfect (k=0.831).
Conclusions: The ABDE showed an almost perfect agreement with a 21-radiologist panel in binary BD classification on a multivendor dataset, earning a chance as a reproducible alternative to visual evaluation.
△ Less
Submitted 13 November, 2018;
originally announced November 2018.
-
Fast Flux Service Network Detection via Data Mining on Passive DNS Traffic
Authors:
Pierangelo Lombardo,
Salvatore Saeli,
Federica Bisio,
Davide Bernardi,
Danilo Massa
Abstract:
In the last decade, the use of fast flux technique has become established as a common practice to organise botnets in Fast Flux Service Networks (FFSNs), which are platforms able to sustain illegal online services with very high availability. In this paper, we report on an effective fast flux detection algorithm based on the passive analysis of the Domain Name System (DNS) traffic of a corporate n…
▽ More
In the last decade, the use of fast flux technique has become established as a common practice to organise botnets in Fast Flux Service Networks (FFSNs), which are platforms able to sustain illegal online services with very high availability. In this paper, we report on an effective fast flux detection algorithm based on the passive analysis of the Domain Name System (DNS) traffic of a corporate network. The proposed method is based on the near-real-time identification of different metrics that measure a wide range of fast flux key features; the metrics are combined via a simple but effective mathematical and data mining approach. The proposed solution has been evaluated in a one-month experiment over an enterprise network, with the injection of pcaps associated with different malware campaigns, that leverage FFSNs and cover a wide variety of attack scenarios. An in-depth analysis of a list of fast flux domains confirmed the reliability of the metrics used in the proposed algorithm and allowed for the identification of many IPs that turned out to be part of two notorious FFSNs, namely Dark Cloud and SandiFlux, to the description of which we therefore contribute. All the fast flux domains were detected with a very low false positive rate; a comparison of performance indicators with previous works show a remarkable improvement.
△ Less
Submitted 18 September, 2018; v1 submitted 17 April, 2018;
originally announced April 2018.