SoK: Memorisation in machine learning

Usynin, Dmitrii; Knolle, Moritz; Kaissis, Georgios

Abstract:Quantifying the impact of individual data samples on machine learning models is an open research problem. This is particularly relevant when complex and high-dimensional relationships have to be learned from a limited sample of the data generating distribution, such as in deep learning. It was previously shown that, in these cases, models rely not only on extracting patterns which are helpful for generalisation, but also seem to be required to incorporate some of the training data more or less as is, in a process often termed memorisation. This raises the question: if some memorisation is a requirement for effective learning, what are its privacy implications? In this work we unify a broad range of previous definitions and perspectives on memorisation in ML, discuss their interplay with model generalisation and their implications of these phenomena on data privacy. Moreover, we systematise methods allowing practitioners to detect the occurrence of memorisation or quantify it and contextualise our findings in a broad range of ML learning settings. Finally, we discuss memorisation in the context of privacy attacks, differential privacy (DP) and adversarial actors.

Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Cite as:	arXiv:2311.03075 [cs.LG]
	(or arXiv:2311.03075v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.03075

Computer Science > Machine Learning

Title:SoK: Memorisation in machine learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators