Skip to main content

Showing 1–10 of 10 results for author: Vero, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07217  [pdf, other

    cs.LG cs.AI cs.CL

    A Synthetic Dataset for Personal Attribute Inference

    Authors: Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev

    Abstract: Recently, powerful Large Language Models (LLMs) have become easily accessible to hundreds of millions of users worldwide. However, their strong capabilities and vast world knowledge do not come without associated privacy risks. In this work, we focus on the emerging privacy threat LLMs pose - the ability to accurately infer personal information from online texts. Despite the growing importance of… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2405.18161  [pdf, other

    cs.LG cs.AI

    Back to the Drawing Board for Fair Representation Learning

    Authors: Angéline Pouget, Nikola Jovanović, Mark Vero, Robin Staab, Martin Vechev

    Abstract: The goal of Fair Representation Learning (FRL) is to mitigate biases in machine learning models by learning data representations that enable high accuracy on downstream tasks while minimizing discrimination based on sensitive attributes. The evaluation of FRL methods in many recent works primarily focuses on the tradeoff between downstream fairness and accuracy with respect to a single task that w… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2405.18137  [pdf, other

    cs.LG cs.AI cs.CR

    Exploiting LLM Quantization

    Authors: Kazuki Egashira, Mark Vero, Robin Staab, **gxuan He, Martin Vechev

    Abstract: Quantization leverages lower-precision weights to reduce the memory usage of large language models (LLMs) and is a key technique for enabling their deployment on commodity hardware. While LLM quantization's impact on utility has been extensively explored, this work for the first time studies its adverse effects from a security perspective. We reveal that widely used quantization methods can be exp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2404.10618  [pdf, other

    cs.AI cs.CV cs.LG

    Private Attribute Inference from Images with Vision-Language Models

    Authors: Batuhan Tömekçe, Mark Vero, Robin Staab, Martin Vechev

    Abstract: As large language models (LLMs) become ubiquitous in our daily tasks and digital interactions, associated privacy risks are increasingly in focus. While LLM privacy research has primarily focused on the leakage of model training data, it has recently been shown that the increase in models' capabilities has enabled LLMs to make accurate privacy-infringing inferences from previously unseen texts. Wi… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  5. arXiv:2402.13846  [pdf, other

    cs.AI cs.CL cs.CR

    Large Language Models are Advanced Anonymizers

    Authors: Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev

    Abstract: Recent work in privacy research on large language models has shown that they achieve near human-level performance at inferring personal data from real-world online texts. With consistently increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. This raises the question of how individuals can effectively protec… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    ACM Class: I.2.7

  6. arXiv:2402.09497  [pdf, other

    cs.CR cs.AI cs.LG cs.SE

    Instruction Tuning for Secure Code Generation

    Authors: **gxuan He, Mark Vero, Gabriela Krasnopolska, Martin Vechev

    Abstract: Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the secu… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  7. arXiv:2310.07298  [pdf, other

    cs.AI cs.LG

    Beyond Memorization: Violating Privacy Via Inference with Large Language Models

    Authors: Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev

    Abstract: Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first compr… ▽ More

    Submitted 6 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    ACM Class: I.2.7

  8. arXiv:2307.03577  [pdf, other

    cs.LG cs.DB cs.PL

    CuTS: Customizable Tabular Synthetic Data Generation

    Authors: Mark Vero, Mislav Balunović, Martin Vechev

    Abstract: Privacy, data quality, and data sharing concerns pose a key limitation for tabular data applications. While generating synthetic data resembling the original distribution addresses some of these issues, most applications would benefit from additional customization on the generated data. However, existing synthetic data approaches are limited to particular constraints, e.g., differential privacy (D… ▽ More

    Submitted 2 June, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

  9. arXiv:2210.01785  [pdf, other

    cs.LG cs.CR cs.DC

    TabLeak: Tabular Data Leakage in Federated Learning

    Authors: Mark Vero, Mislav Balunović, Dimitar I. Dimitrov, Martin Vechev

    Abstract: While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the… ▽ More

    Submitted 7 July, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    ACM Class: I.2.11

  10. Reducing Neural Architecture Search Spaces with Training-Free Statistics and Computational Graph Clustering

    Authors: Thorir Mar Ingolfsson, Mark Vero, Xiaying Wang, Lorenzo Lamberti, Luca Benini, Matteo Spallanzani

    Abstract: The computational demands of neural architecture search (NAS) algorithms are usually directly proportional to the size of their target search spaces. Thus, limiting the search to high-quality subsets can greatly reduce the computational load of NAS algorithms. In this paper, we present Clustering-Based REDuction (C-BRED), a new technique to reduce the size of NAS search spaces. C-BRED reduces a NA… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    ACM Class: I.m