Skip to main content

Showing 1–3 of 3 results for author: Yang, M Y R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.06737  [pdf, other

    cs.LG cs.CR

    Disguised Copyright Infringement of Latent Diffusion Models

    Authors: Yiwei Lu, Matthew Y. R. Yang, Zuoqiu Liu, Gautam Kamath, Yaoliang Yu

    Abstract: Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright i… ▽ More

    Submitted 3 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2402.12626  [pdf, other

    cs.LG cs.CR

    Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors

    Authors: Yiwei Lu, Matthew Y. R. Yang, Gautam Kamath, Yaoliang Yu

    Abstract: Machine learning models have achieved great success in supervised learning tasks for end-to-end training, which requires a large amount of labeled data that is not always feasible. Recently, many practitioners have shifted to self-supervised learning methods that utilize cheap unlabeled data to learn a general feature extractor via pre-training, which can be further applied to personalized downstr… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to SaTML 2024

  3. arXiv:2212.09410  [pdf, other

    cs.CL

    Less is More: Parameter-Free Text Classification with Gzip

    Authors: Zhiying Jiang, Matthew Y. R. Yang, Mikhail Tsirlin, Raphael Tang, Jimmy Lin

    Abstract: Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.