Search | arXiv e-print repository

arXiv:2402.02081 [pdf, other]

Risk-Sensitive Diffusion for Perturbation-Robust Optimization

Authors: Yangming Li, Max Ruiz Luyten, Mihaela van der Schaar

Abstract: The essence of score-based generative models (SGM) is to optimize a score-based model towards the score function. However, we show that noisy samples incur another objective function, rather than the one with score function, which will wrongly optimize the model. To address this problem, we first consider a new setting where every noisy sample is paired with a risk vector, indicating the data qual… ▽ More The essence of score-based generative models (SGM) is to optimize a score-based model towards the score function. However, we show that noisy samples incur another objective function, rather than the one with score function, which will wrongly optimize the model. To address this problem, we first consider a new setting where every noisy sample is paired with a risk vector, indicating the data quality (e.g., noise level). This setting is very common in real-world applications, especially for medical and sensor data. Then, we introduce risk-sensitive SDE, a type of stochastic differential equation (SDE) parameterized by the risk vector. With this tool, we aim to minimize a measure called perturbation instability, which we define to quantify the negative impact of noisy samples on optimization. We will prove that zero instability measure is only achievable in the case where noisy samples are caused by Gaussian perturbation. For non-Gaussian cases, we will also provide its optimal coefficients that minimize the misguidance of noisy samples. To apply risk-sensitive SDE in practice, we extend widely used diffusion models to their risk-sensitive versions and derive a risk-free loss that is efficient for computation. We also have conducted numerical experiments to confirm the validity of our theorems and show that they let SGM be robust to noisy samples for optimization. △ Less

Submitted 5 April, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: Under review paper

arXiv:2310.02003 [pdf, other]

L2MAC: Large Language Model Automatic Computer for Extensive Code Generation

Authors: Samuel Holt, Max Ruiz Luyten, Mihaela van der Schaar

Abstract: Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation… ▽ More Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, significantly outperforming other coding methods in implementing the detailed user-specified task; we show that L2MAC works for general-purpose extensive text-based tasks, such as writing an entire book; and we provide valuable insights into L2MAC's performance improvement over existing methods. △ Less

Submitted 10 April, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

ACM Class: I.2.7; I.2.6; I.2.5; D.2.2; D.2.3; D.3.4

arXiv:2211.00227 [pdf, other]

Transfer Learning with Kernel Methods

Authors: Adityanarayanan Radhakrishnan, Max Ruiz Luyten, Neha Prasad, Caroline Uhler

Abstract: Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple machine learning models that are competitive on a variety of tasks, it has been unclear how to perform transfer learning for kernel methods. In this work, we propose a transfer learning framework for kernel methods by projecting and… ▽ More Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple machine learning models that are competitive on a variety of tasks, it has been unclear how to perform transfer learning for kernel methods. In this work, we propose a transfer learning framework for kernel methods by projecting and translating the source model to the target task. We demonstrate the effectiveness of our framework in applications to image classification and virtual drug screening. In particular, we show that transferring modern kernels trained on large-scale image datasets can result in substantial performance increase as compared to using the same kernel trained directly on the target task. In addition, we show that transfer-learned kernels allow a more accurate prediction of the effect of drugs on cancer cell lines. For both applications, we identify simple scaling laws that characterize the performance of transfer-learned kernels as a function of the number of target examples. We explain this phenomenon in a simplified linear setting, where we are able to derive the exact scaling laws. By providing a simple and effective transfer learning framework for kernel methods, our work enables kernel methods trained on large datasets to be easily adapted to a variety of downstream target tasks. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Showing 1–3 of 3 results for author: Luyten, M R