Skip to main content

Showing 1–9 of 9 results for author: Gokul, A

.
  1. arXiv:2404.04465  [pdf, other

    cs.CV

    Aligning Diffusion Models by Optimizing Human Utility

    Authors: Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozuka

    Abstract: We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by formulating the alignment objective as the maximization of expected human utility. Since this objective applies to each generation independently, Diffusion-KTO does not require collecting costly pairwise preference data nor training a complex reward model. Instead, our objective requires simple per-image bina… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 27 pages, 11 figures

  2. arXiv:2401.13974  [pdf, other

    cs.CV cs.AI cs.GR

    BootPIG: Bootstrap** Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models

    Authors: Senthil Purushwalkam, Akash Gokul, Shafiq Joty, Nikhil Naik

    Abstract: Recent text-to-image generation models have demonstrated incredible success in generating images that faithfully follow input prompts. However, the requirement of using words to describe a desired concept provides limited control over the appearance of the generated concepts. In this work, we address this shortcoming by proposing an approach to enable personalization capabilities in existing text-… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  3. arXiv:2310.08992  [pdf, other

    cs.AI cs.CL cs.PL

    CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

    Authors: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty

    Abstract: Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modul… ▽ More

    Submitted 13 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  4. arXiv:2307.08962  [pdf, other

    cs.AI cs.LG

    REX: Rapid Exploration and eXploitation for AI Agents

    Authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer… ▽ More

    Submitted 26 January, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

  5. arXiv:2303.13703  [pdf, other

    cs.CV cs.AI cs.LG

    End-to-End Diffusion Latent Optimization Improves Classifier Guidance

    Authors: Bram Wallace, Akash Gokul, Stefano Ermon, Nikhil Naik

    Abstract: Classifier guidance -- using the gradients of an image classifier to steer the generations of a diffusion model -- has the potential to dramatically expand the creative control over image generation and editing. However, currently classifier guidance requires either training new noise-aware models to obtain accurate gradients or using a one-step denoising approximation of the final generation, whi… ▽ More

    Submitted 31 May, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  6. arXiv:2211.12446  [pdf, other

    cs.CV cs.AI cs.LG

    EDICT: Exact Diffusion Inversion via Coupled Transformations

    Authors: Bram Wallace, Akash Gokul, Nikhil Naik

    Abstract: Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate stat… ▽ More

    Submitted 22 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: 24 pages, 22 figures. Code now available

  7. arXiv:2209.03745  [pdf, other

    cs.CV

    Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

    Authors: Kevin Miao, Akash Gokul, Raghav Singh, Suzanne Petryk, Joseph Gonzalez, Kurt Keutzer, Trevor Darrell, Colorado Reed

    Abstract: Recent trends in self-supervised representation learning have focused on removing inductive biases from training pipelines. However, inductive biases can be useful in settings when limited data are available or provide additional insight into the underlying data distribution. We present spatial prior attention (SPAN), a framework that takes advantage of consistent spatial and semantic structure in… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

  8. arXiv:2208.11821  [pdf, other

    cs.CV

    Refine and Represent: Region-to-Object Representation Learning

    Authors: Akash Gokul, Konstantinos Kallidromitis, Shufan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed

    Abstract: Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives. In this paper, we present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining. R2O operates by training an encoder to dynamically refine region-based seg… ▽ More

    Submitted 20 December, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

  9. arXiv:2010.01528  [pdf, other

    cs.CV cs.AI cs.LG

    Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

    Authors: Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph E. Gonzalez, Marcus Rohrbach, Trevor Darrell

    Abstract: The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting. Previous work has shown that leveraging memory in the form of a replay buffer can reduce performance degradation on prior tasks. We hypothesize that forgetting can be further reduced when the model is encouraged to remember the \textit{evidence} for previously made… ▽ More

    Submitted 2 May, 2021; v1 submitted 4 October, 2020; originally announced October 2020.

    Comments: Accepted at ICLR 2021