Skip to main content

Showing 1–21 of 21 results for author: Yanardag, P

.
  1. arXiv:2406.14599  [pdf, other

    cs.CV

    Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models

    Authors: Matthew Zheng, Enis Simsar, Hidir Yesiltepe, Federico Tombari, Joel Simon, Pinar Yanardag

    Abstract: Text-to-image models are becoming increasingly popular, revolutionizing the landscape of digital art creation by enabling highly detailed and creative visual content generation. These models have been widely employed across various domains, particularly in art generation, where they facilitate a broad spectrum of creative expression and democratize access to artistic creation. In this paper, we in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.02820  [pdf, other

    cs.CV cs.LG

    ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models

    Authors: Kiymet Akdemir, Pinar Yanardag

    Abstract: Text-to-image diffusion models have recently taken center stage as pivotal tools in promoting visual creativity across an array of domains such as comic book artistry, children's literature, game development, and web design. These models harness the power of artificial intelligence to convert textual descriptions into vivid images, thereby enabling artists and creators to bring their imaginative c… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2406.00457  [pdf, other

    cs.CV

    The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP

    Authors: Hidir Yesiltepe, Yusuf Dalva, Pinar Yanardag

    Abstract: Diffusion models have become prominent in creating high-quality images. However, unlike GAN models celebrated for their ability to edit images in a disentangled manner, diffusion-based text-to-image models struggle to achieve the same level of precise attribute manipulation without compromising image coherence. In this paper, CLIP which is often used in popular text-to-image diffusion models such… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  4. arXiv:2403.19776  [pdf, other

    cs.CV cs.LG

    CLoRA: A Contrastive Approach to Compose Multiple LoRA Models

    Authors: Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag

    Abstract: Low-Rank Adaptations (LoRAs) have emerged as a powerful and popular technique in the field of image generation, offering a highly effective way to adapt and refine pre-trained deep learning models for specific tasks without the need for comprehensive retraining. By employing pre-trained LoRA models, such as those representing a specific cat and a particular dog, the objective is to generate an ima… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  5. arXiv:2403.19738  [pdf, other

    cs.CV

    MIST: Mitigating Intersectional Bias with Disentangled Cross-Attention Editing in Text-to-Image Diffusion Models

    Authors: Hidir Yesiltepe, Kiymet Akdemir, Pinar Yanardag

    Abstract: Diffusion-based text-to-image models have rapidly gained popularity for their ability to generate detailed and realistic images from textual descriptions. However, these models often reflect the biases present in their training data, especially impacting marginalized groups. While prior efforts to debias language models have focused on addressing specific biases, such as racial or gender biases, e… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  6. arXiv:2403.19645  [pdf, other

    cs.CV

    GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models

    Authors: Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag

    Abstract: The rapid advancement in image generation models has predominantly been driven by diffusion models, which have demonstrated unparalleled success in generating high-fidelity, diverse images from textual prompts. Despite their success, diffusion models encounter substantial challenges in the domain of image editing, particularly in executing disentangled edits-changes that target specific attributes… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://gantastic.github.io

  7. arXiv:2312.06059  [pdf, other

    cs.CV cs.AI cs.LG

    CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models

    Authors: Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag

    Abstract: Images produced by text-to-image diffusion models might not always faithfully represent the semantic intent of the provided text prompt, where the model might overlook or entirely fail to produce certain objects. Existing solutions often require customly tailored functions for each of these problems, leading to sub-optimal results, especially for complex prompts. Our work introduces a novel perspe… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  8. arXiv:2312.05390  [pdf, other

    cs.CV

    NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models

    Authors: Yusuf Dalva, Pinar Yanardag

    Abstract: Generative models have been very popular in the recent years for their image generation capabilities. GAN-based models are highly regarded for their disentangled latent space, which is a key feature contributing to their success in controlled image editing. On the other hand, diffusion models have emerged as powerful tools for generating high-quality images. However, the latent space of diffusion… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Project page: https://noiseclr.github.io/

  9. arXiv:2312.04524  [pdf, other

    cs.CV

    RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

    Authors: Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe, James M. Rehg, Pinar Yanardag

    Abstract: Recent advancements in diffusion-based models have demonstrated significant success in generating images from text. However, video editing models have not yet reached the same level of visual quality and user control. To address this, we introduce RAVE, a zero-shot video editing method that leverages pre-trained text-to-image diffusion models without additional training. RAVE takes an input video… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project webpage: https://rave-video.github.io , Github: http://github.com/rehg-lab/RAVE

  10. arXiv:2212.02184  [pdf, other

    cs.CV cs.AI

    3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes

    Authors: Alara Dirik, Pinar Yanardag

    Abstract: Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic that can unlock various interesting use cases such as interactive design. In this work, we propose a novel framework that leverages the intermediate latent spaces… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: Accepted to NeurIPS - WiML workshop 2022

  11. arXiv:2203.08516  [pdf, other

    cs.CV

    Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs

    Authors: Enis Simsar, Umut Kocasari, Ezgi Gülperi Er, Pinar Yanardag

    Abstract: The discovery of interpretable directions in the latent spaces of pre-trained GAN models has recently become a popular topic. In particular, StyleGAN2 has enabled various image generation and manipulation tasks due to its rich and disentangled latent spaces. The discovery of such directions is typically done either in a supervised manner, which requires annotated data for each desired manipulation… ▽ More

    Submitted 31 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

  12. arXiv:2202.11772  [pdf, other

    cs.CV cs.LG

    Discovering Multiple and Diverse Directions for Cognitive Image Properties

    Authors: Umut Kocasari, Alperen Bag, Oguz Kaan Yuksel, Pinar Yanardag

    Abstract: Recent research has shown that it is possible to find interpretable directions in the latent spaces of pre-trained GANs. These directions enable controllable generation and support a variety of semantic editing operations. While previous work has focused on discovering a single direction that performs a desired editing operation such as zoom-in, limited work has been done on the discovery of multi… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  13. arXiv:2202.06240  [pdf, other

    cs.CV cs.LG

    FairStyle: Debiasing StyleGAN2 with Style Channel Manipulations

    Authors: Cemre Karakas, Alara Dirik, Eylul Yalcinkaya, Pinar Yanardag

    Abstract: Recent advances in generative adversarial networks have shown that it is possible to generate high-resolution and hyperrealistic images. However, the images produced by GANs are only as fair and representative as the datasets on which they are trained. In this paper, we propose a method for directly modifying a pre-trained StyleGAN2 model that can be used to generate a balanced set of images with… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

  14. arXiv:2202.06079  [pdf, other

    cs.CV cs.GR cs.LG

    Text and Image Guided 3D Avatar Generation and Manipulation

    Authors: Zehranaz Canfes, M. Furkan Atasoy, Alara Dirik, Pinar Yanardag

    Abstract: The manipulation of latent space has recently become an interesting topic in the field of generative models. Recent research shows that latent directions can be used to manipulate images towards certain attributes. However, controlling the generation process of 3D generative models remains a challenge. In this work, we propose a novel 3D manipulation method that can manipulate both the shape and t… ▽ More

    Submitted 12 February, 2022; originally announced February 2022.

  15. arXiv:2112.08493  [pdf, other

    cs.CV cs.LG

    StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation

    Authors: Umut Kocasari, Alara Dirik, Mert Tiftikci, Pinar Yanardag

    Abstract: Discovering meaningful directions in the latent space of GANs to manipulate semantic attributes typically requires large amounts of labeled data. Recent work aims to overcome this limitation by leveraging the power of Contrastive Language-Image Pre-training (CLIP), a joint text-image model. While promising, these methods require several hours of preprocessing or training to achieve the desired man… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2022)

  16. arXiv:2112.06978  [pdf, other

    cs.CV cs.LG

    Exploring Latent Dimensions of Crowd-sourced Creativity

    Authors: Umut Kocasari, Alperen Bag, Efehan Atici, Pinar Yanardag

    Abstract: Recently, the discovery of interpretable directions in the latent spaces of pre-trained GANs has become a popular topic. While existing works mostly consider directions for semantic image manipulations, we focus on an abstract property: creativity. Can we manipulate an image to be more or less creative? We build our work on the largest AI-based creativity platform, Artbreeder, where users can gene… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 5th Workshop on Machine Learning for Creativity and Design (NeurIPS 2021), Sydney, Australia

  17. arXiv:2112.06953  [pdf, other

    cs.CL cs.AI cs.LG

    Controlled Cue Generation for Play Scripts

    Authors: Alara Dirik, Hilal Donmez, Pinar Yanardag

    Abstract: In this paper, we use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. Using over one million lines of dialogue and cues, we approach the problem of cue generation as a controlled text generation task, and show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator. In additi… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

  18. arXiv:2108.09752  [pdf, other

    cs.CV

    Graph2Pix: A Graph-Based Image to Image Translation Framework

    Authors: Dilara Gokay, Enis Simsar, Efehan Atici, Alper Ahmetoglu, Atif Emre Yuksel, Pinar Yanardag

    Abstract: In this paper, we propose a graph-based image-to-image translation framework for generating images. We use rich data collected from the popular creativity platform Artbreeder (http://artbreeder.com), where users interpolate multiple GAN-generated images to create artworks. This unique approach of creating new images leads to a tree-like structure where one can track historical data about the creat… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

  19. arXiv:2104.00820  [pdf, other

    cs.LG cs.CV

    LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions

    Authors: Oğuz Kaan Yüksel, Enis Simsar, Ezgi Gülperi Er, Pinar Yanardag

    Abstract: Recent research has shown that it is possible to find interpretable directions in the latent spaces of pre-trained Generative Adversarial Networks (GANs). These directions enable controllable image generation and support a wide range of semantic editing operations, such as zoom or rotation. The discovery of such directions is often done in a supervised or semi-supervised manner and requires manual… ▽ More

    Submitted 6 October, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  20. arXiv:1506.02761  [pdf, other

    cs.CL cs.LG stat.ML

    WordRank: Learning Word Embeddings via Robust Ranking

    Authors: Shihao Ji, Hyokun Yun, Pinar Yanardag, Shin Matsushima, S. V. N. Vishwanathan

    Abstract: Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on… ▽ More

    Submitted 27 September, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), November 1-5, 2016, Austin, Texas, USA

  21. arXiv:1403.0598  [pdf, other

    cs.LG

    The Structurally Smoothed Graphlet Kernel

    Authors: Pinar Yanardag, S. V. N. Vishwanathan

    Abstract: A commonly used paradigm for representing graphs is to use a vector that contains normalized frequencies of occurrence of certain motifs or sub-graphs. This vector representation can be used in a variety of applications, such as, for computing similarity between graphs. The graphlet kernel of Shervashidze et al. [32] uses induced sub-graphs of k nodes (christened as graphlets by Przulj [28]) as mo… ▽ More

    Submitted 3 March, 2014; originally announced March 2014.