Skip to main content

Showing 1–50 of 54 results for author: Elgammal, A

.
  1. arXiv:2404.05674  [pdf, other

    cs.CV

    MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

    Authors: Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang

    Abstract: In this paper, we present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities. As foundational text-to-image models rapidly evolve, the demand for robust image-to-image translation grows. Addressing this need, MoMA specializes in subject-driven personalized image generation. Utilizing an open-source, Multimodal Large Language Model (MLLM), w… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  2. arXiv:2402.02453  [pdf, other

    cs.CV

    AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art

    Authors: Faizan Farooq Khan, Diana Kim, Divyansh Jha, Youssef Mohamed, Hanna H Chang, Ahmed Elgammal, Luba Elliott, Mohamed Elhoseiny

    Abstract: Discovering the creative potentials of a random signal to various artistic expressions in aesthetic and conceptual richness is a ground for the recent success of generative machine learning as a way of art creation. To understand the new artistic medium better, we conduct a comprehensive analysis to position AI-generated art within the context of human art heritage. Our comparative analysis is bas… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  3. arXiv:2212.04473  [pdf, other

    cs.CV

    Diffusion Guided Domain Adaptation of Image Generators

    Authors: Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, Ahmed Elgammal

    Abstract: Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to another domain? In this paper, we show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models. Generators can be efficiently shifted into new domains indicated by text prompts without access to groundt… ▽ More

    Submitted 9 December, 2022; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Project website: https://styleganfusion.github.io/

  4. arXiv:2201.01819  [pdf, other

    cs.LG cs.CL cs.CV

    Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models

    Authors: Diana Kim, Ahmed Elgammal, Marian Mazzone

    Abstract: We present a machine learning system that can quantify fine art paintings with a set of visual elements and principles of art. This formal analysis is fundamental for understanding art, but develo** such a system is challenging. Paintings have high visual complexities, but it is also difficult to collect enough training data with direct labels. To resolve these practical limitations, we introduc… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

    Comments: 23 pages, This paper is an extended version of a paper that will be published at the 36th AAAI Conference on Artificial Intelligence, to beheld in Vancouver, BC, Canada, February 22 - March 1, 2022

    ACM Class: I.2.6; I.2.7; I.2.10; J.5

  5. arXiv:2101.04775  [pdf, other

    cs.CV cs.AI

    Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

    Authors: Bingchen Liu, Yizhe Zhu, Kunpeng Song, Ahmed Elgammal

    Abstract: Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose a light-weight GAN structure that gains superior quality on 1024*1024 resolution. Notably, the model converges from scratch with just a few hou… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: ICLR-2021

  6. arXiv:2012.09290  [pdf, other

    cs.CV cs.GR cs.MM

    Self-Supervised Sketch-to-Image Synthesis

    Authors: Bingchen Liu, Yizhe Zhu, Kunpeng Song, Ahmed Elgammal

    Abstract: Imagining a colored realistic image from an arbitrarily drawn sketch is one of the human capabilities that we eager machines to mimic. Unlike previous methods that either requires the sketch-image pairs or utilize low-quantity detected edges as sketches, we study the exemplar-based sketch-to-image (s2i) synthesis task in a self-supervised learning manner, eliminating the necessity of the paired sk… ▽ More

    Submitted 22 December, 2020; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: AAAI-2021

  7. arXiv:2010.01473  [pdf, other

    cs.LG eess.IV stat.ML

    Spatial Frequency Bias in Convolutional Generative Adversarial Networks

    Authors: Mahyar Khayatkhoei, Ahmed Elgammal

    Abstract: As the success of Generative Adversarial Networks (GANs) on natural images quickly propels them into various real-life applications across different domains, it becomes more and more important to clearly understand their limitations. Specifically, understanding GANs' capability across the full spectrum of spatial frequencies, i.e. beyond the low-frequency dominant spectrum of natural images, is cr… ▽ More

    Submitted 18 December, 2020; v1 submitted 3 October, 2020; originally announced October 2020.

  8. arXiv:2005.13192  [pdf, other

    cs.CV

    TIME: Text and Image Mutual-Translation Adversarial Networks

    Authors: Bingchen Liu, Kunpeng Song, Yizhe Zhu, Gerard de Melo, Ahmed Elgammal

    Abstract: Focusing on text-to-image (T2I) generation, we propose Text and Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns a T2I generator G and an image captioning discriminator D under the Generative Adversarial Network framework. While previous methods tackle the T2I problem as a uni-directional task and use pre-trained language models to enforce… ▽ More

    Submitted 22 December, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: AAAI-2021

  9. arXiv:2002.12888  [pdf, other

    cs.CV eess.IV

    Sketch-to-Art: Synthesizing Stylized Art Images From Sketches

    Authors: Bingchen Liu, Kunpeng Song, Ahmed Elgammal

    Abstract: We propose a new approach for synthesizing fully detailed art-stylized images from sketches. Given a sketch, with no semantic tagging, and a reference image of a specific style, the model can synthesize meaningful details with colors and textures. The model consists of three modules designed explicitly for better artistic style capturing and generation. Based on a GAN framework, a dual-masked mech… ▽ More

    Submitted 2 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: 24 pages

    Journal ref: ACCV 2020

  10. arXiv:1907.06794  [pdf, other

    cs.CV

    2nd Place Solution to the GQA Challenge 2019

    Authors: Shijie Geng, Ji Zhang, Hang Zhang, Ahmed Elgammal, Dimitris N. Metaxas

    Abstract: We present a simple method that achieves unexpectedly superior performance for Complex Reasoning involved Visual Question Answering. Our solution collects statistical features from high-frequency words of all the questions asked about an image and use them as accurate knowledge for answering further questions of the same image. We are fully aware that this setting is not ubiquitously applicable, a… ▽ More

    Submitted 16 August, 2019; v1 submitted 15 July, 2019; originally announced July 2019.

  11. arXiv:1905.10836  [pdf, other

    cs.CV

    OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization

    Authors: Bingchen Liu, Yizhe Zhu, Zuohui Fu, Gerard de Melo, Ahmed Elgammal

    Abstract: Exploring the potential of GANs for unsupervised disentanglement learning, this paper proposes a novel GAN-based disentanglement framework with One-Hot Sampling and Orthogonal Regularization (OOGAN). While previous works mostly attempt to tackle disentanglement learning through VAE and seek to implicitly minimize the Total Correlation (TC) objective with various sorts of approximation methods, we… ▽ More

    Submitted 10 March, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: AAAI 2020

  12. arXiv:1904.10056  [pdf, other

    cs.CV

    Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning

    Authors: Yizhe Zhu, Jianwen Xie, Bingchen Liu, Ahmed Elgammal

    Abstract: We investigate learning feature-to-feature translator networks by alternating back-propagation as a general-purpose solution to zero-shot learning (ZSL) problems. It is a generative model-based ZSL framework. In contrast to models based on generative adversarial networks (GAN) or variational autoencoders (VAE) that require auxiliary networks to assist the training, our model consists of a single c… ▽ More

    Submitted 10 November, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: accepted to ICCV'19

  13. arXiv:1903.02728  [pdf, other

    cs.CV

    Graphical Contrastive Losses for Scene Graph Parsing

    Authors: Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro

    Abstract: Most scene graph parsers use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses mul… ▽ More

    Submitted 16 August, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  14. arXiv:1903.00502  [pdf, other

    cs.CV

    Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

    Authors: Yizhe Zhu, Jianwen Xie, Zhiqiang Tang, Xi Peng, Ahmed Elgammal

    Abstract: Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper map** function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative regi… ▽ More

    Submitted 1 December, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

    Comments: accepted to NeurIPS'19

  15. arXiv:1811.09543  [pdf, other

    cs.CV

    An Interpretable Model for Scene Graph Generation

    Authors: Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

    Abstract: We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature's contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and inv… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.00662

  16. arXiv:1811.00662  [pdf, other

    cs.CV

    Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

    Authors: Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

    Abstract: This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle. Three key factors contribute the most to our success: 1) language bias is a powerful baseline for this task. We build the empirical distribution $P(predicate|subject,object)$ in the training set and directly use that in testing. This baseline achieved the 2nd place… ▽ More

    Submitted 7 November, 2018; v1 submitted 1 November, 2018; originally announced November 2018.

  17. arXiv:1806.00880  [pdf, other

    cs.LG cs.CV stat.ML

    Disconnected Manifold Learning for Generative Adversarial Networks

    Authors: Mahyar Khayatkhoei, Ahmed Elgammal, Maneesh Singh

    Abstract: Natural images may lie on a union of disjoint manifolds rather than one globally connected manifold, and this can cause several difficulties for the training of common Generative Adversarial Networks (GANs). In this work, we first show that single generator GANs are unable to correctly model a distribution supported on a disconnected manifold, and investigate how sample quality, mode drop** and… ▽ More

    Submitted 10 January, 2019; v1 submitted 3 June, 2018; originally announced June 2018.

    Comments: NeurIPS 2018

  18. arXiv:1804.10660  [pdf, other

    cs.CV

    Large-Scale Visual Relationship Understanding

    Authors: Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny

    Abstract: Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of <subject, relation, object> triples. In real-world scenarios with large numbers of objects and relations, some are seen very commonly while others are barely seen. We develop a new relationship detection model that embeds objects and relations into two vector spaces wh… ▽ More

    Submitted 16 August, 2019; v1 submitted 27 April, 2018; originally announced April 2018.

  19. arXiv:1801.07729  [pdf, other

    cs.AI cs.CV

    The Shape of Art History in the Eyes of the Machine

    Authors: Ahmed Elgammal, Marian Mazzone, Bingchen Liu, Diana Kim, Mohamed Elhoseiny

    Abstract: How does the machine classify styles in art? And how does it relate to art historians' methods for analyzing style? Several studies have shown the ability of the machine to learn and predict style categories, such as Renaissance, Baroque, Impressionism, etc., from images of paintings. This implies that the machine can learn an internal representation encoding discriminative features through its vi… ▽ More

    Submitted 12 February, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

  20. arXiv:1712.01381  [pdf, other

    cs.CV

    A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts

    Authors: Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, Ahmed Elgammal

    Abstract: Most existing zero-shot learning methods consider the problem as a visual semantic embedding one. Given the demonstrated capability of Generative Adversarial Networks(GANs) to generate images, we instead leverage GANs to imagine unseen categories from text descriptions and hence recognize novel classes with no examples being seen. Specifically, we propose a simple yet effective generative model th… ▽ More

    Submitted 18 May, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

    Comments: To appear in CVPR18

  21. arXiv:1711.03536  [pdf, other

    eess.IV cs.AI cs.CV

    Picasso, Matisse, or a Fake? Automated Analysis of Drawings at the Stroke Level for Attribution and Authentication

    Authors: Ahmed Elgammal, Yan Kang, Milko Den Leeuw

    Abstract: This paper proposes a computational approach for analysis of strokes in line drawings by artists. We aim at develo** an AI methodology that facilitates attribution of drawings of unknown authors in a way that is not easy to be deceived by forged art. The methodology used is based on quantifying the characteristics of individual strokes in drawings. We propose a novel algorithm for segmenting ind… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

  22. arXiv:1709.01148  [pdf, other

    cs.CV

    Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision

    Authors: Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal

    Abstract: In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations. For instance, this learning process enables terms like "beak" to be sparsely linked to the visu… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

    Comments: Accepted by CVPR'17

  23. arXiv:1709.01140  [pdf, other

    cs.CV

    A Multilayer-Based Framework for Online Background Subtraction with Freely Moving Cameras

    Authors: Yizhe Zhu, Ahmed Elgammal

    Abstract: The exponentially increasing use of moving platforms for video capture introduces the urgent need to develop the general background subtraction algorithms with the capability to deal with the moving background. In this paper, we propose a multilayer-based framework for online background subtraction for videos captured by moving cameras. Unlike the previous treatments of the problem, the proposed m… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

    Comments: Accepted by ICCV'17

  24. arXiv:1706.07068  [pdf, other

    cs.AI

    CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms

    Authors: Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, Marian Mazzone

    Abstract: We propose a new system for generating art. The system generates art by looking at art and learning about style; and becomes creative by increasing the arousal potential of the generated art by deviating from the learned styles. We build over Generative Adversarial Networks (GAN), which have shown the ability to learn to generate novel images simulating a given distribution. We argue that such net… ▽ More

    Submitted 21 June, 2017; originally announced June 2017.

    Comments: This paper is an extended version of a paper published on the eighth International Conference on Computational Creativity (ICCC), held in Atlanta, GA, June 20th-June 22nd, 2017

  25. arXiv:1701.01218  [pdf, other

    cs.LG cs.CV

    Overlap** Cover Local Regression Machines

    Authors: Mohamed Elhoseiny, Ahmed Elgammal

    Abstract: We present the Overlap** Domain Cover (ODC) notion for kernel machines, as a set of overlap** subsets of the data that covers the entire training set and optimized to be spatially cohesive as possible. We show how this notion benefit the speed of local kernel machines for regression in terms of both speed while achieving while minimizing the prediction error. We propose an efficient ODC framew… ▽ More

    Submitted 5 January, 2017; originally announced January 2017.

    Comments: Long Article with more experiments and analysis of conference paper "Overlap** Domain Cover for Scalable and Accurate Regression Kernel Machines", presented orally 2015 at the British Machine Vision Conference 2015 (BMVC)

  26. arXiv:1609.09240  [pdf, other

    cs.CV

    Modelling depth for nonparametric foreground segmentation using RGBD devices

    Authors: Gabriel Moyà-Alcover, Ahmed Elgammal, Antoni Jaume-i-Capó, Javier Varona

    Abstract: The problem of detecting changes in a scene and segmenting the foreground from background is still challenging, despite previous work. Moreover, new RGBD capturing devices include depth cues, which could be incorporated to improve foreground segmentation. In this work, we present a new nonparametric approach where a unified model mixes the device multiple information cues. In order to unify all th… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

    Comments: Accepted in Pattern Recognition Letters. Will update the info

  27. arXiv:1604.00466  [pdf, other

    cs.CL cs.CV

    Automatic Annotation of Structured Facts in Images

    Authors: Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal

    Abstract: Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions. Example structured facts include attributed objects (e.g., <flower, red>), actions (e.g., <baby, smile>), interactions (e.g., <man, walking, dog>), and positional information (e.g., <vase, on, table>). The collected annotations are… ▽ More

    Submitted 7 April, 2016; v1 submitted 2 April, 2016; originally announced April 2016.

  28. arXiv:1602.02865  [pdf, other

    cs.CV cs.LG cs.NE

    The Role of Typicality in Object Classification: Improving The Generalization Capacity of Convolutional Neural Networks

    Authors: Babak Saleh, Ahmed Elgammal, Jacob Feldman

    Abstract: Deep artificial neural networks have made remarkable progress in different tasks in the field of computer vision. However, the empirical analysis of these models and investigation of their failure cases has received attention recently. In this work, we show that deep learning models cannot generalize to atypical images that are substantially different from training images. This is in contrast to t… ▽ More

    Submitted 9 February, 2016; originally announced February 2016.

    Comments: In Submission

  29. arXiv:1601.05861  [pdf, other

    cs.CV

    Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition

    Authors: Amr Bakry, Ahmed Elgammal

    Abstract: Speech recognition is a challenging problem. Due to the acoustic limitations, using visual information is essential for improving the recognition accuracy in real-life unconstraint situations. One common approach is to model the visual recognition as nonlinear optimization problem. Measuring the distances between visual units is essential for solving this problem. Embedding the visual units on a m… ▽ More

    Submitted 21 January, 2016; originally announced January 2016.

  30. arXiv:1601.01411  [pdf, other

    cs.LG stat.ML

    Learning Kernels for Structured Prediction using Polynomial Kernel Transformations

    Authors: Chetan Tonde, Ahmed Elgammal

    Abstract: Learning the kernel functions used in kernel methods has been a vastly explored area in machine learning. It is now widely accepted that to obtain 'good' performance, learning a kernel function is the key challenge. In this work we focus on learning kernel representations for structured regression. We propose use of polynomials expansion of kernels, referred to as Schoenberg transforms and Gegenba… ▽ More

    Submitted 7 January, 2016; originally announced January 2016.

    Report number: 21 pages, 10 figures

  31. arXiv:1601.00236  [pdf, other

    cs.LG stat.ML

    Supervised Dimensionality Reduction via Distance Correlation Maximization

    Authors: Praneeth Vepakomma, Chetan Tonde, Ahmed Elgammal

    Abstract: In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, Szekely et. al. (2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representat… ▽ More

    Submitted 2 January, 2016; originally announced January 2016.

    Comments: 23 pages, 6 figures

  32. arXiv:1601.00025  [pdf, other

    cs.CV cs.CL cs.LG

    Write a Classifier: Predicting Visual Classifiers from Unstructured Text

    Authors: Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh

    Abstract: People typically learn through exposure to visual concepts associated with linguistic descriptions. For instance, teaching visual object categories to children is often accompanied by descriptions in text or speech. In a machine learning context, these observations motivates us to ask whether this learning process could be computationally modeled to learn visual classifiers. More specifically, the… ▽ More

    Submitted 27 December, 2016; v1 submitted 31 December, 2015; originally announced January 2016.

    Comments: (TPAMI) Transactions on Pattern Analysis and Machine Intelligence 2017

  33. arXiv:1512.01325  [pdf, other

    cs.CV cs.AI cs.HC cs.IT cs.LG

    Toward a Taxonomy and Computational Models of Abnormalities in Images

    Authors: Babak Saleh, Ahmed Elgammal, Jacob Feldman, Ali Farhadi

    Abstract: The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to dis… ▽ More

    Submitted 4 December, 2015; originally announced December 2015.

    Comments: To appear in the Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016)

  34. arXiv:1512.00818  [pdf, other

    cs.CV cs.CL cs.LG

    Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

    Authors: Mohamed Elhoseiny, **gen Liu, Hui Cheng, Harpreet Sawhney, Ahmed Elgammal

    Abstract: We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following direction… ▽ More

    Submitted 15 December, 2015; v1 submitted 2 December, 2015; originally announced December 2015.

    Comments: To appear in AAAI 2016

  35. arXiv:1511.05175  [pdf, other

    cs.CV cs.AI cs.LG

    Convolutional Models for Joint Object Categorization and Pose Estimation

    Authors: Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal

    Abstract: In the task of Object Recognition, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition… ▽ More

    Submitted 19 April, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: only for workshop presentation at ICLR

  36. arXiv:1511.04891  [pdf, other

    cs.CV cs.CL cs.LG

    Sherlock: Scalable Fact Learning in Images

    Authors: Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal

    Abstract: We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand unbounded number of facts in a structured way. The training data comes as structured facts in images… ▽ More

    Submitted 2 April, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: Jan 7 Update

  37. arXiv:1508.01983  [pdf, other

    cs.CV

    Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance

    Authors: Amr Bakry, Mohamed Elhoseiny, Tarek El-Gaaly, Ahmed Elgammal

    Abstract: This paper is focused on studying the view-manifold structure in the feature spaces implied by the different layers of Convolutional Neural Networks (CNN). There are several questions that this paper aims to answer: Does the learned CNN representation achieve viewpoint invariance? How does it achieve viewpoint invariance? Is it achieved by collapsing the view manifolds, or separating them while pr… ▽ More

    Submitted 20 June, 2016; v1 submitted 9 August, 2015; originally announced August 2015.

    Comments: This paper accepted in ICLR 2016 main conference

  38. arXiv:1506.08529  [pdf, other

    cs.CV

    Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions

    Authors: Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh

    Abstract: In this paper we propose a framework for predicting kernelized classifiers in the visual domain for categories with no training images where the knowledge comes from textual description about these categories. Through our optimization framework, the proposed approach is capable of embedding the class-level knowledge from the text domain as kernel classifiers in the visual domain. We also proposed… ▽ More

    Submitted 29 June, 2015; originally announced June 2015.

  39. arXiv:1506.00711  [pdf, other

    cs.AI cs.CV cs.CY cs.MM cs.SI

    Quantifying Creativity in Art Networks

    Authors: Ahmed Elgammal, Babak Saleh

    Abstract: Can we develop a computer algorithm that assesses the creativity of a painting given its context within art history? This paper proposes a novel computational framework for assessing the creativity of creative products, such as paintings, sculptures, poetry, etc. We use the most common definition of creativity, which emphasizes the originality of the product and its influential value. The proposed… ▽ More

    Submitted 1 June, 2015; originally announced June 2015.

    Comments: This paper will be published in the sixth International Conference on Computational Creativity (ICCC) June 29-July 2nd 2015, Park City, Utah, USA. This arXiv version is an extended version of the conference paper

  40. arXiv:1505.00855  [pdf, other

    cs.CV cs.IR cs.LG cs.MM

    Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature

    Authors: Babak Saleh, Ahmed Elgammal

    Abstract: In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can b… ▽ More

    Submitted 4 May, 2015; originally announced May 2015.

    Comments: 21 pages

  41. arXiv:1503.06813  [pdf, other

    cs.CV

    Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation

    Authors: Haopeng Zhang, Tarek El-Gaaly, Ahmed Elgammal, Zhiguo Jiang

    Abstract: Due to large variations in shape, appearance, and viewing conditions, object recognition is a key precursory challenge in the fields of object manipulation and robotic/AI visual reasoning in general. Recognizing object categories, particular instances of objects and viewpoints/poses of objects are three critical subproblems robots must solve in order to accurately grasp/manipulate objects and reas… ▽ More

    Submitted 12 April, 2015; v1 submitted 23 March, 2015; originally announced March 2015.

  42. arXiv:1503.05782  [pdf, other

    cs.CV cs.LG

    Learning Hypergraph-regularized Attribute Predictors

    Authors: Sheng Huang, Mohamed Elhoseiny, Ahmed Elgammal, Dan Yang

    Abstract: We present a novel attribute learning framework named Hypergraph-based Attribute Predictor (HAP). In HAP, a hypergraph is leveraged to depict the attribute relations in the data. Then the attribute prediction problem is casted as a regularized hypergraph cut problem in which HAP jointly learns a collection of attribute projections from the feature space to a hypergraph embedding space aligned with… ▽ More

    Submitted 19 March, 2015; originally announced March 2015.

    Comments: This is an attribute learning paper accepted by CVPR 2015

  43. arXiv:1411.6714  [pdf

    cs.CY

    The Digital Humanities Unveiled: Perceptions Held by Art Historians and Computer Scientists about Computer Vision Technology

    Authors: Emily L. Spratt, Ahmed Elgammal

    Abstract: Although computer scientists are generally familiar with the achievements of computer vision technology in art history, these accomplishments are little known and often misunderstood by scholars in the humanities. To clarify the parameters of this seeming disjuncture, we have addressed the concerns that one example of the digitization of the humanities poses on social, philosophical, and practical… ▽ More

    Submitted 24 November, 2014; originally announced November 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1410.2488

  44. arXiv:1411.2214  [pdf, other

    cs.CV

    Abnormal Object Recognition: A Comprehensive Study

    Authors: Babak Saleh, Ali Farhadi, Ahmed Elgammal

    Abstract: When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviat… ▽ More

    Submitted 9 November, 2014; originally announced November 2014.

  45. arXiv:1410.6736  [pdf, other

    cs.CV

    On The Effect of Hyperedge Weights On Hypergraph Learning

    Authors: Sheng Huang, Ahmed Elgammal, Dan Yang

    Abstract: Hypergraph is a powerful representation in several computer vision, machine learning and pattern recognition problems. In the last decade, many researchers have been keen to develop different hypergraph models. In contrast, no much attention has been paid to the design of hyperedge weights. However, many studies on pairwise graphs show that the choice of edge weight can significantly influence the… ▽ More

    Submitted 24 October, 2014; originally announced October 2014.

  46. arXiv:1410.2488  [pdf, other

    cs.CV physics.hist-ph

    Computational Beauty: Aesthetic Judgment at the Intersection of Art and Science

    Authors: Emily L. Spratt, Ahmed Elgammal

    Abstract: In part one of the Critique of Judgment, Immanuel Kant wrote that "the judgment of taste...is not a cognitive judgment, and so not logical, but is aesthetic."\cite{Kant} While the condition of aesthetic discernment has long been the subject of philosophical discourse, the role of the arbiters of that judgment has more often been assumed than questioned. The art historian, critic, connoisseur, and… ▽ More

    Submitted 29 September, 2014; originally announced October 2014.

  47. arXiv:1409.7480  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Generalized Twin Gaussian Processes using Sharma-Mittal Divergence

    Authors: Mohamed Elhoseiny, Ahmed Elgammal

    Abstract: There has been a growing interest in mutual information measures due to their wide range of applications in Machine Learning and Computer Vision. In this paper, we present a generalized structured regression framework based on Shama-Mittal divergence, a relative entropy measure, which is introduced to the Machine Learning community in this work. Sharma-Mittal (SM) divergence is a generalized mutua… ▽ More

    Submitted 1 June, 2015; v1 submitted 26 September, 2014; originally announced September 2014.

    Comments: This work got accepted for Publication in the Machine Learning Journal 2015. The work is scheduled for presentation at ECML-PKDD 2015 journal track papers

  48. arXiv:1408.3218  [pdf, other

    cs.CV cs.LG

    Toward Automated Discovery of Artistic Influence

    Authors: Babak Saleh, Kanako Abe, Ravneet Singh Arora, Ahmed Elgammal

    Abstract: Considering the huge amount of art pieces that exist, there is valuable information to be discovered. Examining a painting, an expert can determine its style, genre, and the time period that the painting belongs. One important task for art historians is to find influences and connections between artists. Is influence a task that a computer can measure? The contribution of this paper is in explorin… ▽ More

    Submitted 14 August, 2014; originally announced August 2014.

    Comments: 29 pages, 14 figures and 12 tables

  49. arXiv:1408.1031  [pdf, other

    cs.CL cs.HC

    Text to Multi-level MindMaps: A Novel Method for Hierarchical Visual Abstraction of Natural Language Text

    Authors: Mohamed Elhoseiny, Ahmed Elgammal

    Abstract: MindMap** is a well-known technique used in note taking, which encourages learning and studying. MindMap** has been manually adopted to help present knowledge and concepts in a visual form. Unfortunately, there is no reliable automated approach to generate MindMaps from Natural Language text. This work firstly introduces MindMap Multilevel Visualization concept which is to jointly visualize an… ▽ More

    Submitted 23 December, 2014; v1 submitted 31 July, 2014; originally announced August 2014.

    Comments: 31 pages

  50. arXiv:1312.7469  [pdf, other

    cs.CV

    Collaborative Discriminant Locality Preserving Projections With its Application to Face Recognition

    Authors: Sheng Huang, Dan Yang, Dong Yang, Ahmed Elgammal

    Abstract: We present a novel Discriminant Locality Preserving Projections (DLPP) algorithm named Collaborative Discriminant Locality Preserving Projection (CDLPP). In our algorithm, the discriminating power of DLPP are further exploited from two aspects. On the one hand, the global optimum of class scattering is guaranteed via using the between-class scatter matrix to replace the original denominator of DLP… ▽ More

    Submitted 8 February, 2014; v1 submitted 28 December, 2013; originally announced December 2013.

    Comments: second version