-
Gesture-to-Gesture Translation in the Wild via Category-Independent Conditional Maps
Authors:
Yahui Liu,
Marco De Nadai,
Gloria Zen,
Nicu Sebe,
Bruno Lepri
Abstract:
Recent works have shown Generative Adversarial Networks (GANs) to be particularly effective in image-to-image translations. However, in tasks such as body pose and hand gesture translation, existing methods usually require precise annotations, e.g. key-points or skeletons, which are time-consuming to draw. In this work, we propose a novel GAN architecture that decouples the required annotations in…
▽ More
Recent works have shown Generative Adversarial Networks (GANs) to be particularly effective in image-to-image translations. However, in tasks such as body pose and hand gesture translation, existing methods usually require precise annotations, e.g. key-points or skeletons, which are time-consuming to draw. In this work, we propose a novel GAN architecture that decouples the required annotations into a category label - that specifies the gesture type - and a simple-to-draw category-independent conditional map - that expresses the location, rotation and size of the hand gesture. Our architecture synthesizes the target gesture while preserving the background context, thus effectively dealing with gesture translation in the wild. To this aim, we use an attention module and a rolling guidance approach, which loops the generated images back into the network and produces higher quality images compared to competing works. Thus, our GAN learns to generate new images from simple annotations without requiring key-points or skeleton labels. Results on two public datasets show that our method outperforms state of the art approaches both quantitatively and qualitatively. To the best of our knowledge, no work so far has addressed the gesture-to-gesture translation in the wild by requiring user-friendly annotations.
△ Less
Submitted 31 July, 2019; v1 submitted 12 July, 2019;
originally announced July 2019.
-
Enhancing Perceptual Attributes with Bayesian Style Generation
Authors:
Aliaksandr Siarohin,
Gloria Zen,
Nicu Sebe,
Elisa Ricci
Abstract:
Deep learning has brought an unprecedented progress in computer vision and significant advances have been made in predicting subjective properties inherent to visual data (e.g., memorability, aesthetic quality, evoked emotions, etc.). Recently, some research works have even proposed deep learning approaches to modify images such as to appropriately alter these properties. Following this research l…
▽ More
Deep learning has brought an unprecedented progress in computer vision and significant advances have been made in predicting subjective properties inherent to visual data (e.g., memorability, aesthetic quality, evoked emotions, etc.). Recently, some research works have even proposed deep learning approaches to modify images such as to appropriately alter these properties. Following this research line, this paper introduces a novel deep learning framework for synthesizing images in order to enhance a predefined perceptual attribute. Our approach takes as input a natural image and exploits recent models for deep style transfer and generative adversarial networks to change its style in order to modify a specific high-level attribute. Differently from previous works focusing on enhancing a specific property of a visual content, we propose a general framework and demonstrate its effectiveness in two use cases, i.e. increasing image memorability and generating scary pictures. We evaluate the proposed approach on publicly available benchmarks, demonstrating its advantages over state of the art methods.
△ Less
Submitted 3 December, 2018;
originally announced December 2018.
-
How to Make an Image More Memorable? A Deep Style Transfer Approach
Authors:
Aliaksandr Siarohin,
Gloria Zen,
Cveta Majtanovic,
Xavier Alameda-Pineda,
Elisa Ricci,
Nicu Sebe
Abstract:
Recent works have shown that it is possible to automatically predict intrinsic image properties like memorability. In this paper, we take a step forward addressing the question: "Can we make an image more memorable?". Methods for automatically increasing image memorability would have an impact in many application fields like education, gaming or advertising. Our work is inspired by the popular edi…
▽ More
Recent works have shown that it is possible to automatically predict intrinsic image properties like memorability. In this paper, we take a step forward addressing the question: "Can we make an image more memorable?". Methods for automatically increasing image memorability would have an impact in many application fields like education, gaming or advertising. Our work is inspired by the popular editing-by-applying-filters paradigm adopted in photo editing applications, like Instagram and Prisma. In this context, the problem of increasing image memorability maps to that of retrieving "memorabilizing" filters or style "seeds". Still, users generally have to go through most of the available filters before finding the desired solution, thus turning the editing process into a resource and time consuming task. In this work, we show that it is possible to automatically retrieve the best style seeds for a given image, thus remarkably reducing the number of human attempts needed to find a good match. Our approach leverages from recent advances in the field of image synthesis and adopts a deep architecture for generating a memorable picture from a given input image and a style seed. Importantly, to automatically select the best style a novel learning-based solution, also relying on deep models, is proposed. Our experimental evaluation, conducted on publicly available benchmarks, demonstrates the effectiveness of the proposed approach for generating memorable images through automatic style seed selection
△ Less
Submitted 6 April, 2017;
originally announced April 2017.
-
Are Safer Looking Neighborhoods More Lively? A Multimodal Investigation into Urban Life
Authors:
Marco De Nadai,
Radu L. Vieriu,
Gloria Zen,
Stefan Dragicevic,
Nikhil Naik,
Michele Caraviello,
Cesar A. Hidalgo,
Nicu Sebe,
Bruno Lepri
Abstract:
Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the…
▽ More
Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the natural surveillance of human activity. But in a world where the preference for safe looking neighborhoods is small, the connection between the perception of safety and liveliness will be either weak or nonexistent. In this paper we explore the connection between the levels of activity and the perception of safety of neighborhoods in two major Italian cities by combining mobile phone data (as a proxy for activity or liveliness) with scores of perceived safety estimated using a Convolutional Neural Network trained on a dataset of Google Street View images scored using a crowdsourced visual perception survey. We find that: (i) safer looking neighborhoods are more active than what is expected from their population density, employee density, and distance to the city centre; and (ii) that the correlation between appearance of safety and activity is positive, strong, and significant, for females and people over 50, but negative for people under 30, suggesting that the behavioral impact of perception depends on the demographic of the population. Finally, we use occlusion techniques to identify the urban features that contribute to the appearance of safety, finding that greenery and street facing windows contribute to a positive appearance of safety (in agreement with Oscar Newman's defensible space theory). These results suggest that urban appearance modulates levels of human activity and, consequently, a neighborhood's rate of natural surveillance.
△ Less
Submitted 1 August, 2016;
originally announced August 2016.
-
Self-Learning Camera: Autonomous Adaptation of Object Detectors to Unlabeled Video Streams
Authors:
Adrien Gaidon,
Gloria Zen,
Jose A. Rodriguez-Serrano
Abstract:
Learning object detectors requires massive amounts of labeled training samples from the specific data source of interest. This is impractical when dealing with many different sources (e.g., in camera networks), or constantly changing ones such as mobile cameras (e.g., in robotics or driving assistant systems). In this paper, we address the problem of self-learning detectors in an autonomous manner…
▽ More
Learning object detectors requires massive amounts of labeled training samples from the specific data source of interest. This is impractical when dealing with many different sources (e.g., in camera networks), or constantly changing ones such as mobile cameras (e.g., in robotics or driving assistant systems). In this paper, we address the problem of self-learning detectors in an autonomous manner, i.e. (i) detectors continuously updating themselves to efficiently adapt to streaming data sources (contrary to transductive algorithms), (ii) without any labeled data strongly related to the target data stream (contrary to self-paced learning), and (iii) without manual intervention to set and update hyper-parameters. To that end, we propose an unsupervised, on-line, and self-tuning learning algorithm to optimize a multi-task learning convex objective. Our method uses confident but laconic oracles (high-precision but low-recall off-the-shelf generic detectors), and exploits the structure of the problem to jointly learn on-line an ensemble of instance-level trackers, from which we derive an adapted category-level object detector. Our approach is validated on real-world publicly available video object datasets.
△ Less
Submitted 18 June, 2014; v1 submitted 17 June, 2014;
originally announced June 2014.