Skip to main content

Showing 1–50 of 53 results for author: Pavlovic, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.07914  [pdf, other

    cs.CV

    ALWOD: Active Learning for Weakly-Supervised Object Detection

    Authors: Yuting Wang, Velibor Ilic, Jiatong Li, Branislav Kisacanin, Vladimir Pavlovic

    Abstract: Object detection (OD), a crucial vision task, remains challenged by the lack of large training datasets with precise object localization labels. In this work, we propose ALWOD, a new framework that addresses this problem by fusing active learning (AL) with weakly and semi-supervised object detection paradigms. Because the performance of AL critically depends on the model initialization, we propose… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: published in ICCV 2023

  2. arXiv:2308.02866  [pdf, other

    cs.CV cs.LG

    NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation

    Authors: Jianfeng Wang, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Thomas Lukasiewicz

    Abstract: Semi-supervised semantic segmentation involves assigning pixel-wise labels to unlabeled images at training time. This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost. Current approaches to semi-supervised semantic segmentation work by predicting pseudo-labels for each pixel from a class-wise probability distribution output by… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Appear at ICML2023. Source codes are available at: https://github.com/Jianf-Wang/NP-SemiSeg

  3. arXiv:2306.16772  [pdf, other

    cs.CV cs.AI cs.LG

    M3Act: Learning from Synthetic Human Group Activities

    Authors: Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia

    Abstract: The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and g… ▽ More

    Submitted 2 May, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

  4. arXiv:2212.04673  [pdf, other

    cs.CV

    MSI: Maximize Support-Set Information for Few-Shot Segmentation

    Authors: Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia

    Abstract: FSS(Few-shot segmentation) aims to segment a target class using a small number of labeled images(support set). To extract information relevant to the target class, a dominant approach in best-performing FSS methods removes background features using a support mask. We observe that this feature excision through a limiting support mask introduces an information bottleneck in several challenging FSS c… ▽ More

    Submitted 10 November, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: ICCV 2023

  5. arXiv:2212.01376  [pdf, other

    cs.CV

    D2DF2WOD: Learning Object Proposals for Weakly-Supervised Object Detection via Progressive Domain Adaptation

    Authors: Yuting Wang, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and localization at inference time. To tackle this issue, we propose D2DF2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annota… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: published in WACV 2023

  6. arXiv:2211.00817  [pdf, other

    cs.LG cs.MA

    An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

    Authors: Gang Qiao, Kaidong Hu, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic

    Abstract: Learning-based approaches to modeling crowd motion have become increasingly successful but require training and evaluation on large datasets, coupled with complex model selection and parameter tuning. To circumvent this tremendously time-consuming process, we propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd s… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  7. arXiv:2210.14862  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Semantic Parsing: From Images to Abstract Meaning Representation

    Authors: Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

    Abstract: The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs o… ▽ More

    Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: published in CoNLL 2022

  8. arXiv:2207.01066  [pdf, other

    cs.LG cs.CV

    NP-Match: When Neural Processes meet Semi-Supervised Learning

    Authors: Jianfeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou

    Abstract: Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data. In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match. NP-Match is suited to this task for two reasons. Firstly, NP-Match implicitly compares data… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: To appear at ICML 2022. The source codes are at https://github.com/Jianf-Wang/NP-Match

  9. arXiv:2206.09265  [pdf, ps, other

    cs.CV

    SAViR-T: Spatially Attentive Visual Reasoning with Transformers

    Authors: Pritish Sahu, Kalliopi Basioti, Vladimir Pavlovic

    Abstract: We present a novel computational model, "SAViR-T", for the family of visual reasoning problems embodied in the Raven's Progressive Matrices (RPM). Our model considers explicit spatial semantics of visual elements within each image in the puzzle, encoded as spatio-visual tokens, and learns the intra-image as well as the inter-image token dependencies, highly relevant for the visual reasoning task.… ▽ More

    Submitted 21 June, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

  10. arXiv:2203.12826  [pdf, other

    cs.CV

    HM: Hybrid Masking for Few-Shot Segmentation

    Authors: Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia

    Abstract: We study few-shot semantic segmentation that aims to segment a target object from a query image when provided with a few annotated support images of the target class. Several recent methods resort to a feature masking (FM) technique to discard irrelevant feature activations which eventually facilitates the reliable prediction of segmentation mask. A fundamental limitation of FM is the inability to… ▽ More

    Submitted 24 July, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: 14 pages

    MSC Class: 68T45

  11. arXiv:2201.07189  [pdf, other

    cs.CV

    MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction

    Authors: Mihee Lee, Samuel S. Sohn, Seonghyeon Moon, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic

    Abstract: Accurate long-term trajectory prediction in complex scenes, where multiple agents (e.g., pedestrians or vehicles) interact with each other and the environment while attempting to accomplish diverse and often unknown goals, is a challenging stochastic forecasting problem. In this work, we propose MUSE, a new probabilistic modeling framework based on a cascade of Conditional VAEs, which tackles the… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

  12. arXiv:2110.11830  [pdf, other

    cs.CV

    Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

    Authors: Fangda Han, Guoyao Hao, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: Multi-attribute conditional image generation is a challenging problem in computervision. We propose Multi-attribute Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing images from a trichotomy of attributes: content, view-geometry, and implicit visual style. We design MPG by extending the state-of-the-art StyleGAN2, using a new conditioning technique tha… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: To appear in British Machine Vision Conference (BMVC) 2021. arXiv admin note: text overlap with arXiv:2012.02821

  13. arXiv:2109.13156  [pdf, ps, other

    cs.LG cs.CV

    DAReN: A Collaborative Approach Towards Reasoning And Disentangling

    Authors: Pritish Sahu, Kalliopi Basioti, Vladimir Pavlovic

    Abstract: Computational learning approaches to solving visual reasoning tests, such as Raven's Progressive Matrices (RPM), critically depend on the ability to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules based on those concepts (i.e., the reasoning). However, learning of representation and reasoning is a challenging and ill-posed task, often approached… ▽ More

    Submitted 29 June, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

  14. arXiv:2109.11047  [pdf, other

    cs.CV

    Cross-Modal Coherence for Text-to-Image Retrieval

    Authors: Malihe Alikhani, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic, Matthew Stone

    Abstract: Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Modelfor text-to… ▽ More

    Submitted 15 April, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: This paper is published in AAAI-2022

  15. arXiv:2102.03151  [pdf, other

    cs.LG

    Reducing the Amortization Gap in Variational Autoencoders: A Bayesian Random Function Approach

    Authors: Minyoung Kim, Vladimir Pavlovic

    Abstract: Variational autoencoder (VAE) is a very successful generative model whose key element is the so called amortized inference network, which can perform test time inference using a single feed forward pass. Unfortunately, this comes at the cost of degraded accuracy in posterior approximation, often underperforming the instance-wise variational optimization. Although the latest semi-amortized approach… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  16. arXiv:2102.02547  [pdf, other

    cs.CV

    CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

    Authors: Hai X. Pham, Ricardo Guerrero, Jiatong Li, Vladimir Pavlovic

    Abstract: Despite the abundance of multi-modal data, such as image-text pairs, there has been little effort in understanding the individual entities and their different roles in the construction of these data instances. In this work, we endeavour to discover the entities and their corresponding importance in cooking recipes automaticall} as a visual-linguistic association problem. More specifically, we intr… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

    Comments: 22 pages, accepted in AAAI 2021

  17. arXiv:2012.13024  [pdf, other

    cs.CV cs.LG

    Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent Representations

    Authors: Mihee Lee, Vladimir Pavlovic

    Abstract: Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-moda… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

  18. arXiv:2012.02821  [pdf, other

    cs.CV cs.CL

    MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs

    Authors: Fangda Han, Guoyao Hao, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: Multilabel conditional image generation is a challenging problem in computer vision. In this work we propose Multi-ingredient Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing multilabel images. We design MPG based on a state-of-the-art GAN structure called StyleGAN2, in which we develop a new conditioning technique by enforcing intermediate feature ma… ▽ More

    Submitted 5 October, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

  19. arXiv:2012.01345  [pdf, other

    cs.CV cs.IR cs.LG

    Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning

    Authors: Ricardo Guerrero, Hai Xuan Pham, Vladimir Pavlovic

    Abstract: Computational food analysis (CFA) naturally requires multi-modal evidence of a particular food, e.g., images, recipe text, etc. A key to making CFA possible is multi-modal shared representation learning, which aims to create a joint representation of the multiple views (text and image) of the data. In this work we propose a method for food domain cross-modal shared representation learning that pre… ▽ More

    Submitted 30 September, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

  20. arXiv:2012.00682  [pdf, other

    cs.LG cs.CV

    Learning Disentangled Latent Factors from Paired Data in Cross-Modal Retrieval: An Implicit Identifiable VAE Approach

    Authors: Minyoung Kim, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: We deal with the problem of learning the underlying disentangled latent factors that are shared between the paired bi-modal data in cross-modal retrieval. Our assumption is that the data in both modalities are complex, structured, and high dimensional (e.g., image and text), for which the conventional deep auto-encoding latent variable models such as the Variational Autoencoder (VAE) often suffer… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  21. arXiv:2011.08544  [pdf, other

    cs.LG stat.ML

    Recursive Inference for Variational Autoencoders

    Authors: Minyoung Kim, Vladimir Pavlovic

    Abstract: Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized, resulting in relatively inaccurate posterior approximation compared to instance-wise variational optimization. Recent semi-amortized approaches were proposed to address this drawback; however, their iterative gradient update procedures can be computationally demanding. To address these issues, in this paper… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

  22. arXiv:2010.08727  [pdf, other

    cs.CV

    Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

    Authors: Jiatong Li, Fangda Han, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: Increased awareness of the impact of food consumption on health and lifestyle today has given rise to novel data-driven food analysis systems. Although these systems may recognize the ingredients, a detailed analysis of their amounts in the meal, which is paramount for estimating the correct nutrition, is usually ignored. In this paper, we study the novel and challenging problem of predicting the… ▽ More

    Submitted 17 October, 2020; originally announced October 2020.

  23. arXiv:2009.03034  [pdf, other

    cs.LG cs.AI stat.ML

    Ordinal-Content VAE: Isolating Ordinal-Valued Content Factors in Deep Latent Variable Models

    Authors: Minyoung Kim, Vladimir Pavlovic

    Abstract: In deep representational learning, it is often desired to isolate a particular factor (termed {\em content}) from other factors (referred to as {\em style}). What constitutes the content is typically specified by users through explicit labels in the data, while all unlabeled/unknown factors are regarded as style. Recently, it has been shown that such content-labeled data can be effectively exploit… ▽ More

    Submitted 7 September, 2020; originally announced September 2020.

  24. arXiv:2002.11493  [pdf, other

    cs.CV

    CookGAN: Meal Image Synthesis from Ingredients

    Authors: Fangda Han, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: In this work we propose a new computational framework, based on generative deep models, for synthesis of photo-realistic food meal images from textual list of its ingredients. Previous works on synthesis of images from text typically rely on pre-trained text models to extract text features, followed by generative neural networks (GAN) aimed to generate realistic images conditioned on the text feat… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: 10 pages, 5 figures, accepted by WACV 2020. arXiv admin note: substantial text overlap with arXiv:1905.13149

  25. arXiv:1910.05810  [pdf, other

    cs.AI cs.CV

    Deep Crowd-Flow Prediction in Built Environments

    Authors: Samuel S. Sohn, Seonghyeon Moon, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia

    Abstract: Predicting the behavior of crowds in complex environments is a key requirement in a multitude of application areas, including crowd and disaster management, architectural design, and urban planning. Given a crowd's immediate state, current approaches simulate crowd movement to arrive at a future state. However, most applications require the ability to predict hundreds of possible simulation outcom… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

  26. arXiv:1910.00738  [pdf, other

    cs.MA

    Scenario Generalization of Data-driven Imitation Models in Crowd Simulation

    Authors: Gang Qiao, Honglu Zhou, Mubbasir Kapadia, Sejong Yoon, Vladimir Pavlovic

    Abstract: Crowd simulation, the study of the movement of multiple agents in complex environments, presents a unique application domain for machine learning. One challenge in crowd simulation is to imitate the movement of expert agents in highly dense crowds. An imitation model could substitute an expert agent if the model behaves as good as the expert. This will bring many exciting applications. However, we… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: 12 pages, MIG 2019 - ACM SIGGRAPH Conference on Motion, Interaction and Games

  27. Deep Cooking: Predicting Relative Food Ingredient Amounts from Images

    Authors: Jiatong Li, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: In this paper, we study the novel problem of not only predicting ingredients from a food image, but also predicting the relative amounts of the detected ingredients. We propose two prediction-based models using deep learning that output sparse and dense predictions, coupled with important semi-automatic multi-database integrative data pre-processing, to solve the problem. Experiments on a dataset… ▽ More

    Submitted 26 September, 2019; originally announced October 2019.

  28. arXiv:1909.12366  [pdf, other

    cs.CV

    Task-Discriminative Domain Alignment for Unsupervised Domain Adaptation

    Authors: Behnam Gholami, Pritish Sahu, Minyoung Kim, Vladimir Pavlovic

    Abstract: Domain Adaptation (DA), the process of effectively adapting task models learned on one domain, the source, to other related but distinct domains, the targets, with no or minimal retraining, is typically accomplished using the process of source-to-target manifold alignment. However, this process often leads to unsatisfactory adaptation performance, in part because it ignores the task-specific struc… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: This paper is accepted for ORAL presentation at the ICCV 2019 MDALC Workshop

  29. arXiv:1909.12158  [pdf, other

    cs.CV

    Fast and Effective Adaptation of Facial Action Unit Detection Deep Model

    Authors: Mihee Lee, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

    Abstract: Detecting facial action units (AU) is one of the fundamental steps in automatic recognition of facial expression of emotions and cognitive states. Though there have been a variety of approaches proposed for this task, most of these models are trained only for the specific target AUs, and as such they fail to easily adapt to the task of recognition of new AUs (i.e., those not initially used to trai… ▽ More

    Submitted 27 November, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Presented at 2019 IJCAI Affective Computing Workshop

  30. arXiv:1909.02820  [pdf, other

    cs.LG cs.CV stat.ML

    Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement

    Authors: Minyoung Kim, Yuting Wang, Pritish Sahu, Vladimir Pavlovic

    Abstract: We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifying disentangled factors of variability in data. While many recent attempts at factor disentanglement have focused on sophisticated learning objectives within the VAE framework, their choice of a standard normal as the latent factor prior is both suboptimal and detrimental to performance. Our key observa… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: International Conference on Computer Vision (ICCV) 2019

  31. arXiv:1905.13149  [pdf, other

    cs.CV cs.GR cs.LG stat.ML

    The Art of Food: Meal Image Synthesis from Ingredients

    Authors: Fangda Han, Ricardo Guerrero, Vladimir Pavlovic

    Abstract: In this work we propose a new computational framework, based on generative deep models, for synthesis of photo-realistic food meal images from textual descriptions of its ingredients. Previous works on synthesis of images from text typically rely on pre-trained text models to extract text features, followed by a generative neural networks (GANs) aimed to generate realistic images conditioned on th… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: 12 pages, 6 figures, 2 tables, under review as a conference paper at BMVC 2019

  32. arXiv:1905.06982  [pdf, other

    cs.LG stat.ML

    Efficient Deep Gaussian Process Models for Variable-Sized Input

    Authors: Issam H. Laradji, Mark Schmidt, Vladimir Pavlovic, Minyoung Kim

    Abstract: Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: Accepted in IJCNN 2019

  33. arXiv:1905.02114  [pdf, other

    cs.CV

    Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

    Authors: Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

    Abstract: In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations. Specifically, we introduce a statistical 3D morphable model that flexibly describes the distribution of points on the surface of the face model, with an efficient swit… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  34. arXiv:1902.08727  [pdf, other

    cs.LG stat.ML

    Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach

    Authors: Minyoung Kim, Pritish Sahu, Behnam Gholami, Vladimir Pavlovic

    Abstract: In unsupervised domain adaptation, it is widely known that the target domain error can be provably reduced by having a shared input representation that makes the source and target domains indistinguishable from each other. Very recently it has been studied that not just matching the marginal input distributions, but the alignment of output (class) distributions is also critical. The latter can be… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

  35. arXiv:1902.01568  [pdf, other

    cs.LG stat.ML

    Relevance Factor VAE: Learning and Identifying Disentangled Factors

    Authors: Minyoung Kim, Yuting Wang, Pritish Sahu, Vladimir Pavlovic

    Abstract: We propose a novel VAE-based deep auto-encoder model that can learn disentangled latent representations in a fully unsupervised manner, endowed with the ability to identify all meaningful sources of variation and their cardinality. Our model, dubbed Relevance-Factor-VAE, leverages the total correlation (TC) in the latent space to achieve the disentanglement goal, but also addresses the key issue o… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

  36. arXiv:1810.11547  [pdf, other

    cs.CV

    Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach

    Authors: Behnam Gholami, Pritish Sahu, Ognjen Rudovic, Konstantinos Bousmalis, Vladimir Pavlovic

    Abstract: Unsupervised domain adaptation (uDA) models focus on pairwise adaptation settings where there is a single, labeled, source and a single target domain. However, in many real-world settings one seeks to adapt to multiple, but somewhat similar, target domains. Applying pairwise adaptation approaches to this setting may be suboptimal, as they fail to leverage shared information among multiple domains.… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.

    Comments: 19 pages, 5 Figures, 5 Tables

  37. arXiv:1803.07716  [pdf, other

    cs.CV

    Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

    Authors: Hai X. Pham, Yuting Wang, Vladimir Pavlovic

    Abstract: This paper presents Generative Adversarial Talking Head (GATH), a novel deep generative neural network that enables fully automatic facial expression synthesis of an arbitrary portrait with continuous action unit (AU) coefficients. Specifically, our model directly manipulates image pixels to make the unseen subject in the still photo express various emotions controlled by values of facial AU coeff… ▽ More

    Submitted 28 March, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

    Comments: Fix typos, add youtube link of supplementary video

  38. arXiv:1710.03354  [pdf, other

    cs.MA

    The Role of Data-driven Priors in Multi-agent Crowd Trajectory Estimation

    Authors: Gang Qiao, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic

    Abstract: Trajectory interpolation, the process of filling-in the gaps and removing noise from observed agent trajectories, is an essential task for the motion inference in multi-agent setting. A desired trajectory interpolation method should be robust to noise, changes in environments or agent densities, while also being yielding realistic group movement behaviors. Such realistic behaviors are, however, ch… ▽ More

    Submitted 9 October, 2017; originally announced October 2017.

    Comments: 11 pages including supplementary, 6 figures in total, 6 tables in total, 1 algorithm

  39. arXiv:1710.00920  [pdf, other

    cs.CV

    End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

    Authors: Hai X. Pham, Yuting Wang, Vladimir Pavlovic

    Abstract: We present a deep learning framework for real-time speech-driven 3D facial animation from just raw waveforms. Our deep neural network directly maps an input sequence of speech audio to a series of micro facial action unit activations and head rotations to drive a 3D blendshape face model. In particular, our deep model is able to learn the latent representations of time-varying contextual informati… ▽ More

    Submitted 7 December, 2017; v1 submitted 2 October, 2017; originally announced October 2017.

  40. arXiv:1710.00018  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Unsupervised Domain Adaptation with Copula Models

    Authors: Cuong D. Tran, Ognjen Rudovic, Vladimir Pavlovic

    Abstract: We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predic… ▽ More

    Submitted 29 September, 2017; originally announced October 2017.

    Comments: IEEE International Workshop On Machine Learning for Signal Processing 2017

  41. arXiv:1704.04481  [pdf, other

    cs.CV

    Deep Structured Learning for Facial Action Unit Intensity Estimation

    Authors: Robert Walecki, Ognjen, Rudovic, Vladimir Pavlovic, Björn Schuller, Maja Pantic

    Abstract: We consider the task of automated estimation of facial expression intensity. This involves estimation of multiple output variables (facial action units --- AUs) that are structurally dependent. Their structure arises from statistically induced co-occurrence patterns of AU intensity levels. Modeling this structure is critical for improving the estimation performance; however, this performance is bo… ▽ More

    Submitted 14 April, 2017; originally announced April 2017.

  42. Cartoonish sketch-based face editing in videos using identity deformation transfer

    Authors: Long Zhao, Fangda Han, Xi Peng, Xun Zhang, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris N. Metaxas

    Abstract: We address the problem of using hand-drawn sketches to create exaggerated deformations to faces in videos, such as enlarging the shape or modifying the position of eyes or mouth. This task is formulated as a 3D face model reconstruction and deformation problem. We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame. At the same time, us… ▽ More

    Submitted 25 January, 2019; v1 submitted 25 March, 2017; originally announced March 2017.

    Comments: In Computers & Graphics, 2019. (12 pages, 10 figures)

  43. arXiv:1609.08201  [pdf, other

    cs.DB cs.IR stat.ML

    Robust Time-Series Retrieval Using Probabilistic Adaptive Segmental Alignment

    Authors: Shahriar Shariat, Vladimir Pavlovic

    Abstract: Traditional pairwise sequence alignment is based on matching individual samples from two sequences, under time monotonicity constraints. However, in many application settings matching subsequences (segments) instead of individual samples may bring in additional robustness to noise or local non-causal perturbations. This paper presents an approach to segmental sequence alignment that jointly segmen… ▽ More

    Submitted 26 September, 2016; originally announced September 2016.

    Journal ref: Knowl Inf Syst (2016) 49: 91. doi:10.1007/s10115-015-0898-4

  44. arXiv:1510.03909  [pdf, other

    cs.CV cs.HC

    Variable-state Latent Conditional Random Fields for Facial Expression Recognition and Action Unit Detection

    Authors: Robert Walecki, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

    Abstract: Automated recognition of facial expressions of emotions, and detection of facial action units (AUs), from videos depends critically on modeling of their dynamics. These dynamics are characterized by changes in temporal phases (onset-apex-offset) and intensity of emotion expressions and AUs, the appearance of which may vary considerably among target subjects, making the recognition/detection task v… ▽ More

    Submitted 13 October, 2015; originally announced October 2015.

  45. arXiv:1507.02779  [pdf, other

    cs.CV

    Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes

    Authors: Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-jen Cham

    Abstract: We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user. In particular, we emphasize on improving the tracking performance in instances where the tracked subject is at a large distance from the cameras, and the quality of point cloud deteriorates severely. Th… ▽ More

    Submitted 10 July, 2015; originally announced July 2015.

    Comments: 10 pages, 8 figures, 4 tables

  46. arXiv:1507.02356  [pdf, other

    stat.ML cs.LG

    Intrinsic Non-stationary Covariance Function for Climate Modeling

    Authors: Chintan A. Dalal, Vladimir Pavlovic, Robert E. Kopp

    Abstract: Designing a covariance function that represents the underlying correlation is a crucial step in modeling complex natural systems, such as climate models. Geospatial datasets at a global scale usually suffer from non-stationarity and non-uniformly smooth spatial boundaries. A Gaussian process regression using a non-stationary covariance function has shown promise for this task, as this covariance f… ▽ More

    Submitted 8 July, 2015; originally announced July 2015.

    Comments: 9 pages, 3 figures

  47. arXiv:1507.00824  [pdf, ps, other

    cs.LG stat.ML

    D-MFVI: Distributed Mean Field Variational Inference using Bregman ADMM

    Authors: Behnam Babagholami-Mohamadabadi, Sejong Yoon, Vladimir Pavlovic

    Abstract: Bayesian models provide a framework for probabilistic modelling of complex datasets. However, many of such models are computationally demanding especially in the presence of large datasets. On the other hand, in sensor network applications, statistical (Bayesian) parameter estimation usually needs distributed algorithms, in which both data and computation are distributed across the nodes of the ne… ▽ More

    Submitted 3 July, 2015; originally announced July 2015.

    Comments: 19 pages, 6 figures

  48. Discovering Characteristic Landmarks on Ancient Coins using Convolutional Networks

    Authors: Jongpil Kim, Vladimir Pavlovic

    Abstract: In this paper, we propose a novel method to find characteristic landmarks on ancient Roman imperial coins using deep convolutional neural network models (CNNs). We formulate an optimization problem to discover class-specific regions while guaranteeing specific controlled loss of accuracy. Analysis on visualization of the discovered region confirms that not only can the proposed method successfully… ▽ More

    Submitted 30 June, 2015; v1 submitted 30 June, 2015; originally announced June 2015.

  49. arXiv:1506.09124  [pdf, other

    cs.CV

    Multi-Cue Structure Preserving MRF for Unconstrained Video Segmentation

    Authors: Saehoon Yi, Vladimir Pavlovic

    Abstract: Video segmentation is a step** stone to understanding video context. Video segmentation enables one to represent a video by decomposing it into coherent regions which comprise whole or parts of objects. However, the challenge originates from the fact that most of the video segmentation algorithms are based on unsupervised learning due to expensive cost of pixelwise video annotation and intra-cla… ▽ More

    Submitted 30 June, 2015; originally announced June 2015.

  50. arXiv:1506.08928  [pdf, other

    cs.LG cs.CV math.OC

    Fast ADMM Algorithm for Distributed Optimization with Adaptive Penalty

    Authors: Changkyu Song, Sejong Yoon, Vladimir Pavlovic

    Abstract: We propose new methods to speed up convergence of the Alternating Direction Method of Multipliers (ADMM), a common optimization tool in the context of large scale and distributed learning. The proposed method accelerates the speed of convergence by automatically deciding the constraint penalty needed for parameter consensus in each iteration. In addition, we also propose an extension of the method… ▽ More

    Submitted 29 June, 2015; originally announced June 2015.

    Comments: 8 pages manuscript, 2 pages appendix, 5 figures