Search | arXiv e-print repository

A Compass for Navigating the World of Sentence Embeddings for the Telecom Domain

Authors: Sujoy Roychowdhury, Sumit Soman, H. G. Ranjani, Vansh Chhabra, Neeraj Gunda, Subhadip Bandyopadhyay, Sai Krishna Bala

Abstract: A plethora of sentence embedding models makes it challenging to choose one, especially for domains such as telecom, rich with specialized vocabulary. We evaluate multiple embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies as well as their (95\%) confidence intervals. We establish a systematic method to obtain thresholds for simi… ▽ More A plethora of sentence embedding models makes it challenging to choose one, especially for domains such as telecom, rich with specialized vocabulary. We evaluate multiple embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies as well as their (95\%) confidence intervals. We establish a systematic method to obtain thresholds for similarity scores for different embeddings. We observe that fine-tuning improves mean bootstrapped accuracies as well as tightens confidence intervals. The pre-training combined with fine-tuning makes confidence intervals even tighter. To understand these variations, we analyse and report significant correlations between the distributional overlap between top-$K$, correct and random sentence similarities with retrieval accuracies and similarity thresholds. Following current literature, we analyze if retrieval accuracy variations can be attributed to isotropy of embeddings. Our conclusions are that isotropy of embeddings (as measured by two independent state-of-the-art isotropy metric definitions) cannot be attributed to better retrieval performance. However, domain adaptation which improves retrieval accuracies also improves isotropy. We establish that domain adaptation moves domain specific embeddings further away from general domain embeddings. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 3 figures, 4 tables

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2312.06960 [pdf, other]

Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

Authors: Utkarsh Mall, Cheng Perng Phoo, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala

Abstract: We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. Our key insight is to use co-located internet imagery taken on the ground as an intermediary for connecting remote-sensing images and language. Specifically, we train an image encoder for remote sensing images to align with the image encoder of CLIP using a large amount of paired… ▽ More We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. Our key insight is to use co-located internet imagery taken on the ground as an intermediary for connecting remote-sensing images and language. Specifically, we train an image encoder for remote sensing images to align with the image encoder of CLIP using a large amount of paired internet and satellite images. Our unsupervised approach enables the training of a first-of-its-kind large-scale vision language model (VLM) for remote sensing images at two different resolutions. We show that these VLMs enable zero-shot, open-vocabulary image classification, retrieval, segmentation and visual question answering for satellite images. On each of these tasks, our VLM trained without textual annotations outperforms existing VLMs trained with supervision, with gains of up to 20% for classification and 80% for segmentation. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2204.07030 [pdf, other]

Activation Regression for Continuous Domain Generalization with Applications to Crop Classification

Authors: Samar Khanna, Bram Wallace, Kavita Bala, Bharath Hariharan

Abstract: Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire c… ▽ More Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire continental United States, providing macroscopic insight into the effects of geography on crop classification in multi-spectral and temporally distributed satellite imagery. Our method demonstrates improved generalisability from 1) passing geographically correlated climate variables along with the satellite data to a Transformer model and 2) regressing on the model features to reconstruct these domain variables. Combined, we provide a novel perspective on geographic generalisation in satellite imagery and a simple-yet-effective approach to leverage domain knowledge. Code is available at: \url{https://github.com/samar-khanna/cropmap} △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2109.09923 [pdf, other]

AutoPhoto: Aesthetic Photo Capture using Reinforcement Learning

Authors: Hadi AlZayer, Hubert Lin, Kavita Bala

Abstract: The process of capturing a well-composed photo is difficult and it takes years of experience to master. We propose a novel pipeline for an autonomous agent to automatically capture an aesthetic photograph by navigating within a local region in a scene. Instead of classical optimization over heuristics such as the rule-of-thirds, we adopt a data-driven aesthetics estimator to assess photo quality.… ▽ More The process of capturing a well-composed photo is difficult and it takes years of experience to master. We propose a novel pipeline for an autonomous agent to automatically capture an aesthetic photograph by navigating within a local region in a scene. Instead of classical optimization over heuristics such as the rule-of-thirds, we adopt a data-driven aesthetics estimator to assess photo quality. A reinforcement learning framework is used to optimize the model with respect to the learned aesthetics metric. We train our model in simulation with indoor scenes, and we demonstrate that our system can capture aesthetic photos in both simulation and real world environments on a ground robot. To our knowledge, this is the first system that can automatically explore an environment to capture an aesthetic photo with respect to a learned aesthetic estimator. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: Accepted to IROS 2021

arXiv:2108.10967 [pdf, other]

Field-Guide-Inspired Zero-Shot Learning

Authors: Utkarsh Mall, Bharath Hariharan, Kavita Bala

Abstract: Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is… ▽ More Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is especially the case when the recognition domain is an expert domain. We introduce a new field-guide-inspired approach to zero-shot annotation where the learner model interactively asks for the most useful attributes that define a class. We evaluate our method on classification benchmarks with attribute annotations like CUB, SUN, and AWA2 and show that our model achieves the performance of a model with full annotations at the cost of a significantly fewer number of annotations. Since the time of experts is precious, decreasing annotation cost can be very valuable for real-world deployment. △ Less

Submitted 24 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV 2021

arXiv:2104.00674 [pdf, other]

PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting

Authors: Kai Zhang, Fujun Luan, Qianqian Wang, Kavita Bala, Noah Snavely

Abstract: We present PhySG, an end-to-end inverse rendering pipeline that includes a fully differentiable renderer and can reconstruct geometry, materials, and illumination from scratch from a set of RGB input images. Our framework represents specular BRDFs and environmental illumination using mixtures of spherical Gaussians, and represents geometry as a signed distance function parameterized as a Multi-Lay… ▽ More We present PhySG, an end-to-end inverse rendering pipeline that includes a fully differentiable renderer and can reconstruct geometry, materials, and illumination from scratch from a set of RGB input images. Our framework represents specular BRDFs and environmental illumination using mixtures of spherical Gaussians, and represents geometry as a signed distance function parameterized as a Multi-Layer Perceptron. The use of spherical Gaussians allows us to efficiently solve for approximate light transport, and our method works on scenes with challenging non-Lambertian reflectance captured under natural, static illumination. We demonstrate, with both synthetic and real data, that our reconstructions not only enable rendering of novel viewpoints, but also physics-based appearance editing of materials and illumination. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: Accepted to CVPR 2021; Project page: https://kai-46.github.io/PhySG-website/

arXiv:2103.17070 [pdf, other]

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering

Authors: Jang Hyun Cho, Utkarsh Mall, Kavita Bala, Bharath Hariharan

Abstract: We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, sol… ▽ More We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to low-level visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of segmenting both things and stuff categories without any hyperparameter tuning or task-specific pre-processing. Our method largely outperforms existing baselines on COCO and Cityscapes with +17.5 Acc. and +4.5 mIoU. We show that PiCIE gives a better initialization for standard supervised training. The code is available at https://github.com/janghyuncho/PiCIE. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: CVPR 2021

arXiv:2103.15208 [pdf, other]

Unified Shape and SVBRDF Recovery using Differentiable Monte Carlo Rendering

Authors: Fujun Luan, Shuang Zhao, Kavita Bala, Zhao Dong

Abstract: Reconstructing the shape and appearance of real-world objects using measured 2D images has been a long-standing problem in computer vision. In this paper, we introduce a new analysis-by-synthesis technique capable of producing high-quality reconstructions through robust coarse-to-fine optimization and physics-based differentiable rendering. Unlike most previous methods that handle geometry and r… ▽ More Reconstructing the shape and appearance of real-world objects using measured 2D images has been a long-standing problem in computer vision. In this paper, we introduce a new analysis-by-synthesis technique capable of producing high-quality reconstructions through robust coarse-to-fine optimization and physics-based differentiable rendering. Unlike most previous methods that handle geometry and reflectance largely separately, our method unifies the optimization of both by leveraging image gradients with respect to both object reflectance and geometry. To obtain physically accurate gradient estimates, we develop a new GPU-based Monte Carlo differentiable renderer leveraging recent advances in differentiable rendering theory to offer unbiased gradients while enjoying better performance than existing tools like PyTorch3D and redner. To further improve robustness, we utilize several shape and material priors as well as a coarse-to-fine optimization strategy to reconstruct geometry. We demonstrate that our technique can produce reconstructions with higher quality than previous methods such as COLMAP and Kinect Fusion. △ Less

Submitted 24 June, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

arXiv:2012.02996 [pdf, other]

doi 10.1371/journal.pone.0255109

Materials In Paintings (MIP): An interdisciplinary dataset for perception, art history, and computer vision

Authors: Mitchell J. P. van Zuijlen, Hubert Lin, Kavita Bala, Sylvia C. Pont, Maarten W. A. Wijntjes

Abstract: A painter is free to modify how components of a natural scene are depicted, which can lead to a perceptually convincing image of the distal world. This signals a major difference between photos and paintings: paintings are explicitly created for human perception. Studying these painterly depictions could be beneficial to a multidisciplinary audience. In this paper, we capture and explore the paint… ▽ More A painter is free to modify how components of a natural scene are depicted, which can lead to a perceptually convincing image of the distal world. This signals a major difference between photos and paintings: paintings are explicitly created for human perception. Studying these painterly depictions could be beneficial to a multidisciplinary audience. In this paper, we capture and explore the painterly depictions of materials to enable the study of depiction and perception of materials through the artists' eye. We annotated a dataset of 19k paintings with 200k+ bounding boxes from which polygon segments were automatically extracted. Each bounding box was assigned a coarse label (e.g., fabric) and a fine-grained label (e.g., velvety, silky). We demonstrate the cross-disciplinary utility of our dataset by presenting novel findings across art history, human perception, and computer vision. Our experiments include analyzing the distribution of materials depicted in paintings, showing how painters create convincing depictions using a stylized approach, and demonstrating how paintings can be used to build more robust computer vision models. We conclude that our dataset of painterly material depictions is a rich source for gaining insights into the depiction and perception of materials across multiple disciplines. The MIP dataset is freely accessible at https://materialsinpaintings.tudelft.nl △ Less

Submitted 10 December, 2020; v1 submitted 5 December, 2020; originally announced December 2020.

arXiv:2012.02897 [pdf, other]

Discovering Underground Maps from Fashion

Authors: Utkarsh Mall, Kavita Bala, Tamara Berg, Kristen Grauman

Abstract: The fashion sense -- meaning the clothing styles people wear -- in a geographical region can reveal information about that region. For example, it can reflect the kind of activities people do there, or the type of crowds that frequently visit the region (e.g., tourist hot spot, student neighborhood, business center). We propose a method to automatically create underground neighborhood maps of citi… ▽ More The fashion sense -- meaning the clothing styles people wear -- in a geographical region can reveal information about that region. For example, it can reflect the kind of activities people do there, or the type of crowds that frequently visit the region (e.g., tourist hot spot, student neighborhood, business center). We propose a method to automatically create underground neighborhood maps of cities by analyzing how people dress. Using publicly available images from across a city, our method finds neighborhoods with a similar fashion sense and segments the map without supervision. For 37 cities worldwide, we show promising results in creating good underground maps, as evaluated using experiments with human judges and underground map benchmarks derived from non-image data. Our approach further allows detecting distinct neighborhoods (what is the most unique region of LA?) and answering analogy questions between cities (what is the "Downtown LA" of Bogota?). △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.14477 [pdf, other]

What Can Style Transfer and Paintings Do For Model Robustness?

Authors: Hubert Lin, Mitchell van Zuijlen, Sylvia C. Pont, Maarten W. A. Wijntjes, Kavita Bala

Abstract: A common strategy for improving model robustness is through data augmentations. Data augmentations encourage models to learn desired invariances, such as invariance to horizontal flip** or small changes in color. Recent work has shown that arbitrary style transfer can be used as a form of data augmentation to encourage invariance to textures by creating painting-like images from photographs. How… ▽ More A common strategy for improving model robustness is through data augmentations. Data augmentations encourage models to learn desired invariances, such as invariance to horizontal flip** or small changes in color. Recent work has shown that arbitrary style transfer can be used as a form of data augmentation to encourage invariance to textures by creating painting-like images from photographs. However, a stylized photograph is not quite the same as an artist-created painting. Artists depict perceptually meaningful cues in paintings so that humans can recognize salient components in scenes, an emphasis which is not enforced in style transfer. Therefore, we study how style transfer and paintings differ in their impact on model robustness. First, we investigate the role of paintings as style images for stylization-based data augmentation. We find that style transfer functions well even without paintings as style images. Second, we show that learning from paintings as a form of perceptual data augmentation can improve model robustness. Finally, we investigate the invariances learned from stylization and from paintings, and show that models learn different invariances from these differing forms of data. Our results provide insights into how stylization improves model robustness, and provide evidence that artist-created paintings can be a valuable source of data for model robustness. △ Less

Submitted 27 May, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

Comments: CVPR 2021

arXiv:2011.12276 [pdf, other]

Insights From A Large-Scale Database of Material Depictions In Paintings

Authors: Hubert Lin, Mitchell Van Zuijlen, Maarten W. A. Wijntjes, Sylvia C. Pont, Kavita Bala

Abstract: Deep learning has paved the way for strong recognition systems which are often both trained on and applied to natural images. In this paper, we examine the give-and-take relationship between such visual recognition systems and the rich information available in the fine arts. First, we find that visual recognition systems designed for natural images can work surprisingly well on paintings. In parti… ▽ More Deep learning has paved the way for strong recognition systems which are often both trained on and applied to natural images. In this paper, we examine the give-and-take relationship between such visual recognition systems and the rich information available in the fine arts. First, we find that visual recognition systems designed for natural images can work surprisingly well on paintings. In particular, we find that interactive segmentation tools can be used to cleanly annotate polygonal segments within paintings, a task which is time consuming to undertake by hand. We also find that FasterRCNN, a model which has been designed for object recognition in natural scenes, can be quickly repurposed for detection of materials in paintings. Second, we show that learning from paintings can be beneficial for neural networks that are intended to be used on natural images. We find that training on paintings instead of natural images can improve the quality of learned features and we further find that a large number of paintings can be a valuable source of test data for evaluating domain adaptation algorithms. Our experiments are based on a novel large-scale annotated database of material depictions in paintings which we detail in a separate manuscript. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: International Workshop on Fine Art Pattern Extraction and Recognition, ICPR 2020

arXiv:2003.03464 [pdf, other]

DeepSemanticHPPC: Hypothesis-based Planning over Uncertain Semantic Point Clouds

Authors: Yutao Han, Hubert Lin, Jacopo Banfi, Kavita Bala, Mark Campbell

Abstract: Planning in unstructured environments is challenging -- it relies on sensing, perception, scene reconstruction, and reasoning about various uncertainties. We propose DeepSemanticHPPC, a novel uncertainty-aware hypothesis-based planner for unstructured environments. Our algorithmic pipeline consists of: a deep Bayesian neural network which segments surfaces with uncertainty estimates; a flexible po… ▽ More Planning in unstructured environments is challenging -- it relies on sensing, perception, scene reconstruction, and reasoning about various uncertainties. We propose DeepSemanticHPPC, a novel uncertainty-aware hypothesis-based planner for unstructured environments. Our algorithmic pipeline consists of: a deep Bayesian neural network which segments surfaces with uncertainty estimates; a flexible point cloud scene representation; a next-best-view planner which minimizes the uncertainty of scene semantics using sparse visual measurements; and a hypothesis-based path planner that proposes multiple kinematically feasible paths with evolving safety confidences given next-best-view measurements. Our pipeline iteratively decreases semantic uncertainty along planned paths, filtering out unsafe paths with high confidence. We show that our framework plans safe paths in real-world environments where existing path planners typically fail. △ Less

Submitted 6 March, 2020; originally announced March 2020.

Comments: Accepted by the IEEE International Conference on Robotics and Automation (ICRA) 2020. Video Link: https://youtu.be/_SVEZx5vbiQ. The first three authors contributed equally to this work

arXiv:2002.06626 [pdf, other]

Block Annotation: Better Image Annotation for Semantic Segmentation with Sub-Image Decomposition

Authors: Hubert Lin, Paul Upchurch, Kavita Bala

Abstract: Image datasets with high-quality pixel-level annotations are valuable for semantic segmentation: labelling every pixel in an image ensures that rare classes and small objects are annotated. However, full-image annotations are expensive, with experts spending up to 90 minutes per image. We propose block sub-image annotation as a replacement for full-image annotation. Despite the attention cost of f… ▽ More Image datasets with high-quality pixel-level annotations are valuable for semantic segmentation: labelling every pixel in an image ensures that rare classes and small objects are annotated. However, full-image annotations are expensive, with experts spending up to 90 minutes per image. We propose block sub-image annotation as a replacement for full-image annotation. Despite the attention cost of frequent task switching, we find that block annotations can be crowdsourced at higher quality compared to full-image annotation with equal monetary cost using existing annotation tools developed for full-image annotation. Surprisingly, we find that 50% pixels annotated with blocks allows semantic segmentation to achieve equivalent performance to 100% pixels annotated. Furthermore, as little as 12% of pixels annotated allows performance as high as 98% of the performance with dense annotation. In weakly-supervised settings, block annotation outperforms existing methods by 3-4% (absolute) given equivalent annotation time. To recover the necessary global structure for applications such as characterizing spatial context and affordance relationships, we propose an effective method to inpaint block-annotated images with high-quality labels without additional human effort. As such, fewer annotations can also be used for these applications compared to full-image annotation. △ Less

Submitted 16 February, 2020; originally announced February 2020.

Comments: ICCV 2019; http://www.cs.cornell.edu/~hubert/block_annotation/

arXiv:1908.11412 [pdf, other]

GeoStyle: Discovering Fashion Trends and Events

Authors: Utkarsh Mall, Kevin Matzen, Bharath Hariharan, Noah Snavely, Kavita Bala

Abstract: Understanding fashion styles and trends is of great potential interest to retailers and consumers alike. The photos people upload to social media are a historical and public data source of how people dress across the world and at different times. While we now have tools to automatically recognize the clothing and style attributes of what people are wearing in these photographs, we lack the ability… ▽ More Understanding fashion styles and trends is of great potential interest to retailers and consumers alike. The photos people upload to social media are a historical and public data source of how people dress across the world and at different times. While we now have tools to automatically recognize the clothing and style attributes of what people are wearing in these photographs, we lack the ability to analyze spatial and temporal trends in these attributes or make predictions about the future. In this paper, we address this need by providing an automatic framework that analyzes large corpora of street imagery to (a) discover and forecast long-term trends of various fashion attributes as well as automatically discovered styles, and (b) identify spatio-temporally localized events that affect what people wear. We show that our framework makes long term trend forecasts that are >20% more accurate than the prior art, and identifies hundreds of socially meaningful events that impact fashion across the globe. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: Accepted in ICCV 2019

arXiv:1810.08729 [pdf, other]

doi 10.1109/3DV.2018.00027

Learning Material-Aware Local Descriptors for 3D Shapes

Authors: Hubert Lin, Melinos Averkiou, Evangelos Kalogerakis, Balazs Kovacs, Siddhant Ranade, Vladimir G. Kim, Siddhartha Chaudhuri, Kavita Bala

Abstract: Material understanding is critical for design, geometric modeling, and analysis of functional objects. We enable material-aware 3D shape analysis by employing a projective convolutional neural network architecture to learn material- aware descriptors from view-based representations of 3D points for point-wise material classification or material- aware retrieval. Unfortunately, only a small fractio… ▽ More Material understanding is critical for design, geometric modeling, and analysis of functional objects. We enable material-aware 3D shape analysis by employing a projective convolutional neural network architecture to learn material- aware descriptors from view-based representations of 3D points for point-wise material classification or material- aware retrieval. Unfortunately, only a small fraction of shapes in 3D repositories are labeled with physical mate- rials, posing a challenge for learning methods. To address this challenge, we crowdsource a dataset of 3080 3D shapes with part-wise material labels. We focus on furniture models which exhibit interesting structure and material variabil- ity. In addition, we also contribute a high-quality expert- labeled benchmark of 115 shapes from Herman-Miller and IKEA for evaluation. We further apply a mesh-aware con- ditional random field, which incorporates rotational and reflective symmetries, to smooth our local material predic- tions across neighboring surface patches. We demonstrate the effectiveness of our learned descriptors for automatic texturing, material-aware retrieval, and physical simulation. The dataset and code will be publicly available. △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: 3DV 2018

arXiv:1809.10820 [pdf, other]

Inverse Transport Networks

Authors: Chengqian Che, Fujun Luan, Shuang Zhao, Kavita Bala, Ioannis Gkioulekas

Abstract: We introduce inverse transport networks as a learning architecture for inverse rendering problems where, given input image measurements, we seek to infer physical scene parameters such as shape, material, and illumination. During training, these networks are evaluated not only in terms of how close they can predict groundtruth parameters, but also in terms of whether the parameters they produce ca… ▽ More We introduce inverse transport networks as a learning architecture for inverse rendering problems where, given input image measurements, we seek to infer physical scene parameters such as shape, material, and illumination. During training, these networks are evaluated not only in terms of how close they can predict groundtruth parameters, but also in terms of whether the parameters they produce can be used, together with physically-accurate graphics renderers, to reproduce the input image measurements. To en- able training of inverse transport networks using stochastic gradient descent, we additionally create a general-purpose, physically-accurate differentiable renderer, which can be used to estimate derivatives of images with respect to arbitrary physical scene parameters. Our experiments demonstrate that inverse transport networks can be trained efficiently using differentiable rendering, and that they generalize to scenes with completely unseen geometry and illumination better than networks trained without appearance- matching regularization. △ Less

Submitted 27 September, 2018; originally announced September 2018.

arXiv:1804.03189 [pdf, other]

Deep Painterly Harmonization

Authors: Fujun Luan, Sylvain Paris, Eli Shechtman, Kavita Bala

Abstract: Copying an element from a photo and pasting it into a painting is a challenging task. Applying photo compositing techniques in this context yields subpar results that look like a collage --- and existing painterly stylization algorithms, which are global, perform poorly when applied locally. We address these issues with a dedicated algorithm that carefully determines the local statistics to be tra… ▽ More Copying an element from a photo and pasting it into a painting is a challenging task. Applying photo compositing techniques in this context yields subpar results that look like a collage --- and existing painterly stylization algorithms, which are global, perform poorly when applied locally. We address these issues with a dedicated algorithm that carefully determines the local statistics to be transferred. We ensure both spatial and inter-scale statistical consistency and demonstrate that both aspects are key to generating quality results. To cope with the diversity of abstraction levels and types of paintings, we introduce a technique to adjust the parameters of the transfer depending on the painting. We show that our algorithm produces significantly better results than photo compositing or global stylization techniques and that it enables creative painterly edits that would be otherwise difficult to achieve. △ Less

Submitted 26 June, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

arXiv:1706.01869 [pdf, other]

StreetStyle: Exploring world-wide clothing styles from millions of photos

Authors: Kevin Matzen, Kavita Bala, Noah Snavely

Abstract: Each day billions of photographs are uploaded to photo-sharing services and social media platforms. These images are packed with information about how people live around the world. In this paper we exploit this rich trove of data to understand fashion and style trends worldwide. We present a framework for visual discovery at scale, analyzing clothing and fashion across millions of images of people… ▽ More Each day billions of photographs are uploaded to photo-sharing services and social media platforms. These images are packed with information about how people live around the world. In this paper we exploit this rich trove of data to understand fashion and style trends worldwide. We present a framework for visual discovery at scale, analyzing clothing and fashion across millions of images of people around the world and spanning several years. We introduce a large-scale dataset of photos of people annotated with clothing attributes, and use this dataset to train attribute classifiers via deep learning. We also present a method for discovering visually consistent style clusters that capture useful visual correlations in this massive dataset. Using these tools, we analyze millions of photos to derive visual insight, producing a first-of-its-kind analysis of global and per-city fashion choices and spatio-temporal trends. △ Less

Submitted 6 June, 2017; originally announced June 2017.

arXiv:1705.01156 [pdf, other]

Shading Annotations in the Wild

Authors: Balazs Kovacs, Sean Bell, Noah Snavely, Kavita Bala

Abstract: Understanding shading effects in images is critical for a variety of vision and graphics problems, including intrinsic image decomposition, shadow removal, image relighting, and inverse rendering. As is the case with other vision tasks, machine learning is a promising approach to understanding shading - but there is little ground truth shading data available for real-world images. We introduce Sha… ▽ More Understanding shading effects in images is critical for a variety of vision and graphics problems, including intrinsic image decomposition, shadow removal, image relighting, and inverse rendering. As is the case with other vision tasks, machine learning is a promising approach to understanding shading - but there is little ground truth shading data available for real-world images. We introduce Shading Annotations in the Wild (SAW), a new large-scale, public dataset of shading annotations in indoor scenes, comprised of multiple forms of shading judgments obtained via crowdsourcing, along with shading annotations automatically generated from RGB-D imagery. We use this data to train a convolutional neural network to predict per-pixel shading information in an image. We demonstrate the value of our data and network in an application to intrinsic images, where we can reduce decomposition artifacts produced by existing algorithms. Our database is available at http://opensurfaces.cs.cornell.edu/saw/. △ Less

Submitted 2 May, 2017; originally announced May 2017.

Comments: CVPR 2017

arXiv:1703.07511 [pdf, other]

Deep Photo Style Transfer

Authors: Fujun Luan, Sylvain Paris, Eli Shechtman, Kavita Bala

Abstract: This paper introduces a deep-learning approach to photographic style transfer that handles a large variety of image content while faithfully transferring the reference style. Our approach builds upon the recent work on painterly transfer that separates style from the content of an image by considering different layers of a neural network. However, as is, this approach is not suitable for photoreal… ▽ More This paper introduces a deep-learning approach to photographic style transfer that handles a large variety of image content while faithfully transferring the reference style. Our approach builds upon the recent work on painterly transfer that separates style from the content of an image by considering different layers of a neural network. However, as is, this approach is not suitable for photorealistic style transfer. Even when both the input and reference images are photographs, the output still exhibits distortions reminiscent of a painting. Our contribution is to constrain the transformation from the input to the output to be locally affine in colorspace, and to express this constraint as a custom fully differentiable energy term. We show that this approach successfully suppresses distortion and yields satisfying photorealistic style transfers in a broad variety of scenarios, including transfer of the time of day, weather, season, and artistic edits. △ Less

Submitted 10 April, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

arXiv:1611.05507 [pdf, other]

Deep Feature Interpolation for Image Content Changes

Authors: Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, Kilian Weinberger

Abstract: We propose Deep Feature Interpolation (DFI), a new data-driven baseline for automatic high-resolution image transformation. As the name suggests, it relies only on simple linear interpolation of deep convolutional features from pre-trained convnets. We show that despite its simplicity, DFI can perform high-level semantic transformations like "make older/younger", "make bespectacled", "add smile",… ▽ More We propose Deep Feature Interpolation (DFI), a new data-driven baseline for automatic high-resolution image transformation. As the name suggests, it relies only on simple linear interpolation of deep convolutional features from pre-trained convnets. We show that despite its simplicity, DFI can perform high-level semantic transformations like "make older/younger", "make bespectacled", "add smile", among others, surprisingly well - sometimes even matching or outperforming the state-of-the-art. This is particularly unexpected as DFI requires no specialized network architecture or even any deep network to be trained for these tasks. DFI therefore can be used as a new baseline to evaluate more complex algorithms and provides a practical answer to the question of which image transformation tasks are still challenging in the rise of deep learning. △ Less

Submitted 19 June, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

Comments: First two authors contributed equally. Accepted by CVPR 2017. Code at https://github.com/paulu/deepfeatinterp

arXiv:1603.02003 [pdf, other]

From A to Z: Supervised Transfer of Style and Content Using Deep Neural Network Generators

Authors: Paul Upchurch, Noah Snavely, Kavita Bala

Abstract: We propose a new neural network architecture for solving single-image analogies - the generation of an entire set of stylistically similar images from just a single input image. Solving this problem requires separating image style from content. Our network is a modified variational autoencoder (VAE) that supports supervised training of single-image analogies and in-network evaluation of outputs wi… ▽ More We propose a new neural network architecture for solving single-image analogies - the generation of an entire set of stylistically similar images from just a single input image. Solving this problem requires separating image style from content. Our network is a modified variational autoencoder (VAE) that supports supervised training of single-image analogies and in-network evaluation of outputs with a structured similarity objective that captures pixel covariances. On the challenging task of generating a 62-letter font from a single example letter we produce images with 22.4% lower dissimilarity to the ground truth than state-of-the-art. △ Less

Submitted 7 March, 2016; originally announced March 2016.

arXiv:1512.04143 [pdf, other]

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks

Authors: Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

Abstract: It is well known that contextual and multi-scale representations are important for accurate visual recognition. In this paper we present the Inside-Outside Net (ION), an object detector that exploits information both inside and outside the region of interest. Contextual information outside the region of interest is integrated using spatial recurrent neural networks. Inside, we use skip pooling to… ▽ More It is well known that contextual and multi-scale representations are important for accurate visual recognition. In this paper we present the Inside-Outside Net (ION), an object detector that exploits information both inside and outside the region of interest. Contextual information outside the region of interest is integrated using spatial recurrent neural networks. Inside, we use skip pooling to extract information at multiple scales and levels of abstraction. Through extensive experiments we evaluate the design space and provide readers with an overview of what tricks of the trade are important. ION improves state-of-the-art on PASCAL VOC 2012 object detection from 73.9% to 76.4% mAP. On the new and more challenging MS COCO dataset, we improve state-of-art-the from 19.7% to 33.1% mAP. In the 2015 MS COCO Detection Challenge, our ION model won the Best Student Entry and finished 3rd place overall. As intuition suggests, our detection results provide strong evidence that context and multi-scale representations improve small object detection. △ Less

Submitted 13 December, 2015; originally announced December 2015.

arXiv:1511.06421 [pdf, other]

Deep Manifold Traversal: Changing Labels with Convolutional Features

Authors: Jacob R. Gardner, Paul Upchurch, Matt J. Kusner, Yixuan Li, Kilian Q. Weinberger, Kavita Bala, John E. Hopcroft

Abstract: Many tasks in computer vision can be cast as a "label changing" problem, where the goal is to make a semantic change to the appearance of an image or some subject in an image in order to alter the class membership. Although successful task-specific methods have been developed for some label changing applications, to date no general purpose method exists. Motivated by this we propose deep manifold… ▽ More Many tasks in computer vision can be cast as a "label changing" problem, where the goal is to make a semantic change to the appearance of an image or some subject in an image in order to alter the class membership. Although successful task-specific methods have been developed for some label changing applications, to date no general purpose method exists. Motivated by this we propose deep manifold traversal, a method that addresses the problem in its most general form: it first approximates the manifold of natural images then morphs a test image along a traversal path away from a source class and towards a target class while staying near the manifold throughout. The resulting algorithm is surprisingly effective and versatile. It is completely data driven, requiring only an example set of images from the desired source and target domains. We demonstrate deep manifold traversal on highly diverse label changing tasks: changing an individual's appearance (age and hair color), changing the season of an outdoor image, and transforming a city skyline towards nighttime. △ Less

Submitted 17 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

arXiv:1509.07473 [pdf, other]

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Authors: Andreas Veit, Balazs Kovacs, Sean Bell, Julian McAuley, Kavita Bala, Serge Belongie

Abstract: With the rapid proliferation of smart mobile devices, users now take millions of photos every day. These include large numbers of clothing and accessory images. We would like to answer questions like `What outfit goes well with this pair of shoes?' To answer these types of questions, one has to go beyond learning visual similarity and learn a visual notion of compatibility across categories. In th… ▽ More With the rapid proliferation of smart mobile devices, users now take millions of photos every day. These include large numbers of clothing and accessory images. We would like to answer questions like `What outfit goes well with this pair of shoes?' To answer these types of questions, one has to go beyond learning visual similarity and learn a visual notion of compatibility across categories. In this paper, we propose a novel learning framework to help answer these types of questions. The main idea of this framework is to learn a feature transformation from images of items into a latent space that expresses compatibility. For the feature transformation, we use a Siamese Convolutional Neural Network (CNN) architecture, where training examples are pairs of items that are either compatible or incompatible. We model compatibility based on co-occurrence in large-scale user behavior data; in particular co-purchase data from Amazon.com. To learn cross-category fit, we introduce a strategic method to sample training data, where pairs of items are heterogeneous dyads, i.e., the two elements of a pair belong to different high-level categories. While this approach is applicable to a wide variety of settings, we focus on the representative problem of learning compatible clothing style. Our results indicate that the proposed framework is capable of learning semantic information about visual style and is able to generate outfits of clothes, with items from different categories, that go well together. △ Less

Submitted 24 September, 2015; originally announced September 2015.

Comments: ICCV 2015

arXiv:1412.0623 [pdf, other]

Material Recognition in the Wild with the Materials in Context Database

Authors: Sean Bell, Paul Upchurch, Noah Snavely, Kavita Bala

Abstract: Recognizing materials in real-world images is a challenging task. Real-world materials have rich surface texture, geometry, lighting conditions, and clutter, which combine to make the problem particularly difficult. In this paper, we introduce a new, large-scale, open dataset of materials in the wild, the Materials in Context Database (MINC), and combine this dataset with deep learning to achieve… ▽ More Recognizing materials in real-world images is a challenging task. Real-world materials have rich surface texture, geometry, lighting conditions, and clutter, which combine to make the problem particularly difficult. In this paper, we introduce a new, large-scale, open dataset of materials in the wild, the Materials in Context Database (MINC), and combine this dataset with deep learning to achieve material recognition and segmentation of images in the wild. MINC is an order of magnitude larger than previous material databases, while being more diverse and well-sampled across its 23 categories. Using MINC, we train convolutional neural networks (CNNs) for two tasks: classifying materials from patches, and simultaneous material recognition and segmentation in full images. For patch-based classification on MINC we found that the best performing CNN architectures can achieve 85.2% mean class accuracy. We convert these trained CNN classifiers into an efficient fully convolutional framework combined with a fully connected conditional random field (CRF) to predict the material at every pixel in an image, achieving 73.1% mean class accuracy. Our experiments demonstrate that having a large, well-sampled dataset such as MINC is crucial for real-world material recognition and segmentation. △ Less

Submitted 14 April, 2015; v1 submitted 1 December, 2014; originally announced December 2014.

Comments: CVPR 2015. Sean Bell and Paul Upchurch contributed equally

arXiv:1107.3671 [pdf]

Impact of Mobility On QoS of Mobile WiMax Network With CBR Application

Authors: Kranti Bala, Kiran Ahuja

Abstract: The issue of mobility is important in wireless network because internet connectivity can only be effective if it's available during the movement of node. To enhance mobility, wireless access systems are designed such as IEEE 802.16e to operate on the move without any disruption of services. In this paper we are analyzing the impact of mobility on the QoS parameters (Throughput, Average Jitter and… ▽ More The issue of mobility is important in wireless network because internet connectivity can only be effective if it's available during the movement of node. To enhance mobility, wireless access systems are designed such as IEEE 802.16e to operate on the move without any disruption of services. In this paper we are analyzing the impact of mobility on the QoS parameters (Throughput, Average Jitter and Average end to end Delay) of a mobile WiMAX network (IEEE 802.16e) with CBR application. △ Less

Submitted 19 July, 2011; originally announced July 2011.

Comments: Total 7 Pages, 5 Figures and 1 table

Journal ref: International Journal of Advancements in Technology , Vol. 2, No. 3, July 2011

Showing 1–28 of 28 results for author: Bala, K