Search | arXiv e-print repository

Enhancing Pollinator Conservation towards Agriculture 4.0: Monitoring of Bees through Object Recognition

Authors: Ajay John Alex, Chloe M. Barnes, Pedro Machado, Isibor Ihianle, Gábor Markó, Martin Bencsik, Jordan J. Bird

Abstract: In an era of rapid climate change and its adverse effects on food production, technological intervention to monitor pollinator conservation is of paramount importance for environmental monitoring and conservation for global food security. The survival of the human species depends on the conservation of pollinators. This article explores the use of Computer Vision and Object Recognition to autonomo… ▽ More In an era of rapid climate change and its adverse effects on food production, technological intervention to monitor pollinator conservation is of paramount importance for environmental monitoring and conservation for global food security. The survival of the human species depends on the conservation of pollinators. This article explores the use of Computer Vision and Object Recognition to autonomously track and report bee behaviour from images. A novel dataset of 9664 images containing bees is extracted from video streams and annotated with bounding boxes. With training, validation and testing sets (6722, 1915, and 997 images, respectively), the results of the COCO-based YOLO model fine-tuning approaches show that YOLOv5m is the most effective approach in terms of recognition accuracy. However, YOLOv5s was shown to be the most optimal for real-time bee detection with an average processing and inference time of 5.1ms per video frame at the cost of slightly lower ability. The trained model is then packaged within an explainable AI interface, which converts detection events into timestamped reports and charts, with the aim of facilitating use by non-technical users such as expert stakeholders from the apiculture industry towards informing responsible consumption and production. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.05967 [pdf, other]

Distilling Diffusion Models into Conditional GANs

Authors: Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

Abstract: We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose… ▽ More We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models -- DMD, SDXL-Turbo, and SDXL-Lightning -- on the zero-shot COCO benchmark. △ Less

Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Project page: https://mingukkang.github.io/Diffusion2GAN/

arXiv:2310.05590 [pdf, other]

Perceptual Artifacts Localization for Image Synthesis Tasks

Authors: Lingzhi Zhang, Zhengjie Xu, Connelly Barnes, Yuqian Zhou, Qing Liu, He Zhang, Sohrab Amirghodsi, Zhe Lin, Eli Shechtman, Jianbo Shi

Abstract: Recent advancements in deep generative models have facilitated the creation of photo-realistic images across various tasks. However, these generated images often exhibit perceptual artifacts in specific regions, necessitating manual correction. In this study, we present a comprehensive empirical examination of Perceptual Artifacts Localization (PAL) spanning diverse image synthesis endeavors. We i… ▽ More Recent advancements in deep generative models have facilitated the creation of photo-realistic images across various tasks. However, these generated images often exhibit perceptual artifacts in specific regions, necessitating manual correction. In this study, we present a comprehensive empirical examination of Perceptual Artifacts Localization (PAL) spanning diverse image synthesis endeavors. We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels across ten synthesis tasks. A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks. Additionally, we illustrate its proficiency in adapting to previously unseen models using minimal training samples. We further propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images. Through our experimental analyses, we elucidate several practical downstream applications, such as automated artifact rectification, non-referential image quality evaluation, and abnormal region detection in images. The dataset and code are released. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2305.17624 [pdf, other]

SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

Authors: Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Abhinav Shrivastava

Abstract: In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a laborious and time-consuming task. In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click… ▽ More In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a laborious and time-consuming task. In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click. Our method surpasses the precision and recall achieved by the traditional method of running panoptic segmentation and then selecting the segments containing the clicks. We also showcase how a transformer-based module can be used to identify more distracting regions similar to the user's click position. Our experiments demonstrate that the model can effectively and accurately segment unknown distracting objects interactively and in groups. By significantly simplifying the photo cleaning and retouching process, our proposed model provides inspiration for exploring rare object segmentation and group selection with a single click. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: CVPR 2023. Project link: https://simpson-cvpr23.github.io

arXiv:2304.00221 [pdf, other]

Automatic High Resolution Wire Segmentation and Removal

Authors: Mang Tik Chiu, Xuaner Zhang, Zijun Wei, Yuqian Zhou, Eli Shechtman, Connelly Barnes, Zhe Lin, Florian Kainz, Sohrab Amirghodsi, Humphrey Shi

Abstract: Wires and powerlines are common visual distractions that often undermine the aesthetics of photographs. The manual process of precisely segmenting and removing them is extremely tedious and may take up hours, especially on high-resolution photos where wires may span the entire space. In this paper, we present an automatic wire clean-up system that eases the process of wire segmentation and removal… ▽ More Wires and powerlines are common visual distractions that often undermine the aesthetics of photographs. The manual process of precisely segmenting and removing them is extremely tedious and may take up hours, especially on high-resolution photos where wires may span the entire space. In this paper, we present an automatic wire clean-up system that eases the process of wire segmentation and removal/inpainting to within a few seconds. We observe several unique challenges: wires are thin, lengthy, and sparse. These are rare properties of subjects that common segmentation tasks cannot handle, especially in high-resolution images. We thus propose a two-stage method that leverages both global and local contexts to accurately segment wires in high-resolution images efficiently, and a tile-based inpainting strategy to remove the wires given our predicted segmentation masks. We also introduce the first wire segmentation benchmark dataset, WireSegHR. Finally, we demonstrate quantitatively and qualitatively that our wire clean-up system enables fully automated wire removal with great generalization to various wire appearances. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: https://github.com/adobe-research/auto-wire-removal

arXiv:2212.06310 [pdf, other]

Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators

Authors: Haitian Zheng, Zhe Lin, **gwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Yuqian Zhou, Sohrab Amirghodsi, Jiebo Luo

Abstract: Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole reg… ▽ More Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task. △ Less

Submitted 23 April, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: 18 pages, 16 figures

arXiv:2208.03552 [pdf, other]

Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation

Authors: Lingzhi Zhang, Connelly Barnes, Kevin Wampler, Sohrab Amirghodsi, Eli Shechtman, Zhe Lin, Jianbo Shi

Abstract: Recently, deep models have established SOTA performance for low-resolution image inpainting, but they lack fidelity at resolutions associated with modern cameras such as 4K or more, and for large holes. We contribute an inpainting benchmark dataset of photos at 4K and above representative of modern sensors. We demonstrate a novel framework that combines deep learning and traditional methods. We us… ▽ More Recently, deep models have established SOTA performance for low-resolution image inpainting, but they lack fidelity at resolutions associated with modern cameras such as 4K or more, and for large holes. We contribute an inpainting benchmark dataset of photos at 4K and above representative of modern sensors. We demonstrate a novel framework that combines deep learning and traditional methods. We use an existing deep inpainting model LaMa to fill the hole plausibly, establish three guide images consisting of structure, segmentation, depth, and apply a multiply-guided PatchMatch to produce eight candidate upsampled inpainted images. Next, we feed all candidate inpaintings through a novel curation module that chooses a good inpainting by column summation on an 8x8 antisymmetric pairwise preference matrix. Our framework's results are overwhelmingly preferred by users over 8 strong baselines, with improvements of quantitative metrics up to 7.4 over the best baseline LaMa, and our technique when paired with 4 different SOTA inpainting backbones improves each such that ours is overwhelmingly preferred by users over a strong super-res baseline. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: 34 pages, 15 figures, ECCV 2022

arXiv:2208.03357 [pdf, other]

Perceptual Artifacts Localization for Inpainting

Authors: Lingzhi Zhang, Yuqian Zhou, Connelly Barnes, Sohrab Amirghodsi, Zhe Lin, Eli Shechtman, Jianbo Shi

Abstract: Image inpainting is an essential task for multiple practical applications like object removal and image editing. Deep GAN-based models greatly improve the inpainting performance in structures and textures within the hole, but might also generate unexpected artifacts like broken structures or color blobs. Users perceive these artifacts to judge the effectiveness of inpainting models, and retouch th… ▽ More Image inpainting is an essential task for multiple practical applications like object removal and image editing. Deep GAN-based models greatly improve the inpainting performance in structures and textures within the hole, but might also generate unexpected artifacts like broken structures or color blobs. Users perceive these artifacts to judge the effectiveness of inpainting models, and retouch these imperfect areas to inpaint again in a typical retouching workflow. Inspired by this workflow, we propose a new learning task of automatic segmentation of inpainting perceptual artifacts, and apply the model for inpainting model evaluation and iterative refinement. Specifically, we first construct a new inpainting artifacts dataset by manually annotating perceptual artifacts in the results of state-of-the-art inpainting models. Then we train advanced segmentation networks on this dataset to reliably localize inpainting artifacts within inpainted images. Second, we propose a new interpretable evaluation metric called Perceptual Artifact Ratio (PAR), which is the ratio of objectionable inpainted regions to the entire inpainted area. PAR demonstrates a strong correlation with real user preference. Finally, we further apply the generated masks for iterative image inpainting by combining our approach with multiple recent inpainting methods. Extensive experiments demonstrate the consistent decrease of artifact regions and inpainting quality improvement across the different methods. △ Less

Submitted 5 August, 2022; originally announced August 2022.

arXiv:2208.00776 [pdf, other]

Deep 360$^\circ$ Optical Flow Estimation Based on Multi-Projection Fusion

Authors: Yiheng Li, Connelly Barnes, Kun Huang, Fang-Lue Zhang

Abstract: Optical flow computation is essential in the early stages of the video processing pipeline. This paper focuses on a less explored problem in this area, the 360$^\circ$ optical flow estimation using deep neural networks to support increasingly popular VR applications. To address the distortions of panoramic representations when applying convolutional neural networks, we propose a novel multi-projec… ▽ More Optical flow computation is essential in the early stages of the video processing pipeline. This paper focuses on a less explored problem in this area, the 360$^\circ$ optical flow estimation using deep neural networks to support increasingly popular VR applications. To address the distortions of panoramic representations when applying convolutional neural networks, we propose a novel multi-projection fusion framework that fuses the optical flow predicted by the models trained using different projection methods. It learns to combine the complementary information in the optical flow results under different projections. We also build the first large-scale panoramic optical flow dataset to support the training of neural networks and the evaluation of panoramic optical flow estimation methods. The experimental results on our dataset demonstrate that our method outperforms the existing methods and other alternative deep networks that were developed for processing 360° content. △ Less

Submitted 27 July, 2022; originally announced August 2022.

Comments: Accepted to ECCV2022

arXiv:2206.04271 [pdf, other]

DeepVerge: Classification of Roadside Verge Biodiversity and Conservation Potential

Authors: Andrew Perrett, Charlie Barnes, Mark Schofield, Lan Qie, Petra Bosilj, James M. Brown

Abstract: Open space grassland is being increasingly farmed or built upon, leading to a ram** up of conservation efforts targeting roadside verges. Approximately half of all UK grassland species can be found along the country's 500,000 km of roads, with some 91 species either threatened or near threatened. Careful management of these "wildlife corridors" is therefore essential to preventing species extinc… ▽ More Open space grassland is being increasingly farmed or built upon, leading to a ram** up of conservation efforts targeting roadside verges. Approximately half of all UK grassland species can be found along the country's 500,000 km of roads, with some 91 species either threatened or near threatened. Careful management of these "wildlife corridors" is therefore essential to preventing species extinction and maintaining biodiversity in grassland habitats. Wildlife trusts have often enlisted the support of volunteers to survey roadside verges and identify new "Local Wildlife Sites" as areas of high conservation potential. Using volunteer survey data from 3,900 km of roadside verges alongside publicly available street-view imagery, we present DeepVerge; a deep learning-based method that can automatically survey sections of roadside verges by detecting the presence of positive indicator species. Using images and ground truth survey data from the rural county of Lincolnshire, DeepVerge achieved a mean accuracy of 88%. Such a method may be used by local authorities to identify new local wildlife sites, and aid management and environmental planning in line with legal and government policy obligations, saving thousands of hours of manual labour. △ Less

Submitted 9 June, 2022; originally announced June 2022.

ACM Class: I.4

arXiv:2203.11947 [pdf, other]

CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware Training

Authors: Haitian Zheng, Zhe Lin, **gwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Ning Xu, Sohrab Amirghodsi, Jiebo Luo

Abstract: Recent image inpainting methods have made great progress but often struggle to generate plausible image structures when dealing with large holes in complex images. This is partially due to the lack of effective network structures that can capture both the long-range dependency and high-level semantics of an image. We propose cascaded modulation GAN (CM-GAN), a new network design consisting of an e… ▽ More Recent image inpainting methods have made great progress but often struggle to generate plausible image structures when dealing with large holes in complex images. This is partially due to the lack of effective network structures that can capture both the long-range dependency and high-level semantics of an image. We propose cascaded modulation GAN (CM-GAN), a new network design consisting of an encoder with Fourier convolution blocks that extract multi-scale feature representations from the input image with holes and a dual-stream decoder with a novel cascaded global-spatial modulation block at each scale level. In each decoder block, global modulation is first applied to perform coarse and semantic-aware structure synthesis, followed by spatial modulation to further adjust the feature map in a spatially adaptive fashion. In addition, we design an object-aware training scheme to prevent the network from hallucinating new objects inside holes, fulfilling the needs of object removal tasks in real-world scenarios. Extensive experiments are conducted to show that our method significantly outperforms existing methods in both quantitative and qualitative evaluation. Please refer to the project page: \url{https://github.com/htzheng/CM-GAN-Inpainting}. △ Less

Submitted 20 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 32 pages, 19 figures

arXiv:2201.08131 [pdf, other]

GeoFill: Reference-Based Image Inpainting with Better Geometric Understanding

Authors: Yunhan Zhao, Connelly Barnes, Yuqian Zhou, Eli Shechtman, Sohrab Amirghodsi, Charless Fowlkes

Abstract: Reference-guided image inpainting restores image pixels by leveraging the content from another single reference image. The primary challenge is how to precisely place the pixels from the reference image into the hole region. Therefore, understanding the 3D geometry that relates pixels between two views is a crucial step towards building a better model. Given the complexity of handling various type… ▽ More Reference-guided image inpainting restores image pixels by leveraging the content from another single reference image. The primary challenge is how to precisely place the pixels from the reference image into the hole region. Therefore, understanding the 3D geometry that relates pixels between two views is a crucial step towards building a better model. Given the complexity of handling various types of reference images, we focus on the scenario where the images are captured by freely moving the same camera around. Compared to the previous work, we propose a principled approach that does not make heuristic assumptions about the planarity of the scene. We leverage a monocular depth estimate and predict relative pose between cameras, then align the reference image to the target by a differentiable 3D reprojection and a joint optimization of relative pose and depth map scale and offset. Our approach achieves state-of-the-art performance on both RealEstate10K and MannequinChallenge dataset with large baselines, complex geometry and extreme camera motions. We experimentally verify our approach is also better at handling large holes. △ Less

Submitted 8 October, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted to WACV 2023

arXiv:2104.05647 [pdf, other]

Fruit Quality and Defect Image Classification with Conditional GAN Data Augmentation

Authors: Jordan J. Bird, Chloe M. Barnes, Luis J. Manso, Anikó Ekárt, Diego R. Faria

Abstract: Contemporary Artificial Intelligence technologies allow for the employment of Computer Vision to discern good crops from bad, providing a step in the pipeline of selecting healthy fruit from undesirable fruit, such as those which are mouldy or gangrenous. State-of-the-art works in the field report high accuracy results on small datasets (<1000 images), which are not representative of the populatio… ▽ More Contemporary Artificial Intelligence technologies allow for the employment of Computer Vision to discern good crops from bad, providing a step in the pipeline of selecting healthy fruit from undesirable fruit, such as those which are mouldy or gangrenous. State-of-the-art works in the field report high accuracy results on small datasets (<1000 images), which are not representative of the population regarding real-world usage. The goals of this study are to further enable real-world usage by improving generalisation with data augmentation as well as to reduce overfitting and energy usage through model pruning. In this work, we suggest a machine learning pipeline that combines the ideas of fine-tuning, transfer learning, and generative model-based training data augmentation towards improving fruit quality image classification. A linear network topology search is performed to tune a VGG16 lemon quality classification model using a publicly-available dataset of 2690 images. We find that appending a 4096 neuron fully connected layer to the convolutional layers leads to an image classification accuracy of 83.77%. We then train a Conditional Generative Adversarial Network on the training data for 2000 epochs, and it learns to generate relatively realistic images. Grad-CAM analysis of the model trained on real photographs shows that the synthetic images can exhibit classifiable characteristics such as shape, mould, and gangrene. A higher image classification accuracy of 88.75% is then attained by augmenting the training with synthetic images, arguing that Conditional Generative Adversarial Networks have the ability to produce new data to alleviate issues of data scarcity. Finally, model pruning is performed via polynomial decay, where we find that the Conditional GAN-augmented classification network can retain 81.16% classification accuracy when compressed to 50% of its original size. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: 16 pages, 12 figures, 3 tables

arXiv:2104.03960 [pdf, other]

Modulated Periodic Activations for Generalizable Local Functional Representations

Authors: Ishit Mehta, Michaël Gharbi, Connelly Barnes, Eli Shechtman, Ravi Ramamoorthi, Manmohan Chandraker

Abstract: Multi-Layer Perceptrons (MLPs) make powerful functional representations for sampling and reconstruction problems involving low-dimensional signals like images,shapes and light fields. Recent works have significantly improved their ability to represent high-frequency content by using periodic activations or positional encodings. This often came at the expense of generalization: modern methods are t… ▽ More Multi-Layer Perceptrons (MLPs) make powerful functional representations for sampling and reconstruction problems involving low-dimensional signals like images,shapes and light fields. Recent works have significantly improved their ability to represent high-frequency content by using periodic activations or positional encodings. This often came at the expense of generalization: modern methods are typically optimized for a single signal. We present a new representation that generalizes to multiple instances and achieves state-of-the-art fidelity. We use a dual-MLP architecture to encode the signals. A synthesis network creates a functional map** from a low-dimensional input (e.g. pixel-position) to the output domain (e.g. RGB color). A modulation network maps a latent code corresponding to the target signal to parameters that modulate the periodic activations of the synthesis network. We also propose a local-functional representation which enables generalization. The signal's domain is partitioned into a regular grid,with each tile represented by a latent code. At test time, the signal is encoded with high-fidelity by inferring (or directly optimizing) the latent code-book. Our approach produces generalizable functional representations of images, videos and shapes, and achieves higher reconstruction quality than prior works that are optimized for a single signal. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: Project Page at https://ishit.github.io/modsine/

arXiv:2103.15982 [pdf, other]

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

Authors: Yuqian Zhou, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi

Abstract: Image inpainting is the task of plausibly restoring missing pixels within a hole region that is to be removed from a target image. Most existing technologies exploit patch similarities within the image, or leverage large-scale training data to fill the hole using learned semantic and texture information. However, due to the ill-posed nature of the inpainting task, such methods struggle to complete… ▽ More Image inpainting is the task of plausibly restoring missing pixels within a hole region that is to be removed from a target image. Most existing technologies exploit patch similarities within the image, or leverage large-scale training data to fill the hole using learned semantic and texture information. However, due to the ill-posed nature of the inpainting task, such methods struggle to complete larger holes containing complicated scenes. In this paper, we propose TransFill, a multi-homography transformed fusion method to fill the hole by referring to another source image that shares scene contents with the target image. We first align the source image to the target image by estimating multiple homographies guided by different depth levels. We then learn to adjust the color and apply a pixel-level war** to each homography-warped source image to make it more consistent with the target. Finally, a pixel-level fusion module is learned to selectively merge the different proposals. Our method achieves state-of-the-art performance on pairs of images across a variety of wide baselines and color differences, and generalizes to user-provided image pairs. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: Accepted by CVPR2021

arXiv:2102.04533 [pdf, other]

Learning from Shader Program Traces

Authors: Yuting Yang, Connelly Barnes, Adam Finkelstein

Abstract: Deep learning for image processing typically treats input imagery as pixels in some color space. This paper proposes instead to learn from program traces of procedural fragment shaders -- programs that generate images. At each pixel, we collect the intermediate values computed at program execution, and these data form the input to the learned model. We investigate this learning task for a variety… ▽ More Deep learning for image processing typically treats input imagery as pixels in some color space. This paper proposes instead to learn from program traces of procedural fragment shaders -- programs that generate images. At each pixel, we collect the intermediate values computed at program execution, and these data form the input to the learned model. We investigate this learning task for a variety of applications: our model can learn to predict a low-noise output image from shader programs that exhibit sampling noise; this model can also learn from a simplified shader program that approximates the reference solution with less computation, as well as learn the output of postprocessing filters like defocus blur and edge-aware sharpening. Finally we show that the idea of learning from program traces can even be applied to non-imagery simulations of flocks of boids. Our experiments on a variety of shaders show quantitatively and qualitatively that models learned from program traces outperform baseline models learned from RGB color augmented with hand-picked shader-specific features like normals, depth, and diffuse and specular color. △ Less

Submitted 24 April, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

arXiv:2010.08437 [pdf, other]

doi 10.1109/ACCESS.2020.3012417.

Deep Learning based Automated Forest Health Diagnosis from Aerial Images

Authors: Chia-Yen Chiang, Chloe Barnes, Plamen Angelov, Richard Jiang

Abstract: Global climate change has had a drastic impact on our environment. Previous study showed that pest disaster occured from global climate change may cause a tremendous number of trees died and they inevitably became a factor of forest fire. An important portent of the forest fire is the condition of forests. Aerial image-based forest analysis can give an early detection of dead trees and living tree… ▽ More Global climate change has had a drastic impact on our environment. Previous study showed that pest disaster occured from global climate change may cause a tremendous number of trees died and they inevitably became a factor of forest fire. An important portent of the forest fire is the condition of forests. Aerial image-based forest analysis can give an early detection of dead trees and living trees. In this paper, we applied a synthetic method to enlarge imagery dataset and present a new framework for automated dead tree detection from aerial images using a re-trained Mask RCNN (Mask Region-based Convolutional Neural Network) approach, with a transfer learning scheme. We apply our framework to our aerial imagery datasets,and compare eight fine-tuned models. The mean average precision score (mAP) for the best of these models reaches 54%. Following the automated detection, we are able to automatically produce and calculate number of dead tree masks to label the dead trees in an image, as an indicator of forest health that could be linked to the causal analysis of environmental changes and the predictive likelihood of forest fire. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: 16 pages

ACM Class: I.4.6; I.4.9; I.2.6; J.2

Journal ref: IEEE Access, vol. 8, pp. 144064-144076, 2020

arXiv:2007.15068 [pdf, other]

Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild

Authors: Liqian Ma, Zhe Lin, Connelly Barnes, Alexei A. Efros, **gwan Lu

Abstract: Due to the ubiquity of smartphones, it is popular to take photos of one's self, or "selfies." Such photos are convenient to take, because they do not require specialized equipment or a third-party photographer. However, in selfies, constraints such as human arm length often make the body pose look unnatural. To address this issue, we introduce $\textit{unselfie}$, a novel photographic transformati… ▽ More Due to the ubiquity of smartphones, it is popular to take photos of one's self, or "selfies." Such photos are convenient to take, because they do not require specialized equipment or a third-party photographer. However, in selfies, constraints such as human arm length often make the body pose look unnatural. To address this issue, we introduce $\textit{unselfie}$, a novel photographic transformation that automatically translates a selfie into a neutral-pose portrait. To achieve this, we first collect an unpaired dataset, and introduce a way to synthesize paired training data for self-supervised learning. Then, to $\textit{unselfie}$ a photo, we propose a new three-stage pipeline, where we first find a target neutral pose, inpaint the body texture, and finally refine and composite the person on the background. To obtain a suitable target neutral pose, we propose a novel nearest pose search module that makes the reposing task easier and enables the generation of multiple neutral-pose results among which users can choose the best one they like. Qualitative and quantitative evaluations show the superiority of our pipeline over alternatives. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: To appear in ECCV 2020

arXiv:2005.08891 [pdf, other]

Generative Tweening: Long-term Inbetweening of 3D Human Motions

Authors: Yi Zhou, **gwan Lu, Connelly Barnes, Jimei Yang, Sitao Xiang, Hao li

Abstract: The ability to generate complex and realistic human body animations at scale, while following specific artistic constraints, has been a fundamental goal for the game and animation industry for decades. Popular techniques include key-framing, physics-based simulation, and database methods via motion graphs. Recently, motion generators based on deep learning have been introduced. Although these lear… ▽ More The ability to generate complex and realistic human body animations at scale, while following specific artistic constraints, has been a fundamental goal for the game and animation industry for decades. Popular techniques include key-framing, physics-based simulation, and database methods via motion graphs. Recently, motion generators based on deep learning have been introduced. Although these learning models can automatically generate highly intricate stylized motions of arbitrary length, they still lack user control. To this end, we introduce the problem of long-term inbetweening, which involves automatically synthesizing complex motions over a long time interval given very sparse keyframes by users. We identify a number of challenges related to this problem, including maintaining biomechanical and keyframe constraints, preserving natural motions, and designing the entire motion sequence holistically while considering all constraints. We introduce a biomechanically constrained generative adversarial network that performs long-term inbetweening of human motions, conditioned on keyframe constraints. This network uses a novel two-stage approach where it first predicts local motion in the form of joint angles, and then predicts global motion, i.e. the global path that the character follows. Since there are typically a number of possible motions that could satisfy the given user constraints, we also enable our network to generate a variety of outputs with a scheme that we call Motion DNA. This approach allows the user to manipulate and influence the output content by feeding seed motions (DNA) to the network. Trained with 79 classes of captured motion data, our network performs robustly on a variety of highly complex motion styles. △ Less

Submitted 28 May, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

arXiv:2004.14071 [pdf, other]

doi 10.1111/cgf.14027

Image Morphing with Perceptual Constraints and STN Alignment

Authors: Noa Fish, Richard Zhang, Lilach Perry, Daniel Cohen-Or, Eli Shechtman, Connelly Barnes

Abstract: In image morphing, a sequence of plausible frames are synthesized and composited together to form a smooth transformation between given instances. Intermediates must remain faithful to the input, stand on their own as members of the set, and maintain a well-paced visual transition from one to the next. In this paper, we propose a conditional GAN morphing framework operating on a pair of input imag… ▽ More In image morphing, a sequence of plausible frames are synthesized and composited together to form a smooth transformation between given instances. Intermediates must remain faithful to the input, stand on their own as members of the set, and maintain a well-paced visual transition from one to the next. In this paper, we propose a conditional GAN morphing framework operating on a pair of input images. The network is trained to synthesize frames corresponding to temporal samples along the transformation, and learns a proper shape prior that enhances the plausibility of intermediate frames. While individual frame plausibility is boosted by the adversarial setup, a special training protocol producing sequences of frames, combined with a perceptual similarity loss, promote smooth transformation over time. Explicit stating of correspondences is replaced with a grid-based freeform deformation spatial transformer that predicts the geometric warp between the inputs, instituting the smooth geometric effect by bringing the shapes into an initial alignment. We provide comparisons to classic as well as latent space morphing techniques, and demonstrate that, given a set of images for self-supervision, our network learns to generate visually pleasing morphing effects featuring believable in-betweens, with robustness to changes in shape and texture, requiring no correspondence annotation. △ Less

Submitted 29 April, 2020; originally announced April 2020.

ACM Class: I.3.3

arXiv:1912.03457 [pdf, other]

Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities

Authors: Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

Abstract: In this paper, we examine and analyze the challenges associated with develo** and introducing language technologies to low-resource language communities. While doing so, we bring to light the successes and failures of past work in this area, challenges being faced in doing so, and what they have achieved. Throughout this paper, we take a problem-facing approach and describe essential factors whi… ▽ More In this paper, we examine and analyze the challenges associated with develo** and introducing language technologies to low-resource language communities. While doing so, we bring to light the successes and failures of past work in this area, challenges being faced in doing so, and what they have achieved. Throughout this paper, we take a problem-facing approach and describe essential factors which the success of such technologies hinges upon. We present the various aspects in a manner which clarify and lay out the different tasks involved, which can aid organizations looking to make an impact in this area. We take the example of Gondi, an extremely-low resource Indian language, to reinforce and complement our discussion. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Comments: Accepted at ICON 2019; 9 pages

arXiv:1901.05945 [pdf, other]

Foreground-aware Image Inpainting

Authors: Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo

Abstract: Existing image inpainting methods typically fill holes by borrowing information from surrounding pixels. They often produce unsatisfactory results when the holes overlap with or touch foreground objects due to lack of information about the actual extent of foreground and background regions within the holes. These scenarios, however, are very important in practice, especially for applications such… ▽ More Existing image inpainting methods typically fill holes by borrowing information from surrounding pixels. They often produce unsatisfactory results when the holes overlap with or touch foreground objects due to lack of information about the actual extent of foreground and background regions within the holes. These scenarios, however, are very important in practice, especially for applications such as the removal of distracting objects. To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Specifically, our model learns to predict the foreground contour first, and then inpaints the missing region using the predicted contour as guidance. We show that by such disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting. Experiments show that our method significantly outperforms existing methods and achieves superior inpainting results on challenging cases with complex compositions. △ Less

Submitted 22 April, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

Comments: Camera Ready version of CVPR 2019 with supplementary materials

arXiv:1901.03447 [pdf, other]

Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture

Authors: Ning Yu, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Michal Lukac

Abstract: This paper addresses the problem of interpolating visual textures. We formulate this problem by requiring (1) by-example controllability and (2) realistic and smooth interpolation among an arbitrary number of texture samples. To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where th… ▽ More This paper addresses the problem of interpolating visual textures. We formulate this problem by requiring (1) by-example controllability and (2) realistic and smooth interpolation among an arbitrary number of texture samples. To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where they can be linearly interpolated and projected back onto the image domain, thus ensuring both intuitive control and realistic results. We show our method outperforms a number of baselines according to a comprehensive suite of metrics as well as a user study. We further show several applications based on our technique, which include texture brush, texture dissolve, and animal hybridization. △ Less

Submitted 16 April, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

Comments: Accepted to CVPR'19

arXiv:1812.07035 [pdf, other]

On the Continuity of Rotation Representations in Neural Networks

Authors: Yi Zhou, Connelly Barnes, **gwan Lu, Jimei Yang, Hao Li

Abstract: In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate… ▽ More In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses. △ Less

Submitted 8 June, 2020; v1 submitted 17 December, 2018; originally announced December 2018.

arXiv:1808.07269 [pdf, other]

doi 10.1103/PhysRevD.99.092001

A Deep Neural Network for Pixel-Level Electromagnetic Particle Identification in the MicroBooNE Liquid Argon Time Projection Chamber

Authors: MicroBooNE collaboration, C. Adams, M. Alrashed, R. An, J. Anthony, J. Asaadi, A. Ashkenazi, M. Auger, S. Balasubramanian, B. Baller, C. Barnes, G. Barr, M. Bass, F. Bay, A. Bhat, K. Bhattacharya, M. Bishai, A. Blake, T. Bolton, L. Camilleri, D. Caratelli, I. Caro Terrazas, R. Carr, R. Castillo Fernandez, F. Cavanna , et al. (148 additional authors not shown)

Abstract: We have developed a convolutional neural network (CNN) that can make a pixel-level prediction of objects in image data recorded by a liquid argon time projection chamber (LArTPC) for the first time. We describe the network design, training techniques, and software tools developed to train this network. The goal of this work is to develop a complete deep neural network based data reconstruction cha… ▽ More We have developed a convolutional neural network (CNN) that can make a pixel-level prediction of objects in image data recorded by a liquid argon time projection chamber (LArTPC) for the first time. We describe the network design, training techniques, and software tools developed to train this network. The goal of this work is to develop a complete deep neural network based data reconstruction chain for the MicroBooNE detector. We show the first demonstration of a network's validity on real LArTPC data using MicroBooNE collection plane images. The demonstration is performed for stop** muon and a $ν_μ$ charged current neutral pion data samples. △ Less

Submitted 22 August, 2018; originally announced August 2018.

Journal ref: Phys. Rev. D 99, 092001 (2019)

arXiv:1706.01208 [pdf, other]

Approximate Program Smoothing Using Mean-Variance Statistics, with Application to Procedural Shader Bandlimiting

Authors: Yuting Yang, Connelly Barnes

Abstract: This paper introduces a general method to approximate the convolution of an arbitrary program with a Gaussian kernel. This process has the effect of smoothing out a program. Our compiler framework models intermediate values in the program as random variables, by using mean and variance statistics. Our approach breaks the input program into parts and relates the statistics of the different parts, u… ▽ More This paper introduces a general method to approximate the convolution of an arbitrary program with a Gaussian kernel. This process has the effect of smoothing out a program. Our compiler framework models intermediate values in the program as random variables, by using mean and variance statistics. Our approach breaks the input program into parts and relates the statistics of the different parts, under the smoothing process. We give several approximations that can be used for the different parts of the program. These include the approximation of Dorn et al., a novel adaptive Gaussian approximation, Monte Carlo sampling, and compactly supported kernels. Our adaptive Gaussian approximation is accurate up to the second order in the standard deviation of the smoothing kernel, and mathematically smooth. We show how to construct a compiler that applies chosen approximations to given parts of the input program. Because each expression can have multiple approximation choices, we use a genetic search to automatically select the best approximations. We apply this framework to the problem of automatically bandlimiting procedural shader programs. We evaluate our method on a variety of complex shaders, including shaders with parallax map**, animation, and spatially varying statistics. The resulting smoothed shader programs outperform previous approaches both numerically, and aesthetically, due to the smoothing properties of our approximations. △ Less

Submitted 5 June, 2017; originally announced June 2017.

Comments: 13 pages, 6 figures

ACM Class: D.3.4; I.3.7

arXiv:1706.01021 [pdf, other]

Where and Who? Automatic Semantic-Aware Person Composition

Authors: Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes

Abstract: Image compositing is a method used to generate realistic yet fake imagery by inserting contents from one image to another. Previous work in compositing has focused on improving appearance compatibility of a user selected foreground segment and a background image (i.e. color and illumination consistency). In this work, we instead develop a fully automated compositing model that additionally learns… ▽ More Image compositing is a method used to generate realistic yet fake imagery by inserting contents from one image to another. Previous work in compositing has focused on improving appearance compatibility of a user selected foreground segment and a background image (i.e. color and illumination consistency). In this work, we instead develop a fully automated compositing model that additionally learns to select and transform compatible foreground segments from a large collection given only an input image background. To simplify the task, we restrict our problem by focusing on human instance composition, because human segments exhibit strong correlations with their background and because of the availability of large annotated data. We develop a novel branching Convolutional Neural Network (CNN) that jointly predicts candidate person locations given a background image. We then use pre-trained deep feature representations to retrieve person instances from a large segment database. Experimental results show that our model can generate composite images that look visually convincing. We also develop a user interface to demonstrate the potential application of our method. △ Less

Submitted 2 December, 2017; v1 submitted 3 June, 2017; originally announced June 2017.

Comments: 10 pages, 9 figures

arXiv:1701.08893 [pdf, other]

Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses

Authors: Eric Risser, Pierre Wilmot, Connelly Barnes

Abstract: Recently, methods have been proposed that perform texture synthesis and style transfer by using convolutional neural networks (e.g. Gatys et al. [2015,2016]). These methods are exciting because they can in some cases create results with state-of-the-art quality. However, in this paper, we show these methods also have limitations in texture quality, stability, requisite parameter tuning, and lack o… ▽ More Recently, methods have been proposed that perform texture synthesis and style transfer by using convolutional neural networks (e.g. Gatys et al. [2015,2016]). These methods are exciting because they can in some cases create results with state-of-the-art quality. However, in this paper, we show these methods also have limitations in texture quality, stability, requisite parameter tuning, and lack of user controls. This paper presents a multiscale synthesis pipeline based on convolutional neural networks that ameliorates these issues. We first give a mathematical explanation of the source of instabilities in many previous approaches. We then improve these instabilities by using histogram losses to synthesize textures that better statistically match the exemplar. We also show how to integrate localized style losses in our multiscale framework. These losses can improve the quality of large features, improve the separation of content and style, and offer artistic controls such as paint by numbers. We demonstrate that our approach offers improved quality, convergence in fewer iterations, and more stability over the optimization. △ Less

Submitted 1 February, 2017; v1 submitted 30 January, 2017; originally announced January 2017.

arXiv:1612.01635 [pdf, other]

Learning to Detect Multiple Photographic Defects

Authors: Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes

Abstract: In this paper, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try vari… ▽ More In this paper, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try various correction methods. Defect detection could also help users select photos of higher quality while filtering out those with severe defects in photo curation and summarization. To investigate this problem, we collected a large-scale dataset of user annotations on seven common photographic defects, which allows us to evaluate algorithms by measuring their consistency with human judgments. Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects. Unlike some existing single-defect estimation methods that rely on low-level statistics and may fail in many cases on natural photographs, our model is able to understand image contents and quality at a higher level. As a result, in our experiments, we show that our model has predictions with much higher consistency with human judgments than low-level methods as well as several baseline CNN models. Our model also performs better than an average human from our user study. △ Less

Submitted 8 March, 2018; v1 submitted 5 December, 2016; originally announced December 2016.

Comments: Accepted to WACV'18

Showing 1–29 of 29 results for author: Barnes, C