Search | arXiv e-print repository

Conditional Synthetic Food Image Generation

Authors: Wen** Fu, Yue Han, Jiangpeng He, Sriram Baireddy, Mridul Gupta, Fengqing Zhu

Abstract: Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability. In this work, we explore the StyleGAN and its application of synthetic food image generation. Despite the impressive performance of GAN for natural image generation, food images suffer from high intra-class diversity and inter-class similarity, resulting… ▽ More Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability. In this work, we explore the StyleGAN and its application of synthetic food image generation. Despite the impressive performance of GAN for natural image generation, food images suffer from high intra-class diversity and inter-class similarity, resulting in overfitting and visual artifacts for synthetic images. Therefore, we aim to explore the capability and improve the performance of GAN methods for food image generation. Specifically, we first choose StyleGAN3 as the baseline method to generate synthetic food images and analyze the performance. Then, we identify two issues that can cause performance degradation on food images during the training phase: (1) inter-class feature entanglement during multi-food classes training and (2) loss of high-resolution detail during image downsampling. To address both issues, we propose to train one food category at a time to avoid feature entanglement and leverage image patches cropped from high-resolution datasets to retain fine details. We evaluate our method on the Food-101 dataset and show improved quality of generated synthetic food images compared with the baseline. Finally, we demonstrate the great potential of improving the performance of downstream tasks, such as food image classification by including high-quality synthetic training samples in the data augmentation. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2205.03947 [pdf, other]

High-Resolution UAV Image Generation for Sorghum Panicle Detection

Authors: Enyu Cai, Zhankun Luo, Sriram Baireddy, Jiaqi Guo, Changye Yang, Edward J. Delp

Abstract: The number of panicles (or heads) of Sorghum plants is an important phenotypic trait for plant development and grain yield estimation. The use of Unmanned Aerial Vehicles (UAVs) enables the capability of collecting and analyzing Sorghum images on a large scale. Deep learning can provide methods for estimating phenotypic traits from UAV images but requires a large amount of labeled data. The lack o… ▽ More The number of panicles (or heads) of Sorghum plants is an important phenotypic trait for plant development and grain yield estimation. The use of Unmanned Aerial Vehicles (UAVs) enables the capability of collecting and analyzing Sorghum images on a large scale. Deep learning can provide methods for estimating phenotypic traits from UAV images but requires a large amount of labeled data. The lack of training data due to the labor-intensive ground truthing of UAV images causes a major bottleneck in develo** methods for Sorghum panicle detection and counting. In this paper, we present an approach that uses synthetic training images from generative adversarial networks (GANs) for data augmentation to enhance the performance of Sorghum panicle detection and counting. Our method can generate synthetic high-resolution UAV RGB images with panicle labels by using image-to-image translation GANs with a limited ground truth dataset of real UAV RGB images. The results show the improvements in panicle detection and counting using our data augmentation approach. △ Less

Submitted 8 May, 2022; originally announced May 2022.

arXiv:2205.00952 [pdf, other]

Leaf Tar Spot Detection Using RGB Images

Authors: Sriram Baireddy, Da-Young Lee, Carlos Gongora-Canul, Christian D. Cruz, Edward J. Delp

Abstract: Tar spot disease is a fungal disease that appears as a series of black circular spots containing spores on corn leaves. Tar spot has proven to be an impactful disease in terms of reducing crop yield. To quantify disease progression, experts usually have to visually phenotype leaves from the plant. This process is very time-consuming and is difficult to incorporate in any high-throughput phenotypin… ▽ More Tar spot disease is a fungal disease that appears as a series of black circular spots containing spores on corn leaves. Tar spot has proven to be an impactful disease in terms of reducing crop yield. To quantify disease progression, experts usually have to visually phenotype leaves from the plant. This process is very time-consuming and is difficult to incorporate in any high-throughput phenoty** system. Deep neural networks could provide quick, automated tar spot detection with sufficient ground truth. However, manually labeling tar spots in images to serve as ground truth is also tedious and time-consuming. In this paper we first describe an approach that uses automated image analysis tools to generate ground truth images that are then used for training a Mask R-CNN. We show that a Mask R-CNN can be used effectively to detect tar spots in close-up images of leaf surfaces. We additionally show that the Mask R-CNN can also be used for in-field images of whole leaves to capture the number of tar spots and area of the leaf infected by the disease. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.12067 [pdf, other]

An Overview of Recent Work in Media Forensics: Methods and Threats

Authors: Kratika Bhagtani, Amit Kumar Singh Yadav, Emily R. Bartusiak, Ziyue Xiang, Ruiting Shao, Sriram Baireddy, Edward J. Delp

Abstract: In this paper, we review recent work in media forensics for digital images, video, audio (specifically speech), and documents. For each data modality, we discuss synthesis and manipulation techniques that can be used to create and modify digital media. We then review technological advancements for detecting and quantifying such manipulations. Finally, we consider open issues and suggest directions… ▽ More In this paper, we review recent work in media forensics for digital images, video, audio (specifically speech), and documents. For each data modality, we discuss synthesis and manipulation techniques that can be used to create and modify digital media. We then review technological advancements for detecting and quantifying such manipulations. Finally, we consider open issues and suggest directions for future research. △ Less

Submitted 12 May, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: This is a longer version of a paper accepted to the 2022 IEEE International Conference on Multimedia Information Processing and Retrieval entitled "An Overview of Recent Work in Multimedia Forensics"

arXiv:2109.03961 [pdf, other]

Improving Building Segmentation for Off-Nadir Satellite Imagery

Authors: Hanxiang Hao, Sriram Baireddy, Kevin LaTourette, Latisha Konz, Moses Chan, Mary L. Comer, Edward J. Delp

Abstract: Automatic building segmentation is an important task for satellite imagery analysis and scene understanding. Most existing segmentation methods focus on the case where the images are taken from directly overhead (i.e., low off-nadir/viewing angle). These methods often fail to provide accurate results on satellite images with larger off-nadir angles due to the higher noise level and lower spatial r… ▽ More Automatic building segmentation is an important task for satellite imagery analysis and scene understanding. Most existing segmentation methods focus on the case where the images are taken from directly overhead (i.e., low off-nadir/viewing angle). These methods often fail to provide accurate results on satellite images with larger off-nadir angles due to the higher noise level and lower spatial resolution. In this paper, we propose a method that is able to provide accurate building segmentation for satellite imagery captured from a large range of off-nadir angles. Based on Bayesian deep learning, we explicitly design our method to learn the data noise via aleatoric and epistemic uncertainty modeling. Satellite image metadata (e.g., off-nadir angle and ground sample distance) is also used in our model to further improve the result. We show that with uncertainty modeling and metadata injection, our method achieves better performance than the baseline method, especially for noisy images taken from large off-nadir angles. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: This is an extended version of our ACM SIGSPATIAL'21 conference paper

arXiv:2109.00632 [pdf, other]

Field-Based Plot Extraction Using UAV RGB Images

Authors: Changye Yang, Sriram Baireddy, Enyu Cai, Melba Crawford, Edward J. Delp

Abstract: Unmanned Aerial Vehicles (UAVs) have become popular for use in plant phenoty** of field based crops, such as maize and sorghum, due to their ability to acquire high resolution data over field trials. Field experiments, which may comprise thousands of plants, are planted according to experimental designs to evaluate varieties or management practices. For many types of phenoty** analysis, we exa… ▽ More Unmanned Aerial Vehicles (UAVs) have become popular for use in plant phenoty** of field based crops, such as maize and sorghum, due to their ability to acquire high resolution data over field trials. Field experiments, which may comprise thousands of plants, are planted according to experimental designs to evaluate varieties or management practices. For many types of phenoty** analysis, we examine smaller groups of plants known as "plots." In this paper, we propose a new plot extraction method that will segment a UAV image into plots. We will demonstrate that our method achieves higher plot extraction accuracy than existing approaches. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2105.12926 [pdf, other]

Image-Based Plant Wilting Estimation

Authors: Changye Yang, Sriram Baireddy, Enyu Cai, Valerian Meline, Denise Caldwell, Anjali S. Iyer-Pascuzzi, Edward J. Delp

Abstract: Many plants become limp or droop through heat, loss of water, or disease. This is also known as wilting. In this paper, we examine plant wilting caused by bacterial infection. In particular, we want to design a metric for wilting based on images acquired of the plant. A quantifiable wilting metric will be useful in studying bacterial wilt and identifying resistance genes. Since there is no standar… ▽ More Many plants become limp or droop through heat, loss of water, or disease. This is also known as wilting. In this paper, we examine plant wilting caused by bacterial infection. In particular, we want to design a metric for wilting based on images acquired of the plant. A quantifiable wilting metric will be useful in studying bacterial wilt and identifying resistance genes. Since there is no standard way to estimate wilting, it is common to use ad hoc visual scores. This is very subjective and requires expert knowledge of the plants and the disease mechanism. Our solution consists of using various wilting metrics acquired from RGB images of the plants. We also designed several experiments to demonstrate that our metrics are effective at estimating wilting in plants. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.06361 [pdf, other]

doi 10.1109/CVPRW53098.2021.00115

Forensic Analysis of Video Files Using Metadata

Authors: Ziyue Xiang, János Horváth, Sriram Baireddy, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: The unprecedented ease and ability to manipulate video content has led to a rapid spread of manipulated media. The availability of video editing tools greatly increased in recent years, allowing one to easily generate photo-realistic alterations. Such manipulations can leave traces in the metadata embedded in video files. This metadata information can be used to determine video manipulations, bran… ▽ More The unprecedented ease and ability to manipulate video content has led to a rapid spread of manipulated media. The availability of video editing tools greatly increased in recent years, allowing one to easily generate photo-realistic alterations. Such manipulations can leave traces in the metadata embedded in video files. This metadata information can be used to determine video manipulations, brand of video recording device, the type of video editing tool, and other important evidence. In this paper, we focus on the metadata contained in the popular MP4 video wrapper/container. We describe our method for metadata extractor that uses the MP4's tree structure. Our approach for analyzing the video metadata produces a more compact representation. We will describe how we construct features from the metadata and then use dimensionality reduction and nearest neighbor classification for forensic analysis of a video file. Our approach allows one to visually inspect the distribution of metadata features and make decisions. The experimental results confirm that the performance of our approach surpasses other methods. △ Less

Submitted 22 January, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: v2: fixed a typo in Section 3.4; added page number; added IEEE copyright notice

arXiv:2005.06402 [pdf, other]

FaR-GAN for One-Shot Face Reenactment

Authors: Hanxiang Hao, Sriram Baireddy, Amy R. Reibman, Edward J. Delp

Abstract: Animating a static face image with target facial expressions and movements is important in the area of image editing and movie production. This face reenactment process is challenging due to the complex geometry and movement of human faces. Previous work usually requires a large set of images from the same person to model the appearance. In this paper, we present a one-shot face reenactment model,… ▽ More Animating a static face image with target facial expressions and movements is important in the area of image editing and movie production. This face reenactment process is challenging due to the complex geometry and movement of human faces. Previous work usually requires a large set of images from the same person to model the appearance. In this paper, we present a one-shot face reenactment model, FaR-GAN, that takes only one face image of any given source identity and a target expression as input, and then produces a face image of the same source identity but with the target expression. The proposed method makes no assumptions about the source identity, facial expression, head pose, or even image background. We evaluate our method on the VoxCeleb1 dataset and show that our method is able to generate a higher quality face image than the compared methods. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: This paper has been accepted to the AI for content creation workshop at CVPR 2020

arXiv:2004.13973 [pdf, other]

Deep Transfer Learning For Plant Center Localization

Authors: Enyu Cai, Sriram Baireddy, Changye Yang, Melba Crawford, Edward J. Delp

Abstract: Plant phenoty** focuses on the measurement of plant characteristics throughout the growing season, typically with the goal of evaluating genotypes for plant breeding. Estimating plant location is important for identifying genotypes which have low emergence, which is also related to the environment and management practices such as fertilizer applications. The goal of this paper is to investigate… ▽ More Plant phenoty** focuses on the measurement of plant characteristics throughout the growing season, typically with the goal of evaluating genotypes for plant breeding. Estimating plant location is important for identifying genotypes which have low emergence, which is also related to the environment and management practices such as fertilizer applications. The goal of this paper is to investigate methods that estimate plant locations for a field-based crop using RGB aerial images captured using Unmanned Aerial Vehicles (UAVs). Deep learning approaches provide promising capability for locating plants observed in RGB images, but they require large quantities of labeled data (ground truth) for training. Using a deep learning architecture fine-tuned on a single field or a single type of crop on fields in other geographic areas or with other crops may not have good results. The problem of generating ground truth for each new field is labor-intensive and tedious. In this paper, we propose a method for estimating plant centers by transferring an existing model to a new scenario using limited ground truth data. We describe the use of transfer learning using a model fine-tuned for a single field or a single type of plant on a varied set of similar crops and fields. We show that transfer learning provides promising results for detecting plant locations. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:2004.12027 [pdf, other]

Deepfakes Detection with Automatic Face Weighting

Authors: Daniel Mas Montserrat, Hanxiang Hao, S. K. Yarlagadda, Sriram Baireddy, Ruiting Shao, János Horváth, Emily Bartusiak, Justin Yang, David Güera, Fengqing Zhu, Edward J. Delp

Abstract: Altered and manipulated multimedia is increasingly present and widely distributed via social media platforms. Advanced video manipulation tools enable the generation of highly realistic-looking altered multimedia. While many methods have been presented to detect manipulations, most of them fail when evaluated with data outside of the datasets used in research environments. In order to address this… ▽ More Altered and manipulated multimedia is increasingly present and widely distributed via social media platforms. Advanced video manipulation tools enable the generation of highly realistic-looking altered multimedia. While many methods have been presented to detect manipulations, most of them fail when evaluated with data outside of the datasets used in research environments. In order to address this problem, the Deepfake Detection Challenge (DFDC) provides a large dataset of videos containing realistic manipulations and an evaluation system that ensures that methods work quickly and accurately, even when faced with challenging data. In this paper, we introduce a method based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs) that extracts visual and temporal features from faces present in videos to accurately detect manipulations. The method is evaluated with the DFDC dataset, providing competitive results compared to other techniques. △ Less

Submitted 4 May, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

arXiv:2004.06643 [pdf, other]

An Attention-Based System for Damage Assessment Using Satellite Imagery

Authors: Hanxiang Hao, Sriram Baireddy, Emily R. Bartusiak, Latisha Konz, Kevin LaTourette, Michael Gribbons, Moses Chan, Mary L. Comer, Edward J. Delp

Abstract: When disaster strikes, accurate situational information and a fast, effective response are critical to save lives. Widely available, high resolution satellite images enable emergency responders to estimate locations, causes, and severity of damage. Quickly and accurately analyzing the extensive amount of satellite imagery available, though, requires an automatic approach. In this paper, we present… ▽ More When disaster strikes, accurate situational information and a fast, effective response are critical to save lives. Widely available, high resolution satellite images enable emergency responders to estimate locations, causes, and severity of damage. Quickly and accurately analyzing the extensive amount of satellite imagery available, though, requires an automatic approach. In this paper, we present Siam-U-Net-Attn model - a multi-class deep learning model with an attention mechanism - to assess damage levels of buildings given a pair of satellite images depicting a scene before and after a disaster. We evaluate the proposed method on xView2, a large-scale building damage assessment dataset, and demonstrate that the proposed approach achieves accurate damage scale classification and building segmentation results simultaneously. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: 10 pages, 9 figures

arXiv:2001.08854 [pdf, other]

Plant Stem Segmentation Using Fast Ground Truth Generation

Authors: Changye Yang, Sriram Baireddy, Yuhao Chen, Enyu Cai, Denise Caldwell, Valérian Méline, Anjali S. Iyer-Pascuzzi, Edward J. Delp

Abstract: Accurately phenoty** plant wilting is important for understanding responses to environmental stress. Analysis of the shape of plants can potentially be used to accurately quantify the degree of wilting. Plant shape analysis can be enhanced by locating the stem, which serves as a consistent reference point during wilting. In this paper, we show that deep learning methods can accurately segment to… ▽ More Accurately phenoty** plant wilting is important for understanding responses to environmental stress. Analysis of the shape of plants can potentially be used to accurately quantify the degree of wilting. Plant shape analysis can be enhanced by locating the stem, which serves as a consistent reference point during wilting. In this paper, we show that deep learning methods can accurately segment tomato plant stems. We also propose a control-point-based ground truth method that drastically reduces the resources needed to create a training dataset for a deep learning approach. Experimental results show the viability of both our proposed ground truth approach and deep learning based stem segmentation. △ Less

Submitted 23 January, 2020; originally announced January 2020.

arXiv:1910.11367 [pdf, other]

Learning eating environments through scene clustering

Authors: Sri Kalyan Yarlagadda, Sriram Baireddy, David Güera, Carol J. Boushey, Deborah A. Kerr, Fengqing Zhu

Abstract: It is well known that dietary habits have a significant influence on health. While many studies have been conducted to understand this relationship, little is known about the relationship between eating environments and health. Yet researchers and health agencies around the world have recognized the eating environment as a promising context for improving diet and health. In this paper, we propose… ▽ More It is well known that dietary habits have a significant influence on health. While many studies have been conducted to understand this relationship, little is known about the relationship between eating environments and health. Yet researchers and health agencies around the world have recognized the eating environment as a promising context for improving diet and health. In this paper, we propose an image clustering method to automatically extract the eating environments from eating occasion images captured during a community dwelling dietary study. Specifically, we are interested in learning how many different environments an individual consumes food in. Our method clusters images by extracting features at both global and local scales using a deep neural network. The variation in the number of clusters and images captured by different individual makes this a very challenging problem. Experimental results show that our method performs significantly better compared to several existing clustering approaches. △ Less

Submitted 9 November, 2019; v1 submitted 24 October, 2019; originally announced October 2019.

arXiv:1906.08743 [pdf, other]

We Need No Pixels: Video Manipulation Detection Using Stream Descriptors

Authors: David Güera, Sriram Baireddy, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

Abstract: Manipulating video content is easier than ever. Due to the misuse potential of manipulated content, multiple detection techniques that analyze the pixel data from the videos have been proposed. However, clever manipulators should also carefully forge the metadata and auxiliary header information, which is harder to do for videos than images. In this paper, we propose to identify forged videos by a… ▽ More Manipulating video content is easier than ever. Due to the misuse potential of manipulated content, multiple detection techniques that analyze the pixel data from the videos have been proposed. However, clever manipulators should also carefully forge the metadata and auxiliary header information, which is harder to do for videos than images. In this paper, we propose to identify forged videos by analyzing their multimedia stream descriptors with simple binary classifiers, completely avoiding the pixel space. Using well-known datasets, our results show that this scalable approach can achieve a high manipulation detection score if the manipulators have not done a careful data sanitization of the multimedia stream descriptors. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 7 pages, 6 figures, presented at the ICML 2019 Worksop on Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes

Showing 1–15 of 15 results for author: Baireddy, S