-
Motion-Conditioned Image Animation for Video Editing
Authors:
Wilson Yan,
Andrew Brown,
Pieter Abbeel,
Rohit Girdhar,
Samaneh Azadi
Abstract:
We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing. It leverages a simple decomposition of the video editing problem into image editing followed by motion-conditioned image animation. Furthermore, given the lack of robust evaluation datasets for video editing, we introduce a new benchmark that measures edit capability across a wide variety of tasks, such as object r…
▽ More
We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing. It leverages a simple decomposition of the video editing problem into image editing followed by motion-conditioned image animation. Furthermore, given the lack of robust evaluation datasets for video editing, we introduce a new benchmark that measures edit capability across a wide variety of tasks, such as object replacement, background changes, style changes, and motion edits. We present a comprehensive human evaluation of the latest video editing methods along with MoCA, on our proposed benchmark. MoCA establishes a new state-of-the-art, demonstrating greater human preference win-rate, and outperforming notable recent approaches including Dreamix (63%), MasaCtrl (75%), and Tune-A-Video (72%), with especially significant improvements for motion edits.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Authors:
Rohit Girdhar,
Mannat Singh,
Andrew Brown,
Quentin Duval,
Samaneh Azadi,
Sai Saketh Rambhatla,
Akbar Shah,
Xi Yin,
Devi Parikh,
Ishan Misra
Abstract:
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolut…
▽ More
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Augmented Computational Design: Methodical Application of Artificial Intelligence in Generative Design
Authors:
Pirouz Nourian,
Shervin Azadi,
Roy Uijtendaal,
Nan Bai
Abstract:
This chapter presents methodological reflections on the necessity and utility of artificial intelligence in generative design. Specifically, the chapter discusses how generative design processes can be augmented by AI to deliver in terms of a few outcomes of interest or performance indicators while dealing with hundreds or thousands of small decisions. The core of the performance-based generative…
▽ More
This chapter presents methodological reflections on the necessity and utility of artificial intelligence in generative design. Specifically, the chapter discusses how generative design processes can be augmented by AI to deliver in terms of a few outcomes of interest or performance indicators while dealing with hundreds or thousands of small decisions. The core of the performance-based generative design paradigm is about making statistical or simulation-driven associations between these choices and consequences for map** and navigating such a complex decision space. This chapter will discuss promising directions in Artificial Intelligence for augmenting decision-making processes in architectural design for map** and navigating complex design spaces.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Voxel Graph Operators: Topological Voxelization, Graph Generation, and Derivation of Discrete Differential Operators from Voxel Complexes
Authors:
Pirouz Nourian,
Shervin Azadi
Abstract:
In this paper, we present a novel workflow consisting of algebraic algorithms and data structures for fast and topologically accurate conversion of vector data models such as Boundary Representations into voxels (topological voxelization); spatially indexing them; constructing connectivity graphs from voxels; and constructing a coherent set of multivariate differential and integral operators from…
▽ More
In this paper, we present a novel workflow consisting of algebraic algorithms and data structures for fast and topologically accurate conversion of vector data models such as Boundary Representations into voxels (topological voxelization); spatially indexing them; constructing connectivity graphs from voxels; and constructing a coherent set of multivariate differential and integral operators from these graphs. Topological Voxelization is revisited and presented in the paper as a reversible map** of geometric models from $\mathbb{R}^3$ to $\mathbb{Z}^3$ to $\mathbb{N}^3$ and eventually to an index space created by Morton Codes in $\mathbb{N}$ while ensuring the topological validity of the voxel models; namely their topological thinness and their geometrical consistency. In addition, we present algorithms for constructing graphs and hyper-graph connectivity models on voxel data for graph traversal and field interpolations and utilize them algebraically in elegantly discretizing differential and integral operators for geometric, graphical, or spatial analyses and digital simulations. The multi-variate differential and integral operators presented in this paper can be used particularly in the formulation of Partial Differential Equations for physics simulations.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
EquiCity Game: A mathematical serious game for participatory design of spatial configurations
Authors:
Pirouz Nourian,
Shervin Azadi,
Nan Bai,
Bruno de Andrade,
Nour Abu Zaid,
Samaneh Rezvani,
Ana Pereira Roders
Abstract:
We propose mechanisms for a mathematical social-choice game that is designed to mediate decision-making processes for city planning, urban area redevelopment, and architectural design (massing) of urban housing complexes. The proposed game is effectively a multi-player generative configurator equipped with automated appraisal/scoring mechanisms for revealing the aggregate impact of alternatives; f…
▽ More
We propose mechanisms for a mathematical social-choice game that is designed to mediate decision-making processes for city planning, urban area redevelopment, and architectural design (massing) of urban housing complexes. The proposed game is effectively a multi-player generative configurator equipped with automated appraisal/scoring mechanisms for revealing the aggregate impact of alternatives; featuring a participatory digital process to support transparent and inclusive decision-making processes in spatial design for ensuring an equitable balance of sustainable development goals. As such, the game effectively empowers a group of decision-makers to reach a fair consensus by mathematically simulating many rounds of trade-offs between their decisions, with different levels of interest or control over various types of investments. Our proposed gamified design process encompasses decision-making about the most idiosyncratic aspects of a site related to its heritage status and cultural significance to the physical aspects such as balancing access to sunlight and the right to sunlight of the neighbours of the site, ensuring coherence of the entire configuration with regards to a network of desired closeness ratings, the satisfaction of a programme of requirements, and intricately balancing individual development goals in conjunction with communal goals and environmental design codes. The game is developed fully based on an algebraic computational process on our own digital twinning platform, using open geospatial data and open-source computational tools such as NumPy. The mathematical process consists of a Markovian design machine for balancing the decisions of actors, a massing configurator equipped with Fuzzy Logic and Multi-Criteria Decision Analysis, algebraic graph-theoretical accessibility evaluators, and automated solar-climatic evaluators using geospatial computational geometry.
△ Less
Submitted 30 September, 2023; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
Authors:
Samaneh Azadi,
Akbar Shah,
Thomas Hayes,
Devi Parikh,
Sonal Gupta
Abstract:
Text-guided human motion generation has drawn significant interest because of its impactful applications spanning animation and robotics. Recently, application of diffusion models for motion generation has enabled improvements in the quality of generated motions. However, existing approaches are limited by their reliance on relatively small-scale motion capture data, leading to poor performance on…
▽ More
Text-guided human motion generation has drawn significant interest because of its impactful applications spanning animation and robotics. Recently, application of diffusion models for motion generation has enabled improvements in the quality of generated motions. However, existing approaches are limited by their reliance on relatively small-scale motion capture data, leading to poor performance on more diverse, in-the-wild prompts. In this paper, we introduce Make-An-Animation, a text-conditioned human motion generation model which learns more diverse poses and prompts from large-scale image-text datasets, enabling significant improvement in performance over prior works. Make-An-Animation is trained in two stages. First, we train on a curated large-scale dataset of (text, static pseudo-pose) pairs extracted from image-text datasets. Second, we fine-tune on motion capture data, adding additional layers to model the temporal dimension. Unlike prior diffusion models for motion generation, Make-An-Animation uses a U-Net architecture similar to recent text-to-video generation models. Human evaluation of motion realism and alignment with input text shows that our model reaches state-of-the-art performance on text-to-motion generation.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Text-Conditional Contextualized Avatars For Zero-Shot Personalization
Authors:
Samaneh Azadi,
Thomas Hayes,
Akbar Shah,
Guan Pang,
Devi Parikh,
Sonal Gupta
Abstract:
Recent large-scale text-to-image generation models have made significant improvements in the quality, realism, and diversity of the synthesized images and enable users to control the created content through language. However, the personalization aspect of these generative models is still challenging and under-explored. In this work, we propose a pipeline that enables personalization of image gener…
▽ More
Recent large-scale text-to-image generation models have made significant improvements in the quality, realism, and diversity of the synthesized images and enable users to control the created content through language. However, the personalization aspect of these generative models is still challenging and under-explored. In this work, we propose a pipeline that enables personalization of image generation with avatars capturing a user's identity in a delightful way. Our pipeline is zero-shot, avatar texture and style agnostic, and does not require training on the avatar at all - it is scalable to millions of users who can generate a scene with their avatar. To render the avatar in a pose faithful to the given text prompt, we propose a novel text-to-3D pose diffusion model trained on a curated large-scale dataset of in-the-wild human poses improving the performance of the SOTA text-to-motion models significantly. We show, for the first time, how to leverage large-scale image datasets to learn human 3D pose parameters and overcome the limitations of motion capture datasets.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Shape-Guided Diffusion with Inside-Outside Attention
Authors:
Dong Huk Park,
Grace Luo,
Clayton Toste,
Samaneh Azadi,
Xihui Liu,
Maka Karalashvili,
Anna Rohrbach,
Trevor Darrell
Abstract:
We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion. Our training-free method uses an Inside-Outside Attention mechanism during the inversion and generation process to apply a shape constraint to the cross- and self-attention maps. Our mechanism designates which spatial region is the object (inside) vs. backgro…
▽ More
We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion. Our training-free method uses an Inside-Outside Attention mechanism during the inversion and generation process to apply a shape constraint to the cross- and self-attention maps. Our mechanism designates which spatial region is the object (inside) vs. background (outside) then associates edits to the correct region. We demonstrate the efficacy of our method on the shape-guided editing task, where the model must replace an object according to a text prompt and object mask. We curate a new ShapePrompts benchmark derived from MS-COCO and achieve SOTA results in shape faithfulness without a degradation in text alignment or image realism according to both automatic metrics and annotator ratings. Our data and code will be made available at https://shape-guided-diffusion.github.io.
△ Less
Submitted 1 April, 2024; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Discovering Quantum Phase Transitions with Fermionic Neural Networks
Authors:
G. Cassella,
H. Sutterud,
S. Azadi,
N. D. Drummond,
D. Pfau,
J. S. Spencer,
W. M. C. Foulkes
Abstract:
Deep neural networks have been extremely successful as highly accurate wave function ansätze for variational Monte Carlo calculations of molecular ground states. We present an extension of one such ansatz, FermiNet, to calculations of the ground states of periodic Hamiltonians, and study the homogeneous electron gas. FermiNet calculations of the ground-state energies of small electron gas systems…
▽ More
Deep neural networks have been extremely successful as highly accurate wave function ansätze for variational Monte Carlo calculations of molecular ground states. We present an extension of one such ansatz, FermiNet, to calculations of the ground states of periodic Hamiltonians, and study the homogeneous electron gas. FermiNet calculations of the ground-state energies of small electron gas systems are in excellent agreement with previous initiator full configuration interaction quantum Monte Carlo and diffusion Monte Carlo calculations. We investigate the spin-polarized homogeneous electron gas and demonstrate that the same neural network architecture is capable of accurately representing both the delocalized Fermi liquid state and the localized Wigner crystal state. The network is given no \emph{a priori} knowledge that a phase transition exists, but converges on the translationally invariant ground state at high density and spontaneously breaks the symmetry to produce the crystalline ground state at low density.
△ Less
Submitted 5 July, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
Authors:
Xihui Liu,
Dong Huk Park,
Samaneh Azadi,
Gong Zhang,
Arman Chopikyan,
Yuxiao Hu,
Humphrey Shi,
Anna Rohrbach,
Trevor Darrell
Abstract:
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this m…
▽ More
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance.
△ Less
Submitted 5 December, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
A Computational Approach for Checking Compliance with European View and Sunlight Exposure Criteria
Authors:
Eleonora Brembilla,
Shervin Azadi,
Pirouz Nourian
Abstract:
The paper presents open-source computational workflows for assessing the "Exposure to sunlight" and "View out" criteria as defined in the European standard EN 17037 "Daylight in Buildings", issued by the European Committee for Standardization. In addition to these factors, the standard document also addresses daylight provision and protection from glare, both of which fall out of the scope of this…
▽ More
The paper presents open-source computational workflows for assessing the "Exposure to sunlight" and "View out" criteria as defined in the European standard EN 17037 "Daylight in Buildings", issued by the European Committee for Standardization. In addition to these factors, the standard document also addresses daylight provision and protection from glare, both of which fall out of the scope of this paper. The purpose of the standard is stated as 'encouraging building designers to assess and ensure successfully daylit spaces'. The standard document proposes verification methods for performing such assessments, albeit without recommending a simulation procedure for computing the aforementioned criteria. The workflows proposed in this paper are arguably the first attempt to standardize these assessment methods using de-facto open-source standard technologies currently used in practice. The approach of this work is twofold: establish that the compliance check can be systematically performed on a 3D model by a novel simulation tool developed by the authors; and highlighting the additional assumptions that need to be implemented to build a robust and unambiguous tool within existing open-source frameworks.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Semantic Bottleneck Scene Generation
Authors:
Samaneh Azadi,
Michael Tschannen,
Eric Tzeng,
Sylvain Gelly,
Trevor Darrell,
Mario Lucic
Abstract:
Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes. We assume pixel-wise segmentation labels are available during training and use them to learn the scene structure. During inference, our model first synthesiz…
▽ More
Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes. We assume pixel-wise segmentation labels are available during training and use them to learn the scene structure. During inference, our model first synthesizes a realistic segmentation layout from scratch, then synthesizes a realistic scene conditioned on that layout. For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts. For the latter, we use a conditional segmentation-to-image synthesis network that captures the distribution of photo-realistic images conditioned on the semantic layout. When trained end-to-end, the resulting model outperforms state-of-the-art generative models in unsupervised image synthesis on two challenging domains in terms of the Frechet Inception Distance and user-study evaluations. Moreover, we demonstrate the generated segmentation maps can be used as additional training data to strongly improve recent segmentation-to-image synthesis networks.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Time Distance: A Novel Collision Prediction and Path Planning Method
Authors:
Ali Analooee,
Shahram Azadi,
Reza Kazemi
Abstract:
In this paper, a new fast algorithm for path planning and a collision prediction framework for two dimensional dynamically changing environments are introduced. The method is called Time Distance (TD) and benefits from the space-time space idea. First, the TD concept is defined as the time interval that must be spent in order for an object to reach another object or a location. Next, TD functions…
▽ More
In this paper, a new fast algorithm for path planning and a collision prediction framework for two dimensional dynamically changing environments are introduced. The method is called Time Distance (TD) and benefits from the space-time space idea. First, the TD concept is defined as the time interval that must be spent in order for an object to reach another object or a location. Next, TD functions are derived as a function of location, velocity and geometry of objects. To construct the configuration-time space, TD functions in conjunction with another function named "Z-Infinity" are exploited. Finally, an explicit formula for creating the length optimal collision free path is presented. Length optimization in this formula is achieved using a function named "Route Function" which minimizes a cost function. Performance of the path planning algorithm is evaluated in simulations. Comparisons indicate that the algorithm is fast enough and capable to generate length optimal paths as the most effective methods do. Finally, as another usage of the TD functions, a collision prediction framework is presented. This framework consists of an explicit function which is a function of TD functions and calculates the TD of the vehicle with respect to all objects of the environment.
△ Less
Submitted 6 April, 2023; v1 submitted 7 July, 2019;
originally announced July 2019.
-
Discriminator Rejection Sampling
Authors:
Samaneh Azadi,
Catherine Olsson,
Trevor Darrell,
Ian Goodfellow,
Augustus Odena
Abstract:
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be us…
▽ More
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the SAGAN model, state-of-the-art in the image generation task at the time of develo** this work. On ImageNet, we train an improved baseline that increases the Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75.
△ Less
Submitted 26 February, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Compositional GAN: Learning Image-Conditional Binary Composition
Authors:
Samaneh Azadi,
Deepak Pathak,
Sayna Ebrahimi,
Trevor Darrell
Abstract:
Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion,…
▽ More
Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose a novel self-consistent Composition-by-Decomposition (CoDe) network to compose a pair of objects. Given object images from two distinct distributions, our model can generate a realistic composite image from their joint distribution following the texture and shape of the input objects. We evaluate our approach through qualitative experiments and user evaluations. Our results indicate that the learned model captures potential interactions between the two object domains, and generates realistic composed scenes at test time.
△ Less
Submitted 28 March, 2019; v1 submitted 19 July, 2018;
originally announced July 2018.
-
Multi-Content GAN for Few-Shot Font Style Transfer
Authors:
Samaneh Azadi,
Matthew Fisher,
Vladimir Kim,
Zhaowen Wang,
Eli Shechtman,
Trevor Darrell
Abstract:
In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface. To generate a set of multi-content images following a consistent style from very few examples, we propose an end-to-end stacked conditional GAN model considering content along channels and style along network laye…
▽ More
In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface. To generate a set of multi-content images following a consistent style from very few examples, we propose an end-to-end stacked conditional GAN model considering content along channels and style along network layers. Our proposed network transfers the style of given glyphs to the contents of unseen ones, capturing highly stylized fonts found in the real-world such as those on movie posters or infographics. We seek to transfer both the typographic stylization (ex. serifs and ears) as well as the textual stylization (ex. color gradients and effects.) We base our experiments on our collected data set including 10,000 fonts with different styles and demonstrate effective generalization from a very small number of observed glyphs.
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
Learning Detection with Diverse Proposals
Authors:
Samaneh Azadi,
Jiashi Feng,
Trevor Darrell
Abstract:
To predict a set of diverse and informative proposals with enriched representations, this paper introduces a differentiable Determinantal Point Process (DPP) layer that is able to augment the object detection architectures. Most modern object detection architectures, such as Faster R-CNN, learn to localize objects by minimizing deviations from the ground-truth but ignore correlation between multip…
▽ More
To predict a set of diverse and informative proposals with enriched representations, this paper introduces a differentiable Determinantal Point Process (DPP) layer that is able to augment the object detection architectures. Most modern object detection architectures, such as Faster R-CNN, learn to localize objects by minimizing deviations from the ground-truth but ignore correlation between multiple proposals and object categories. Non-Maximum Suppression (NMS) as a widely used proposal pruning scheme ignores label- and instance-level relations between object candidates resulting in multi-labeled detections. In the multi-class case, NMS selects boxes with the largest prediction scores ignoring the semantic relation between categories of potential election. In contrast, our trainable DPP layer, allowing for Learning Detection with Diverse Proposals (LDDP), considers both label-level contextual information and spatial layout relationships between proposals without increasing the number of parameters of the network, and thus improves location and category specifications of final detected bounding boxes substantially during both training and inference schemes. Furthermore, we show that LDDP keeps it superiority over Faster R-CNN even if the number of proposals generated by LDPP is only ~30% as many as those for Faster R-CNN.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
Auxiliary Image Regularization for Deep CNNs with Noisy Labels
Authors:
Samaneh Azadi,
Jiashi Feng,
Stefanie Jegelka,
Trevor Darrell
Abstract:
Precisely-labeled data sets with sufficient amount of samples are very important for training deep convolutional neural networks (CNNs). However, many of the available real-world data sets contain erroneously labeled samples and those errors substantially hinder the learning of very accurate CNN models. In this work, we consider the problem of training a deep CNN model for image classification wit…
▽ More
Precisely-labeled data sets with sufficient amount of samples are very important for training deep convolutional neural networks (CNNs). However, many of the available real-world data sets contain erroneously labeled samples and those errors substantially hinder the learning of very accurate CNN models. In this work, we consider the problem of training a deep CNN model for image classification with mislabeled training samples - an issue that is common in real image data sets with tags supplied by amateur users. To solve this problem, we propose an auxiliary image regularization technique, optimized by the stochastic Alternating Direction Method of Multipliers (ADMM) algorithm, that automatically exploits the mutual context information among training images and encourages the model to select reliable images to robustify the learning process. Comprehensive experiments on benchmark data sets clearly demonstrate our proposed regularized CNN model is resistant to label noise in training data.
△ Less
Submitted 2 March, 2016; v1 submitted 22 November, 2015;
originally announced November 2015.