-
Self-adaptive weights based on balanced residual decay rate for physics-informed neural networks and deep operator networks
Authors:
Wenqian Chen,
Amanda A. Howard,
Panos Stinis
Abstract:
Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy and efficiency. In this work, we demonstrate that the failure of plain physics-informed neural networks arises from the significant discrepancy in the convergence…
▽ More
Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy and efficiency. In this work, we demonstrate that the failure of plain physics-informed neural networks arises from the significant discrepancy in the convergence speed of residuals at different training points, where the slowest convergence speed dominates the overall solution convergence. Based on these observations, we propose a point-wise adaptive weighting method that balances the residual decay rate across different training points. The performance of our proposed adaptive weighting method is compared with current state-of-the-art adaptive weighting methods on benchmark problems for both physics-informed neural networks and physics-informed deep operator networks. Through extensive numerical results we demonstrate that our proposed approach of balanced residual decay rates offers several advantages, including bounded weights, high prediction accuracy, fast convergence speed, low training uncertainty, low computational cost and ease of hyperparameter tuning.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems
Authors:
Amanda A. Howard,
Bruno Jacob,
Sarah H. Murphy,
Alexander Heinlein,
Panos Stinis
Abstract:
Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to…
▽ More
Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to be trained in parallel to give accurate solutions for multiscale problems. We show that finite basis KANs (FBKANs) can provide accurate results with noisy data and for physics-informed training.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Custom Gradient Estimators are Straight-Through Estimators in Disguise
Authors:
Matt Schoenbauer,
Daniele Moro,
Lukasz Lew,
Andrew Howard
Abstract:
Quantization-aware training comes with a fundamental challenge: the derivative of quantization functions such as rounding are zero almost everywhere and nonexistent elsewhere. Various differentiable approximations of quantization functions have been proposed to address this issue. In this paper, we prove that when the learning rate is sufficiently small, a large class of weight gradient estimators…
▽ More
Quantization-aware training comes with a fundamental challenge: the derivative of quantization functions such as rounding are zero almost everywhere and nonexistent elsewhere. Various differentiable approximations of quantization functions have been proposed to address this issue. In this paper, we prove that when the learning rate is sufficiently small, a large class of weight gradient estimators is equivalent with the straight through estimator (STE). Specifically, after swap** in the STE and adjusting both the weight initialization and the learning rate in SGD, the model will train in almost exactly the same way as it did with the original gradient estimator. Moreover, we show that for adaptive learning rate algorithms like Adam, the same result can be seen without any modifications to the weight initialization and learning rate. We experimentally show that these results hold for both a small convolutional model trained on the MNIST dataset and for a ResNet50 model trained on ImageNet.
△ Less
Submitted 22 May, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
MobileNetV4 -- Universal Models for the Mobile Ecosystem
Authors:
Danfeng Qin,
Chas Leichner,
Manolis Delakis,
Marco Fornoni,
Shixin Luo,
Fan Yang,
Weijun Wang,
Colby Banbury,
Chengxi Ye,
Berkin Akin,
Vaibhav Aggarwal,
Tenghui Zhu,
Daniele Moro,
Andrew Howard
Abstract:
We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB,…
▽ More
We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Authors:
Marina Neseem,
Conor McCullough,
Randy Hsin,
Chas Leichner,
Shan Li,
In Suk Chong,
Andrew G. Howard,
Lukasz Lew,
Sherief Reda,
Ville-Mikko Rautio,
Daniele Moro
Abstract:
Low-precision quantization is recognized for its efficacy in neural network optimization. Our analysis reveals that non-quantized elementwise operations which are prevalent in layers such as parameterized activation functions, batch normalization, and quantization scaling dominate the inference cost of low-precision models. These non-quantized elementwise operations are commonly overlooked in SOTA…
▽ More
Low-precision quantization is recognized for its efficacy in neural network optimization. Our analysis reveals that non-quantized elementwise operations which are prevalent in layers such as parameterized activation functions, batch normalization, and quantization scaling dominate the inference cost of low-precision models. These non-quantized elementwise operations are commonly overlooked in SOTA efficiency metrics such as Arithmetic Computation Effort (ACE). In this paper, we propose ACEv2 - an extended version of ACE which offers a better alignment with the inference cost of quantized models and their energy consumption on ML hardware. Moreover, we introduce PikeLPN, a model that addresses these efficiency issues by applying quantization to both elementwise operations and multiply-accumulate operations. In particular, we present a novel quantization technique for batch normalization layers named QuantNorm which allows for quantizing the batch normalization parameters without compromising the model performance. Additionally, we propose applying Double Quantization where the quantization scaling parameters are quantized. Furthermore, we recognize and resolve the issue of distribution mismatch in Separable Convolution layers by introducing Distribution-Heterogeneous Quantization which enables quantizing them to low-precision. PikeLPN achieves Pareto-optimality in efficiency-accuracy trade-off with up to 3X efficiency improvement compared to SOTA low-precision models.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Energy Flexibility Potential in the Brewery Sector: A Multi-agent Based Simulation of 239 Danish Breweries
Authors:
Daniel Anthony Howard,
Zheng Grace Ma,
Jacob Alstrup Engvang,
Morten Hagenau,
Kathrine Lau Jorgensen,
Jonas Fausing Olesen,
Bo Nørregaard Jørgensen
Abstract:
The beverage industry is a typical food processing industry, accounts for significant energy consumption, and has flexible demands. However, the deployment of energy flexibility in the beverage industry is complex and challenging. Furthermore, activation of energy flexibility from the whole brewery industry is necessary to ensure grid stability. Therefore, this paper assesses the energy flexibilit…
▽ More
The beverage industry is a typical food processing industry, accounts for significant energy consumption, and has flexible demands. However, the deployment of energy flexibility in the beverage industry is complex and challenging. Furthermore, activation of energy flexibility from the whole brewery industry is necessary to ensure grid stability. Therefore, this paper assesses the energy flexibility potential of Denmark's brewery sector based on a multi-agent-based simulation. 239 individual brewery facilities are simulated, and each facility, as an agent, can interact with the energy system market and make decisions based on its underlying parameters and operational restrictions. The results show that the Danish breweries could save 1.56 % of electricity costs annually while maintaining operational security and reducing approximately 1745 tonnes of CO2 emissions. Furthermore, medium-size breweries could obtain higher relative benefits by providing energy flexibility, especially those producing lager and ale. The result also shows that the breweries' relative saving potential is electricity market-dependent.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Challenge design roadmap
Authors:
Hugo Jair Escalante Balderas,
Isabelle Guyon,
Addison Howard,
Walter Reade,
Sebastien Treguer
Abstract:
Challenges can be seen as a type of game that motivates participants to solve serious tasks. As a result, competition organizers must develop effective game rules. However, these rules have multiple objectives beyond making the game enjoyable for participants. These objectives may include solving real-world problems, advancing scientific or technical areas, making scientific discoveries, and educa…
▽ More
Challenges can be seen as a type of game that motivates participants to solve serious tasks. As a result, competition organizers must develop effective game rules. However, these rules have multiple objectives beyond making the game enjoyable for participants. These objectives may include solving real-world problems, advancing scientific or technical areas, making scientific discoveries, and educating the public. In many ways, creating a challenge is similar to launching a product. It requires the same level of excitement and rigorous testing, and the goal is to attract ''customers'' in the form of participants. The process begins with a solid plan, such as a competition proposal that will eventually be submitted to an international conference and subjected to peer review. Although peer review does not guarantee quality, it does force organizers to consider the impact of their challenge, identify potential oversights, and generally improve its quality. This chapter provides guidelines for creating a strong plan for a challenge. The material draws on the preparation guidelines from organizations such as Kaggle 1 , ChaLearn 2 and Tailor 3 , as well as the NeurIPS proposal template, which some of the authors contributed to.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Multifidelity domain decomposition-based physics-informed neural networks and operators for time-dependent problems
Authors:
Alexander Heinlein,
Amanda A. Howard,
Damien Beecroft,
Panos Stinis
Abstract:
Multiscale problems are challenging for neural network-based discretizations of differential equations, such as physics-informed neural networks (PINNs). This can be (partly) attributed to the so-called spectral bias of neural networks. To improve the performance of PINNs for time-dependent problems, a combination of multifidelity stacking PINNs and domain decomposition-based finite basis PINNs is…
▽ More
Multiscale problems are challenging for neural network-based discretizations of differential equations, such as physics-informed neural networks (PINNs). This can be (partly) attributed to the so-called spectral bias of neural networks. To improve the performance of PINNs for time-dependent problems, a combination of multifidelity stacking PINNs and domain decomposition-based finite basis PINNs is employed. In particular, to learn the high-fidelity part of the multifidelity model, a domain decomposition in time is employed. The performance is investigated for a pendulum and a two-frequency problem as well as the Allen-Cahn equation. It can be observed that the domain decomposition approach clearly improves the PINN and stacking PINN approaches. Finally, it is demonstrated that the FBPINN approach can be extended to multifidelity physics-informed deep operator networks.
△ Less
Submitted 6 June, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Identifying Best Practice Melting Patterns in Induction Furnaces: A Data-Driven Approach Using Time Series KMeans Clustering and Multi-Criteria Decision Making
Authors:
Daniel Anthony Howard,
Bo Nørregaard Jørgensen,
Zheng Ma
Abstract:
Improving energy efficiency in industrial production processes is crucial for competitiveness, and compliance with climate policies. This paper introduces a data-driven approach to identify optimal melting patterns in induction furnaces. Through time-series K-means clustering the melting patterns could be classified into distinct clusters based on temperature profiles. Using the elbow method, 12 c…
▽ More
Improving energy efficiency in industrial production processes is crucial for competitiveness, and compliance with climate policies. This paper introduces a data-driven approach to identify optimal melting patterns in induction furnaces. Through time-series K-means clustering the melting patterns could be classified into distinct clusters based on temperature profiles. Using the elbow method, 12 clusters were identified, representing the range of melting patterns. Performance parameters such as melting time, energy-specific performance, and carbon cost were established for each cluster, indicating furnace efficiency and environmental impact. Multiple criteria decision-making methods including Simple Additive Weighting, Multiplicative Exponential Weighting, Technique for Order of Preference by Similarity to Ideal Solution, modified TOPSIS, and VlseKriterijumska Optimizacija I Kompromisno Resenje were utilized to determine the best-practice cluster. The study successfully identified the cluster with the best performance. Implementing the best practice operation resulted in an 8.6 % reduction in electricity costs, highlighting the potential energy savings in the foundry.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Stacked networks improve physics-informed training: applications to neural networks and deep operator networks
Authors:
Amanda A Howard,
Sarah H Murphy,
Shady E Ahmed,
Panos Stinis
Abstract:
Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build…
▽ More
Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks.
△ Less
Submitted 20 November, 2023; v1 submitted 11 November, 2023;
originally announced November 2023.
-
Efficient kernel surrogates for neural network-based regression
Authors:
Saad Qadeer,
Andrew Engel,
Amanda Howard,
Adam Tsou,
Max Vargas,
Panos Stinis,
Tony Chiang
Abstract:
Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialize…
▽ More
Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialized DNNs in the infinite width limit converge to kernel machines relying on a Neural Tangent Kernel (NTK) with known closed form. These results suggest, and experimental evidence corroborates, that empirical kernel machines can also act as surrogates for finite width DNNs. The high computational cost of assembling the full NTK, however, makes this approach infeasible in practice, motivating the need for low-cost approximations. In the current work, we study the performance of the Conjugate Kernel (CK), an efficient approximation to the NTK that has been observed to yield fairly similar results. For the regression problem of smooth functions and logistic regression classification, we show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior. In particular, we establish bounds for the relative test losses, verify them with numerical tests, and identify the regularity of the kernel as the key determinant of performance. In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively. We present a demonstration of this on the foundation model GPT-2 by comparing its performance on a classification task using a conventional approach and our prescription. We also show how our approach can be used to improve physics-informed operator network training for regression tasks as well as convolutional neural network training for vision classification tasks.
△ Less
Submitted 24 January, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
Authors:
Shuyang Sun,
Weijun Wang,
Qihang Yu,
Andrew Howard,
Philip Torr,
Liang-Chieh Chen
Abstract:
This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to its high complexity, the training objective of panoptic segmentation will inevitably lead to much higher false positive penalization. Such unbalanced loss makes the training process of the end-to-end mask-transformer based arc…
▽ More
This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to its high complexity, the training objective of panoptic segmentation will inevitably lead to much higher false positive penalization. Such unbalanced loss makes the training process of the end-to-end mask-transformer based architectures difficult, especially for efficient models. In this paper, we present ReMaX that adds relaxation to mask predictions and class predictions during training for panoptic segmentation. We demonstrate that via these simple relaxation techniques during training, our model can be consistently improved by a clear margin \textbf{without} any extra computational cost on inference. By combining our method with efficient backbones like MobileNetV3-Small, our method achieves new state-of-the-art results for efficient panoptic segmentation on COCO, ADE20K and Cityscapes. Code and pre-trained checkpoints will be available at \url{https://github.com/google-research/deeplab2}.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models
Authors:
Alicia Parrish,
Hannah Rose Kirk,
Jessica Quaye,
Charvi Rastogi,
Max Bartolo,
Oana Inel,
Juan Ciro,
Rafael Mosquera,
Addison Howard,
Will Cukierski,
D. Sculley,
Vijay Janapa Reddi,
Lora Aroyo
Abstract:
The generative AI revolution in recent years has been spurred by an expansion in compute power and data quantity, which together enable extensive pre-training of powerful text-to-image (T2I) models. With their greater capabilities to generate realistic and creative content, these T2I models like DALL-E, MidJourney, Imagen or Stable Diffusion are reaching ever wider audiences. Any unsafe behaviors…
▽ More
The generative AI revolution in recent years has been spurred by an expansion in compute power and data quantity, which together enable extensive pre-training of powerful text-to-image (T2I) models. With their greater capabilities to generate realistic and creative content, these T2I models like DALL-E, MidJourney, Imagen or Stable Diffusion are reaching ever wider audiences. Any unsafe behaviors inherited from pretraining on uncurated internet-scraped datasets thus have the potential to cause wide-reaching harm, for example, through generated images which are violent, sexually explicit, or contain biased and derogatory stereotypes. Despite this risk of harm, we lack systematic and structured evaluation datasets to scrutinize model behavior, especially adversarial attacks that bypass existing safety filters. A typical bottleneck in safety evaluation is achieving a wide coverage of different types of challenging examples in the evaluation set, i.e., identifying 'unknown unknowns' or long-tail problems. To address this need, we introduce the Adversarial Nibbler challenge. The goal of this challenge is to crowdsource a diverse set of failure modes and reward challenge participants for successfully finding safety vulnerabilities in current state-of-the-art T2I models. Ultimately, we aim to provide greater awareness of these issues and assist developers in improving the future safety and reliability of generative AI models. Adversarial Nibbler is a data-centric challenge, part of the DataPerf challenge suite, organized and supported by Kaggle and MLCommons.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
A multifidelity approach to continual learning for physical systems
Authors:
Amanda Howard,
Yucheng Fu,
Panos Stinis
Abstract:
We introduce a novel continual learning method based on multifidelity deep neural networks. This method learns the correlation between the output of previously trained models and the desired output of the model on the current training dataset, limiting catastrophic forgetting. On its own the multifidelity continual learning method shows robust results that limit forgetting across several datasets.…
▽ More
We introduce a novel continual learning method based on multifidelity deep neural networks. This method learns the correlation between the output of previously trained models and the desired output of the model on the current training dataset, limiting catastrophic forgetting. On its own the multifidelity continual learning method shows robust results that limit forgetting across several datasets. Additionally, we show that the multifidelity method can be combined with existing continual learning methods, including replay and memory aware synapses, to further limit catastrophic forgetting. The proposed continual learning method is especially suited for physical problems where the data satisfy the same physical laws on each domain, or for physics-informed neural networks, because in these cases we expect there to be a strong correlation between the output of the previous model and the model on the current training domain.
△ Less
Submitted 9 February, 2024; v1 submitted 7 April, 2023;
originally announced April 2023.
-
DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement
Authors:
Aaron Master,
Lie Lu,
Jonas Samuelsson,
Heidi-Maria Lehtonen,
Scott Norcross,
Nathan Swedlow,
Audrey Howard
Abstract:
Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technolo…
▽ More
Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technologies include spatio-level filtering (SLF) and deep-learning based dialog classification and denoising. Using subjective listening tests, we show that DeepSpace demonstrates significantly improved overall performance relative to state-of-the-art systems available for testing. We explore the feasibility of using existing automated metrics to evaluate unguided DE systems.
△ Less
Submitted 22 February, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
A Hybrid Deep Neural Operator/Finite Element Method for Ice-Sheet Modeling
Authors:
QiZhi He,
Mauro Perego,
Amanda A. Howard,
George Em Karniadakis,
Panos Stinis
Abstract:
One of the most challenging and consequential problems in climate modeling is to provide probabilistic projections of sea level rise. A large part of the uncertainty of sea level projections is due to uncertainty in ice sheet dynamics. At the moment, accurate quantification of the uncertainty is hindered by the cost of ice sheet computational models. In this work, we develop a hybrid approach to a…
▽ More
One of the most challenging and consequential problems in climate modeling is to provide probabilistic projections of sea level rise. A large part of the uncertainty of sea level projections is due to uncertainty in ice sheet dynamics. At the moment, accurate quantification of the uncertainty is hindered by the cost of ice sheet computational models. In this work, we develop a hybrid approach to approximate existing ice sheet computational models at a fraction of their cost. Our approach consists of replacing the finite element model for the momentum equations for the ice velocity, the most expensive part of an ice sheet model, with a Deep Operator Network, while retaining a classic finite element discretization for the evolution of the ice thickness. We show that the resulting hybrid model is very accurate and it is an order of magnitude faster than the traditional finite element model. Further, a distinctive feature of the proposed model compared to other neural network approaches, is that it can handle high-dimensional parameter spaces (parameter fields) such as the basal friction at the bed of the glacier, and can therefore be used for generating samples for uncertainty quantification. We study the impact of hyper-parameters, number of unknowns and correlation length of the parameter distribution on the training and accuracy of the Deep Operator Network on a synthetic ice sheet model. We then target the evolution of the Humboldt glacier in Greenland and show that our hybrid model can provide accurate statistics of the glacier mass loss and can be effectively used to accelerate the quantification of uncertainty.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
On Label Granularity and Object Localization
Authors:
Elijah Cole,
Kimberly Wilber,
Grant Van Horn,
Xuan Yang,
Marco Fornoni,
Pietro Perona,
Serge Belongie,
Andrew Howard,
Oisin Mac Aodha
Abstract:
Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation w…
▽ More
Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Multifidelity Deep Operator Networks For Data-Driven and Physics-Informed Problems
Authors:
Amanda A. Howard,
Mauro Perego,
George E. Karniadakis,
Panos Stinis
Abstract:
Operator learning for complex nonlinear systems is increasingly common in modeling multi-physics and multi-scale systems. However, training such high-dimensional operators requires a large amount of expensive, high-fidelity data, either from experiments or simulations. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fid…
▽ More
Operator learning for complex nonlinear systems is increasingly common in modeling multi-physics and multi-scale systems. However, training such high-dimensional operators requires a large amount of expensive, high-fidelity data, either from experiments or simulations. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fidelity to accurately learn complex operators when sufficient high-fidelity data is not available. Additionally, we demonstrate that the presence of low-fidelity data can improve the predictions of physics-informed learning with DeepONets. We demonstrate the new multi-fidelity training in diverse examples, including modeling of the ice-sheet dynamics of the Humboldt glacier, Greenland, using two different fidelity models and also using the same physical model at two different resolutions.
△ Less
Submitted 21 November, 2023; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Machine Learning in Heterogeneous Porous Materials
Authors:
Marta D'Elia,
Hang Deng,
Cedric Fraces,
Krishna Garikipati,
Lori Graham-Brady,
Amanda Howard,
George Karniadakis,
Vahid Keshavarzzadeh,
Robert M. Kirby,
Nathan Kutz,
Chunhui Li,
Xing Liu,
Hannah Lu,
Pania Newell,
Daniel O'Malley,
Masa Prodanovic,
Gowri Srinivasan,
Alexandre Tartakovsky,
Daniel M. Tartakovsky,
Hamdi Tchelepi,
Bozo Vazic,
Hari Viswanathan,
Hongkyu Yoon,
Piotr Zarzycki
Abstract:
The "Workshop on Machine learning in heterogeneous porous materials" brought together international scientific communities of applied mathematics, porous media, and material sciences with experts in the areas of heterogeneous materials, machine learning (ML) and applied mathematics to identify how ML can advance materials research. Within the scope of ML and materials research, the goal of the wor…
▽ More
The "Workshop on Machine learning in heterogeneous porous materials" brought together international scientific communities of applied mathematics, porous media, and material sciences with experts in the areas of heterogeneous materials, machine learning (ML) and applied mathematics to identify how ML can advance materials research. Within the scope of ML and materials research, the goal of the workshop was to discuss the state-of-the-art in each community, promote crosstalk and accelerate multi-disciplinary collaborative research, and identify challenges and opportunities. As the end result, four topic areas were identified: ML in predicting materials properties, and discovery and design of novel materials, ML in porous and fractured media and time-dependent phenomena, Multi-scale modeling in heterogeneous porous materials via ML, and Discovery of materials constitutive laws and new governing equations. This workshop was part of the AmeriMech Symposium series sponsored by the National Academies of Sciences, Engineering and Medicine and the U.S. National Committee on Theoretical and Applied Mechanics.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context
Authors:
Weijun Wang,
Andrew Howard
Abstract:
We present a next-generation neural network architecture, MOSAIC, for efficient and accurate semantic image segmentation on mobile devices. MOSAIC is designed using commonly supported neural operations by diverse mobile hardware platforms for flexible deployment across various mobile platforms. With a simple asymmetric encoder-decoder structure which consists of an efficient multi-scale context en…
▽ More
We present a next-generation neural network architecture, MOSAIC, for efficient and accurate semantic image segmentation on mobile devices. MOSAIC is designed using commonly supported neural operations by diverse mobile hardware platforms for flexible deployment across various mobile platforms. With a simple asymmetric encoder-decoder structure which consists of an efficient multi-scale context encoder and a light-weight hybrid decoder to recover spatial details from aggregated information, MOSAIC achieves new state-of-the-art performance while balancing accuracy and computational cost. Deployed on top of a tailored feature extraction backbone based on a searched classification network, MOSAIC achieves a 5% absolute accuracy gain surpassing the current industry standard MLPerf models and state-of-the-art architectures.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
The self-supervised spectral-spatial attention-based transformer network for automated, accurate prediction of crop nitrogen status from UAV imagery
Authors:
Xin Zhang,
Liangxiu Han,
Tam Sobeih,
Lewis Lappin,
Mark Lee,
Andew Howard,
Aron Kisdi
Abstract:
Nitrogen (N) fertilizer is routinely applied by farmers to increase crop yields. At present, farmers often over-apply N fertilizer in some locations or at certain times because they do not have high-resolution crop N status data. N-use efficiency can be low, with the remaining N lost to the environment, resulting in higher production costs and environmental pollution. Accurate and timely estimatio…
▽ More
Nitrogen (N) fertilizer is routinely applied by farmers to increase crop yields. At present, farmers often over-apply N fertilizer in some locations or at certain times because they do not have high-resolution crop N status data. N-use efficiency can be low, with the remaining N lost to the environment, resulting in higher production costs and environmental pollution. Accurate and timely estimation of N status in crops is crucial to improving crop** systems' economic and environmental sustainability. Destructive approaches based on plant tissue analysis are time consuming and impractical over large fields. Recent advances in remote sensing and deep learning have shown promise in addressing the aforementioned challenges in a non-destructive way. In this work, we propose a novel deep learning framework: a self-supervised spectral-spatial attention-based vision transformer (SSVT). The proposed SSVT introduces a Spectral Attention Block (SAB) and a Spatial Interaction Block (SIB), which allows for simultaneous learning of both spatial and spectral features from UAV digital aerial imagery, for accurate N status prediction in wheat fields. Moreover, the proposed framework introduces local-to-global self-supervised learning to help train the model from unlabelled data. The proposed SSVT has been compared with five state-of-the-art models including: ResNet, RegNet, EfficientNet, EfficientNetV2 and the original vision transformer on both testing and independent datasets. The proposed approach achieved high accuracy (0.96) with good generalizability and reproducibility for wheat N status estimation.
△ Less
Submitted 15 February, 2022; v1 submitted 12 November, 2021;
originally announced November 2021.
-
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework
Authors:
Matan Halevy,
Camille Harris,
Amy Bruckman,
Diyi Yang,
Ayanna Howard
Abstract:
Recent research has demonstrated how racial biases against users who write African American English exists in popular toxic language datasets. While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases. We demonstrate that different benchmark classifiers, as well as two in-process bias-rem…
▽ More
Recent research has demonstrated how racial biases against users who write African American English exists in popular toxic language datasets. While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases. We demonstrate that different benchmark classifiers, as well as two in-process bias-remediation techniques, propagate racial biases even in a larger corpus. We then propose a novel ensemble-framework that uses a specialized classifier that is fine-tuned to the African American English dialect. We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets. We demonstrate how the ensemble framework improves fairness metrics across all sample datasets with minimal impact on the classification performance, and provide empirical evidence in its ability to unlearn the annotation biases towards authors who use African American English.
** Please note that this work may contain examples of offensive words and phrases.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Bridging the Gap Between Object Detection and User Intent via Query-Modulation
Authors:
Marco Fornoni,
Chaochao Yan,
Liangchen Luo,
Kimberly Wilber,
Alex Stark,
Yin Cui,
Boqing Gong,
Andrew Howard
Abstract:
When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is esp…
▽ More
When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is especially severe when operating capacity-constrained mobile object detectors on-device. In this paper we investigate techniques to modulate mobile detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard detectors, query-modulated detectors show superior performance at detecting objects for a given user query. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors also outperform a specialized referring expression recognition system. Query-modulated detectors can also be trained to simultaneously solve for both localizing a user query and standard detection, even outperforming standard mobile detectors at the canonical COCO task.
△ Less
Submitted 3 August, 2022; v1 submitted 18 June, 2021;
originally announced June 2021.
-
Physics-informed CoKriging model of a redox flow battery
Authors:
Amanda A. Howard,
Alexandre M. Tartakovsky
Abstract:
Redox flow batteries (RFBs) offer the capability to store large amounts of energy cheaply and efficiently, however, there is a need for fast and accurate models of the charge-discharge curve of a RFB to potentially improve the battery capacity and performance. We develop a multifidelity model for predicting the charge-discharge curve of a RFB. In the multifidelity model, we use the Physics-informe…
▽ More
Redox flow batteries (RFBs) offer the capability to store large amounts of energy cheaply and efficiently, however, there is a need for fast and accurate models of the charge-discharge curve of a RFB to potentially improve the battery capacity and performance. We develop a multifidelity model for predicting the charge-discharge curve of a RFB. In the multifidelity model, we use the Physics-informed CoKriging (CoPhIK) machine learning method that is trained on experimental data and constrained by the so-called "zero-dimensional" physics-based model. Here we demonstrate that the model shows good agreement with experimental results and significant improvements over existing zero-dimensional models. We show that the proposed model is robust as it is not sensitive to the input parameters in the zero-dimensional model. We also show that only a small amount of high-fidelity experimental datasets are needed for accurate predictions for the range of considered input parameters, which include current density, flow rate, and initial concentrations.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
BasisNet: Two-stage Model Synthesis for Efficient Inference
Authors:
Mingda Zhang,
Chun-Te Chu,
Andrey Zhmoginov,
Andrew Howard,
Brendan Jou,
Yukun Zhu,
Li Zhang,
Rebecca Hwa,
Adriana Kovashka
Abstract:
In this work, we present BasisNet which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form. Our approach incorporates a lightweight model to preview the input and generate input-dependent combination coefficients, which later controls the synthesis of a more accurate specialist model to make final prediction.…
▽ More
In this work, we present BasisNet which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form. Our approach incorporates a lightweight model to preview the input and generate input-dependent combination coefficients, which later controls the synthesis of a more accurate specialist model to make final prediction. The two-stage model synthesis strategy can be applied to any network architectures and both stages are jointly trained. We also show that proper training recipes are critical for increasing generalizability for such high capacity neural networks. On ImageNet classification benchmark, our BasisNet with MobileNets as backbone demonstrated clear advantage on accuracy-efficiency trade-off over several strong baselines. Specifically, BasisNet-MobileNetV3 obtained 80.3% top-1 accuracy with only 290M Multiply-Add operations, halving the computational cost of previous state-of-the-art without sacrificing accuracy. With early termination, the average cost can be further reduced to 198M MAdds while maintaining accuracy of 80.0% on ImageNet.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection
Authors:
Keren Ye,
Adriana Kovashka,
Mark Sandler,
Menglong Zhu,
Andrew Howard,
Marco Fornoni
Abstract:
Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks. For maximum accuracy, each detector is usually trained to solve one single specific task, and comes with a completely independent set of parameters. While this guarantees high performance, it is also highly inefficient, as each model has to be separately downloaded and stored. In this paper we…
▽ More
Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks. For maximum accuracy, each detector is usually trained to solve one single specific task, and comes with a completely independent set of parameters. While this guarantees high performance, it is also highly inefficient, as each model has to be separately downloaded and stored. In this paper we address the question: can task-specific detectors be trained and represented as a shared set of weights, plus a very small set of additional weights for each task? The main contributions of this paper are the following: 1) we perform the first systematic study of parameter-efficient transfer learning techniques for object detection problems; 2) we propose a technique to learn a model patch with a size that is dependent on the difficulty of the task to be learned, and validate our approach on 10 different object detection tasks. Our approach achieves similar accuracy as previously proposed approaches, while being significantly more compact.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Large-Scale Generative Data-Free Distillation
Authors:
Liangchen Luo,
Mark Sandler,
Zi Lin,
Andrey Zhmoginov,
Andrew Howard
Abstract:
Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this pr…
▽ More
Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this problem, but they are either highly time-consuming or unable to scale to large datasets. To this end, we propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics of the trained teacher network. This enables us to build an ensemble of generators without training data that can efficiently produce substitute inputs for subsequent distillation. The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively. Furthermore, we are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Multi-path Neural Networks for On-device Multi-domain Visual Classification
Authors:
Qifei Wang,
Junjie Ke,
Joshua Greaves,
Grace Chu,
Gabriel Bender,
Luciano Sbaiz,
Alec Go,
Andrew Howard,
Feng Yang,
Ming-Hsuan Yang,
Jeff Gilbert,
Peyman Milanfar
Abstract:
Learning multiple domains/tasks with a single model is important for improving data efficiency and lowering inference cost for numerous vision tasks, especially on resource-constrained mobile devices. However, hand-crafting a multi-domain/task model can be both tedious and challenging. This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classifi…
▽ More
Learning multiple domains/tasks with a single model is important for improving data efficiency and lowering inference cost for numerous vision tasks, especially on resource-constrained mobile devices. However, hand-crafting a multi-domain/task model can be both tedious and challenging. This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices. The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space. An adaptive balanced domain prioritization algorithm is proposed to balance optimizing the joint model on multiple domains simultaneously. The determined multi-path model selectively shares parameters across domains in shared nodes while kee** domain-specific parameters within non-shared nodes in individual domain paths. This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains. Extensive evaluations on the Visual Decathlon dataset demonstrate that the proposed multi-path model achieves state-of-the-art performance in terms of accuracy, model size, and FLOPS against other approaches using MobileNetV3-like architectures. Furthermore, the proposed method improves average accuracy over learning single-domain models individually, and reduces the total number of parameters and FLOPS by 78% and 32% respectively, compared to the approach that simply bundles single-domain models for multi-domain learning.
△ Less
Submitted 8 January, 2021; v1 submitted 10 October, 2020;
originally announced October 2020.
-
Learning Unknown Physics of non-Newtonian Fluids
Authors:
Brandon Reyes,
Amanda A. Howard,
Paris Perdikaris,
Alexandre M. Tartakovsky
Abstract:
We extend the physics-informed neural network (PINN) method to learn viscosity models of two non-Newtonian systems (polymer melts and suspensions of particles) using only velocity measurements. The PINN-inferred viscosity models agree with the empirical models for shear rates with large absolute values but deviate for shear rates near zero where the analytical models have an unphysical singularity…
▽ More
We extend the physics-informed neural network (PINN) method to learn viscosity models of two non-Newtonian systems (polymer melts and suspensions of particles) using only velocity measurements. The PINN-inferred viscosity models agree with the empirical models for shear rates with large absolute values but deviate for shear rates near zero where the analytical models have an unphysical singularity. Once a viscosity model is learned, we use the PINN method to solve the momentum conservation equation for non-Newtonian fluid flow using only the boundary conditions.
△ Less
Submitted 26 August, 2020;
originally announced September 2020.
-
Discovering Multi-Hardware Mobile Models via Architecture Search
Authors:
Grace Chu,
Okan Arikan,
Gabriel Bender,
Weijun Wang,
Achille Brighton,
Pieter-Jan Kindermans,
Hanxiao Liu,
Berkin Akin,
Suyog Gupta,
Andrew Howard
Abstract:
Hardware-aware neural architecture designs have been predominantly focusing on optimizing model performance on single hardware and model development complexity, where another important factor, model deployment complexity, has been largely ignored. In this paper, we argue that, for applications that may be deployed on multiple hardware, having different single-hardware models across the deployed ha…
▽ More
Hardware-aware neural architecture designs have been predominantly focusing on optimizing model performance on single hardware and model development complexity, where another important factor, model deployment complexity, has been largely ignored. In this paper, we argue that, for applications that may be deployed on multiple hardware, having different single-hardware models across the deployed hardware makes it hard to guarantee consistent outputs across hardware and duplicates engineering work for debugging and fixing. To minimize such deployment cost, we propose an alternative solution, multi-hardware models, where a single architecture is developed for multiple hardware. With thoughtful search space design and incorporating the proposed multi-hardware metrics in neural architecture search, we discover multi-hardware models that give state-of-the-art (SoTA) performance across multiple hardware in both average and worse case scenarios. For performance on individual hardware, the single multi-hardware model yields similar or better results than SoTA performance on accelerators like GPU, DSP and EdgeTPU which was achieved by different models, while having similar performance with MobilenetV3 Large Minimalistic model on mobile CPU.
△ Less
Submitted 23 April, 2021; v1 submitted 18 August, 2020;
originally announced August 2020.
-
A community-powered search of machine learning strategy space to find NMR property prediction models
Authors:
Lars A. Bratholm,
Will Gerrard,
Brandon Anderson,
Shaojie Bai,
Sunghwan Choi,
Lam Dang,
Pavel Hanchar,
Addison Howard,
Guillaume Huard,
Sanghoon Kim,
Zico Kolter,
Risi Kondor,
Mordechai Kornbluth,
Youhan Lee,
Youngsoo Lee,
Jonathan P. Mailoa,
Thanh Tu Nguyen,
Milos Popovic,
Goran Rakocevic,
Walter Reade,
Wonho Song,
Luka Stojanovic,
Erik H. Thiede,
Nebojsa Tijanic,
Andres Torrubia
, et al. (4 additional authors not shown)
Abstract:
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the…
▽ More
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Accessible Computer Science for K-12 Students with Hearing Impairments
Authors:
Meenakshi Das,
Daniela Marghitu,
Fatemeh Jamshidi,
Mahender Mandala,
Ayanna Howard
Abstract:
An inclusive science, technology, engineering and mathematics (STEM) workforce is needed to maintain America's leadership in the scientific enterprise. Increasing the participation of underrepresented groups in STEM, including persons with disabilities, requires national attention to fully engage the nation's citizens in transforming its STEM enterprise. To address this need, a number of initiativ…
▽ More
An inclusive science, technology, engineering and mathematics (STEM) workforce is needed to maintain America's leadership in the scientific enterprise. Increasing the participation of underrepresented groups in STEM, including persons with disabilities, requires national attention to fully engage the nation's citizens in transforming its STEM enterprise. To address this need, a number of initiatives, such as AccessCSforALL, Bootstrap, and CSforAll, are making efforts to make Computer Science inclusive to the 7.4 million K-12 students with disabilities in the U.S. Of special interest to our project are those K-12 students with hearing impairments. American Sign Language (ASL) is the primary means of communication for an estimated 500,000 people in the United States, yet there are limited online resources providing Computer Science instruction in ASL. This paper introduces a new project designed to support Deaf and Hard of Hearing (DHH) K-12 students and sign interpreters in acquiring knowledge of complex Computer Science concepts. We discuss the motivation for the project and an early design of the accessible block-based Computer Science curriculum to engage DHH students in hands-on computing education.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
A Bayesian Framework for Nash Equilibrium Inference in Human-Robot Parallel Play
Authors:
Shray Bansal,
** Xu,
Ayanna Howard,
Charles Isbell
Abstract:
We consider shared workspace scenarios with humans and robots acting to achieve independent goals, termed as parallel play. We model these as general-sum games and construct a framework that utilizes the Nash equilibrium solution concept to consider the interactive effect of both agents while planning. We find multiple Pareto-optimal equilibria in these tasks. We hypothesize that people act by cho…
▽ More
We consider shared workspace scenarios with humans and robots acting to achieve independent goals, termed as parallel play. We model these as general-sum games and construct a framework that utilizes the Nash equilibrium solution concept to consider the interactive effect of both agents while planning. We find multiple Pareto-optimal equilibria in these tasks. We hypothesize that people act by choosing an equilibrium based on social norms and their personalities. To enable coordination, we infer the equilibrium online using a probabilistic model that includes these two factors and use it to select the robot's action. We apply our approach to a close-proximity pick-and-place task involving a robot and a simulated human with three potential behaviors - defensive, selfish, and norm-following. We showed that using a Bayesian approach to infer the equilibrium enables the robot to complete the task with less than half the number of collisions while also reducing the task execution time as compared to the best baseline. We also performed a study with human participants interacting either with other humans or with different robot agents and observed that our proposed approach performs similar to human-human parallel play interactions. The code is available at https://github.com/shray/bayes-nash
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Non-discriminative data or weak model? On the relative importance of data and model resolution
Authors:
Mark Sandler,
Jonathan Baccash,
Andrey Zhmoginov,
Andrew Howard
Abstract:
We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution"). Adjusting these characteristics is frequently used as a hyperparameter providing a trade-off between model performance and accuracy. An intuitive interpretation is that the reduced information…
▽ More
We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution"). Adjusting these characteristics is frequently used as a hyperparameter providing a trade-off between model performance and accuracy. An intuitive interpretation is that the reduced information content in the low-resolution input causes decay in the accuracy. In this paper, we show that up to a point, the input resolution alone plays little role in the network performance, and it is the internal resolution that is the critical driver of model quality. We then build on these insights to develop novel neural network architectures that we call \emph{Isometric Neural Networks}. These models maintain a fixed internal resolution throughout their entire depth. We demonstrate that they lead to high accuracy models with low activation footprint and parameter count.
△ Less
Submitted 17 October, 2019; v1 submitted 7 September, 2019;
originally announced September 2019.
-
Visual Wake Words Dataset
Authors:
Aakanksha Chowdhery,
Pete Warden,
Jonathon Shlens,
Andrew Howard,
Rocky Rhodes
Abstract:
The emergence of Internet of Things (IoT) applications requires intelligence on the edge. Microcontrollers provide a low-cost compute platform to deploy intelligent IoT applications using machine learning at scale, but have extremely limited on-chip memory and compute capability. To deploy computer vision on such devices, we need tiny vision models that fit within a few hundred kilobytes of memory…
▽ More
The emergence of Internet of Things (IoT) applications requires intelligence on the edge. Microcontrollers provide a low-cost compute platform to deploy intelligent IoT applications using machine learning at scale, but have extremely limited on-chip memory and compute capability. To deploy computer vision on such devices, we need tiny vision models that fit within a few hundred kilobytes of memory footprint in terms of peak usage and model size on device storage. To facilitate the development of microcontroller friendly models, we present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset. We anticipate the proposed dataset will advance the research on tiny vision models that can push the pareto-optimal boundary in terms of accuracy versus memory usage for microcontroller applications.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
Geo-Aware Networks for Fine-Grained Recognition
Authors:
Grace Chu,
Brian Potetz,
Weijun Wang,
Andrew Howard,
Yang Song,
Fernando Brucher,
Thomas Leung,
Hartwig Adam
Abstract:
Fine-grained recognition distinguishes among categories with subtle visual differences. In order to differentiate between these challenging visual categories, it is helpful to leverage additional information. Geolocation is a rich source of additional information that can be used to improve fine-grained classification accuracy, but has been understudied. Our contributions to this field are twofold…
▽ More
Fine-grained recognition distinguishes among categories with subtle visual differences. In order to differentiate between these challenging visual categories, it is helpful to leverage additional information. Geolocation is a rich source of additional information that can be used to improve fine-grained classification accuracy, but has been understudied. Our contributions to this field are twofold. First, to the best of our knowledge, this is the first paper which systematically examined various ways of incorporating geolocation information into fine-grained image classification through the use of geolocation priors, post-processing or feature modulation. Secondly, to overcome the situation where no fine-grained dataset has complete geolocation information, we release two fine-grained datasets with geolocation by providing complementary information to existing popular datasets - iNaturalist and YFCC100M. By leveraging geolocation information we improve top-1 accuracy in iNaturalist from 70.1% to 79.0% for a strong baseline image-only model. Comparing several models, we found that best performance was achieved by a post-processing model that consumed the output of the image-only baseline alongside geolocation. However, for a resource-constrained model (MobileNetV2), performance was better with a feature modulation model that trains jointly over pixels and geolocation: accuracy increased from 59.6% to 72.2%. Our work makes a strong case for incorporating geolocation information in fine-grained recognition models for both server and on-device.
△ Less
Submitted 4 September, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Searching for MobileNetV3
Authors:
Andrew Howard,
Mark Sandler,
Grace Chu,
Liang-Chieh Chen,
Bo Chen,
Mingxing Tan,
Weijun Wang,
Yukun Zhu,
Ruoming Pang,
Vijay Vasudevan,
Quoc V. Le,
Hartwig Adam
Abstract:
We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration…
▽ More
We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation. For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation. MobileNetV3-Large is 3.2\% more accurate on ImageNet classification while reducing latency by 15\% compared to MobileNetV2. MobileNetV3-Small is 4.6\% more accurate while reducing latency by 5\% compared to MobileNetV2. MobileNetV3-Large detection is 25\% faster at roughly the same accuracy as MobileNetV2 on COCO detection. MobileNetV3-Large LR-ASPP is 30\% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.
△ Less
Submitted 20 November, 2019; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Low-Power Computer Vision: Status, Challenges, Opportunities
Authors:
Sergei Alyamkin,
Matthew Ardi,
Alexander C. Berg,
Achille Brighton,
Bo Chen,
Yiran Chen,
Hsin-Pai Cheng,
Zichen Fan,
Chen Feng,
Bo Fu,
Kent Gauen,
Abhinav Goel,
Alexander Goncharenko,
Xuyang Guo,
Soonhoi Ha,
Andrew Howard,
Xiao Hu,
Yuanjun Huang,
Donghyun Kang,
Jaeyoun Kim,
Jong Gook Ko,
Alexander Kondratyev,
Junhyeok Lee,
Seungjae Lee,
Suwoong Lee
, et al. (19 additional authors not shown)
Abstract:
Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batte…
▽ More
Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners' solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
An Interactive Robotic Framework to Facilitate Sensory Experiences for Children with ASD
Authors:
Hifza Javed,
Rachael Burns,
Myounghoon Jeon,
Ayanna M. Howard,
Chung Hyuk Park
Abstract:
The diagnosis of Autism Spectrum Disorder (ASD) in children is commonly accompanied by a diagnosis of sensory processing disorders as well. Abnormalities are usually reported in multiple sensory processing domains, showing a higher prevalence of unusual responses, particularly to tactile, auditory and visual stimuli. This paper discusses a novel robot-based framework designed to target sensory dif…
▽ More
The diagnosis of Autism Spectrum Disorder (ASD) in children is commonly accompanied by a diagnosis of sensory processing disorders as well. Abnormalities are usually reported in multiple sensory processing domains, showing a higher prevalence of unusual responses, particularly to tactile, auditory and visual stimuli. This paper discusses a novel robot-based framework designed to target sensory difficulties faced by children with ASD in a controlled setting. The setup consists of a number of sensory stations, together with robotic agents that navigate the stations and interact with the stimuli as they are presented. These stimuli are designed to resemble real world scenarios that form a common part of one's everyday experiences. Given the strong interest of children with ASD in technology in general and robots in particular, we attempt to utilize our robotic platform to demonstrate socially acceptable responses to the stimuli in an interactive, pedagogical setting that encourages the child's social, motor and vocal skills, while providing a diverse sensory experience. A user study was conducted to evaluate the efficacy of the proposed framework, with a total of 18 participants (5 with ASD and 13 typically develo**) between the ages of 4 and 12 years. We describe our methods of data collection, coding of video data and the analysis of the results obtained from the study. We also discuss the limitations of the current work and detail our plans for the future work to improve the validity of the obtained results.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning
Authors:
Pramod Kaushik Mudrakarta,
Mark Sandler,
Andrey Zhmoginov,
Andrew Howard
Abstract:
We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network…
▽ More
We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network to perform well on qualitatively different problems (e.g. converting a Single Shot MultiBox Detection (SSD) model into a 1000-class image classification model while reusing 98% of parameters of the SSD feature extractor). Similarly, we show that re-learning existing low-parameter layers (such as depth-wise convolutions) while kee** the rest of the network frozen also improves transfer-learning accuracy significantly. Our approach allows both simultaneous (multi-task) as well as sequential transfer learning. In several multi-task learning problems, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task performance.
△ Less
Submitted 23 February, 2019; v1 submitted 24 October, 2018;
originally announced October 2018.
-
2018 Low-Power Image Recognition Challenge
Authors:
Sergei Alyamkin,
Matthew Ardi,
Achille Brighton,
Alexander C. Berg,
Yiran Chen,
Hsin-Pai Cheng,
Bo Chen,
Zichen Fan,
Chen Feng,
Bo Fu,
Kent Gauen,
Jongkook Go,
Alexander Goncharenko,
Xuyang Guo,
Hong Hanh Nguyen,
Andrew Howard,
Yuanjun Huang,
Donghyun Kang,
Jaeyoun Kim,
Alexander Kondratyev,
Seungjae Lee,
Suwoong Lee,
Junhyeok Lee,
Zhiyu Liang,
Xin Liu
, et al. (16 additional authors not shown)
Abstract:
The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times.…
▽ More
The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times. As computer vision is widely used in many battery-powered systems (such as drones and mobile phones), the need for low-power computer vision will become increasingly important. This paper summarizes LPIRC 2018 by describing the three different tracks and the winners' solutions.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Authors:
Mingxing Tan,
Bo Chen,
Ruoming Pang,
Vijay Vasudevan,
Mark Sandler,
Andrew Howard,
Quoc V. Le
Abstract:
Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose a…
▽ More
Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8x faster than MobileNetV2 [29] with 0.5% higher accuracy and 2.3x faster than NASNet [36] with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet
△ Less
Submitted 28 May, 2019; v1 submitted 30 July, 2018;
originally announced July 2018.
-
Does Removing Stereotype Priming Remove Bias? A Pilot Human-Robot Interaction Study
Authors:
Tobi Ogunyale,
De'Aira Bryant,
Ayanna Howard
Abstract:
Robots capable of participating in complex social interactions have shown great potential in a variety of applications. As these robots grow more popular, it is essential to continuously evaluate the dynamics of the human-robot relationship. One factor shown to have potential impacts on this critical relationship is the human projection of stereotypes onto social robots, a practice that is implici…
▽ More
Robots capable of participating in complex social interactions have shown great potential in a variety of applications. As these robots grow more popular, it is essential to continuously evaluate the dynamics of the human-robot relationship. One factor shown to have potential impacts on this critical relationship is the human projection of stereotypes onto social robots, a practice that is implicitly known to effect both developers and users of this technology. As such, in this research, we wished to investigate the difference in participants' perceptions of the robot interaction if we removed stereotype priming. This has not yet been a common practice in similar studies. Given the stereotypes of emotions among ethnic groups, especially in the U.S., this study specifically sought to investigate the impact that robot "skin color" could potentially have on the human perception of a robot's emotional expressive behavior. A between-subject experiment with 198 individuals was conducted. The results showed no significant differences in the overall emotion classification or intensity ratings for the different robot skin colors. These results lend credence to our hypothesis that when individuals are not primed with information related to human stereotypes, robots are evaluated based on functional attributes versus stereotypical attributes. This provides some confidence that robots, if designed correctly, can potentially be used as a tool to override stereotype-based biases associated with human behavior.
△ Less
Submitted 2 July, 2018;
originally announced July 2018.
-
Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
Authors:
Yin Cui,
Yang Song,
Chen Sun,
Andrew Howard,
Serge Belongie
Abstract:
Transferring the knowledge learned from large scale datasets (e.g., ImageNet) via fine-tuning offers an effective solution for domain-specific fine-grained visual categorization (FGVC) tasks (e.g., recognizing bird species or car make and model). In such scenarios, data annotation often calls for specialized domain knowledge and thus is difficult to scale. In this work, we first tackle a problem i…
▽ More
Transferring the knowledge learned from large scale datasets (e.g., ImageNet) via fine-tuning offers an effective solution for domain-specific fine-grained visual categorization (FGVC) tasks (e.g., recognizing bird species or car make and model). In such scenarios, data annotation often calls for specialized domain knowledge and thus is difficult to scale. In this work, we first tackle a problem in large scale FGVC. Our method won first place in iNaturalist 2017 large scale species classification challenge. Central to the success of our approach is a training scheme that uses higher image resolution and deals with the long-tailed distribution of training data. Next, we study transfer learning via fine-tuning from large scale datasets to small scale, domain-specific FGVC datasets. We propose a measure to estimate domain similarity via Earth Mover's Distance and demonstrate that transfer learning benefits from pre-training on a source domain that is similar to the target domain by this measure. Our proposed transfer learning outperforms ImageNet pre-training and obtains state-of-the-art results on multiple commonly used FGVC datasets.
△ Less
Submitted 16 June, 2018;
originally announced June 2018.
-
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
Authors:
Tien-Ju Yang,
Andrew Howard,
Bo Chen,
Xiao Zhang,
Alec Go,
Mark Sandler,
Vivienne Sze,
Hartwig Adam
Abstract:
This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt in…
▽ More
This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7$\times$ speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2).
△ Less
Submitted 28 September, 2018; v1 submitted 9 April, 2018;
originally announced April 2018.
-
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Authors:
Mark Sandler,
Andrew Howard,
Menglong Zhu,
Andrey Zhmoginov,
Liang-Chieh Chen
Abstract:
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic se…
▽ More
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3.
The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters
△ Less
Submitted 21 March, 2019; v1 submitted 12 January, 2018;
originally announced January 2018.
-
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Authors:
Benoit Jacob,
Skirmantas Kligys,
Bo Chen,
Menglong Zhu,
Matthew Tang,
Andrew Howard,
Hartwig Adam,
Dmitry Kalenichenko
Abstract:
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware.…
▽ More
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
△ Less
Submitted 15 December, 2017;
originally announced December 2017.
-
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Authors:
Andrew G. Howard,
Menglong Zhu,
Bo Chen,
Dmitry Kalenichenko,
Weijun Wang,
Tobias Weyand,
Marco Andreetto,
Hartwig Adam
Abstract:
We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choo…
▽ More
We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.
-
The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition
Authors:
Jonathan Krause,
Benjamin Sapp,
Andrew Howard,
Howard Zhou,
Alexander Toshev,
Tom Duerig,
James Philbin,
Li Fei-Fei
Abstract:
Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes. Second, train a model utilizing this data. Toward the goal of solving fine-grained recognition, we introduce an alternative approach, leveraging free, noisy data from the web and…
▽ More
Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes. Second, train a model utilizing this data. Toward the goal of solving fine-grained recognition, we introduce an alternative approach, leveraging free, noisy data from the web and simple, generic methods of recognition. This approach has benefits in both performance and scalability. We demonstrate its efficacy on four fine-grained datasets, greatly exceeding existing state of the art without the manual collection of even a single label, and furthermore show first results at scaling to more than 10,000 fine-grained categories. Quantitatively, we achieve top-1 accuracies of 92.3% on CUB-200-2011, 85.4% on Birdsnap, 93.4% on FGVC-Aircraft, and 80.8% on Stanford Dogs without using their annotated training sets. We compare our approach to an active learning approach for expanding fine-grained datasets.
△ Less
Submitted 18 October, 2016; v1 submitted 20 November, 2015;
originally announced November 2015.
-
Some Improvements on Deep Convolutional Neural Network Based Image Classification
Authors:
Andrew G. Howard
Abstract:
We investigate multiple techniques to improve upon the current state of the art deep convolutional neural network based image classification pipeline. The techiques include adding more image transformations to training data, adding more transformations to generate additional predictions at test time and using complementary models applied to higher resolution images. This paper summarizes our entry…
▽ More
We investigate multiple techniques to improve upon the current state of the art deep convolutional neural network based image classification pipeline. The techiques include adding more image transformations to training data, adding more transformations to generate additional predictions at test time and using complementary models applied to higher resolution images. This paper summarizes our entry in the Imagenet Large Scale Visual Recognition Challenge 2013. Our system achieved a top 5 classification error rate of 13.55% using no external data which is over a 20% relative improvement on the previous year's winner.
△ Less
Submitted 18 December, 2013;
originally announced December 2013.