-
Hierarchical Decision Mamba
Authors:
André Correia,
Luís A. Alexandre
Abstract:
Recent advancements in imitation learning have been largely fueled by the integration of sequence models, which provide a structured flow of information to effectively mimic task behaviours. Currently, Decision Transformer (DT) and subsequently, the Hierarchical Decision Transformer (HDT), presented Transformer-based approaches to learn task policies. Recently, the Mamba architecture has shown to…
▽ More
Recent advancements in imitation learning have been largely fueled by the integration of sequence models, which provide a structured flow of information to effectively mimic task behaviours. Currently, Decision Transformer (DT) and subsequently, the Hierarchical Decision Transformer (HDT), presented Transformer-based approaches to learn task policies. Recently, the Mamba architecture has shown to outperform Transformers across various task domains. In this work, we introduce two novel methods, Decision Mamba (DM) and Hierarchical Decision Mamba (HDM), aimed at enhancing the performance of the Transformer models. Through extensive experimentation across diverse environments such as OpenAI Gym and D4RL, leveraging varying demonstration data sets, we demonstrate the superiority of Mamba models over their Transformer counterparts in a majority of tasks. Results show that HDM outperforms other methods in most settings. The code can be found at https://github.com/meowatthemoon/HierarchicalDecisionMamba.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Music to Dance as Language Translation using Sequence Models
Authors:
André Correia,
Luís A. Alexandre
Abstract:
Synthesising appropriate choreographies from music remains an open problem. We introduce MDLT, a novel approach that frames the choreography generation problem as a translation task. Our method leverages an existing data set to learn to translate sequences of audio into corresponding dance poses. We present two variants of MDLT: one utilising the Transformer architecture and the other employing th…
▽ More
Synthesising appropriate choreographies from music remains an open problem. We introduce MDLT, a novel approach that frames the choreography generation problem as a translation task. Our method leverages an existing data set to learn to translate sequences of audio into corresponding dance poses. We present two variants of MDLT: one utilising the Transformer architecture and the other employing the Mamba architecture. We train our method on AIST++ and PhantomDance data sets to teach a robotic arm to dance, but our method can be applied to a full humanoid robot. Evaluation metrics, including Average Joint Error and Frechet Inception Distance, consistently demonstrate that, when given a piece of music, MDLT excels at producing realistic and high-quality choreography. The code can be found at github.com/meowatthemoon/MDLT.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development
Authors:
Seyed Jalaleddin Mousavirad,
Luís A. Alexandre
Abstract:
Energy consumption is a fundamental concern in mobile application development, bearing substantial significance for both developers and end-users. Main objective of this research is to propose a novel neural network-based framework, enhanced by a metaheuristic approach, to achieve robust energy prediction in the context of mobile app development. The metaheuristic approach here aims to achieve two…
▽ More
Energy consumption is a fundamental concern in mobile application development, bearing substantial significance for both developers and end-users. Main objective of this research is to propose a novel neural network-based framework, enhanced by a metaheuristic approach, to achieve robust energy prediction in the context of mobile app development. The metaheuristic approach here aims to achieve two goals: 1) identifying suitable learning algorithms and their corresponding hyperparameters, and 2) determining the optimal number of layers and neurons within each layer. Moreover, due to limitations in accessing certain aspects of a mobile phone, there might be missing data in the data set, and the proposed framework can handle this. In addition, we conducted an optimal algorithm selection strategy, employing 13 base and advanced metaheuristic algorithms, to identify the best algorithm based on accuracy and resistance to missing values. The representation in our proposed metaheuristic algorithm is variable-size, meaning that the length of the candidate solutions changes over time. We compared the algorithms based on the architecture found by each algorithm at different levels of missing values, accuracy, F-measure, and stability analysis. Additionally, we conducted a Wilcoxon signed-rank test for statistical comparison of the results. The extensive experiments show that our proposed approach significantly improves energy consumption prediction. Particularly, the JADE algorithm, a variant of Differential Evolution (DE), DE, and the Covariance Matrix Adaptation Evolution Strategy deliver superior results under various conditions and across different missing value levels.
△ Less
Submitted 4 June, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
A Metaheuristic-based Machine Learning Approach for Energy Prediction in Mobile App Development
Authors:
Seyed Jalaleddin Mousavirad,
Luís A. Alexandre
Abstract:
Energy consumption plays a vital role in mobile App development for developers and end-users, and it is considered one of the most crucial factors for purchasing a smartphone. In addition, in terms of sustainability, it is essential to find methods to reduce the energy consumption of mobile devices since the extensive use of billions of smartphones worldwide significantly impacts the environment.…
▽ More
Energy consumption plays a vital role in mobile App development for developers and end-users, and it is considered one of the most crucial factors for purchasing a smartphone. In addition, in terms of sustainability, it is essential to find methods to reduce the energy consumption of mobile devices since the extensive use of billions of smartphones worldwide significantly impacts the environment. Despite the existence of several energy-efficient programming practices in Android, the leading mobile ecosystem, machine learning-based energy prediction algorithms for mobile App development have yet to be reported. Therefore, this paper proposes a histogram-based gradient boosting classification machine (HGBC), boosted by a metaheuristic approach, for energy prediction in mobile App development. Our metaheuristic approach is responsible for two issues. First, it finds redundant and irrelevant features without any noticeable change in performance. Second, it performs a hyper-parameter tuning for the HGBC algorithm. Since our proposed metaheuristic approach is algorithm-independent, we selected 12 algorithms for the search strategy to find the optimal search algorithm. Our finding shows that a success-history-based parameter adaption for differential evolution with linear population size (L-SHADE) offers the best performance. It can improve performance and decrease the number of features effectively. Our extensive set of experiments clearly shows that our proposed approach can provide significant results for energy consumption prediction.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Are Neural Architecture Search Benchmarks Well Designed? A Deeper Look Into Operation Importance
Authors:
Vasco Lopes,
Bruno Degardin,
Luís A. Alexandre
Abstract:
Neural Architecture Search (NAS) benchmarks significantly improved the capability of develo** and comparing NAS methods while at the same time drastically reduced the computational overhead by providing meta-information about thousands of trained neural networks. However, tabular benchmarks have several drawbacks that can hinder fair comparisons and provide unreliable results. These usually focu…
▽ More
Neural Architecture Search (NAS) benchmarks significantly improved the capability of develo** and comparing NAS methods while at the same time drastically reduced the computational overhead by providing meta-information about thousands of trained neural networks. However, tabular benchmarks have several drawbacks that can hinder fair comparisons and provide unreliable results. These usually focus on providing a small pool of operations in heavily constrained search spaces -- usually cell-based neural networks with pre-defined outer-skeletons. In this work, we conducted an empirical analysis of the widely used NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks in terms of their generability and how different operations influence the performance of the generated architectures. We found that only a subset of the operation pool is required to generate architectures close to the upper-bound of the performance range. Also, the performance distribution is negatively skewed, having a higher density of architectures in the upper-bound range. We consistently found convolution layers to have the highest impact on the architecture's performance, and that specific combination of operations favors top-scoring architectures. These findings shed insights on the correct evaluation and comparison of NAS methods using NAS benchmarks, showing that directly searching on NAS-Bench-201, ImageNet16-120 and TransNAS-Bench-101 produces more reliable results than searching only on CIFAR-10. Furthermore, with this work we provide suggestions for future benchmark evaluations and design. The code used to conduct the evaluations is available at https://github.com/VascoLopes/NAS-Benchmark-Evaluation.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
A Survey on Task Allocation and Scheduling in Robotic Network Systems
Authors:
Saeid Alirezazadeh,
Luís A. Alexandre
Abstract:
Cloud Robotics is hel** to create a new generation of robots that leverage the nearly unlimited resources of large data centers (i.e., the cloud), overcoming the limitations imposed by on-board resources. Different processing power, capabilities, resource sizes, energy consumption, and so forth, make scheduling and task allocation critical components. The basic idea of task allocation and schedu…
▽ More
Cloud Robotics is hel** to create a new generation of robots that leverage the nearly unlimited resources of large data centers (i.e., the cloud), overcoming the limitations imposed by on-board resources. Different processing power, capabilities, resource sizes, energy consumption, and so forth, make scheduling and task allocation critical components. The basic idea of task allocation and scheduling is to optimize performance by minimizing completion time, energy consumption, delays between two consecutive tasks, along with others, and maximizing resource utilization, number of completed tasks in a given time interval, and suchlike. In the past, several works have addressed various aspects of task allocation and scheduling. In this paper, we provide a comprehensive overview of task allocation and scheduling strategies and related metrics suitable for robotic network cloud systems. We discuss the issues related to allocation and scheduling methods and the limitations that need to be overcome. The literature review is organized according to three different viewpoints: Architectures and Applications, Methods and Parameters. In addition, the limitations of each method are highlighted for future research.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
A Survey of Demonstration Learning
Authors:
André Correia,
Luís A. Alexandre
Abstract:
With the fast improvement of machine learning, reinforcement learning (RL) has been used to automate human tasks in different areas. However, training such agents is difficult and restricted to expert users. Moreover, it is mostly limited to simulation environments due to the high cost and safety concerns of interactions in the real world. Demonstration Learning is a paradigm in which an agent lea…
▽ More
With the fast improvement of machine learning, reinforcement learning (RL) has been used to automate human tasks in different areas. However, training such agents is difficult and restricted to expert users. Moreover, it is mostly limited to simulation environments due to the high cost and safety concerns of interactions in the real world. Demonstration Learning is a paradigm in which an agent learns to perform a task by imitating the behavior of an expert shown in demonstrations. It is a relatively recent area in machine learning, but it is gaining significant traction due to having tremendous potential for learning complex behaviors from demonstrations. Learning from demonstration accelerates the learning process by improving sample efficiency, while also reducing the effort of the programmer. Due to learning without interacting with the environment, demonstration learning would allow the automation of a wide range of real world applications such as robotics and healthcare. This paper provides a survey of demonstration learning, where we formally introduce the demonstration problem along with its main challenges and provide a comprehensive overview of the process of learning from demonstrations from the creation of the demonstration data set, to learning methods from demonstrations, and optimization by combining demonstration learning with different machine learning methods. We also review the existing benchmarks and identify their strengths and limitations. Additionally, we discuss the advantages and disadvantages of the paradigm as well as its main applications. Lastly, we discuss our perspective on open problems and research directions for this rapidly growing field.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Metaheuristic-based Energy-aware Image Compression for Mobile App Development
Authors:
Seyed Jalaleddin Mousavirad,
Luís A Alexandre
Abstract:
The JPEG standard is widely used in different image processing applications. One of the main components of the JPEG standard is the quantisation table (QT) since it plays a vital role in the image properties such as image quality and file size. In recent years, several efforts based on population-based metaheuristic (PBMH) algorithms have been performed to find the proper QT(s) for a specific imag…
▽ More
The JPEG standard is widely used in different image processing applications. One of the main components of the JPEG standard is the quantisation table (QT) since it plays a vital role in the image properties such as image quality and file size. In recent years, several efforts based on population-based metaheuristic (PBMH) algorithms have been performed to find the proper QT(s) for a specific image, although they do not take into consideration the user opinion in advance. Take an android developer as an example, who prefers a small-size image, while the optimisation process results in a high-quality image, leading to a huge file size. Another pitfall of the current works is a lack of comprehensive coverage, meaning that the QT(s) can not provide all possible combinations of file size and quality. Therefore, this paper aims to propose three distinct contributions. First, to include the user opinion in the compression process, the file size of the output image can be controlled by a user in advance. To this end, we propose a novel objective function for population-based JPEG image compression. Second, to tackle the lack of comprehensive coverage, we suggest a novel representation. Our proposed representation can not only provide more comprehensive coverage but also find the proper value for the quality factor for a specific image without any background knowledge. Both changes in representation and objective function are independent of the search strategies and can be used with any type of population-based metaheuristic (PBMH) algorithm. Therefore, as the third contribution, we also provide a comprehensive benchmark on 22 state-of-the-art and recently-introduced PBMH algorithms. Our extensive experiments on different benchmark images and in terms of different criteria show that our novel formulation for JPEG image compression can work effectively.
△ Less
Submitted 20 April, 2023; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Hierarchical Decision Transformer
Authors:
André Correia,
Luís A. Alexandre
Abstract:
Sequence models in reinforcement learning require task knowledge to estimate the task policy. This paper presents a hierarchical algorithm for learning a sequence model from demonstrations. The high-level mechanism guides the low-level controller through the task by selecting sub-goals for the latter to reach. This sequence replaces the returns-to-go of previous methods, improving its performance…
▽ More
Sequence models in reinforcement learning require task knowledge to estimate the task policy. This paper presents a hierarchical algorithm for learning a sequence model from demonstrations. The high-level mechanism guides the low-level controller through the task by selecting sub-goals for the latter to reach. This sequence replaces the returns-to-go of previous methods, improving its performance overall, especially in tasks with longer episodes and scarcer rewards. We validate our method in multiple tasks of OpenAIGym, D4RL and RoboMimic benchmarks. Our method outperforms the baselines in eight out of ten tasks of varied horizons and reward frequencies without prior task knowledge, showing the advantages of the hierarchical model approach for learning from demonstrations using a sequence model.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Energy-Aware JPEG Image Compression: A Multi-Objective Approach
Authors:
Seyed Jalaleddin Mousavirad,
Luís A. Alexandre
Abstract:
Customer satisfaction is crucially affected by energy consumption in mobile devices. One of the most energy-consuming parts of an application is images. While different images with different quality consume different amounts of energy, there are no straightforward methods to calculate the energy consumption of an operation in a typical image. This paper, first, investigates that there is a correla…
▽ More
Customer satisfaction is crucially affected by energy consumption in mobile devices. One of the most energy-consuming parts of an application is images. While different images with different quality consume different amounts of energy, there are no straightforward methods to calculate the energy consumption of an operation in a typical image. This paper, first, investigates that there is a correlation between energy consumption and image quality as well as image file size. Therefore, these two can be considered as a proxy for energy consumption. Then, we propose a multi-objective strategy to enhance image quality and reduce image file size based on the quantisation tables in JPEG image compression. To this end, we have used two general multi-objective metaheuristic approaches: scalarisation and Pareto-based. Scalarisation methods find a single optimal solution based on combining different objectives, while Pareto-based techniques aim to achieve a set of solutions. In this paper, we embed our strategy into five scalarisation algorithms, including energy-aware multi-objective genetic algorithm (EnMOGA), energy-aware multi-objective particle swarm optimisation (EnMOPSO), energy-aware multi-objective differential evolution (EnMODE), energy-aware multi-objective evolutionary strategy (EnMOES), and energy-aware multi-objective pattern search (EnMOPS). Also, two Pareto-based methods, including a non-dominated sorting genetic algorithm (NSGA-II) and a reference-point-based NSGA-II (NSGA-III) are used for the embedding scheme, and two Pareto-based algorithms, EnNSGAII and EnNSGAIII, are presented. Experimental studies show that the performance of the baseline algorithm is improved by embedding the proposed strategy into metaheuristic algorithms.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Exploration, Path Planning with Obstacle and Collision Avoidance in a Dynamic Environment
Authors:
Saeid Alirezazadeh,
Luís A. Alexandre
Abstract:
If we give a robot the task of moving an object from its current position to another location in an unknown environment, the robot must explore the map, identify all types of obstacles, and then determine the best route to complete the task. We proposed a mathematical model to find an optimal path planning that avoids collisions with all static and moving obstacles and has the minimum completion t…
▽ More
If we give a robot the task of moving an object from its current position to another location in an unknown environment, the robot must explore the map, identify all types of obstacles, and then determine the best route to complete the task. We proposed a mathematical model to find an optimal path planning that avoids collisions with all static and moving obstacles and has the minimum completion time and the minimum distance traveled. In this model, the bounding box around obstacles and robots is not considered, so the robot can move very close to the obstacles without colliding with them. We considered two types of obstacles: deterministic, which include all static obstacles such as walls that do not move and all moving obstacles whose movements have a fixed pattern, and non-deterministic, which include all obstacles whose movements can occur in any direction with some probability distribution at any time. We also consider the acceleration and deceleration of the robot to improve collision avoidance.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Guided Evolutionary Neural Architecture Search With Efficient Performance Estimation
Authors:
Vasco Lopes,
Miguel Santos,
Bruno Degardin,
Luís A. Alexandre
Abstract:
Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. This paper proposes GEA, a novel approach for guided NAS. GEA guides the evolution by exploring the search space by generating and evaluating several…
▽ More
Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. This paper proposes GEA, a novel approach for guided NAS. GEA guides the evolution by exploring the search space by generating and evaluating several architectures in each generation at initialisation stage using a zero-proxy estimator, where only the highest-scoring architecture is trained and kept for the next generation. Subsequently, GEA continuously extracts knowledge about the search space without increased complexity by generating several off-springs from an existing architecture at each generation. More, GEA forces exploitation of the most performant architectures by descendant generation while simultaneously driving exploration through parent mutation and favouring younger architectures to the detriment of older ones. Experimental results demonstrate the effectiveness of the proposed method, and extensive ablation studies evaluate the importance of different parameters. Results show that GEA achieves state-of-the-art results on all data sets of NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks.
△ Less
Submitted 22 July, 2022;
originally announced August 2022.
-
LudVision -- Remote Detection of Exotic Invasive Aquatic Floral Species using Drone-Mounted Multispectral Data
Authors:
António J. Abreu,
Luís A. Alexandre,
João A. Santos,
Filippo Basso
Abstract:
Remote sensing is the process of detecting and monitoring the physical characteristics of an area by measuring its reflected and emitted radiation at a distance. It is being broadly used to monitor ecosystems, mainly for their preservation. Ever-growing reports of invasive species have affected the natural balance of ecosystems. Exotic invasive species have a critical impact when introduced into n…
▽ More
Remote sensing is the process of detecting and monitoring the physical characteristics of an area by measuring its reflected and emitted radiation at a distance. It is being broadly used to monitor ecosystems, mainly for their preservation. Ever-growing reports of invasive species have affected the natural balance of ecosystems. Exotic invasive species have a critical impact when introduced into new ecosystems and may lead to the extinction of native species. In this study, we focus on Ludwigia peploides, considered by the European Union as an aquatic invasive species. Its presence can negatively impact the surrounding ecosystem and human activities such as agriculture, fishing, and navigation. Our goal was to develop a method to identify the presence of the species. We used images collected by a drone-mounted multispectral sensor to achieve this, creating our LudVision data set. To identify the targeted species on the collected images, we propose a new method for detecting Ludwigia p. in multispectral images. The method is based on existing state-of-the-art semantic segmentation methods modified to handle multispectral data. The proposed method achieved a producer's accuracy of 79.9% and a user's accuracy of 95.5%.
△ Less
Submitted 13 July, 2022; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Towards Less Constrained Macro-Neural Architecture Search
Authors:
Vasco Lopes,
Luís A. Alexandre
Abstract:
Networks found with Neural Architecture Search (NAS) achieve state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, most NAS methods heavily rely on human-defined assumptions that constrain the search: architecture's outer-skeletons, number of layers, parameter heuristics and search spaces. Additionally, common search spaces consist of repeatable modul…
▽ More
Networks found with Neural Architecture Search (NAS) achieve state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, most NAS methods heavily rely on human-defined assumptions that constrain the search: architecture's outer-skeletons, number of layers, parameter heuristics and search spaces. Additionally, common search spaces consist of repeatable modules (cells) instead of fully exploring the architecture's search space by designing entire architectures (macro-search). Imposing such constraints requires deep human expertise and restricts the search to pre-defined settings. In this paper, we propose LCMNAS, a method that pushes NAS to less constrained search spaces by performing macro-search without relying on pre-defined heuristics or bounded search spaces. LCMNAS introduces three components for the NAS pipeline: i) a method that leverages information about well-known architectures to autonomously generate complex search spaces based on Weighted Directed Graphs with hidden properties, ii) an evolutionary search strategy that generates complete architectures from scratch, and iii) a mixed-performance estimation approach that combines information about architectures at initialization stage and lower fidelity estimates to infer their trainability and capacity to model complex functions. We present experiments in 13 different data sets showing that LCMNAS is capable of generating both cell and macro-based architectures with minimal GPU computation and state-of-the-art results. More, we conduct extensive studies on the importance of different NAS components in both cell and macro-based settings. Code for reproducibility is public at https://github.com/VascoLopes/LCMNAS.
△ Less
Submitted 6 January, 2023; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Contrastive Learning from Demonstrations
Authors:
André Correia,
Luís A. Alexandre
Abstract:
This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick and place. We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information while suppr…
▽ More
This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick and place. We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN triplet baseline. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning, and in all cases the results improve when compared to state-of-the-art approaches, with the added benefit of reduced number of training iterations.
△ Less
Submitted 7 November, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
MPF6D: Masked Pyramid Fusion 6D Pose Estimation
Authors:
Nuno Pereira,
Luís A. Alexandre
Abstract:
Object pose estimation has multiple important applications, such as robotic gras** and augmented reality. We present a new method to estimate the 6D pose of objects that improves upon the accuracy of current proposals and can still be used in real-time. Our method uses RGB-D data as input to segment objects and estimate their pose. It uses a neural network with multiple heads to identify the obj…
▽ More
Object pose estimation has multiple important applications, such as robotic gras** and augmented reality. We present a new method to estimate the 6D pose of objects that improves upon the accuracy of current proposals and can still be used in real-time. Our method uses RGB-D data as input to segment objects and estimate their pose. It uses a neural network with multiple heads to identify the objects in the scene, generate the appropriate masks and estimate the values of the translation vectors and the quaternion that represents the objects' rotation. These heads leverage a pyramid architecture used during feature extraction and feature fusion. We conduct an empirical evaluation using the two most common datasets in the area, and compare against state-of-the-art approaches, illustrating the capabilities of MPF6D. Our method can be used in real-time with its low inference time and high accuracy.
△ Less
Submitted 4 February, 2022; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Guided Evolution for Neural Architecture Search
Authors:
Vasco Lopes,
Miguel Santos,
Bruno Degardin,
Luís A. Alexandre
Abstract:
Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. In this paper, we propose G-EA, a novel approach for guided evolutionary NAS. The rationale behind G-EA, is to explore the search space by generating…
▽ More
Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. In this paper, we propose G-EA, a novel approach for guided evolutionary NAS. The rationale behind G-EA, is to explore the search space by generating and evaluating several architectures in each generation at initialization stage using a zero-proxy estimator, where only the highest-scoring network is trained and kept for the next generation. This evaluation at initialization stage allows continuous extraction of knowledge from the search space without increasing computation, thus allowing the search to be efficiently guided. Moreover, G-EA forces exploitation of the most performant networks by descendant generation while at the same time forcing exploration by parent mutation and by favouring younger architectures to the detriment of older ones. Experimental results demonstrate the effectiveness of the proposed method, showing that G-EA achieves state-of-the-art results in NAS-Bench-201 search space in CIFAR-10, CIFAR-100 and ImageNet16-120, with mean accuracies of 93.98%, 72.12% and 45.94% respectively.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Optimal Algorithm Allocation for Robotic Network Cloud Systems
Authors:
Saeid Alirezazadeh,
André Correia,
Luís A. Alexandre
Abstract:
A robotic network is a system with multiple robots connected by a communication network. Certain tasks that cannot be accomplished with available robotic resources are candidates for the use of cloud robotics, which overcomes the limitations of the robot network by adding to the network, either local or remote servers or cloud infrastructure, to aid in computational demanding tasks or storage. Pre…
▽ More
A robotic network is a system with multiple robots connected by a communication network. Certain tasks that cannot be accomplished with available robotic resources are candidates for the use of cloud robotics, which overcomes the limitations of the robot network by adding to the network, either local or remote servers or cloud infrastructure, to aid in computational demanding tasks or storage. Previous studies have mainly focused on minimizing the cost of the robots in retrieving resources by knowing the resource allocation in advance. We develop a method for a robotic network cloud system that includes robots, fog and cloud nodes, to determine where each algorithm should be allocated so that the system achieves optimal performance, regardless of which robot initiates the request. We can find the minimum required memory for the robots and the optimal way to allocate the algorithms with the shortest time to complete each task. We experimentally compare our method with a state-of-the-art method, using real-world data, showing the improvements that can be obtained.
△ Less
Submitted 27 May, 2022; v1 submitted 26 April, 2021;
originally announced April 2021.
-
EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search
Authors:
Vasco Lopes,
Saeid Alirezazadeh,
Luís A. Alexandre
Abstract:
Neural Architecture Search (NAS) has shown excellent results in designing architectures for computer vision problems. NAS alleviates the need for human-defined settings by automating architecture design and engineering. However, NAS methods tend to be slow, as they require large amounts of GPU computation. This bottleneck is mainly due to the performance estimation strategy, which requires the eva…
▽ More
Neural Architecture Search (NAS) has shown excellent results in designing architectures for computer vision problems. NAS alleviates the need for human-defined settings by automating architecture design and engineering. However, NAS methods tend to be slow, as they require large amounts of GPU computation. This bottleneck is mainly due to the performance estimation strategy, which requires the evaluation of the generated architectures, mainly by training them, to update the sampler method. In this paper, we propose EPE-NAS, an efficient performance estimation strategy, that mitigates the problem of evaluating networks, by scoring untrained networks and creating a correlation with their trained performance. We perform this process by looking at intra and inter-class correlations of an untrained network. We show that EPE-NAS can produce a robust correlation and that by incorporating it into a simple random sampling strategy, we are able to search for competitive networks, without requiring any training, in a matter of seconds using a single GPU. Moreover, EPE-NAS is agnostic to the search method, since it focuses on the evaluation of untrained networks, making it easy to integrate into almost any NAS method.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
An AutoML-based Approach to Multimodal Image Sentiment Analysis
Authors:
Vasco Lopes,
António Gaspar,
Luís A. Alexandre,
João Cordeiro
Abstract:
Sentiment analysis is a research topic focused on analysing data to extract information related to the sentiment that it causes. Applications of sentiment analysis are wide, ranging from recommendation systems, and marketing to customer satisfaction. Recent approaches evaluate textual content using Machine Learning techniques that are trained over large corpora. However, as social media grown, oth…
▽ More
Sentiment analysis is a research topic focused on analysing data to extract information related to the sentiment that it causes. Applications of sentiment analysis are wide, ranging from recommendation systems, and marketing to customer satisfaction. Recent approaches evaluate textual content using Machine Learning techniques that are trained over large corpora. However, as social media grown, other data types emerged in large quantities, such as images. Sentiment analysis in images has shown to be a valuable complement to textual data since it enables the inference of the underlying message polarity by creating context and connections. Multimodal sentiment analysis approaches intend to leverage information of both textual and image content to perform an evaluation. Despite recent advances, current solutions still flounder in combining both image and textual information to classify social media data, mainly due to subjectivity, inter-class homogeneity and fusion data differences. In this paper, we propose a method that combines both textual and image individual sentiment analysis into a final fused classification based on AutoML, that performs a random search to find the best model. Our method achieved state-of-the-art performance in the B-T4SA dataset, with 95.19% accuracy.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Improving Makespan in Dynamic Task Scheduling for Cloud Robotic Systems with Time Window Constraints
Authors:
Saeid Alirezazadeh,
Luís A. Alexandre
Abstract:
A scheduling method in a robotic network cloud system with minimal makespan is beneficial as the system can complete all the tasks assigned to it in the fastest way. Robotic network cloud systems can be translated into graphs where nodes represent hardware with independent computing power and edges represent data transmissions between nodes. Time window constraints on tasks are a natural way to or…
▽ More
A scheduling method in a robotic network cloud system with minimal makespan is beneficial as the system can complete all the tasks assigned to it in the fastest way. Robotic network cloud systems can be translated into graphs where nodes represent hardware with independent computing power and edges represent data transmissions between nodes. Time window constraints on tasks are a natural way to order tasks. The makespan is the maximum amount of time between when the first node to receive a task starts executing its first scheduled task and when all nodes have completed their last scheduled task. Load balancing allocation and scheduling ensures that the time between when the first node completes its scheduled tasks and when all other nodes complete their scheduled tasks is as short as possible. We propose a grid of all tasks to ensure that the time window constraints for tasks are met. We propose grid of all tasks balancing algorithm for distributing and scheduling tasks with minimum makespan. We theoretically prove the correctness of the proposed algorithm and present simulations illustrating the obtained results.
△ Less
Submitted 7 June, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Auto-Classifier: A Robust Defect Detector Based on an AutoML Head
Authors:
Vasco Lopes,
Luís A. Alexandre
Abstract:
The dominant approach for surface defect detection is the use of hand-crafted feature-based methods. However, this falls short when conditions vary that affect extracted images. So, in this paper, we sought to determine how well several state-of-the-art Convolutional Neural Networks perform in the task of surface defect detection. Moreover, we propose two methods: CNN-Fusion, that fuses the predic…
▽ More
The dominant approach for surface defect detection is the use of hand-crafted feature-based methods. However, this falls short when conditions vary that affect extracted images. So, in this paper, we sought to determine how well several state-of-the-art Convolutional Neural Networks perform in the task of surface defect detection. Moreover, we propose two methods: CNN-Fusion, that fuses the prediction of all the networks into a final one, and Auto-Classifier, which is a novel proposal that improves a Convolutional Neural Network by modifying its classification component using AutoML. We carried out experiments to evaluate the proposed methods in the task of surface defect detection using different datasets from DAGM2007. We show that the use of Convolutional Neural Networks achieves better results than traditional methods, and also, that Auto-Classifier out-performs all other methods, by achieving 100% accuracy and 100% AUC results throughout all the datasets.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
HMCNAS: Neural Architecture Search using Hidden Markov Chains and Bayesian Optimization
Authors:
Vasco Lopes,
Luís A. Alexandre
Abstract:
Neural Architecture Search has achieved state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, many assumptions, that require human definition, related with the problems being solved or the models generated are still needed: final model architectures, number of layers to be sampled, forced operations, small search spaces, which ultimately contributes t…
▽ More
Neural Architecture Search has achieved state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, many assumptions, that require human definition, related with the problems being solved or the models generated are still needed: final model architectures, number of layers to be sampled, forced operations, small search spaces, which ultimately contributes to having models with higher performances at the cost of inducing bias into the system. In this paper, we propose HMCNAS, which is composed of two novel components: i) a method that leverages information about human-designed models to autonomously generate a complex search space, and ii) an Evolutionary Algorithm with Bayesian Optimization that is capable of generating competitive CNNs from scratch, without relying on human-defined parameters or small search spaces. The experimental results show that the proposed approach results in competitive architectures obtained in a very short time. HMCNAS provides a step towards generalizing NAS, by providing a way to create competitive models, without requiring any human knowledge about the specific task.
△ Less
Submitted 31 July, 2020;
originally announced July 2020.
-
Dynamic Task Allocation for Robotic Network Cloud Systems
Authors:
Saeid Alirezazadeh,
Luís A. Alexandre
Abstract:
Every robotic network cloud system can be seen as a graph with nodes as hardware with independent computational processing powers and edges as data transmissions between nodes. When assigning a task to a node we may change several values corresponding to the node such as distance to other nodes, the time to complete all of its tasks, the energy level of the node, energy consumed while performing a…
▽ More
Every robotic network cloud system can be seen as a graph with nodes as hardware with independent computational processing powers and edges as data transmissions between nodes. When assigning a task to a node we may change several values corresponding to the node such as distance to other nodes, the time to complete all of its tasks, the energy level of the node, energy consumed while performing all of its tasks, geometrical position, communication with other nodes, and so on. These values can be seen as fingerprints for the current state of the node which can be evaluated as a subspace of a hyperspace. We proposed a theoretical model describing how assigning tasks to a node will change the subspace of the hyperspace, and from that, we show how to obtain the optimal task allocation. We described the communication instability between nodes and the capability of nodes as subspaces of a hyperspace. We translate task scheduling to nodes as finding the maximum volume of the hyperspace.
△ Less
Submitted 22 July, 2020;
originally announced July 2020.
-
Optimal Algorithm Allocation for Single Robot Cloud Systems
Authors:
Saeid Alirezazadeh,
Luís A. Alexandre
Abstract:
In order for a robot to perform a task, several algorithms need to be executed, sometimes, simultaneously. Algorithms can be run either on the robot itself or, upon request, be performed on cloud infrastructure. The term cloud infrastructure is used to describe hardware, storage, abstracted resources, and network resources related to cloud computing. Depending on the decisions on where to execute…
▽ More
In order for a robot to perform a task, several algorithms need to be executed, sometimes, simultaneously. Algorithms can be run either on the robot itself or, upon request, be performed on cloud infrastructure. The term cloud infrastructure is used to describe hardware, storage, abstracted resources, and network resources related to cloud computing. Depending on the decisions on where to execute the algorithms, the overall execution time and necessary memory space for the robot will change accordingly. The price of a robot depends, among other things, on its memory capacity and computational power. We answer the question of how to keep a given performance and use a cheaper robot (lower resources) by assigning computational tasks to the cloud infrastructure, depending on memory, computational power, and communication constraints. Also, for a fixed robot, our model provides a way to have optimal overall performance. We provide a general model for the optimal decision of algorithm allocation under certain constraints. We exemplify the model with simulation results. The main advantage of our model is that it provides an optimal task allocation simultaneously for memory and time.
△ Less
Submitted 8 February, 2022; v1 submitted 19 March, 2020;
originally announced March 2020.
-
MaskedFusion: Mask-based 6D Object Pose Estimation
Authors:
Nuno Pereira,
Luís A. Alexandre
Abstract:
MaskedFusion is a framework to estimate the 6D pose of objects using RGB-D data, with an architecture that leverages multiple sub-tasks in a pipeline to achieve accurate 6D poses. 6D pose estimation is an open challenge due to complex world objects and many possible problems when capturing data from the real world, e.g., occlusions, truncations, and noise in the data. Achieving accurate 6D poses w…
▽ More
MaskedFusion is a framework to estimate the 6D pose of objects using RGB-D data, with an architecture that leverages multiple sub-tasks in a pipeline to achieve accurate 6D poses. 6D pose estimation is an open challenge due to complex world objects and many possible problems when capturing data from the real world, e.g., occlusions, truncations, and noise in the data. Achieving accurate 6D poses will improve results in other open problems like robot gras** or positioning objects in augmented reality. MaskedFusion improves the state-of-the-art by using object masks to eliminate non-relevant data. With the inclusion of the masks on the neural network that estimates the 6D pose of an object we also have features that represent the object shape. MaskedFusion is a modular pipeline where each sub-task can have different methods that achieve the objective. MaskedFusion achieved 97.3% on average using the ADD metric on the LineMOD dataset and 93.3% using the ADD-S AUC metric on YCB-Video Dataset, which is an improvement, compared to the state-of-the-art methods. The code is available on GitHub (https://github.com/kroglice/MaskedFusion).
△ Less
Submitted 18 March, 2020; v1 submitted 18 November, 2019;
originally announced November 2019.
-
A crossinggram for random fields on lattices
Authors:
Helena Ferreira,
Marta Ferreira,
Luís A. Alexandre
Abstract:
The modeling of risk situations that occur in a space-time framework can be done using max-stable random fields on lattices. Although the summary coefficients for the spatial and temporal behaviour do not characterize the finite-dimensional distributions of the random field, they have the advantage of being immediate to interpret and easier to estimate. The coefficients that we propose give us inf…
▽ More
The modeling of risk situations that occur in a space-time framework can be done using max-stable random fields on lattices. Although the summary coefficients for the spatial and temporal behaviour do not characterize the finite-dimensional distributions of the random field, they have the advantage of being immediate to interpret and easier to estimate. The coefficients that we propose give us information about the tendency of a random field for local oscillations of its values in relation to real valued high levels. It is not the magnitude of the oscillations that is being evaluated, but rather the greater or lesser number of oscillations, that is, the tendency of the trajectories to oscillate. We can observe surface trajectories more smooth over a region according to higher crossinggram value. It takes value in $[0,1]$ and increases with the concordance of the variables of the random field.
△ Less
Submitted 13 February, 2020; v1 submitted 26 June, 2019;
originally announced June 2019.
-
A Time-Segmented Consortium Blockchain for Robotic Event Registration
Authors:
Miguel Fernandes,
Luís A. Alexandre
Abstract:
A blockchain, during its lifetime, records large amounts of data, that in a common usage its kept on its entirety. In a robotics environment, the old information is useful for human evaluation, or oracles interfacing with the blockchain but it is not useful for the robots that require only current information in order to continue their work. This causes a storage problem in blockchain nodes that h…
▽ More
A blockchain, during its lifetime, records large amounts of data, that in a common usage its kept on its entirety. In a robotics environment, the old information is useful for human evaluation, or oracles interfacing with the blockchain but it is not useful for the robots that require only current information in order to continue their work. This causes a storage problem in blockchain nodes that have limited storage capacity, such as in the case of nodes attached to robots that are usually built around embedded solutions. This paper presents a time-segmentation solution for devices with limited storage capacity, integrated in a particular robot-directed blockchain called RobotChain. Results are presented regarding the proposed solution that show that the goal of restricting each node's capacity is reached without compromising all the benefits that arise from the use of blockchains in these contexts, and on the contrary, it allows for cheap nodes to use this blockchain, reduces storage costs and allows faster deployment of new nodes.
△ Less
Submitted 14 May, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Controlling Robots using Artificial Intelligence and a Consortium Blockchain
Authors:
Vasco Lopes,
Luís A. Alexandre,
Nuno Pereira
Abstract:
Blockchain is a disruptive technology that is normally used within financial applications, however it can be very beneficial also in certain robotic contexts, such as when an immutable register of events is required. Among the several properties of Blockchain that can be useful within robotic environments, we find not just immutability but also decentralization of the data, irreversibility, access…
▽ More
Blockchain is a disruptive technology that is normally used within financial applications, however it can be very beneficial also in certain robotic contexts, such as when an immutable register of events is required. Among the several properties of Blockchain that can be useful within robotic environments, we find not just immutability but also decentralization of the data, irreversibility, accessibility and non-repudiation. In this paper, we propose an architecture that uses blockchain as a ledger and smart-contract technology for robotic control by using external parties, Oracles, to process data. We show how to register events in a secure way, how it is possible to use smart-contracts to control robots and how to interface with external Artificial Intelligence algorithms for image analysis. The proposed architecture is modular and can be used in multiple contexts such as in manufacturing, network control, robot control, and others, since it is easy to integrate, adapt, maintain and extend to new domains.
△ Less
Submitted 2 March, 2019;
originally announced March 2019.
-
An Overview of Blockchain Integration with Robotics and Artificial Intelligence
Authors:
Vasco Lopes,
Luís A. Alexandre
Abstract:
Blockchain technology is growing everyday at a fast-passed rhythm and it's possible to integrate it with many systems, namely Robotics with AI services. However, this is still a recent field and there isn't yet a clear understanding of what it could potentially become. In this paper, we conduct an overview of many different methods and platforms that try to leverage the power of blockchain into ro…
▽ More
Blockchain technology is growing everyday at a fast-passed rhythm and it's possible to integrate it with many systems, namely Robotics with AI services. However, this is still a recent field and there isn't yet a clear understanding of what it could potentially become. In this paper, we conduct an overview of many different methods and platforms that try to leverage the power of blockchain into robotic systems, to improve AI services or to solve problems that are present in the major blockchains, which can lead to the ability of creating robotic systems with increased capabilities and security. We present an overview, discuss the methods and conclude the paper with our view on the future of the integration of these technologies.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.
-
SeNA-CNN: Overcoming Catastrophic Forgetting in Convolutional Neural Networks by Selective Network Augmentation
Authors:
Abel S. Zacarias,
Luís A. Alexandre
Abstract:
Lifelong learning aims to develop machine learning systems that can learn new tasks while preserving the performance on previous learned tasks. In this paper we present a method to overcome catastrophic forgetting on convolutional neural networks, that learns new tasks and preserves the performance on old tasks without accessing the data of the original model, by selective network augmentation. Th…
▽ More
Lifelong learning aims to develop machine learning systems that can learn new tasks while preserving the performance on previous learned tasks. In this paper we present a method to overcome catastrophic forgetting on convolutional neural networks, that learns new tasks and preserves the performance on old tasks without accessing the data of the original model, by selective network augmentation. The experiment results showed that SeNA-CNN, in some scenarios, outperforms the state-of-art Learning without Forgetting algorithm. Results also showed that in some situations it is better to use SeNA-CNN instead of training a neural network using isolated learning.
△ Less
Submitted 9 May, 2018; v1 submitted 22 February, 2018;
originally announced February 2018.
-
Stacked Denoising Autoencoders and Transfer Learning for Immunogold Particles Detection and Recognition
Authors:
Ricardo Gamelas Sousa,
Jorge M. Santos,
Luís M. Silva,
Luís A. Alexandre,
Tiago Esteves,
Sara Rocha,
Paulo Monjardino,
Joaquim Marques de Sá,
Francisco Figueiredo,
Pedro Quelhas
Abstract:
In this paper we present a system for the detection of immunogold particles and a Transfer Learning (TL) framework for the recognition of these immunogold particles. Immunogold particles are part of a high-magnification method for the selective localization of biological molecules at the subcellular level only visible through Electron Microscopy. The number of immunogold particles in the cell wall…
▽ More
In this paper we present a system for the detection of immunogold particles and a Transfer Learning (TL) framework for the recognition of these immunogold particles. Immunogold particles are part of a high-magnification method for the selective localization of biological molecules at the subcellular level only visible through Electron Microscopy. The number of immunogold particles in the cell walls allows the assessment of the differences in their compositions providing a tool to analise the quality of different plants. For its quantization one requires a laborious manual labeling (or annotation) of images containing hundreds of particles. The system that is proposed in this paper can leverage significantly the burden of this manual task.
For particle detection we use a LoG filter coupled with a SDA. In order to improve the recognition, we also study the applicability of TL settings for immunogold recognition. TL reuses the learning model of a source problem on other datasets (target problems) containing particles of different sizes. The proposed system was developed to solve a particular problem on maize cells, namely to determine the composition of cell wall ingrowths in endosperm transfer cells. This novel dataset as well as the code for reproducing our experiments is made publicly available.
We determined that the LoG detector alone attained more than 84\% of accuracy with the F-measure. Develo** immunogold recognition with TL also provided superior performance when compared with the baseline models augmenting the accuracy rates by 10\%.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.
-
Distributed learning of CNNs on heterogeneous CPU/GPU architectures
Authors:
Jose Marques,
Gabriel Falcao,
Luís A. Alexandre
Abstract:
Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not even the adoption of Graphics Processing U…
▽ More
Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not even the adoption of Graphics Processing Units (GPUs) could keep up to. This problem is partially solved by using more processing units and distributed training methods that are offered by several frameworks dedicated to neural network training. However, these techniques do not take full advantage of the possible parallelization offered by CNNs and the cooperative use of heterogeneous devices with different processing capabilities, clock speeds, memory size, among others. This paper presents a new method for the parallel training of CNNs that can be considered as a particular instantiation of model parallelism, where only the convolutional layer is distributed. In fact, the convolutions processed during training (forward and backward propagation included) represent from $60$-$90$\% of global processing time. The paper analyzes the influence of network size, bandwidth, batch size, number of devices, including their processing capabilities, and other parameters. Results show that this technique is capable of diminishing the training time without affecting the classification performance for both CPUs and GPUs. For the CIFAR-10 dataset, using a CNN with two convolutional layers, and $500$ and $1500$ kernels, respectively, best speedups achieve $3.28\times$ using four CPUs and $2.45\times$ with three GPUs. Modern imaging datasets, larger and more complex than CIFAR-10 will certainly require more than $60$-$90$\% of processing time calculating convolutions, and speedups will tend to increase accordingly.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.
-
Distribution-Based Categorization of Classifier Transfer Learning
Authors:
Ricardo Gamelas Sousa,
Luís A. Alexandre,
Jorge M. Santos,
Luís M. Silva,
Joaquim Marques de Sá
Abstract:
Transfer Learning (TL) aims to transfer knowledge acquired in one problem, the source problem, onto another problem, the target problem, dispensing with the bottom-up construction of the target model. Due to its relevance, TL has gained significant interest in the Machine Learning community since it paves the way to devise intelligent learning models that can easily be tailored to many different a…
▽ More
Transfer Learning (TL) aims to transfer knowledge acquired in one problem, the source problem, onto another problem, the target problem, dispensing with the bottom-up construction of the target model. Due to its relevance, TL has gained significant interest in the Machine Learning community since it paves the way to devise intelligent learning models that can easily be tailored to many different applications. As it is natural in a fast evolving area, a wide variety of TL methods, settings and nomenclature have been proposed so far. However, a wide range of works have been reporting different names for the same concepts. This concept and terminology mixture contribute however to obscure the TL field, hindering its proper consideration. In this paper we present a review of the literature on the majority of classification TL methods, and also a distribution-based categorization of TL with a common nomenclature suitable to classification problems. Under this perspective three main TL categories are presented, discussed and illustrated with examples.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
Understanding trained CNNs by indexing neuron selectivity
Authors:
Ivet Rafegas,
Maria Vanrell,
Luis A. Alexandre,
Guillem Arias
Abstract:
The impressive performance of Convolutional Neural Networks (CNNs) when solving different vision problems is shadowed by their black-box nature and our consequent lack of understanding of the representations they build and how these representations are organized. To help understanding these issues, we propose to describe the activity of individual neurons by their Neuron Feature visualization and…
▽ More
The impressive performance of Convolutional Neural Networks (CNNs) when solving different vision problems is shadowed by their black-box nature and our consequent lack of understanding of the representations they build and how these representations are organized. To help understanding these issues, we propose to describe the activity of individual neurons by their Neuron Feature visualization and quantify their inherent selectivity with two specific properties. We explore selectivity indexes for: an image feature (color); and an image label (class membership). Our contribution is a framework to seek or classify neurons by indexing on these selectivity properties. It helps to find color selective neurons, such as a red-mushroom neuron in layer Conv4 or class selective neurons such as dog-face neurons in layer Conv5 in VGG-M, and establishes a methodology to derive other selectivity properties. Indexing on neuron selectivity can statistically draw how features and classes are represented through layers in a moment when the size of trained nets is growing and automatic tools to index neurons can be helpful.
△ Less
Submitted 21 May, 2019; v1 submitted 1 February, 2017;
originally announced February 2017.