-
Lightweight Learner for Shared Knowledge Lifelong Learning
Authors:
Yunhao Ge,
Yuecheng Li,
Di Wu,
Ao Xu,
Adam M. Jones,
Amanda Sofie Rios,
Iordanis Fostiropoulos,
Shixian Wen,
Po-Hsuan Huang,
Zachary William Murdock,
Gozde Sahin,
Shuo Ni,
Kiran Lekkala,
Sumedh Anand Sontakke,
Laurent Itti
Abstract:
In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentral…
▽ More
In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel. After learning their respective tasks, agents share and consolidate their knowledge over a decentralized communication network, so that, in the end, all agents can master all tasks. We present one solution to SKILL which uses Lightweight Lifelong Learning (LLL) agents, where the goal is to facilitate efficient sharing by minimizing the fraction of the agent that is specialized for any given task. Each LLL agent thus consists of a common task-agnostic immutable part, where most parameters are, and individual task-specific modules that contain fewer parameters but are adapted to each task. Agents share their task-specific modules, plus summary information ("task anchors") representing their tasks in the common task-agnostic latent space of all agents. Receiving agents register each received task-specific module using the corresponding anchor. Thus, every agent improves its ability to solve new tasks each time new task-specific modules and anchors are received. On a new, very challenging SKILL-102 dataset with 102 image classification tasks (5,033 classes in total, 2,041,225 training, 243,464 validation, and 243,464 test images), we achieve much higher (and SOTA) accuracy over 8 LL baselines, while also achieving near perfect parallelization. Code and data can be found at https://github.com/gyhandy/Shared-Knowledge-Lifelong-Learning
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Ferroelectric FET based Context-Switching FPGA Enabling Dynamic Reconfiguration for Adaptive Deep Learning Machines
Authors:
Yixin Xu,
Zijian Zhao,
Yi Xiao,
Tongguang Yu,
Halid Mulaosmanovic,
Dominik Kleimaier,
Stefan Duenkel,
Sven Beyer,
Xiao Gong,
Rajiv Joshi,
X. Sharon Hu,
Shixian Wen,
Amanda Sofie Rios,
Kiran Lekkala,
Laurent Itti,
Eric Homan,
Sumitha George,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perfor…
▽ More
Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perform technology-circuit-architecture co-design to break this tradeoff with no additional area cost and lower power consumption compared with conventional designs while providing dynamic reconfiguration, which can hide the reconfiguration time behind the execution time. Leveraging the intrinsic transistor structure and non-volatility of ferroelectric FET (FeFET), compact FPGA primitives are proposed and experimentally verified, including 1FeFET look-up table (LUT) cell, 1FeFET routing cell for connection blocks (CBs) and switch boxes (SBs). To support dynamic reconfiguration, two local copies of primitives are placed in parallel, which enables loading of arbitrary configuration without interrupting the active configuration execution. A comprehensive evaluation shows that compared with the SRAM-based FPGA, our dynamic reconfiguration design shows 63.0%/71.1% reduction in LUT/CB area and 82.7%/53.6% reduction in CB/SB power consumption with minimal penalty in the critical path delay (9.6%). We further implement a Super-Sub network model to show the benefit from the context-switching capability of our design. We also evaluate the timing performance of our design over conventional FPGA in various application scenarios. In one scenario that users switch between two preloaded configurations, our design yields significant time saving by 78.7% on average. In the other scenario of implementing multiple configurations with dynamic reconfiguration, our design offers time saving of 20.3% on average.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
What can we learn from misclassified ImageNet images?
Authors:
Shixian Wen,
Amanda Sofie Rios,
Kiran Lekkala,
Laurent Itti
Abstract:
Understanding the patterns of misclassified ImageNet images is particularly important, as it could guide us to design deep neural networks (DNN) that generalize better. However, the richness of ImageNet imposes difficulties for researchers to visually find any useful patterns of misclassification. Here, to help find these patterns, we propose "Superclassing ImageNet dataset". It is a subset of Ima…
▽ More
Understanding the patterns of misclassified ImageNet images is particularly important, as it could guide us to design deep neural networks (DNN) that generalize better. However, the richness of ImageNet imposes difficulties for researchers to visually find any useful patterns of misclassification. Here, to help find these patterns, we propose "Superclassing ImageNet dataset". It is a subset of ImageNet which consists of 10 superclasses, each containing 7-116 related subclasses (e.g., 52 bird types, 116 dog types). By training neural networks on this dataset, we found that: (i) Misclassifications are rarely across superclasses, but mainly among subclasses within a superclass. (ii) Ensemble networks trained each only on subclasses of a given superclass perform better than the same network trained on all subclasses of all superclasses. Hence, we propose a two-stage Super-Sub framework, and demonstrate that: (i) The framework improves overall classification performance by 3.3%, by first inferring a superclass using a generalist superclass-level network, and then using a specialized network for final subclass-level classification. (ii) Although the total parameter storage cost increases to a factor N+1 for N superclasses compared to using a single network, with finetuning, delta and quantization aware training techniques this can be reduced to 0.2N+1. Another advantage of this efficient implementation is that the memory cost on the GPU during inference is equivalent to using only one network. The reason is we initiate each subclass-level network through addition of small parameter variations (deltas) to the superclass-level network. (iii) Finally, our framework promises to be more scalable and generalizable than the common alternative of simply scaling up a vanilla network in size, since very large networks often suffer from overfitting and gradient vanishing.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
ShapeY: Measuring Shape Recognition Capacity Using Nearest Neighbor Matching
Authors:
Jong Woo Nam,
Amanda S. Rios,
Bartlett W. Mel
Abstract:
Object recognition in humans depends primarily on shape cues. We have developed a new approach to measuring the shape recognition performance of a vision system based on nearest neighbor view matching within the system's embedding space. Our performance benchmark, ShapeY, allows for precise control of task difficulty, by enforcing that view matching span a specified degree of 3D viewpoint change a…
▽ More
Object recognition in humans depends primarily on shape cues. We have developed a new approach to measuring the shape recognition performance of a vision system based on nearest neighbor view matching within the system's embedding space. Our performance benchmark, ShapeY, allows for precise control of task difficulty, by enforcing that view matching span a specified degree of 3D viewpoint change and/or appearance change. As a first test case we measured the performance of ResNet50 pre-trained on ImageNet. Matching error rates were high. For example, a 27 degree change in object pitch led ResNet50 to match the incorrect object 45% of the time. Appearance changes were also highly disruptive. Examination of false matches indicates that ResNet50's embedding space is severely "tangled". These findings suggest ShapeY can be a useful tool for charting the progress of artificial vision systems towards human-level shape recognition capabilities.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.