Search | arXiv e-print repository

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Authors: Mark D. McDonnell, Dong Gong, Ehsan Abbasnejad, Anton van den Hengel

Abstract: Continual learning requires a model to adapt to ongoing changes in the data distribution, and often to the set of tasks to be performed. It is rare, however, that the data and task changes are completely unpredictable. Given a description of an overarching goal or data theme, which we call a realm, humans can often guess what concepts are associated with it. We show here that the combination of a… ▽ More Continual learning requires a model to adapt to ongoing changes in the data distribution, and often to the set of tasks to be performed. It is rare, however, that the data and task changes are completely unpredictable. Given a description of an overarching goal or data theme, which we call a realm, humans can often guess what concepts are associated with it. We show here that the combination of a large language model and an image generation model can similarly provide useful premonitions as to how a continual learning challenge might develop over time. We use the large language model to generate text descriptions of semantically related classes that might potentially appear in the data stream in future. These descriptions are then rendered using Stable Diffusion to generate new labelled image samples. The resulting synthetic dataset is employed for supervised pre-training, but is discarded prior to commencing continual learning, along with the pre-training classification head. We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem, thus becoming a valuable input to any existing continual learning method. Although there are complexities arising from the domain gap between real and synthetic images, we show that pre-training models in this manner improves multiple Class Incremenal Learning (CIL) methods on fine-grained image classification benchmarks. Supporting code can be found at https://github.com/cl-premonition/premonition. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 31 pages total (14 main paper, 5 references, 12 appendices)

arXiv:2307.02251 [pdf, other]

RanPAC: Random Projections and Pre-trained Models for Continual Learning

Authors: Mark D. McDonnell, Dong Gong, Amin Parveneh, Ehsan Abbasnejad, Anton van den Hengel

Abstract: Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for v… ▽ More Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 20% and 62% on seven class-incremental benchmarks, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast CL has not hitherto been fully tapped. Code is at github.com/RanPAC/RanPAC. △ Less

Submitted 15 January, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: 32 pages, 11 figures

Journal ref: 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023), Dec 2023, New Orleans, United States

arXiv:1907.06916 [pdf, other]

Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems

Authors: Mark D. McDonnell, Hesham Mostafa, Runchun Wang, Andre van Schaik

Abstract: Batch-normalization (BN) layers are thought to be an integrally important layer type in today's state-of-the-art deep convolutional neural networks for computer vision tasks such as classification and detection. However, BN layers introduce complexity and computational overheads that are highly undesirable for training and/or inference on low-power custom hardware implementations of real-time embe… ▽ More Batch-normalization (BN) layers are thought to be an integrally important layer type in today's state-of-the-art deep convolutional neural networks for computer vision tasks such as classification and detection. However, BN layers introduce complexity and computational overheads that are highly undesirable for training and/or inference on low-power custom hardware implementations of real-time embedded vision systems such as UAVs, robots and Internet of Things (IoT) devices. They are also problematic when batch sizes need to be very small during training, and innovations such as residual connections introduced more recently than BN layers could potentially have lessened their impact. In this paper we aim to quantify the benefits BN layers offer in image classification networks, in comparison with alternative choices. In particular, we study networks that use shifted-ReLU layers instead of BN layers. We found, following experiments with wide residual networks applied to the ImageNet, CIFAR 10 and CIFAR 100 image classification datasets, that BN layers do not consistently offer a significant advantage. We found that the accuracy margin offered by BN layers depends on the data set, the network size, and the bit-depth of weights. We conclude that in situations where BN layers are undesirable due to speed, memory or complexity costs, that using shifted-ReLU layers instead should be considered; we found they can offer advantages in all these areas, and often do not impose a significant accuracy cost. △ Less

Submitted 22 July, 2019; v1 submitted 16 July, 2019; originally announced July 2019.

Comments: 8 pages, published IEEE conference paper

arXiv:1810.03241 [pdf, ps, other]

Diagnosing Convolutional Neural Networks using their Spectral Response

Authors: Victor Stamatescu, Mark D. McDonnell

Abstract: Convolutional Neural Networks (CNNs) are a class of artificial neural networks whose computational blocks use convolution, together with other linear and non-linear operations, to perform classification or regression. This paper explores the spectral response of CNNs and its potential use in diagnosing problems with their training. We measure the gain of CNNs trained for image classification on Im… ▽ More Convolutional Neural Networks (CNNs) are a class of artificial neural networks whose computational blocks use convolution, together with other linear and non-linear operations, to perform classification or regression. This paper explores the spectral response of CNNs and its potential use in diagnosing problems with their training. We measure the gain of CNNs trained for image classification on ImageNet and observe that the best models are also the most sensitive to perturbations of their input. Further, we perform experiments on MNIST and CIFAR-10 to find that the gain rises as the network learns and then saturates as the network converges. Moreover, we find that strong gain fluctuations can point to overfitting and learning problems caused by a poor choice of learning rate. We argue that the gain of CNNs can act as a diagnostic tool and potential replacement for the validation loss when hold-out validation data are not available. △ Less

Submitted 7 October, 2018; originally announced October 2018.

ACM Class: I.5.2; I.4.7

arXiv:1802.08530 [pdf, other]

Training wide residual networks for deployment using a single bit for each weight

Authors: Mark D. McDonnell

Abstract: For fast and energy-efficient deployment of trained deep neural networks on resource-constrained embedded hardware, each learned weight parameter should ideally be represented and stored using a single bit. Error-rates usually increase when this requirement is imposed. Here, we report large improvements in error rates on multiple datasets, for deep convolutional neural networks deployed with 1-bit… ▽ More For fast and energy-efficient deployment of trained deep neural networks on resource-constrained embedded hardware, each learned weight parameter should ideally be represented and stored using a single bit. Error-rates usually increase when this requirement is imposed. Here, we report large improvements in error rates on multiple datasets, for deep convolutional neural networks deployed with 1-bit-per-weight. Using wide residual networks as our main baseline, our approach simplifies existing methods that binarize weights by applying the sign function in training; we apply scaling factors for each layer with constant unlearned values equal to the layer-specific standard deviations used for initialization. For CIFAR-10, CIFAR-100 and ImageNet, and models with 1-bit-per-weight requiring less than 10 MB of parameter memory, we achieve error rates of 3.9%, 18.5% and 26.0% / 8.5% (Top-1 / Top-5) respectively. We also considered MNIST, SVHN and ImageNet32, achieving 1-bit-per-weight test results of 0.27%, 1.9%, and 41.3% / 19.1% respectively. For CIFAR, our error rates halve previously reported values, and are within about 1% of our error-rates for the same network with full-precision weights. For networks that overfit, we also show significant improvements in error rate by not learning batch normalization scale and offset parameters. This applies to both full precision and 1-bit-per-weight networks. Using a warm-restart learning-rate schedule, we found that training for 1-bit-per-weight is just as fast as full-precision networks, with better accuracy than standard schedules, and achieved about 98%-99% of peak performance in just 62 training epochs for CIFAR-10/100. For full training code and trained models in MATLAB, Keras and PyTorch see https://github.com/McDonnell-Lab/1-bit-per-weight/ . △ Less

Submitted 23 February, 2018; originally announced February 2018.

Comments: Published as a conference paper at ICLR 2018

Journal ref: ICLR 2018 - International Conference on Learning Representations, Apr 2018, Vancouver, Canada. 2018

arXiv:1704.06415 [pdf, other]

doi 10.1109/TIP.2017.2696744

Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition

Authors: Sebastien C. Wong, Victor Stamatescu, Adam Gatt, David Kearney, Ivan Lee, Mark D. McDonnell

Abstract: This paper addresses the problem of online tracking and classification of multiple objects in an image sequence. Our proposed solution is to first track all objects in the scene without relying on object-specific prior knowledge, which in other systems can take the form of hand-crafted features or user-based track initialization. We then classify the tracked objects with a fast-learning image clas… ▽ More This paper addresses the problem of online tracking and classification of multiple objects in an image sequence. Our proposed solution is to first track all objects in the scene without relying on object-specific prior knowledge, which in other systems can take the form of hand-crafted features or user-based track initialization. We then classify the tracked objects with a fast-learning image classifier that is based on a shallow convolutional neural network architecture and demonstrate that object recognition improves when this is combined with object state information from the tracking algorithm. We argue that by transferring the use of prior knowledge from the detection and tracking stages to the classification stage we can design a robust, general purpose object recognition system with the ability to detect and track a variety of object types. We describe our biologically inspired implementation, which adaptively learns the shape and motion of tracked objects, and apply it to the Neovision2 Tower benchmark data set, which contains multiple object types. An experimental evaluation demonstrates that our approach is competitive with state-of-the-art video object recognition systems that do make use of object-specific prior knowledge in detection and tracking, while providing additional practical advantages by virtue of its generality. △ Less

Submitted 21 April, 2017; originally announced April 2017.

Comments: 15 pages

ACM Class: I.4.8

arXiv:1609.08764 [pdf, ps, other]

Understanding data augmentation for classification: when to warp?

Authors: Sebastien C. Wong, Adam Gatt, Victor Stamatescu, Mark D. McDonnell

Abstract: In this paper we investigate the benefit of augmenting data with synthetically created samples when training a machine learning classifier. Two approaches for creating additional training samples are data war**, which generates additional samples through transformations applied in the data-space, and synthetic over-sampling, which creates additional samples in feature-space. We experimentally ev… ▽ More In this paper we investigate the benefit of augmenting data with synthetically created samples when training a machine learning classifier. Two approaches for creating additional training samples are data war**, which generates additional samples through transformations applied in the data-space, and synthetic over-sampling, which creates additional samples in feature-space. We experimentally evaluate the benefits of data augmentation for a convolutional backpropagation-trained neural network, a convolutional support vector machine and a convolutional extreme learning machine classifier, using the standard MNIST handwritten digit dataset. We found that while it is possible to perform generic augmentation in feature-space, if plausible transforms for the data are known then augmentation in data-space provides a greater benefit for improving performance and reducing overfitting. △ Less

Submitted 26 November, 2016; v1 submitted 28 September, 2016; originally announced September 2016.

Comments: 6 pages, 6 figures, DICTA 2016 conference

ACM Class: I.5.2; I.4.7

arXiv:1503.04596 [pdf, other]

Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network

Authors: Mark D. McDonnell, Tony Vladusich

Abstract: We present a neural network architecture and training method designed to enable very rapid training and low implementation complexity. Due to its training speed and very few tunable parameters, the method has strong potential for applications requiring frequent retraining or online training. The approach is characterized by (a) convolutional filters based on biologically inspired visual processing… ▽ More We present a neural network architecture and training method designed to enable very rapid training and low implementation complexity. Due to its training speed and very few tunable parameters, the method has strong potential for applications requiring frequent retraining or online training. The approach is characterized by (a) convolutional filters based on biologically inspired visual processing filters, (b) randomly-valued classifier-stage input weights, (c) use of least squares regression to train the classifier output weights in a single batch, and (d) linear classifier-stage output units. We demonstrate the efficacy of the method by applying it to image classification. Our results match existing state-of-the-art results on the MNIST (0.37% error) and NORB-small (2.2% error) image classification databases, but with very fast training times compared to standard deep network approaches. The network's performance on the Google Street View House Number (SVHN) (4% error) database is also competitive with state-of-the art methods. △ Less

Submitted 15 August, 2015; v1 submitted 16 March, 2015; originally announced March 2015.

Comments: 7 pages, 2 figures, Paper at IJCNN 2015 (International Joint Conference on Neural Networks, 2015)

arXiv:1412.8307 [pdf, other]

doi 10.1371/journal.pone.0134254

Fast, simple and accurate handwritten digit classification by training shallow neural network classifiers with the 'extreme learning machine' algorithm

Authors: Mark D. McDonnell, Migel D. Tissera, Tony Vladusich, André van Schaik, Jonathan Tapson

Abstract: Recent advances in training deep (multi-layer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1% on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional… ▽ More Recent advances in training deep (multi-layer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1% on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional neural networks. This is achieved by training such networks using the 'Extreme Learning Machine' (ELM) approach, which also enables a very rapid training time (~10 minutes). Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5% error rates on the NORB image database. To achieve these results, we introduce several enhancements to the standard ELM algorithm, which individually and in combination can significantly improve performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random `receptive field' sampling of the input ensures the input weight matrix is sparse, with about 90% of weights equal to zero. Furthermore, combining our methods with a small number of iterations of a single-batch backpropagation method can significantly reduce the number of hidden-units required to achieve a particular performance. Our close to state-of-the-art results for MNIST and NORB suggest that the ease of use and accuracy of the ELM algorithm for designing a single-hidden-layer neural network classifier should cause it to be given greater consideration either as a standalone method for simpler problems, or as the final classification stage in deep neural networks applied to more difficult problems. △ Less

Submitted 22 July, 2015; v1 submitted 29 December, 2014; originally announced December 2014.

Comments: Accepted for publication; 9 pages of text, 6 figures and 1 table

arXiv:1404.3104 [pdf, other]

doi 10.1109/INFCOMW.2014.6849229

Transmit Pulse Sha** for Molecular Communications

Authors: Siyi Wang, Weisi Guo, Mark D. McDonnell

Abstract: This paper presents a method for sha** the transmit pulse of a molecular signal such that the diffusion channel's response is a sharp pulse. The impulse response of a diffusion channel is typically characterised as having an infinitely long transient response. This can cause severe inter-symbol-interference, and reduce the achievable reliable bit rate. We achieve the desired chemical channel res… ▽ More This paper presents a method for sha** the transmit pulse of a molecular signal such that the diffusion channel's response is a sharp pulse. The impulse response of a diffusion channel is typically characterised as having an infinitely long transient response. This can cause severe inter-symbol-interference, and reduce the achievable reliable bit rate. We achieve the desired chemical channel response by poisoning the channel with a secondary compound, such that it chemically cancels aspects of the primary information signal. We use two independent methods to show that the chemical concentration of the \emph{information signal} should be $\propto δ(t)$ and that of the \emph{poison signal} should be $\propto t^{-3/2}$. △ Less

Submitted 11 April, 2014; originally announced April 2014.

Comments: 2 pages, 1 figure, IEEE Conference on Computer Communications (INFOCOM)

arXiv:1404.3099 [pdf, other]

doi 10.1109/INFCOMW.2014.6849215

Distance Distributions for Real Cellular Networks

Authors: Siyi Wang, Weisi Guo, Mark D. McDonnell

Abstract: This paper presents the general distribution for the distance between a mobile user and any base station (BS). We show that a random variable proportional to the distance squared is Gamma distributed. In the case of the nearest BS, it can be reduced to the well established result of the distance being Rayleigh distributed. We validate our results using a random node simulation and real Vodafone 3G… ▽ More This paper presents the general distribution for the distance between a mobile user and any base station (BS). We show that a random variable proportional to the distance squared is Gamma distributed. In the case of the nearest BS, it can be reduced to the well established result of the distance being Rayleigh distributed. We validate our results using a random node simulation and real Vodafone 3G network data, and go on to show how the distribution is tractable by deriving the average aggregate interference power. △ Less

Submitted 11 April, 2014; originally announced April 2014.

Comments: 2 pages, 1 figure, IEEE Conference on Computer Communications (INFOCOM)

arXiv:1404.0127 [pdf, other]

doi 10.1109/ICT.2014.6845140

Performance of Macro-Scale Molecular Communications with Sensor Cleanse Time

Authors: Siyi Wang, Weisi Guo, Song Qiu, Mark D. McDonnell

Abstract: In this paper, we consider a molecular diffusion based communications link that conveys information on the macro-scale (several metres). The motivation is to apply molecular-based communications to challenging electromagnetic environments. We first derive a novel capture probability expression of a finite sized receiver. The paper then introduces the concept of time-aggregated molecular noise at t… ▽ More In this paper, we consider a molecular diffusion based communications link that conveys information on the macro-scale (several metres). The motivation is to apply molecular-based communications to challenging electromagnetic environments. We first derive a novel capture probability expression of a finite sized receiver. The paper then introduces the concept of time-aggregated molecular noise at the receiver as a function of the rate at which the sensor can self-cleanse. The resulting inter-symbol-interference is expressed as a function of the sensor cleanse time, and the performance metrics of bit error rate, throughput and round-trip-time are derived. The results show that the performance is very sensitive to the sensor cleanse time and the drift velocity. The paper concludes with recommendations on the design of a real communication link based on these findings and applies the concepts to a test-bed. △ Less

Submitted 1 April, 2014; originally announced April 2014.

Comments: 6 pages, 6 figures, IEEE International Conference on Telecommunications (ICT)

arXiv:1404.0123 [pdf, other]

doi 10.1109/ICT.2014.6845085

Downlink Interference Estimation without Feedback for Heterogeneous Network Interference Avoidance

Authors: Siyi Wang, Weisi Guo, Mark D. McDonnell

Abstract: In this paper, we present a novel method for a base station (BS) to estimate the total downlink interference power received by any given mobile receiver, without information feedback from the user or information exchange between neighbouring BSs. The prediction method is deterministic and can be computed rapidly. This is achieved by first abstracting the cellular network into a mathematical model,… ▽ More In this paper, we present a novel method for a base station (BS) to estimate the total downlink interference power received by any given mobile receiver, without information feedback from the user or information exchange between neighbouring BSs. The prediction method is deterministic and can be computed rapidly. This is achieved by first abstracting the cellular network into a mathematical model, and then inferring the interference power received at any location based on the power spectrum measurements taken at the observing BS. The analysis expands the methodology to a $\mathsf{K}$-tier heterogeneous network and demonstrates the accuracy of the technique for a variety of sampling densities. The paper demonstrates the methodology by applying it to an opportunistic transmission technique that avoids transmissions to channels which are overwhelmed by interference. The simulation results show that the proposed technique performs closely or better than existing interference avoidance techniques that require information exchange, and yields a 30% throughput improvement over baseline configurations. △ Less

Submitted 1 April, 2014; originally announced April 2014.

Comments: 6 pages, 5 figures, IEEE International Conference on Telecommunications (ICT)

arXiv:1107.2984 [pdf, ps, other]

doi 10.1007/s00422-011-0451-9

An Introductory Review of Information Theory in the Context of Computational Neuroscience

Authors: Mark D. McDonnell, Shiro Ikeda, Jonathan H. Manton

Abstract: This paper introduces several fundamental concepts in information theory from the perspective of their origins in engineering. Understanding such concepts is important in neuroscience for two reasons. Simply applying formulae from information theory without understanding the assumptions behind their definitions can lead to erroneous results and conclusions. Furthermore, this century will see a con… ▽ More This paper introduces several fundamental concepts in information theory from the perspective of their origins in engineering. Understanding such concepts is important in neuroscience for two reasons. Simply applying formulae from information theory without understanding the assumptions behind their definitions can lead to erroneous results and conclusions. Furthermore, this century will see a convergence of information theory and neuroscience; information theory will expand its foundations to incorporate more comprehensively biological processes thereby hel** reveal how neuronal networks achieve their remarkable information processing abilities. △ Less

Submitted 14 July, 2011; originally announced July 2011.

Comments: 18 pages, 7 figures, to appear in Biological Cybernetics

Journal ref: Biological Cybernetics, 105(1), 1-16, 2011

arXiv:0911.3668 [pdf, ps, other]

doi 10.1103/PhysRevE.80.060102

Signal acquisition via polarization modulation in single photon sources

Authors: Mark D. McDonnell, Adrian P. Flitney

Abstract: A simple model system is introduced for demonstrating how a single photon source might be used to transduce classical analog information. The theoretical scheme results in measurements of analog source samples that are (i) quantized in the sense of analog-to-digital conversion and (ii) corrupted by random noise that is solely due to the quantum uncertainty in detecting the polarization state of… ▽ More A simple model system is introduced for demonstrating how a single photon source might be used to transduce classical analog information. The theoretical scheme results in measurements of analog source samples that are (i) quantized in the sense of analog-to-digital conversion and (ii) corrupted by random noise that is solely due to the quantum uncertainty in detecting the polarization state of each photon. This noise is unavoidable if more than one bit per sample is to be transmitted, and we show how it may be exploited in a manner inspired by suprathreshold stochastic resonance. The system is analyzed information theoretically, as it can be modeled as a noisy optical communication channel, although unlike classical Poisson channels, the detector's photon statistics are binomial. Previous results on binomial channels are adapted to demonstrate numerically that the classical information capacity, and thus the accuracy of the transduction, increases logarithmically with the square root of the number of photons, N. Although the capacity is shown to be reduced when an additional detector nonideality is present, the logarithmic increase with N remains. △ Less

Submitted 25 November, 2009; v1 submitted 18 November, 2009; originally announced November 2009.

Comments: 7 pages, 2 figures, accepted by Physical Review E. This version adds a reference

Journal ref: Physical Review E 80, 060102(R) (2009)

Showing 1–15 of 15 results for author: McDonnell, M D