-
An NMF-Based Building Block for Interpretable Neural Networks With Continual Learning
Authors:
Brian K. Vogel
Abstract:
Existing learning methods often struggle to balance interpretability and predictive performance. While models like nearest neighbors and non-negative matrix factorization (NMF) offer high interpretability, their predictive performance on supervised learning tasks is often limited. In contrast, neural networks based on the multi-layer perceptron (MLP) support the modular construction of expressive…
▽ More
Existing learning methods often struggle to balance interpretability and predictive performance. While models like nearest neighbors and non-negative matrix factorization (NMF) offer high interpretability, their predictive performance on supervised learning tasks is often limited. In contrast, neural networks based on the multi-layer perceptron (MLP) support the modular construction of expressive architectures and tend to have better recognition accuracy but are often regarded as black boxes in terms of interpretability. Our approach aims to strike a better balance between these two aspects through the use of a building block based on NMF that incorporates supervised neural network training methods to achieve high predictive performance while retaining the desirable interpretability properties of NMF. We evaluate our Predictive Factorized Coupling (PFC) block on small datasets and show that it achieves competitive predictive performance with MLPs while also offering improved interpretability. We demonstrate the benefits of this approach in various scenarios, such as continual learning, training on non-i.i.d. data, and knowledge removal after training. Additionally, we show examples of using the PFC block to build more expressive architectures, including a fully-connected residual network as well as a factorized recurrent neural network (RNN) that performs competitively with vanilla RNNs while providing improved interpretability. The PFC block uses an iterative inference algorithm that converges to a fixed point, making it possible to trade off accuracy vs computation after training but also currently preventing its use as a general MLP replacement in some scenarios such as training on very large datasets. We provide source code at https://github.com/bkvogel/pfc
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Adaptive Loss Scaling for Mixed Precision Training
Authors:
Ruizhe Zhao,
Brian Vogel,
Tanvir Ahmed
Abstract:
Mixed precision training (MPT) is becoming a practical technique to improve the speed and energy efficiency of training deep neural networks by leveraging the fast hardware support for IEEE half-precision floating point that is available in existing GPUs. MPT is typically used in combination with a technique called loss scaling, that works by scaling up the loss value up before the start of backpr…
▽ More
Mixed precision training (MPT) is becoming a practical technique to improve the speed and energy efficiency of training deep neural networks by leveraging the fast hardware support for IEEE half-precision floating point that is available in existing GPUs. MPT is typically used in combination with a technique called loss scaling, that works by scaling up the loss value up before the start of backpropagation in order to minimize the impact of numerical underflow on training. Unfortunately, existing methods make this loss scale value a hyperparameter that needs to be tuned per-model, and a single scale cannot be adapted to different layers at different training stages. We introduce a loss scaling-based training method called adaptive loss scaling that makes MPT easier and more practical to use, by removing the need to tune a model-specific loss scale hyperparameter. We achieve this by introducing layer-wise loss scale values which are automatically computed during training to deal with underflow more effectively than existing methods. We present experimental results on a variety of networks and tasks that show our approach can shorten the time to convergence and improve accuracy compared to the existing state-of-the-art MPT and single-precision floating point
△ Less
Submitted 27 October, 2019;
originally announced October 2019.
-
Chainer: A Deep Learning Framework for Accelerating the Research Cycle
Authors:
Seiya Tokui,
Ryosuke Okuta,
Takuya Akiba,
Yusuke Niitani,
Toru Ogawa,
Shunta Saito,
Shuji Suzuki,
Kota Uenishi,
Brian Vogel,
Hiroyuki Yamazaki Vincent
Abstract:
Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units…
▽ More
Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Parameter Reference Loss for Unsupervised Domain Adaptation
Authors:
Jiren **,
Richard G. Calland,
Takeru Miyato,
Brian K. Vogel,
Hideki Nakayama
Abstract:
The success of deep learning in computer vision is mainly attributed to an abundance of data. However, collecting large-scale data is not always possible, especially for the supervised labels. Unsupervised domain adaptation (UDA) aims to utilize labeled data from a source domain to learn a model that generalizes to a target domain of unlabeled data. A large amount of existing work uses Siamese net…
▽ More
The success of deep learning in computer vision is mainly attributed to an abundance of data. However, collecting large-scale data is not always possible, especially for the supervised labels. Unsupervised domain adaptation (UDA) aims to utilize labeled data from a source domain to learn a model that generalizes to a target domain of unlabeled data. A large amount of existing work uses Siamese network-based models, where two streams of neural networks process the source and the target domain data respectively. Nevertheless, most of these approaches focus on minimizing the domain discrepancy, overlooking the importance of preserving the discriminative ability for target domain features. Another important problem in UDA research is how to evaluate the methods properly. Common evaluation procedures require target domain labels for hyper-parameter tuning and model selection, contradicting the definition of the UDA task. Hence we propose a more reasonable evaluation principle that avoids this contradiction by simply adopting the latest snapshot of a model for evaluation. This adds an extra requirement for UDA methods besides the main performance criteria: the stability during training. We design a novel method that connects the target domain stream to the source domain stream with a Parameter Reference Loss (PRL) to solve these problems simultaneously. Experiments on various datasets show that the proposed PRL not only improves the performance on the target domain, but also stabilizes the training procedure. As a result, PRL based models do not need the contradictory model selection, and thus are more suitable for practical applications.
△ Less
Submitted 5 December, 2017; v1 submitted 20 November, 2017;
originally announced November 2017.
-
War** Peirce Quincuncial Panoramas
Authors:
Chamberlain Fong,
Brian K. Vogel
Abstract:
The Peirce quincuncial projection is a map** of the surface of a sphere to the interior of a square. It is a conformal map except for four points on the equator. These points of non-conformality cause significant artifacts in photographic applications. In this paper, we propose an algorithm and user-interface to mitigate these artifacts. Moreover, in order to facilitate an interactive user-inter…
▽ More
The Peirce quincuncial projection is a map** of the surface of a sphere to the interior of a square. It is a conformal map except for four points on the equator. These points of non-conformality cause significant artifacts in photographic applications. In this paper, we propose an algorithm and user-interface to mitigate these artifacts. Moreover, in order to facilitate an interactive user-interface, we present a fast algorithm for calculating the Peirce quincuncial projection of spherical imagery. We then promote the Peirce quincuncial projection as a viable alternative to the more popular stereographic projection in some scenarios.
△ Less
Submitted 26 September, 2015; v1 submitted 14 November, 2010;
originally announced November 2010.
-
Positive factor networks: A graphical framework for modeling non-negative sequential data
Authors:
Brian K. Vogel
Abstract:
We present a novel graphical framework for modeling non-negative sequential data with hierarchical structure. Our model corresponds to a network of coupled non-negative matrix factorization (NMF) modules, which we refer to as a positive factor network (PFN). The data model is linear, subject to non-negativity constraints, so that observation data consisting of an additive combination of individu…
▽ More
We present a novel graphical framework for modeling non-negative sequential data with hierarchical structure. Our model corresponds to a network of coupled non-negative matrix factorization (NMF) modules, which we refer to as a positive factor network (PFN). The data model is linear, subject to non-negativity constraints, so that observation data consisting of an additive combination of individually representable observations is also representable by the network. This is a desirable property for modeling problems in computational auditory scene analysis, since distinct sound sources in the environment are often well-modeled as combining additively in the corresponding magnitude spectrogram. We propose inference and learning algorithms that leverage existing NMF algorithms and that are straightforward to implement. We present a target tracking example and provide results for synthetic observation data which serve to illustrate the interesting properties of PFNs and motivate their potential usefulness in applications such as music transcription, source separation, and speech recognition. We show how a target process characterized by a hierarchical state transition model can be represented as a PFN. Our results illustrate that a PFN which is defined in terms of a single target observation can then be used to effectively track the states of multiple simultaneous targets. Our results show that the quality of the inferred target states degrades gradually as the observation noise is increased. We also present results for an example in which meaningful hierarchical features are extracted from a spectrogram. Such a hierarchical representation could be useful for music transcription and source separation applications. We also propose a network for language modeling.
△ Less
Submitted 15 July, 2009; v1 submitted 25 July, 2008;
originally announced July 2008.