Skip to main content

Showing 1–50 of 52 results for author: Storkey, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.20838  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    einspace: Searching for Neural Architectures from Fundamental Operations

    Authors: Linus Ericsson, Miguel Espinosa, Chenhongyi Yang, Antreas Antoniou, Amos Storkey, Shay B. Cohen, Steven McDonagh, Elliot J. Crowley

    Abstract: Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shift… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Project page at https://linusericsson.github.io/einspace/

  2. arXiv:2404.06466  [pdf, other

    cs.LG stat.ML

    Hyperparameter Selection in Continual Learning

    Authors: Thomas L. Lee, Sigrid Passano Hellan, Linus Ericsson, Elliot J. Crowley, Amos Storkey

    Abstract: In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparam… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Preprint, 9 pages

  3. arXiv:2310.02206  [pdf, other

    cs.LG stat.ML

    Chunking: Forgetting Matters in Continual Learning even without Changing Tasks

    Authors: Thomas L. Lee, Amos Storkey

    Abstract: Work on continual learning (CL) has largely focused on the problems arising from the dynamically-changing data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter s… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 9 pages, 11 figures, preprint

  4. arXiv:2305.19076  [pdf, other

    cs.LG stat.ML

    Approximate Bayesian Class-Conditional Models under Continuous Representation Shift

    Authors: Thomas L. Lee, Amos Storkey

    Abstract: For models consisting of a classifier in some representation space, learning online from a non-stationary data stream often necessitates changes in the representation. So, the question arises of what is the best way to adapt the classifier to shifts in representation. Current methods only slowly change the classifier to representation shift, introducing noise into learning as the classifier is mis… ▽ More

    Submitted 7 May, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Published at AISTATS 2024, 9 pages

  5. arXiv:2201.12570  [pdf, other

    q-bio.BM cs.AI cs.LG cs.NE stat.ML

    AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation

    Authors: Asif Khan, Alexander I. Cowen-Rivers, Antoine Grosnit, Derrick-Goh-Xin Deik, Philippe A. Robert, Victor Greiff, Eva Smorodina, Puneet Rawat, Kamil Dreczkowski, Rahmad Akbar, Rasul Tutunov, Dany Bou-Ammar, Jun Wang, Amos Storkey, Haitham Bou-Ammar

    Abstract: Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies. However, the combinatorial nature of CDRH3 sequence space makes it imposs… ▽ More

    Submitted 14 October, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

  6. arXiv:2106.10704  [pdf, other

    cs.LG stat.ML

    Better Training using Weight-Constrained Stochastic Dynamics

    Authors: Benedict Leimkuhler, Tiffany Vlaar, Timothée Pouchon, Amos Storkey

    Abstract: We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization c… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: ICML 2021 camera-ready. arXiv admin note: substantial text overlap with arXiv:2006.10114

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  7. arXiv:2007.07869  [pdf, other

    cs.LG cs.CV stat.ML

    Gradient-based Hyperparameter Optimization Over Long Horizons

    Authors: Paul Micaelli, Amos Storkey

    Abstract: Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online, but this introduces greediness which comes with a significant performance drop. We pr… ▽ More

    Submitted 30 September, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

  8. arXiv:2006.10114  [pdf, other

    cs.LG stat.ML

    Constraint-Based Regularization of Neural Networks

    Authors: Benedict Leimkuhler, Timothée Pouchon, Tiffany Vlaar, Amos Storkey

    Abstract: We propose a method for efficiently incorporating constraints into a stochastic gradient Langevin framework for the training of deep neural networks. Constraints allow direct control of the parameter space of the model. Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks and thus improve the robustness of traini… ▽ More

    Submitted 20 June, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: T. Vlaar won best student paper award at OPT2020

    Journal ref: OPT2020: 12th Annual Workshop on Optimization for Machine Learning, NeurIPS 2020

  9. arXiv:2006.09791  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    Optimizing Grouped Convolutions on Edge Devices

    Authors: Perry Gibson, José Cano, Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey

    Abstract: When deploying a deep neural network on constrained hardware, it is possible to replace the network's standard convolutions with grouped convolutions. This allows for substantial memory savings with minimal loss of accuracy. However, current implementations of grouped convolutions in modern deep learning frameworks are far from performing optimally in terms of speed. In this paper we propose Group… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: Camera ready version to be published at ASAP 2020 - The 31st IEEE International Conference on Application-specific Systems, Architectures and Processors. 8 pages, 6 figures

    ACM Class: I.2.6; D.3.4; C.1.4

  10. arXiv:2006.05849  [pdf, other

    cs.LG stat.ML

    Self-Supervised Relational Reasoning for Representation Learning

    Authors: Massimiliano Patacchiola, Amos Storkey

    Abstract: In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on a set of unlabeled data. The aim is to build useful representations that can be used in downstream tasks, without costly manual annotation. In this work, we propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from inform… ▽ More

    Submitted 10 November, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2020, Spotlight)

  11. arXiv:2006.04647  [pdf, other

    cs.LG cs.CV stat.ML

    Neural Architecture Search without Training

    Authors: Joseph Mellor, Jack Turner, Amos Storkey, Elliot J. Crowley

    Abstract: The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be alleviated if we could partially predict a network's trained… ▽ More

    Submitted 11 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted at ICML 2021 for a long presentation

  12. arXiv:2004.11967  [pdf, other

    cs.CV cs.LG stat.ML

    Defining Benchmarks for Continual Few-Shot Learning

    Authors: Antreas Antoniou, Massimiliano Patacchiola, Mateusz Ochal, Amos Storkey

    Abstract: Both few-shot and continual learning have seen substantial progress in the last years due to the introduction of proper benchmarks. That being said, the field has still to frame a suite of benchmarks for the highly desirable setting of continual few-shot learning, where the learner is presented a number of few-shot tasks, one after the other, and then asked to perform well on a validation set stem… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

  13. arXiv:2004.05439  [pdf, other

    cs.LG stat.ML

    Meta-Learning in Neural Networks: A Survey

    Authors: Timothy Hospedales, Antreas Antoniou, Paul Micaelli, Amos Storkey

    Abstract: The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to conventional approaches to AI where tasks are solved from scratch using a fixed learning algorithm, meta-learning aims to improve the learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many conventional chall… ▽ More

    Submitted 7 November, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

  14. arXiv:2003.08821  [pdf, other

    cs.CV cs.LG stat.ML

    DHOG: Deep Hierarchical Object Grou**

    Authors: Luke Nicholas Darlow, Amos Storkey

    Abstract: Recently, a number of competitive methods have tackled unsupervised representation learning by maximising the mutual information between the representations produced from augmentations. The resulting representations are then invariant to stochastic augmentation strategies, and can be used for downstream tasks such as clustering or classification. Yet data augmentations preserve many properties of… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: 15 pages, submitted to ECCV 2020

  15. arXiv:2003.06254  [pdf, other

    cs.LG cs.CV stat.ML

    What Information Does a ResNet Compress?

    Authors: Luke Nicholas Darlow, Amos Storkey

    Abstract: The information bottleneck principle (Shwartz-Ziv & Tishby, 2017) suggests that SGD-based training of deep neural networks results in optimally compressed hidden layers, from an information theoretic perspective. However, this claim was established on toy data. The goal of the work we present here is to test whether the information bottleneck principle is applicable to a realistic setting using a… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: 10 pages + appendices; submitted to ICLR 2019

  16. arXiv:2002.08981  [pdf, other

    cs.LG cs.CV stat.ML

    Comparing recurrent and convolutional neural networks for predicting wave propagation

    Authors: Stathi Fotiadis, Eduardo Pignatelli, Mario Lino Valencia, Chris Cantwell, Amos Storkey, Anil A. Bharath

    Abstract: Dynamical systems can be modelled by partial differential equations and numerical computations are used everywhere in science and engineering. In this work, we investigate the performance of recurrent and convolutional deep neural network architectures to predict the surface waves. The system is governed by the Saint-Venant equations. We improve on the long-term prediction over previous methods wh… ▽ More

    Submitted 20 April, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  17. arXiv:2002.08697  [pdf, other

    cs.LG stat.ML

    Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

    Authors: Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot J. Crowley, Bjorn Franke, Amos Storkey, Michael O'Boyle

    Abstract: Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: A copy of this was published in IISWC'19

  18. arXiv:1910.05199  [pdf, other

    cs.LG stat.ML

    Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels

    Authors: Massimiliano Patacchiola, Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey

    Abstract: Recently, different machine learning methods have been introduced to tackle the challenging few-shot learning scenario that is, learning from a small labeled dataset related to a specific task. Common approaches have taken the form of meta-learning: learning to learn on the new problem given the old. Following the recognition that meta-learning is implementing learning in a multi-level model, we p… ▽ More

    Submitted 13 October, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2020, Spotlight)

  19. arXiv:1906.04113  [pdf, other

    cs.LG stat.ML

    BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

    Authors: Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey, Gavin Gray

    Abstract: The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustiv… ▽ More

    Submitted 23 January, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: ICLR 2020

  20. arXiv:1906.00859  [pdf, other

    stat.ML cs.LG

    Separable Layers Enable Structured Efficient Linear Substitutions

    Authors: Gavin Gray, Elliot J. Crowley, Amos Storkey

    Abstract: In response to the development of recent efficient dense layers, this paper shows that something as simple as replacing linear components in pointwise convolutions with structured linear decompositions also produces substantial gains in the efficiency/accuracy tradeoff. Pointwise convolutions are fully connected layers and are thus prepared for replacement by structured transforms. Networks using… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

  21. arXiv:1905.10295  [pdf, other

    cs.LG stat.ML

    Learning to learn via Self-Critique

    Authors: Antreas Antoniou, Amos Storkey

    Abstract: In few-shot learning, a machine learning system learns from a small set of labelled examples relating to a specific task, such that it can generalize to new examples of the same task. Given the limited availability of labelled examples in such tasks, we wish to make use of all the information we can. Usually a model learns task-specific information from a small training-set (support-set) to predic… ▽ More

    Submitted 30 January, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Accepted in NeurIPS 2019

  22. arXiv:1905.09768  [pdf, other

    cs.LG stat.ML

    Zero-shot Knowledge Transfer via Adversarial Belief Matching

    Authors: Paul Micaelli, Amos Storkey

    Abstract: Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without us… ▽ More

    Submitted 25 November, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

  23. arXiv:1902.09884  [pdf, other

    stat.ML cs.LG

    Assume, Augment and Learn: Unsupervised Few-Shot Meta-Learning via Random Labels and Data Augmentation

    Authors: Antreas Antoniou, Amos Storkey

    Abstract: The field of few-shot learning has been laboriously explored in the supervised setting, where per-class labels are available. On the other hand, the unsupervised few-shot learning setting, where no labels of any kind are required, has seen little investigation. We propose a method, named Assume, Augment and Learn or AAL, for generating few-shot tasks using unlabeled data. We randomly label a rando… ▽ More

    Submitted 5 March, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: Work in Progress - Under Review in ICML 2019

  24. arXiv:1811.00410  [pdf, other

    stat.ML cs.LG

    Dilated DenseNets for Relational Reasoning

    Authors: Antreas Antoniou, Agnieszka Słowik, Elliot J. Crowley, Amos Storkey

    Abstract: Despite their impressive performance in many tasks, deep neural networks often struggle at relational reasoning. This has recently been remedied with the introduction of a plug-in relational module that considers relations between pairs of objects. Unfortunately, this is combinatorially expensive. In this extended abstract, we show that a DenseNet incorporating dilated convolutions excels at relat… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: Extended Abstract

  25. arXiv:1810.12894  [pdf, other

    cs.LG cs.AI stat.ML

    Exploration by Random Network Distillation

    Authors: Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov

    Abstract: We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random net… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

  26. arXiv:1810.10460  [pdf, other

    stat.ML cs.LG cs.PF

    Distilling with Performance Enhanced Students

    Authors: Jack Turner, Elliot J. Crowley, Valentin Radu, José Cano, Amos Storkey, Michael O'Boyle

    Abstract: The task of accelerating large neural networks on general purpose hardware has, in recent years, prompted the use of channel pruning to reduce network size. However, the efficacy of pruning based approaches has since been called into question. In this paper, we turn to distillation for model compression---specifically, attention transfer---and develop a simple method for discovering performance en… ▽ More

    Submitted 7 March, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: Preprint. Paper title has changed

  27. arXiv:1810.09502  [pdf, other

    cs.LG stat.ML

    How to train your MAML

    Authors: Antreas Antoniou, Harrison Edwards, Amos Storkey

    Abstract: The field of few-shot learning has recently seen substantial advancements. Most of these advancements came from casting few-shot learning as a meta-learning problem. Model Agnostic Meta Learning or MAML is currently one of the best approaches for few-shot learning via meta-learning. MAML is simple, elegant and very powerful, however, it has a variety of issues, such as being very sensitive to neur… ▽ More

    Submitted 5 March, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: Published in ICLR 2019

  28. arXiv:1810.04622  [pdf, other

    stat.ML cs.CV cs.LG

    A Closer Look at Structured Pruning for Neural Network Compression

    Authors: Elliot J. Crowley, Jack Turner, Amos Storkey, Michael O'Boyle

    Abstract: Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of structured pruning has largely evaded scrutiny. In this paper, we examine ResNets and DenseNets obtained through structured pruning-and-tuning and make two int… ▽ More

    Submitted 7 June, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

    Comments: Preprint. First two authors contributed equally. Paper title has changed

  29. arXiv:1810.03505  [pdf, other

    cs.CV cs.LG stat.ML

    CINIC-10 is not ImageNet or CIFAR-10

    Authors: Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey

    Abstract: In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10. It was compiled by combining CIFAR-10 with images selected and downsampled from the ImageNet database. We present the approach to compiling the dataset, illustrate the example images for different classes, give pixel distributions for each part of the repository, and give some standard… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: Dataset compilation, 9 pages, 11 figures, technical report

    Report number: EDI-INF-ANC-1802

  30. arXiv:1810.01860  [pdf, other

    cs.LG stat.ML

    GINN: Geometric Illustration of Neural Networks

    Authors: Luke N. Darlow, Amos J. Storkey

    Abstract: This informal technical report details the geometric illustration of decision boundaries for ReLU units in a three layer fully connected neural network. The network is designed and trained to predict pixel intensity from an (x, y) input location. The Geometric Illustration of Neural Networks (GINN) tool was built to visualise and track the points at which ReLU units switch from being active to off… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: 8 pages, 9 figures, technical report

    Report number: EDI-INF-ANC-1901

  31. arXiv:1809.07196  [pdf, other

    stat.ML cs.CV cs.LG cs.PF

    Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

    Authors: Jack Turner, José Cano, Valentin Radu, Elliot J. Crowley, Michael O'Boyle, Amos Storkey

    Abstract: Convolutional Neural Networks (CNNs) are extremely computationally demanding, presenting a large barrier to their deployment on resource-constrained devices. Since such systems are where some of their most useful applications lie (e.g. obstacle detection for mobile robots, vision-based medical assistive technology), significant bodies of work from both machine learning and systems communities have… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: IISWC 2018

  32. arXiv:1808.04355  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Large-Scale Study of Curiosity-Driven Learning

    Authors: Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

    Abstract: Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for develo** reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper:… ▽ More

    Submitted 13 August, 2018; originally announced August 2018.

    Comments: First three authors contributed equally and ordered alphabetically. Website at https://pathak22.github.io/large-scale-curiosity/

  33. arXiv:1807.05031  [pdf, other

    stat.ML cs.LG

    On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

    Authors: Stanisław Jastrzębski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: Stochastic Gradient Descent (SGD) based training of neural networks with a large learning rate or a small batch-size typically ends in well-generalizing, flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. However, the curvature along the SGD trajectory is poorly understood. An empirical investigation shows that initially SGD visits increasingly… ▽ More

    Submitted 23 December, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

    Journal ref: International Conference on Learning Representations (ICLR) 2019

  34. arXiv:1711.04623  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Three Factors Influencing Minima in SGD

    Authors: Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In order to tackle this problem we investigate the previously proposed approximation of SGD by a stochastic… ▽ More

    Submitted 13 September, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

    Comments: First two authors contributed equally. Short version accepted into ICLR workshop. Accepted to Artificial Neural Networks and Machine Learning, ICANN 2018

  35. arXiv:1711.04340  [pdf, other

    stat.ML cs.CV cs.LG cs.NE

    Data Augmentation Generative Adversarial Networks

    Authors: Antreas Antoniou, Amos Storkey, Harrison Edwards

    Abstract: Effective training of neural networks requires much data. In the low-data regime, parameters are underdetermined, and learnt networks generalise poorly. Data Augmentation alleviates this by using existing data more effectively. However standard data augmentation produces only limited plausible alternative data. Given there is potential to generate a much broader set of augmentations, we design and… ▽ More

    Submitted 21 March, 2018; v1 submitted 12 November, 2017; originally announced November 2017.

    Comments: 10 pages

  36. arXiv:1711.02613  [pdf, other

    stat.ML cs.CV cs.LG

    Moonshine: Distilling with Cheap Convolutions

    Authors: Elliot J. Crowley, Gavin Gray, Amos Storkey

    Abstract: Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture… ▽ More

    Submitted 17 January, 2019; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)

  37. arXiv:1704.03338  [pdf, other

    stat.CO

    Continuously tempered Hamiltonian Monte Carlo

    Authors: Matthew M. Graham, Amos J. Storkey

    Abstract: Hamiltonian Monte Carlo (HMC) is a powerful Markov chain Monte Carlo (MCMC) method for performing approximate inference in complex probabilistic models of continuous variables. In common with many MCMC methods, however, the standard HMC approach performs poorly in distributions with multiple isolated modes. We present a method for augmenting the Hamiltonian system with an extra continuous temperat… ▽ More

    Submitted 11 April, 2017; originally announced April 2017.

    Comments: 16 pages, 7 figures

  38. arXiv:1606.02185  [pdf, other

    stat.ML cs.LG

    Towards a Neural Statistician

    Authors: Harrison Edwards, Amos Storkey

    Abstract: An efficient learner is one who reuses what they already know to tackle a new problem. For a machine learner, this means understanding the similarities amongst datasets. In order to do this, one must take seriously the idea of working with datasets, rather than datapoints, as the key objects to model. Towards this goal, we demonstrate an extension of a variational autoencoder that can learn a meth… ▽ More

    Submitted 20 March, 2017; v1 submitted 7 June, 2016; originally announced June 2016.

    Comments: Updated to camera ready version for ICLR 2017

  39. arXiv:1605.07826  [pdf, other

    stat.CO stat.ML

    Asymptotically exact inference in differentiable generative models

    Authors: Matthew M. Graham, Amos J. Storkey

    Abstract: Many generative models can be expressed as a differentiable function of random inputs drawn from some simple probability density. This framework includes both deep generative architectures such as Variational Autoencoders and a large class of procedurally defined simulator models. We present a method for performing efficient MCMC inference in such models when conditioning on observations of the mo… ▽ More

    Submitted 2 March, 2017; v1 submitted 25 May, 2016; originally announced May 2016.

    Comments: 14 pages, 5 figures. Accepted for AISTATS 2017, camera-ready version

  40. arXiv:1511.07294  [pdf, ps, other

    stat.ML

    Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems

    Authors: Zhanxing Zhu, Amos J. Storkey

    Abstract: We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordinate descent methods with the simplicity of primal-… ▽ More

    Submitted 23 November, 2015; originally announced November 2015.

    Comments: Accepted by AAAI 2016

  41. arXiv:1511.05897  [pdf, other

    cs.LG cs.AI stat.ML

    Censoring Representations with an Adversary

    Authors: Harrison Edwards, Amos Storkey

    Abstract: In practice, there are often explicit constraints on what representations or decisions are acceptable in an application of machine learning. For example it may be a legal requirement that a decision must not favour a particular group. Alternatively it can be that that representation of data must not have identifying information. We address these two related issues by learning flexible representati… ▽ More

    Submitted 4 March, 2016; v1 submitted 18 November, 2015; originally announced November 2015.

    Comments: Paper accepted to ICLR

  42. arXiv:1510.08692  [pdf, ps, other

    stat.ML cs.LG

    Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

    Authors: Xiaocheng Shang, Zhanxing Zhu, Benedict Leimkuhler, Amos J. Storkey

    Abstract: Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stoch… ▽ More

    Submitted 12 February, 2020; v1 submitted 29 October, 2015; originally announced October 2015.

    Journal ref: Advances in Neural Information Processing Systems, 28, 37-45, (2015)

  43. arXiv:1506.04093  [pdf, ps, other

    stat.ML cs.LG

    Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

    Authors: Zhanxing Zhu, Amos J. Storkey

    Abstract: We consider a generic convex-concave saddle point problem with separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsize into this framework. We theoretically show that our proposal of adaptive… ▽ More

    Submitted 12 June, 2015; originally announced June 2015.

    Comments: Accepted by ECML/PKDD2015

  44. The supervised hierarchical Dirichlet process

    Authors: Andrew M. Dai, Amos J. Storkey

    Abstract: We propose the supervised hierarchical Dirichlet process (sHDP), a nonparametric generative model for the joint distribution of a group of observations and a response variable directly associated with that whole group. We compare the sHDP with another leading method for regression on grouped data, the supervised latent Dirichlet allocation (sLDA) model. We evaluate our method on two real-world cla… ▽ More

    Submitted 16 December, 2014; originally announced December 2014.

    Comments: 14 pages

  45. arXiv:1403.0648  [pdf, other

    cs.GT cs.LG q-fin.TR stat.ML

    Multi-period Trading Prediction Markets with Connections to Machine Learning

    Authors: **li Hu, Amos Storkey

    Abstract: We present a new model for prediction markets, in which we use risk measures to model agents and introduce a market maker to describe the trading process. This specific choice on modelling tools brings us mathematical convenience. The analysis shows that the whole market effectively approaches a global objective, despite that the market is designed such that each agent only cares about its own goa… ▽ More

    Submitted 3 March, 2014; originally announced March 2014.

  46. Bayesian Inference in Sparse Gaussian Graphical Models

    Authors: Peter Orchard, Felix Agakov, Amos Storkey

    Abstract: One of the fundamental tasks of science is to find explainable relationships between observed phenomena. One approach to this task that has received attention in recent years is based on probabilistic graphical modelling with sparsity constraints on model structures. In this paper, we describe two new approaches to Bayesian inference of sparse structures of Gaussian graphical models (GGMs). One is… ▽ More

    Submitted 27 September, 2013; originally announced September 2013.

  47. Series Expansion Approximations of Brownian Motion for Non-Linear Kalman Filtering of Diffusion Processes

    Authors: Simon Lyons, Simo Särkkä, Amos Storkey

    Abstract: In this paper, we describe a novel application of sigma-point methods to continuous-discrete filtering. In principle, the nonlinear continuous- discrete filtering problem can be solved exactly. In practice, the solution contains terms that are computationally intractible. Assumed density filtering methods attempt to match statistics of the filtering distribution to some set of more tractible proba… ▽ More

    Submitted 18 February, 2014; v1 submitted 21 February, 2013; originally announced February 2013.

  48. arXiv:1301.3895  [pdf

    cs.LG cs.AI stat.ML

    Dynamic Trees: A Structured Variational Method Giving Efficient Propagation Rules

    Authors: Amos J. Storkey

    Abstract: Dynamic trees are mixtures of tree structured belief networks. They solve some of the problems of fixed tree networks at the cost of making exact inference intractable. For this reason approximate methods such as sampling or mean field approaches have been used. However, mean field approximations assume a factorized distribution over node states. Such a distribution seems unlickely in the posterio… ▽ More

    Submitted 16 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

    Report number: UAI-P-2000-PG-566-573

  49. arXiv:1206.6443  [pdf

    cs.LG cs.GT stat.ML

    Isoelastic Agents and Wealth Updates in Machine Learning Markets

    Authors: Amos Storkey, Jono Millin, Krzysztof Geras

    Abstract: Recently, prediction markets have shown considerable promise for develo** flexible mechanisms for machine learning. In this paper, agents with isoelastic utilities are considered. It is shown that the costs associated with homogeneous markets of agents with isoelastic utilities produce equilibrium prices corresponding to alpha-mixtures, with a particular form of mixing component relating to each… ▽ More

    Submitted 4 September, 2012; v1 submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  50. arXiv:1206.6441  [pdf

    cs.LG cs.IR stat.ML

    A Topic Model for Melodic Sequences

    Authors: Athina Spiliopoulou, Amos Storkey

    Abstract: We examine the problem of learning a probabilistic model for melody directly from musical sequences belonging to the same genre. This is a challenging task as one needs to capture not only the rich temporal structure evident in music, but also the complex statistical dependencies among different music components. To address this problem we introduce the Variable-gram Topic Model, which couples the… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)