-
ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R
Authors:
Aine Fairbrother-Browne,
Sonia GarcĂa-Ruiz,
Regina H Reynolds,
Mina Ryten,
Alan Hodgkinson
Abstract:
We present ensemblQueryR, a package providing an R interface to the Ensembl REST API that facilitates flexible, fast, user-friendly and R workflow integrable querying of Ensembl REST API linkage disequilibrium (LD) endpoints, optimised for high-throughput querying. ensemblQueryR achieves this through functions that are intuitive and amenable to custom code integration, use of familiar R object typ…
▽ More
We present ensemblQueryR, a package providing an R interface to the Ensembl REST API that facilitates flexible, fast, user-friendly and R workflow integrable querying of Ensembl REST API linkage disequilibrium (LD) endpoints, optimised for high-throughput querying. ensemblQueryR achieves this through functions that are intuitive and amenable to custom code integration, use of familiar R object types as inputs and outputs, code optimisation and optional parallelisation functionality. For each LD endpoint, ensemblQueryR provides two functions, permitting both single-query and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate that ensemblQueryR has improved performance in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase over analogous software whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through availability of Docker and singularity images, making this tool widely accessible to the scientific community.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
Improving the Segmentation of Scanning Probe Microscope Images using Convolutional Neural Networks
Authors:
Steff Farley,
Jo E. A. Hodgkinson,
Oliver M. Gordon,
Joanna Turner,
Andrea Soltoggio,
Philip J. Moriarty,
Eugenie Hunsicker
Abstract:
A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the…
▽ More
A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the images to ensure accurate and meaningful statistical analysis can be carried out. Here we develop protocols for the segmentation of images of 2D assemblies of gold nanoparticles formed on silicon surfaces via deposition from an organic solvent. The evaporation of the solvent drives far-from-equilibrium self-organisation of the particles, producing a wide variety of nano- and micro-structured patterns. We show that a segmentation strategy using the U-Net convolutional neural network outperforms traditional automated approaches and has particular potential in the processing of images of nanostructured systems.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision
Authors:
Denis Gudovskiy,
Alec Hodgkinson,
Takuya Yamaguchi,
Sotaro Tsukizawa
Abstract:
Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs) by selecting the most representative data points for annotation. However, currently used methods are ill-equipped to deal with biased data. The main motivation of this paper is to consider a realistic setting for pool-based semi-supervised AL, where the unlabeled collection of train data is biased…
▽ More
Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs) by selecting the most representative data points for annotation. However, currently used methods are ill-equipped to deal with biased data. The main motivation of this paper is to consider a realistic setting for pool-based semi-supervised AL, where the unlabeled collection of train data is biased. We theoretically derive an optimal acquisition function for AL in this setting. It can be formulated as distribution shift minimization between unlabeled train data and weakly-labeled validation dataset. To implement such acquisition function, we propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK) as well as several novel pseudo-label estimators. Our FK-based method outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing. The conducted experiments show at least 40% drop in labeling efforts for the biased class-imbalanced data compared to existing methods.
△ Less
Submitted 29 February, 2020;
originally announced March 2020.
-
DoorGym: A Scalable Door Opening Environment And Baseline Agent
Authors:
Yusuke Urakami,
Alec Hodgkinson,
Casey Carlin,
Randall Leu,
Luca Rigazio,
Pieter Abbeel
Abstract:
In order to practically implement the door opening task, a policy ought to be robust to a wide distribution of door types and environment settings. Reinforcement Learning (RL) with Domain Randomization (DR) is a promising technique to enforce policy generalization, however, there are only a few accessible training environments that are inherently designed to train agents in domain randomized envir…
▽ More
In order to practically implement the door opening task, a policy ought to be robust to a wide distribution of door types and environment settings. Reinforcement Learning (RL) with Domain Randomization (DR) is a promising technique to enforce policy generalization, however, there are only a few accessible training environments that are inherently designed to train agents in domain randomized environments. We introduce DoorGym, an open-source door opening simulation framework designed to utilize domain randomization to train a stable policy. We intend for our environment to lie at the intersection of domain transfer, practical tasks, and realism. We also provide baseline Proximal Policy Optimization and Soft Actor-Critic implementations, which achieves success rates between 0% up to 95% for opening various types of doors in this environment. Moreover, the real-world transfer experiment shows the trained policy is able to work in the real world. Environment kit available here: https://github.com/PSVL/DoorGym/
△ Less
Submitted 24 May, 2022; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Explain to Fix: A Framework to Interpret and Correct DNN Object Detector Predictions
Authors:
Denis Gudovskiy,
Alec Hodgkinson,
Takuya Yamaguchi,
Yasunori Ishii,
Sotaro Tsukizawa
Abstract:
Explaining predictions of deep neural networks (DNNs) is an important and nontrivial task. In this paper, we propose a practical approach to interpret decisions made by a DNN object detector that has fidelity comparable to state-of-the-art methods and sufficient computational efficiency to process large datasets. Our method relies on recent theory and approximates Shapley feature importance values…
▽ More
Explaining predictions of deep neural networks (DNNs) is an important and nontrivial task. In this paper, we propose a practical approach to interpret decisions made by a DNN object detector that has fidelity comparable to state-of-the-art methods and sufficient computational efficiency to process large datasets. Our method relies on recent theory and approximates Shapley feature importance values. We qualitatively and quantitatively show that the proposed explanation method can be used to find image features which cause failures in DNN object detection. The developed software tool combined into the "Explain to Fix" (E2X) framework has a factor of 10 higher computational efficiency than prior methods and can be used for cluster processing using graphics processing units (GPUs). Lastly, we propose a potential extension of the E2X framework where the discovered missing features can be added into training dataset to overcome failures after model retraining.
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
DNN Feature Map Compression using Learned Representation over GF(2)
Authors:
Denis A. Gudovskiy,
Alec Hodgkinson,
Luca Rigazio
Abstract:
In this paper, we introduce a method to compress intermediate feature maps of deep neural networks (DNNs) to decrease memory storage and bandwidth requirements during inference. Unlike previous works, the proposed method is based on converting fixed-point activations into vectors over the smallest GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers embedded into a DNN. S…
▽ More
In this paper, we introduce a method to compress intermediate feature maps of deep neural networks (DNNs) to decrease memory storage and bandwidth requirements during inference. Unlike previous works, the proposed method is based on converting fixed-point activations into vectors over the smallest GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers embedded into a DNN. Such an end-to-end learned representation finds more compact feature maps by exploiting quantization redundancies within the fixed-point activations along the channel or spatial dimensions. We apply the proposed network architectures derived from modified SqueezeNet and MobileNetV2 to the tasks of ImageNet classification and PASCAL VOC object detection. Compared to prior approaches, the conducted experiments show a factor of 2 decrease in memory requirements with minor degradation in accuracy while adding only bitwise computations.
△ Less
Submitted 15 August, 2018;
originally announced August 2018.