Search | arXiv e-print repository

ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R

Authors: Aine Fairbrother-Browne, Sonia García-Ruiz, Regina H Reynolds, Mina Ryten, Alan Hodgkinson

Abstract: We present ensemblQueryR, a package providing an R interface to the Ensembl REST API that facilitates flexible, fast, user-friendly and R workflow integrable querying of Ensembl REST API linkage disequilibrium (LD) endpoints, optimised for high-throughput querying. ensemblQueryR achieves this through functions that are intuitive and amenable to custom code integration, use of familiar R object typ… ▽ More We present ensemblQueryR, a package providing an R interface to the Ensembl REST API that facilitates flexible, fast, user-friendly and R workflow integrable querying of Ensembl REST API linkage disequilibrium (LD) endpoints, optimised for high-throughput querying. ensemblQueryR achieves this through functions that are intuitive and amenable to custom code integration, use of familiar R object types as inputs and outputs, code optimisation and optional parallelisation functionality. For each LD endpoint, ensemblQueryR provides two functions, permitting both single-query and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate that ensemblQueryR has improved performance in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase over analogous software whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through availability of Docker and singularity images, making this tool widely accessible to the scientific community. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2008.12371 [pdf, ps, other]

Improving the Segmentation of Scanning Probe Microscope Images using Convolutional Neural Networks

Authors: Steff Farley, Jo E. A. Hodgkinson, Oliver M. Gordon, Joanna Turner, Andrea Soltoggio, Philip J. Moriarty, Eugenie Hunsicker

Abstract: A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the… ▽ More A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the images to ensure accurate and meaningful statistical analysis can be carried out. Here we develop protocols for the segmentation of images of 2D assemblies of gold nanoparticles formed on silicon surfaces via deposition from an organic solvent. The evaporation of the solvent drives far-from-equilibrium self-organisation of the particles, producing a wide variety of nano- and micro-structured patterns. We show that a segmentation strategy using the U-Net convolutional neural network outperforms traditional automated approaches and has particular potential in the processing of images of nanostructured systems. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 21 pages, 10 figures

arXiv:2003.00393 [pdf, other]

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

Authors: Denis Gudovskiy, Alec Hodgkinson, Takuya Yamaguchi, Sotaro Tsukizawa

Abstract: Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs) by selecting the most representative data points for annotation. However, currently used methods are ill-equipped to deal with biased data. The main motivation of this paper is to consider a realistic setting for pool-based semi-supervised AL, where the unlabeled collection of train data is biased… ▽ More Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs) by selecting the most representative data points for annotation. However, currently used methods are ill-equipped to deal with biased data. The main motivation of this paper is to consider a realistic setting for pool-based semi-supervised AL, where the unlabeled collection of train data is biased. We theoretically derive an optimal acquisition function for AL in this setting. It can be formulated as distribution shift minimization between unlabeled train data and weakly-labeled validation dataset. To implement such acquisition function, we propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK) as well as several novel pseudo-label estimators. Our FK-based method outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing. The conducted experiments show at least 40% drop in labeling efforts for the biased class-imbalanced data compared to existing methods. △ Less

Submitted 29 February, 2020; originally announced March 2020.

Comments: Accepted to CVPR 2020. Preprint

arXiv:1908.01887 [pdf, other]

DoorGym: A Scalable Door Opening Environment And Baseline Agent

Authors: Yusuke Urakami, Alec Hodgkinson, Casey Carlin, Randall Leu, Luca Rigazio, Pieter Abbeel

Abstract: In order to practically implement the door opening task, a policy ought to be robust to a wide distribution of door types and environment settings. Reinforcement Learning (RL) with Domain Randomization (DR) is a promising technique to enforce policy generalization, however, there are only a few accessible training environments that are inherently designed to train agents in domain randomized envir… ▽ More In order to practically implement the door opening task, a policy ought to be robust to a wide distribution of door types and environment settings. Reinforcement Learning (RL) with Domain Randomization (DR) is a promising technique to enforce policy generalization, however, there are only a few accessible training environments that are inherently designed to train agents in domain randomized environments. We introduce DoorGym, an open-source door opening simulation framework designed to utilize domain randomization to train a stable policy. We intend for our environment to lie at the intersection of domain transfer, practical tasks, and realism. We also provide baseline Proximal Policy Optimization and Soft Actor-Critic implementations, which achieves success rates between 0% up to 95% for opening various types of doors in this environment. Moreover, the real-world transfer experiment shows the trained policy is able to work in the real world. Environment kit available here: https://github.com/PSVL/DoorGym/ △ Less

Submitted 24 May, 2022; v1 submitted 5 August, 2019; originally announced August 2019.

Comments: Accepted to NeurIPS2019 Deep Reinforcement Learning Workshop. Full version

arXiv:1811.08011 [pdf, other]

Explain to Fix: A Framework to Interpret and Correct DNN Object Detector Predictions

Authors: Denis Gudovskiy, Alec Hodgkinson, Takuya Yamaguchi, Yasunori Ishii, Sotaro Tsukizawa

Abstract: Explaining predictions of deep neural networks (DNNs) is an important and nontrivial task. In this paper, we propose a practical approach to interpret decisions made by a DNN object detector that has fidelity comparable to state-of-the-art methods and sufficient computational efficiency to process large datasets. Our method relies on recent theory and approximates Shapley feature importance values… ▽ More Explaining predictions of deep neural networks (DNNs) is an important and nontrivial task. In this paper, we propose a practical approach to interpret decisions made by a DNN object detector that has fidelity comparable to state-of-the-art methods and sufficient computational efficiency to process large datasets. Our method relies on recent theory and approximates Shapley feature importance values. We qualitatively and quantitatively show that the proposed explanation method can be used to find image features which cause failures in DNN object detection. The developed software tool combined into the "Explain to Fix" (E2X) framework has a factor of 10 higher computational efficiency than prior methods and can be used for cluster processing using graphics processing units (GPUs). Lastly, we propose a potential extension of the E2X framework where the discovered missing features can be added into training dataset to overcome failures after model retraining. △ Less

Submitted 19 November, 2018; originally announced November 2018.

Comments: Systems for ML Workshop @ NIPS 2018

arXiv:1808.05285 [pdf, other]

DNN Feature Map Compression using Learned Representation over GF(2)

Authors: Denis A. Gudovskiy, Alec Hodgkinson, Luca Rigazio

Abstract: In this paper, we introduce a method to compress intermediate feature maps of deep neural networks (DNNs) to decrease memory storage and bandwidth requirements during inference. Unlike previous works, the proposed method is based on converting fixed-point activations into vectors over the smallest GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers embedded into a DNN. S… ▽ More In this paper, we introduce a method to compress intermediate feature maps of deep neural networks (DNNs) to decrease memory storage and bandwidth requirements during inference. Unlike previous works, the proposed method is based on converting fixed-point activations into vectors over the smallest GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers embedded into a DNN. Such an end-to-end learned representation finds more compact feature maps by exploiting quantization redundancies within the fixed-point activations along the channel or spatial dimensions. We apply the proposed network architectures derived from modified SqueezeNet and MobileNetV2 to the tasks of ImageNet classification and PASCAL VOC object detection. Compared to prior approaches, the conducted experiments show a factor of 2 decrease in memory requirements with minor degradation in accuracy while adding only bitwise computations. △ Less

Submitted 15 August, 2018; originally announced August 2018.

Comments: CEFRL2018

Showing 1–6 of 6 results for author: Hodgkinson, A