Skip to main content

Showing 1–5 of 5 results for author: Justus, D

.
  1. arXiv:2211.12281  [pdf, other

    cs.LG cs.AI

    BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion

    Authors: Alberto Cattaneo, Daniel Justus, Harry Mellor, Douglas Orr, Jerome Maloberti, Zhenying Liu, Thorin Farnsworth, Andrew Fitzgibbon, Blazej Banaszewski, Carlo Luschi

    Abstract: We present the award-winning submission to the WikiKG90Mv2 track of OGB-LSC@NeurIPS 2022. The task is link-prediction on the large-scale knowledge graph WikiKG90Mv2, consisting of 90M+ nodes and 600M+ edges. Our solution uses a diverse ensemble of $85$ Knowledge Graph Embedding models combining five different scoring functions (TransE, TransH, RotatE, DistMult, ComplEx) and two different loss func… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: First place in the WikiKG90Mv2 track of the Open Graph Benchmark Large-Scale Challenge @NeurIPS2022

  2. arXiv:2206.02915  [pdf, other

    cs.LG

    8-bit Numerical Formats for Deep Neural Networks

    Authors: Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

    Abstract: Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the advantages of floating-point over fixed-point representation, and present an in-depth study on the use of 8-bit floating-point number formats for activa… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  3. arXiv:2108.06277  [pdf, other

    cs.CL cs.LG

    Towards Structured Dynamic Sparse Pre-Training of BERT

    Authors: Anastasia Dietrich, Frithjof Gressmann, Douglas Orr, Ivan Chelombiev, Daniel Justus, Carlo Luschi

    Abstract: Identifying algorithms for computational efficient unsupervised training of large language models is an important and active area of research. In this work, we develop and study a straightforward, dynamic always-sparse pre-training approach for BERT language modeling task, which leverages periodic compression steps based on magnitude pruning followed by random parameter re-allocation. This approac… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

  4. arXiv:2106.05822  [pdf, other

    cs.CL cs.LG

    GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

    Authors: Ivan Chelombiev, Daniel Justus, Douglas Orr, Anastasia Dietrich, Frithjof Gressmann, Alexandros Koliousis, Carlo Luschi

    Abstract: Attention based language models have become a critical component in state-of-the-art natural language processing systems. However, these models have significant computational requirements, due to long training times, dense operations and large parameter count. In this work we demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. First,… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  5. arXiv:1811.11880  [pdf, other

    cs.LG cs.AI stat.ML

    Predicting the Computational Cost of Deep Learning Models

    Authors: Daniel Justus, John Brennan, Stephen Bonner, Andrew Stephen McGough

    Abstract: Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict the time it will take to train a deep learning network to solve a given problem. This training time can be seen as the product of the training time per epoch and… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: Accepted for publication at the IEEE International Conference on Big Data, (C) IEEE