Search | arXiv e-print repository

Implementing scalable matrix-vector products for the exact diagonalization methods in quantum many-body physics

Authors: Tom Westerhout, Bradford L. Chamberlain

Abstract: Exact diagonalization is a well-established method for simulating small quantum systems. Its applicability is limited by the exponential growth of the so-called Hamiltonian matrix that needs to be diagonalized. Physical symmetries are usually utilized to reduce the matrix dimension, and distributed-memory parallelism is employed to explore larger systems. This paper focuses on the implementation t… ▽ More Exact diagonalization is a well-established method for simulating small quantum systems. Its applicability is limited by the exponential growth of the so-called Hamiltonian matrix that needs to be diagonalized. Physical symmetries are usually utilized to reduce the matrix dimension, and distributed-memory parallelism is employed to explore larger systems. This paper focuses on the implementation the core distributed algorithms, with a special emphasis on the matrix-vector product operation. Instead of the conventional MPI+X paradigm, Chapel is chosen as the language for these distributed algorithms. We provide a comprehensive description of the algorithms and present performance and scalability tests. Our implementation outperforms the state-of-the-art MPI-based solution by a factor of 7--8 on 32 compute nodes or 4096 cores and exhibits very good scaling on up to 256 nodes or 32768 cores. The implementation has 3 times fewer software lines of code than the current state of the art while remaining fully generic. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: 11 pages, 9 figures

arXiv:2211.04279 [pdf, other]

Detecting Shortcuts in Medical Images -- A Case Study in Chest X-rays

Authors: Amelia Jiménez-Sánchez, Dovile Juodelyte, Bethany Chamberlain, Veronika Cheplygina

Abstract: The availability of large public datasets and the increased amount of computing power have shifted the interest of the medical community to high-performance algorithms. However, little attention is paid to the quality of the data and their annotations. High performance on benchmark datasets may be reported without considering possible shortcuts or artifacts in the data, besides, models are not tes… ▽ More The availability of large public datasets and the increased amount of computing power have shifted the interest of the medical community to high-performance algorithms. However, little attention is paid to the quality of the data and their annotations. High performance on benchmark datasets may be reported without considering possible shortcuts or artifacts in the data, besides, models are not tested on subpopulation groups. With this work, we aim to raise awareness about shortcuts problems. We validate previous findings, and present a case study on chest X-rays using two publicly available datasets. We share annotations for a subset of pneumothorax images with drains. We conclude with general recommendations for medical image classification. △ Less

Submitted 9 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: Submitted to ISBI 2023

arXiv:2210.01542 [pdf, other]

Hyperbolic Deep Reinforcement Learning

Authors: Edoardo Cetin, Benjamin Chamberlain, Michael Bronstein, Jonathan J Hunt

Abstract: We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry p… ▽ More We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: Preprint

arXiv:2210.00513 [pdf, other]

Gradient Gating for Deep Multi-Rate Learning on Graphs

Authors: T. Konstantin Rusch, Benjamin P. Chamberlain, Michael W. Mahoney, Michael M. Bronstein, Siddhartha Mishra

Abstract: We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any… ▽ More We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any basic GNN layer as a wrapper around which the multi-rate gradient gating mechanism is built. We rigorously prove that G$^2$ alleviates the oversmoothing problem and allows the design of deep GNNs. Empirical results are presented to demonstrate that the proposed framework achieves state-of-the-art performance on a variety of graph learning tasks, including on large-scale heterophilic graphs. △ Less

Submitted 15 March, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.15486 [pdf, other]

Graph Neural Networks for Link Prediction with Subgraph Sketching

Authors: Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M. Bronstein, Max Hansmire

Abstract: Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rath… ▽ More Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rather than node) representations and incorporating structural features such as triangle counts. Since explicit link representations are often prohibitively expensive, recent works resorted to subgraph-based methods, which have achieved state-of-the-art performance for LP, but suffer from poor efficiency due to high levels of redundancy between subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link prediction. Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. ELPH is provably more expressive than Message Passing GNNs (MPNNs). It outperforms existing SGNN models on many standard LP benchmarks while being orders of magnitude faster. However, it shares the common GNN limitation that it is only efficient when the dataset fits in GPU memory. Accordingly, we develop a highly scalable model, called BUDDY, which uses feature precomputation to circumvent this limitation without sacrificing predictive performance. Our experiments show that BUDDY also outperforms SGNNs on standard LP benchmarks while being highly scalable and faster than ELPH. △ Less

Submitted 2 May, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 29 pages, 19 figures, 6 appendices

Journal ref: The Eleventh International Conference on Learning Representations 2023 (oral - top 5%)

arXiv:2206.10991 [pdf, other]

Understanding convolution on graphs via energies

Authors: Francesco Di Giovanni, James Rowbottom, Benjamin P. Chamberlain, Thomas Markovich, Michael M. Bronstein

Abstract: Graph Neural Networks (GNNs) typically operate by message-passing, where the state of a node is updated based on the information received from its neighbours. Most message-passing models act as graph convolutions, where features are mixed by a shared, linear transformation before being propagated over the edges. On node-classification tasks, graph convolutions have been shown to suffer from two li… ▽ More Graph Neural Networks (GNNs) typically operate by message-passing, where the state of a node is updated based on the information received from its neighbours. Most message-passing models act as graph convolutions, where features are mixed by a shared, linear transformation before being propagated over the edges. On node-classification tasks, graph convolutions have been shown to suffer from two limitations: poor performance on heterophilic graphs, and over-smoothing. It is common belief that both phenomena occur because such models behave as low-pass filters, meaning that the Dirichlet energy of the features decreases along the layers incurring a smoothing effect that ultimately makes features no longer distinguishable. In this work, we rigorously prove that simple graph-convolutional models can actually enhance high frequencies and even lead to an asymptotic behaviour we refer to as over-sharpening, opposite to over-smoothing. We do so by showing that linear graph convolutions with symmetric weights minimize a multi-particle energy that generalizes the Dirichlet energy; in this setting, the weight matrices induce edge-wise attraction (repulsion) through their positive (negative) eigenvalues, thereby controlling whether the features are being smoothed or sharpened. We also extend the analysis to non-linear GNNs, and demonstrate that some existing time-continuous GNNs are instead always dominated by the low frequencies. Finally, we validate our theoretical findings through ablations and real-world experiments. △ Less

Submitted 6 September, 2023; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: Accepted at TMLR; First two authors equal contribution; 35 pages

arXiv:2202.04579 [pdf, other]

Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs

Authors: Cristian Bodnar, Francesco Di Giovanni, Benjamin Paul Chamberlain, Pietro Liò, Michael M. Bronstein

Abstract: Cellular sheaves equip graphs with a "geometrical" structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that dis… ▽ More Cellular sheaves equip graphs with a "geometrical" structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that discretise this equation. In this paper, we use cellular sheaf theory to show that the underlying geometry of the graph is deeply linked with the performance of GNNs in heterophilic settings and their oversmoothing behaviour. By considering a hierarchy of increasingly general sheaves, we study how the ability of the sheaf diffusion process to achieve linear separation of the classes in the infinite time limit expands. At the same time, we prove that when the sheaf is non-trivial, discretised parametric diffusion processes have greater control than GNNs over their asymptotic behaviour. On the practical side, we study how sheaves can be learned from data. The resulting sheaf diffusion models have many desirable properties that address the limitations of classical graph diffusion equations (and corresponding GNN models) and obtain competitive results in heterophilic settings. Overall, our work provides new connections between GNNs and algebraic topology and would be of interest to both fields. △ Less

Submitted 6 January, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Accepted to NeurIPS 2022. Contains 29 pages, 10 figures

arXiv:2202.02296 [pdf, other]

Graph-Coupled Oscillator Networks

Authors: T. Konstantin Rusch, Benjamin P. Chamberlain, James Rowbottom, Siddhartha Mishra, Michael M. Bronstein

Abstract: We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear controlled and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g.… ▽ More We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear controlled and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g. convolutional or attentional) as the coupling function, from which a multi-layer deep neural network is built up via the dynamics of the proposed ODEs. We relate the oversmoothing problem, commonly encountered in GNNs, to the stability of steady states of the underlying ODE and show that zero-Dirichlet energy steady states are not stable for our proposed ODEs. This demonstrates that the proposed framework mitigates the oversmoothing problem. Moreover, we prove that GraphCON mitigates the exploding and vanishing gradients problem to facilitate training of deep multi-layer GNNs. Finally, we show that our approach offers competitive performance with respect to the state-of-the-art on a variety of graph-based learning tasks. △ Less

Submitted 23 June, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: ICML 2022

arXiv:2111.14522 [pdf, other]

Understanding over-squashing and bottlenecks on graphs via curvature

Authors: Jake Top**, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, Michael M. Bronstein

Abstract: Most graph neural networks (GNNs) use the message passing paradigm, in which node features are propagated on the input graph. Recent works pointed to the distortion of information flowing from distant nodes as a factor limiting the efficiency of message passing for tasks relying on long-distance interactions. This phenomenon, referred to as 'over-squashing', has been heuristically attributed to gr… ▽ More Most graph neural networks (GNNs) use the message passing paradigm, in which node features are propagated on the input graph. Recent works pointed to the distortion of information flowing from distant nodes as a factor limiting the efficiency of message passing for tasks relying on long-distance interactions. This phenomenon, referred to as 'over-squashing', has been heuristically attributed to graph bottlenecks where the number of $k$-hop neighbors grows rapidly with $k$. We provide a precise description of the over-squashing phenomenon in GNNs and analyze how it arises from bottlenecks in the graph. For this purpose, we introduce a new edge-based combinatorial curvature and prove that negatively curved edges are responsible for the over-squashing issue. We also propose and experimentally test a curvature-based graph rewiring method to alleviate the over-squashing. △ Less

Submitted 12 November, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: First two authors equal contribution; outstanding paper honorable mention at ICLR22; 30 pages. v2: corrected Corollary 3 and rephrased in terms of graph diameter. Any other result is unaffected

arXiv:2111.12128 [pdf, other]

On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features

Authors: Emanuele Rossi, Henry Kenlay, Maria I. Gorinova, Benjamin Paul Chamberlain, Xiaowen Dong, Michael Bronstein

Abstract: While Graph Neural Networks (GNNs) have recently become the de facto standard for modeling relational data, they impose a strong assumption on the availability of the node or edge features of the graph. In many real-world applications, however, features are only partially available; for example, in social networks, age and gender are available only for a small subset of users. We present a general… ▽ More While Graph Neural Networks (GNNs) have recently become the de facto standard for modeling relational data, they impose a strong assumption on the availability of the node or edge features of the graph. In many real-world applications, however, features are only partially available; for example, in social networks, age and gender are available only for a small subset of users. We present a general approach for handling missing features in graph machine learning applications that is based on minimization of the Dirichlet energy and leads to a diffusion-type differential equation on the graph. The discretization of this equation produces a simple, fast and scalable algorithm which we call Feature Propagation. We experimentally show that the proposed approach outperforms previous methods on seven common node-classification benchmarks and can withstand surprisingly high rates of missing features: on average we observe only around 4% relative accuracy drop when 99% of the features are missing. Moreover, it takes only 10 seconds to run on a graph with $\sim$2.5M nodes and $\sim$123M edges on a single GPU. △ Less

Submitted 23 May, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

arXiv:2110.09443 [pdf, other]

Beltrami Flow and Neural Diffusion on Graphs

Authors: Benjamin Paul Chamberlain, James Rowbottom, Davide Eynard, Francesco Di Giovanni, Xiaowen Dong, Michael M Bronstein

Abstract: We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural… ▽ More We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 21 pages, 5 figures. Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) 2021

arXiv:2109.08245 [pdf, other]

The 2021 RecSys Challenge Dataset: Fairness is not optional

Authors: Luca Belli, Alykhan Tejani, Frank Portman, Alexandre Lung-Yut-Fong, Ben Chamberlain, Yuanpu Xie, Kristian Lum, Jonathan Hunt, Michael Bronstein, Vito Walter Anelli, Saikishore Kalloori, Bruce Ferwerda, Wenzhe Shi

Abstract: After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dat… ▽ More After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale. △ Less

Submitted 21 September, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

arXiv:2106.10934 [pdf, other]

GRAND: Graph Neural Diffusion

Authors: Benjamin Paul Chamberlain, James Rowbottom, Maria Gorinova, Stefan Webb, Emanuele Rossi, Michael M. Bronstein

Abstract: We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that a… ▽ More We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks. △ Less

Submitted 22 September, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

arXiv:2009.12192 [pdf, other]

doi 10.1145/3383313.3418486

Tuning Word2vec for Large Scale Recommendation Systems

Authors: Benjamin P. Chamberlain, Emanuele Rossi, Dan Shiebler, Suvash Sedhain, Michael M. Bronstein

Abstract: Word2vec is a powerful machine learning tool that emerged from Natural Lan-guage Processing (NLP) and is now applied in multiple domains, including recom-mender systems, forecasting, and network analysis. As Word2vec is often used offthe shelf, we address the question of whether the default hyperparameters are suit-able for recommender systems. The answer is emphatically no. In this paper, wefirst… ▽ More Word2vec is a powerful machine learning tool that emerged from Natural Lan-guage Processing (NLP) and is now applied in multiple domains, including recom-mender systems, forecasting, and network analysis. As Word2vec is often used offthe shelf, we address the question of whether the default hyperparameters are suit-able for recommender systems. The answer is emphatically no. In this paper, wefirst elucidate the importance of hyperparameter optimization and show that un-constrained optimization yields an average 221% improvement in hit rate over thedefault parameters. However, unconstrained optimization leads to hyperparametersettings that are very expensive and not feasible for large scale recommendationtasks. To this end, we demonstrate 138% average improvement in hit rate with aruntime budget-constrained hyperparameter optimization. Furthermore, to makehyperparameter optimization applicable for large scale recommendation problemswhere the target dataset is too large to search over, we investigate generalizinghyperparameters settings from samples. We show that applying constrained hy-perparameter optimization using only a 10% sample of the data still yields a 91%average improvement in hit rate over the default parameters when applied to thefull datasets. Finally, we apply hyperparameters learned using our method of con-strained optimization on a sample to the Who To Follow recommendation serviceat Twitter and are able to increase follow rates by 15%. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 11 pages, 4 figures, Fourteenth ACM Conference on Recommender Systems

Journal ref: Fourteenth ACM Conference on Recommender Systems (RecSys '20), September 22--26, 2020, Virtual Event, Brazil

arXiv:2006.10637 [pdf, other]

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Authors: Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, Michael Bronstein

Abstract: Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing… ▽ More Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs. △ Less

Submitted 9 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

arXiv:2004.11198 [pdf, other]

SIGN: Scalable Inception Graph Neural Networks

Authors: Fabrizio Frasca, Emanuele Rossi, Davide Eynard, Ben Chamberlain, Michael Bronstein, Federico Monti

Abstract: Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in develo** methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, th… ▽ More Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in develo** methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motif-induced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct extensive experimental evaluation on various open benchmarks and show that our approach is competitive with other state-of-the-art architectures, while requiring a fraction of the training and inference time. Moreover, we obtain state-of-the-art results on ogbn-papers100M, the largest public graph dataset, with over 110 million nodes and 1.5 billion edges. △ Less

Submitted 3 November, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

Comments: Extended experiments to ogbn-papers100M

arXiv:1909.03457 [pdf, other]

What is the value of experimentation & measurement?

Authors: C. H. Bryan Liu, Benjamin Paul Chamberlain

Abstract: Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until now, the question of measuring the measurer, or valuing the contribution of an E&M capability to organizational success has not been addressed. We tackle this problem by analyzing how, by decreasing esti… ▽ More Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until now, the question of measuring the measurer, or valuing the contribution of an E&M capability to organizational success has not been addressed. We tackle this problem by analyzing how, by decreasing estimation uncertainty, E&M platforms allow for better prioritization. We quantify this benefit in terms of expected relative improvement in the performance of all new propositions and provide guidance for how much an E&M capability is worth and when organizations should invest in one. △ Less

Submitted 8 September, 2019; originally announced September 2019.

Comments: Accepted into IEEE International Conference on Data Mining (ICDM) 2019. Main paper: 6 pages, 3 figures; Supplementary document: 7 pages, 2 figures. Code available on: https://github.com/liuchbryan/value_of_experimentation

arXiv:1904.00741 [pdf, other]

Fashion Outfit Generation for E-commerce

Authors: Elaine M. Bettaney, Stephen R. Hardwick, Odysseas Zisimopoulos, Benjamin Paul Chamberlain

Abstract: Combining items of clothing into an outfit is a major task in fashion retail. Recommending sets of items that are compatible with a particular seed item is useful for providing users with guidance and inspiration, but is currently a manual process that requires expert stylists and is therefore not scalable or easy to personalise. We use a multilayer neural network fed by visual and textual feature… ▽ More Combining items of clothing into an outfit is a major task in fashion retail. Recommending sets of items that are compatible with a particular seed item is useful for providing users with guidance and inspiration, but is currently a manual process that requires expert stylists and is therefore not scalable or easy to personalise. We use a multilayer neural network fed by visual and textual features to learn embeddings of items in a latent style space such that compatible items of different types are embedded close to one another. We train our model using the ASOS outfits dataset, which consists of a large number of outfits created by professional stylists and which we release to the research community. Our model shows strong performance in an offline outfit compatibility prediction task. We use our model to generate outfits and for the first time in this field perform an AB test, comparing our generated outfits to those produced by a baseline model which matches appropriate product types but uses no information on style. Users approved of outfits generated by our model 21% and 34% more frequently than those generated by the baseline model for womenswear and menswear respectively. △ Less

Submitted 18 March, 2019; originally announced April 2019.

Comments: 9 pages, 9 figures, 4 tables

arXiv:1902.08648 [pdf, other]

Scalable Hyperbolic Recommender Systems

Authors: Benjamin Paul Chamberlain, Stephen R. Hardwick, David R. Wardrope, Fabon Dzogang, Fabio Daolio, Saúl Vargas

Abstract: We present a large scale hyperbolic recommender system. We discuss why hyperbolic geometry is a more suitable underlying geometry for many recommendation systems and cover the fundamental milestones and insights that we have gained from its development. In doing so, we demonstrate the viability of hyperbolic geometry for recommender systems, showing that they significantly outperform Euclidean mod… ▽ More We present a large scale hyperbolic recommender system. We discuss why hyperbolic geometry is a more suitable underlying geometry for many recommendation systems and cover the fundamental milestones and insights that we have gained from its development. In doing so, we demonstrate the viability of hyperbolic geometry for recommender systems, showing that they significantly outperform Euclidean models on datasets with the properties of complex networks. Key to the success of our approach are the novel choice of underlying hyperbolic model and the use of the Einstein midpoint to define an asymmetric recommender system in hyperbolic space. These choices allow us to scale to millions of users and hundreds of thousands of items. △ Less

Submitted 22 February, 2019; originally announced February 2019.

Comments: 11 pages, 8 figures, 2 tables

arXiv:1807.04098 [pdf, other]

doi 10.1007/978-3-030-10997-4_10

A Recurrent Neural Network Survival Model: Predicting Web User Return Time

Authors: Georg L. Grob, Ângelo Cardoso, C. H. Bryan Liu, Duncan A. Little, Benjamin Paul Chamberlain

Abstract: The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both… ▽ More The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: Accepted into ECML PKDD 2018; 8 figures and 1 table

Journal ref: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science, vol 11053. pp 152-168

arXiv:1806.02588 [pdf, other]

Designing Experiments to Measure Incrementality on Facebook

Authors: C. H. Bryan Liu, Elaine M. Bettaney, Benjamin Paul Chamberlain

Abstract: The importance of Facebook advertising has risen dramatically in recent years, with the platform accounting for almost 20% of the global online ad spend in 2017. An important consideration in advertising is incrementality: how much of the change in an experimental metric is an advertising campaign responsible for. To measure incrementality, Facebook provide lift studies. As Facebook lift studies d… ▽ More The importance of Facebook advertising has risen dramatically in recent years, with the platform accounting for almost 20% of the global online ad spend in 2017. An important consideration in advertising is incrementality: how much of the change in an experimental metric is an advertising campaign responsible for. To measure incrementality, Facebook provide lift studies. As Facebook lift studies differ from standard A/B tests, the online experimentation literature does not describe how to calculate parameters such as power and minimum sample size. Facebook also offer multi-cell lift tests, which can be used to compare campaigns that don't have statistically identical audiences. In this case, there is no literature describing how to measure the significance of the difference in incrementality between cells, or how to estimate the power or minimum sample size. We fill these gaps in the literature by providing the statistical power and required sample size calculation for Facebook lift studies. We then generalise the statistical significance, power, and required sample size calculation to multi-cell lift studies. We represent our results theoretically in terms of the distributions of test metrics and in practical terms relating to the metrics used by practitioners, making all of our code publicly available. △ Less

Submitted 11 July, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

Comments: Accepted into 2018 AdKDD & TargetAd Workshop in conjunction with KDD 2018; 6 pages, 4 figures, and 2 tables

arXiv:1804.04095 [pdf, other]

Predicting Twitter User Socioeconomic Attributes with Network and Language Information

Authors: Nikolaos Aletras, Benjamin Paul Chamberlain

Abstract: Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic a… ▽ More Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been explored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: Accepted at ACM HT 2018

arXiv:1803.06258 [pdf, other]

Online Controlled Experiments for Personalised e-Commerce Strategies: Design, Challenges, and Pitfalls

Authors: C. H. Bryan Liu, Benjamin Paul Chamberlain

Abstract: Online controlled experiments are the primary tool for measuring the causal impact of product changes in digital businesses. It is increasingly common for digital products and services to interact with customers in a personalised way. Using online controlled experiments to optimise personalised interaction strategies is challenging because the usual assumption of statistically equivalent user grou… ▽ More Online controlled experiments are the primary tool for measuring the causal impact of product changes in digital businesses. It is increasingly common for digital products and services to interact with customers in a personalised way. Using online controlled experiments to optimise personalised interaction strategies is challenging because the usual assumption of statistically equivalent user groups is violated. Additionally, challenges are introduced by users qualifying for strategies based on dynamic, stochastic attributes. Traditional A/B tests can salvage statistical equivalence by pre-allocating users to control and exposed groups, but this dilutes the experimental metrics and reduces the test power. We present a stacked incrementality test framework that addresses problems with running online experiments for personalised user strategies. We derive bounds that show that our framework is superior to the best simple A/B test given enough users and that this condition is easily met for large scale online experiments. In addition, we provide a test power calculator and describe a selection of pitfalls and lessons learnt from our experience using it. △ Less

Submitted 1 July, 2021; v1 submitted 16 March, 2018; originally announced March 2018.

Comments: Not peer-reviewed but retained for historic interest. Removed an erroneous statement on Welch's t-test assumptions in Section 3.2. 9 pages, 7 figures

arXiv:1712.01209 [pdf, other]

doi 10.4230/OASIcs.ICCSW.2018.1

Speeding Up BigClam Implementation on SNAP

Authors: C. H. Bryan Liu, Benjamin Paul Chamberlain

Abstract: We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlap** communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community -… ▽ More We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlap** communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community - dominates the runtime of the implementation, yet the computation is not parallelized. We show that by parallelizing computations across multiple threads using OpenMP we can speed up the algorithm by 5.3 times when solving large networks for communities, while preserving the integrity of the program and the result. △ Less

Submitted 4 September, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

Comments: To appear in 2018 Imperial College Computing Student Workshop (ICCSW'18); 12 pages, 4 figures, and 3 tables

Journal ref: 2018 Imperial College Computing Student Workshop (ICCSW 2018). OpenAccess Series in Informatics (OASIcs), vol. 66, pp. 1:1-1:13

arXiv:1706.09865 [pdf, other]

doi 10.1007/978-3-319-71273-4_9

Generalising Random Forest Parameter Optimisation to Include Stability and Cost

Authors: C. H. Bryan Liu, Benjamin Paul Chamberlain, Duncan A. Little, Angelo Cardoso

Abstract: Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest… ▽ More Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forests predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics. △ Less

Submitted 13 July, 2017; v1 submitted 29 June, 2017; originally announced June 2017.

Comments: To appear in ECML-PKDD 2017

Journal ref: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. LNCS vol 10536, pp. 102-113 (2017)

arXiv:1705.10359 [pdf, other]

Neural Embeddings of Graphs in Hyperbolic Space

Authors: Benjamin Paul Chamberlain, James Clough, Marc Peter Deisenroth

Abstract: Neural embeddings have been used with great success in Natural Language Processing (NLP). They provide compact representations that encapsulate word similarity and attain state-of-the-art performance in a range of linguistic tasks. The success of neural embeddings has prompted significant amounts of research into applications in domains other than language. One such domain is graph-structured data… ▽ More Neural embeddings have been used with great success in Natural Language Processing (NLP). They provide compact representations that encapsulate word similarity and attain state-of-the-art performance in a range of linguistic tasks. The success of neural embeddings has prompted significant amounts of research into applications in domains other than language. One such domain is graph-structured data, where embeddings of vertices can be learned that encapsulate vertex similarity and improve performance on tasks including edge prediction and vertex labelling. For both NLP and graph based tasks, embeddings have been learned in high-dimensional Euclidean spaces. However, recent work has shown that the appropriate isometric space for embedding complex networks is not the flat Euclidean space, but negatively curved, hyperbolic space. We present a new concept that exploits these recent insights and propose learning neural embeddings of graphs in hyperbolic space. We provide experimental evidence that embedding graphs in their natural geometry significantly improves performance on downstream tasks for several real-world public datasets. △ Less

Submitted 29 May, 2017; originally announced May 2017.

Comments: 7 pages, 5 figures

Journal ref: 13th international workshop on mining and learning from graphs held in conjunction with KDD, 2017

arXiv:1703.02596 [pdf, other]

doi 10.1145/3097983.3098123

Customer Lifetime Value Prediction Using Embeddings

Authors: Benjamin Paul Chamberlain, Angelo Cardoso, C. H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth

Abstract: We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of th… ▽ More We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shop** experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features. △ Less

Submitted 6 July, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

Comments: 10 pages, 11 figures

Journal ref: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pages 1753-1762, 2017

arXiv:1601.04621 [pdf, other]

Probabilistic Inference of Twitter Users' Age based on What They Follow

Authors: Benjamin Paul Chamberlain, Clive Humby, Marc Peter Deisenroth

Abstract: Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward ag… ▽ More Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward age detection of Twitter users typically focus on specific properties of tweets, e.g., linguistic features, which are language dependent. In this paper, we devise a language-independent methodology for determining the age of Twitter users from data that is native to the Twitter ecosystem. The key idea is to use a Bayesian framework to generalise ground-truth age information from a few Twitter users to the entire network based on what/whom they follow. Our approach scales to inferring the age of 700 million Twitter accounts with high accuracy. △ Less

Submitted 24 February, 2017; v1 submitted 18 January, 2016; originally announced January 2016.

Comments: 9 pages, 9 figures

arXiv:1601.03958 [pdf, other]

Real-Time Community Detection in Large Social Networks on a Laptop

Authors: Benjamin Paul Chamberlain, Josh Levy-Kramer, Clive Humby, Marc Peter Deisenroth

Abstract: For a broad range of research, governmental and commercial applications it is important to understand the allegiances, communities and structure of key players in society. One promising direction towards extracting this information is to exploit the rich relational data in digital social networks (the social graph). As social media data sets are very large, most approaches make use of distributed… ▽ More For a broad range of research, governmental and commercial applications it is important to understand the allegiances, communities and structure of key players in society. One promising direction towards extracting this information is to exploit the rich relational data in digital social networks (the social graph). As social media data sets are very large, most approaches make use of distributed computing systems for this purpose. Distributing graph processing requires solving many difficult engineering problems, which has lead some researchers to look at single-machine solutions that are faster and easier to maintain. In this article, we present a single-machine real-time system for large-scale graph processing that allows analysts to interactively explore graph structures. The key idea is that the aggregate actions of large numbers of users can be compressed into a data structure that encapsulates user similarities while being robust to noise and queryable in real-time. We achieve single machine real-time performance by compressing the neighbourhood of each vertex using minhash signatures and facilitate rapid queries through Locality Sensitive Hashing. These techniques reduce query times from hours using industrial desktop machines operating on the full graph to milliseconds on standard laptops. Our method allows exploration of strongly associated regions (i.e. communities) of large graphs in real-time on a laptop. It has been deployed in software that is actively used by social network analysts and offers another channel for media owners to monetise their data, hel** them to continue to provide free services that are valued by billions of people globally. △ Less

Submitted 4 September, 2016; v1 submitted 15 January, 2016; originally announced January 2016.

arXiv:1304.1146 [pdf]

Analysis in HUGIN of Data Conflict

Authors: Bo Chamberlain, Finn Verner Jensen, Frank Jensen, Torsten Nordahl

Abstract: After a brief introduction to causal probabilistic networks and the HUGIN approach, the problem of conflicting data is discussed. A measure of conflict is defined, and it is used in the medical diagnostic system MUNIN. Finally, it is discussed how to distinguish between conflicting data and a rare case. After a brief introduction to causal probabilistic networks and the HUGIN approach, the problem of conflicting data is discussed. A measure of conflict is defined, and it is used in the medical diagnostic system MUNIN. Finally, it is discussed how to distinguish between conflicting data and a rare case. △ Less

Submitted 27 March, 2013; originally announced April 2013.

Comments: Appears in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI1990)

Report number: UAI-P-1990-PG-546-554

arXiv:1208.3538 [pdf, other]

Inverse Modeling of Dynamical Systems: Multi-Dimensional Extensions of a Stochastic Switching Problem

Authors: Erik Bates, Blake Chamberlain, Rachel Gettinger

Abstract: The Buridan's ass paradox is characterized by perpetual indecision between two states, which are never attained. When this problem is formulated as a dynamical system, indecision is modeled by a discrete-state Markov process determined by the system's unknown parameters. Interest lies in estimating these parameters from a limited number of observations. We compare estimation methods and examine ho… ▽ More The Buridan's ass paradox is characterized by perpetual indecision between two states, which are never attained. When this problem is formulated as a dynamical system, indecision is modeled by a discrete-state Markov process determined by the system's unknown parameters. Interest lies in estimating these parameters from a limited number of observations. We compare estimation methods and examine how well each can be generalized to multi-dimensional extensions of this system. By quantifying statistics such as mean, variance, frequency, and cumulative power, we construct both method of moments type estimators and likelihood-based estimators. We show, however, why these techniques become intractable in higher dimensions, and thus develop a geometric approach to reveal the parameters underlying the Markov process. We also examine the robustness of this method to the presence of noise. △ Less

Submitted 17 August, 2012; originally announced August 2012.

Comments: 40 pages, 6 figures. Work was completed by the authors while attending the SURIEM REU at Michigan State University. The authors were awarded an MAA Best Presentation Award at MathFest 2012 (Madison, WI) relating this material

MSC Class: 62M86

Showing 1–31 of 31 results for author: Chamberlain, B