-
Implementing scalable matrix-vector products for the exact diagonalization methods in quantum many-body physics
Authors:
Tom Westerhout,
Bradford L. Chamberlain
Abstract:
Exact diagonalization is a well-established method for simulating small quantum systems. Its applicability is limited by the exponential growth of the so-called Hamiltonian matrix that needs to be diagonalized. Physical symmetries are usually utilized to reduce the matrix dimension, and distributed-memory parallelism is employed to explore larger systems. This paper focuses on the implementation t…
▽ More
Exact diagonalization is a well-established method for simulating small quantum systems. Its applicability is limited by the exponential growth of the so-called Hamiltonian matrix that needs to be diagonalized. Physical symmetries are usually utilized to reduce the matrix dimension, and distributed-memory parallelism is employed to explore larger systems. This paper focuses on the implementation the core distributed algorithms, with a special emphasis on the matrix-vector product operation. Instead of the conventional MPI+X paradigm, Chapel is chosen as the language for these distributed algorithms.
We provide a comprehensive description of the algorithms and present performance and scalability tests. Our implementation outperforms the state-of-the-art MPI-based solution by a factor of 7--8 on 32 compute nodes or 4096 cores and exhibits very good scaling on up to 256 nodes or 32768 cores. The implementation has 3 times fewer software lines of code than the current state of the art while remaining fully generic.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Detecting Shortcuts in Medical Images -- A Case Study in Chest X-rays
Authors:
Amelia Jiménez-Sánchez,
Dovile Juodelyte,
Bethany Chamberlain,
Veronika Cheplygina
Abstract:
The availability of large public datasets and the increased amount of computing power have shifted the interest of the medical community to high-performance algorithms. However, little attention is paid to the quality of the data and their annotations. High performance on benchmark datasets may be reported without considering possible shortcuts or artifacts in the data, besides, models are not tes…
▽ More
The availability of large public datasets and the increased amount of computing power have shifted the interest of the medical community to high-performance algorithms. However, little attention is paid to the quality of the data and their annotations. High performance on benchmark datasets may be reported without considering possible shortcuts or artifacts in the data, besides, models are not tested on subpopulation groups. With this work, we aim to raise awareness about shortcuts problems. We validate previous findings, and present a case study on chest X-rays using two publicly available datasets. We share annotations for a subset of pneumothorax images with drains. We conclude with general recommendations for medical image classification.
△ Less
Submitted 9 November, 2022; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Hyperbolic Deep Reinforcement Learning
Authors:
Edoardo Cetin,
Benjamin Chamberlain,
Michael Bronstein,
Jonathan J Hunt
Abstract:
We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry p…
▽ More
We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Gradient Gating for Deep Multi-Rate Learning on Graphs
Authors:
T. Konstantin Rusch,
Benjamin P. Chamberlain,
Michael W. Mahoney,
Michael M. Bronstein,
Siddhartha Mishra
Abstract:
We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any…
▽ More
We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any basic GNN layer as a wrapper around which the multi-rate gradient gating mechanism is built. We rigorously prove that G$^2$ alleviates the oversmoothing problem and allows the design of deep GNNs. Empirical results are presented to demonstrate that the proposed framework achieves state-of-the-art performance on a variety of graph learning tasks, including on large-scale heterophilic graphs.
△ Less
Submitted 15 March, 2023; v1 submitted 2 October, 2022;
originally announced October 2022.
-
Graph Neural Networks for Link Prediction with Subgraph Sketching
Authors:
Benjamin Paul Chamberlain,
Sergey Shirobokov,
Emanuele Rossi,
Fabrizio Frasca,
Thomas Markovich,
Nils Hammerla,
Michael M. Bronstein,
Max Hansmire
Abstract:
Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rath…
▽ More
Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rather than node) representations and incorporating structural features such as triangle counts. Since explicit link representations are often prohibitively expensive, recent works resorted to subgraph-based methods, which have achieved state-of-the-art performance for LP, but suffer from poor efficiency due to high levels of redundancy between subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link prediction. Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. ELPH is provably more expressive than Message Passing GNNs (MPNNs). It outperforms existing SGNN models on many standard LP benchmarks while being orders of magnitude faster. However, it shares the common GNN limitation that it is only efficient when the dataset fits in GPU memory. Accordingly, we develop a highly scalable model, called BUDDY, which uses feature precomputation to circumvent this limitation without sacrificing predictive performance. Our experiments show that BUDDY also outperforms SGNNs on standard LP benchmarks while being highly scalable and faster than ELPH.
△ Less
Submitted 2 May, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Understanding convolution on graphs via energies
Authors:
Francesco Di Giovanni,
James Rowbottom,
Benjamin P. Chamberlain,
Thomas Markovich,
Michael M. Bronstein
Abstract:
Graph Neural Networks (GNNs) typically operate by message-passing, where the state of a node is updated based on the information received from its neighbours. Most message-passing models act as graph convolutions, where features are mixed by a shared, linear transformation before being propagated over the edges. On node-classification tasks, graph convolutions have been shown to suffer from two li…
▽ More
Graph Neural Networks (GNNs) typically operate by message-passing, where the state of a node is updated based on the information received from its neighbours. Most message-passing models act as graph convolutions, where features are mixed by a shared, linear transformation before being propagated over the edges. On node-classification tasks, graph convolutions have been shown to suffer from two limitations: poor performance on heterophilic graphs, and over-smoothing. It is common belief that both phenomena occur because such models behave as low-pass filters, meaning that the Dirichlet energy of the features decreases along the layers incurring a smoothing effect that ultimately makes features no longer distinguishable. In this work, we rigorously prove that simple graph-convolutional models can actually enhance high frequencies and even lead to an asymptotic behaviour we refer to as over-sharpening, opposite to over-smoothing. We do so by showing that linear graph convolutions with symmetric weights minimize a multi-particle energy that generalizes the Dirichlet energy; in this setting, the weight matrices induce edge-wise attraction (repulsion) through their positive (negative) eigenvalues, thereby controlling whether the features are being smoothed or sharpened. We also extend the analysis to non-linear GNNs, and demonstrate that some existing time-continuous GNNs are instead always dominated by the low frequencies. Finally, we validate our theoretical findings through ablations and real-world experiments.
△ Less
Submitted 6 September, 2023; v1 submitted 22 June, 2022;
originally announced June 2022.
-
Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs
Authors:
Cristian Bodnar,
Francesco Di Giovanni,
Benjamin Paul Chamberlain,
Pietro Liò,
Michael M. Bronstein
Abstract:
Cellular sheaves equip graphs with a "geometrical" structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that dis…
▽ More
Cellular sheaves equip graphs with a "geometrical" structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that discretise this equation. In this paper, we use cellular sheaf theory to show that the underlying geometry of the graph is deeply linked with the performance of GNNs in heterophilic settings and their oversmoothing behaviour. By considering a hierarchy of increasingly general sheaves, we study how the ability of the sheaf diffusion process to achieve linear separation of the classes in the infinite time limit expands. At the same time, we prove that when the sheaf is non-trivial, discretised parametric diffusion processes have greater control than GNNs over their asymptotic behaviour. On the practical side, we study how sheaves can be learned from data. The resulting sheaf diffusion models have many desirable properties that address the limitations of classical graph diffusion equations (and corresponding GNN models) and obtain competitive results in heterophilic settings. Overall, our work provides new connections between GNNs and algebraic topology and would be of interest to both fields.
△ Less
Submitted 6 January, 2023; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Graph-Coupled Oscillator Networks
Authors:
T. Konstantin Rusch,
Benjamin P. Chamberlain,
James Rowbottom,
Siddhartha Mishra,
Michael M. Bronstein
Abstract:
We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear controlled and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g.…
▽ More
We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear controlled and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g. convolutional or attentional) as the coupling function, from which a multi-layer deep neural network is built up via the dynamics of the proposed ODEs. We relate the oversmoothing problem, commonly encountered in GNNs, to the stability of steady states of the underlying ODE and show that zero-Dirichlet energy steady states are not stable for our proposed ODEs. This demonstrates that the proposed framework mitigates the oversmoothing problem. Moreover, we prove that GraphCON mitigates the exploding and vanishing gradients problem to facilitate training of deep multi-layer GNNs. Finally, we show that our approach offers competitive performance with respect to the state-of-the-art on a variety of graph-based learning tasks.
△ Less
Submitted 23 June, 2022; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Understanding over-squashing and bottlenecks on graphs via curvature
Authors:
Jake Top**,
Francesco Di Giovanni,
Benjamin Paul Chamberlain,
Xiaowen Dong,
Michael M. Bronstein
Abstract:
Most graph neural networks (GNNs) use the message passing paradigm, in which node features are propagated on the input graph. Recent works pointed to the distortion of information flowing from distant nodes as a factor limiting the efficiency of message passing for tasks relying on long-distance interactions. This phenomenon, referred to as 'over-squashing', has been heuristically attributed to gr…
▽ More
Most graph neural networks (GNNs) use the message passing paradigm, in which node features are propagated on the input graph. Recent works pointed to the distortion of information flowing from distant nodes as a factor limiting the efficiency of message passing for tasks relying on long-distance interactions. This phenomenon, referred to as 'over-squashing', has been heuristically attributed to graph bottlenecks where the number of $k$-hop neighbors grows rapidly with $k$. We provide a precise description of the over-squashing phenomenon in GNNs and analyze how it arises from bottlenecks in the graph. For this purpose, we introduce a new edge-based combinatorial curvature and prove that negatively curved edges are responsible for the over-squashing issue. We also propose and experimentally test a curvature-based graph rewiring method to alleviate the over-squashing.
△ Less
Submitted 12 November, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features
Authors:
Emanuele Rossi,
Henry Kenlay,
Maria I. Gorinova,
Benjamin Paul Chamberlain,
Xiaowen Dong,
Michael Bronstein
Abstract:
While Graph Neural Networks (GNNs) have recently become the de facto standard for modeling relational data, they impose a strong assumption on the availability of the node or edge features of the graph. In many real-world applications, however, features are only partially available; for example, in social networks, age and gender are available only for a small subset of users. We present a general…
▽ More
While Graph Neural Networks (GNNs) have recently become the de facto standard for modeling relational data, they impose a strong assumption on the availability of the node or edge features of the graph. In many real-world applications, however, features are only partially available; for example, in social networks, age and gender are available only for a small subset of users. We present a general approach for handling missing features in graph machine learning applications that is based on minimization of the Dirichlet energy and leads to a diffusion-type differential equation on the graph. The discretization of this equation produces a simple, fast and scalable algorithm which we call Feature Propagation. We experimentally show that the proposed approach outperforms previous methods on seven common node-classification benchmarks and can withstand surprisingly high rates of missing features: on average we observe only around 4% relative accuracy drop when 99% of the features are missing. Moreover, it takes only 10 seconds to run on a graph with $\sim$2.5M nodes and $\sim$123M edges on a single GPU.
△ Less
Submitted 23 May, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Beltrami Flow and Neural Diffusion on Graphs
Authors:
Benjamin Paul Chamberlain,
James Rowbottom,
Davide Eynard,
Francesco Di Giovanni,
Xiaowen Dong,
Michael M Bronstein
Abstract:
We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural…
▽ More
We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
The 2021 RecSys Challenge Dataset: Fairness is not optional
Authors:
Luca Belli,
Alykhan Tejani,
Frank Portman,
Alexandre Lung-Yut-Fong,
Ben Chamberlain,
Yuanpu Xie,
Kristian Lum,
Jonathan Hunt,
Michael Bronstein,
Vito Walter Anelli,
Saikishore Kalloori,
Bruce Ferwerda,
Wenzhe Shi
Abstract:
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dat…
▽ More
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.
△ Less
Submitted 21 September, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
GRAND: Graph Neural Diffusion
Authors:
Benjamin Paul Chamberlain,
James Rowbottom,
Maria Gorinova,
Stefan Webb,
Emanuele Rossi,
Michael M. Bronstein
Abstract:
We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that a…
▽ More
We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.
△ Less
Submitted 22 September, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Tuning Word2vec for Large Scale Recommendation Systems
Authors:
Benjamin P. Chamberlain,
Emanuele Rossi,
Dan Shiebler,
Suvash Sedhain,
Michael M. Bronstein
Abstract:
Word2vec is a powerful machine learning tool that emerged from Natural Lan-guage Processing (NLP) and is now applied in multiple domains, including recom-mender systems, forecasting, and network analysis. As Word2vec is often used offthe shelf, we address the question of whether the default hyperparameters are suit-able for recommender systems. The answer is emphatically no. In this paper, wefirst…
▽ More
Word2vec is a powerful machine learning tool that emerged from Natural Lan-guage Processing (NLP) and is now applied in multiple domains, including recom-mender systems, forecasting, and network analysis. As Word2vec is often used offthe shelf, we address the question of whether the default hyperparameters are suit-able for recommender systems. The answer is emphatically no. In this paper, wefirst elucidate the importance of hyperparameter optimization and show that un-constrained optimization yields an average 221% improvement in hit rate over thedefault parameters. However, unconstrained optimization leads to hyperparametersettings that are very expensive and not feasible for large scale recommendationtasks. To this end, we demonstrate 138% average improvement in hit rate with aruntime budget-constrained hyperparameter optimization. Furthermore, to makehyperparameter optimization applicable for large scale recommendation problemswhere the target dataset is too large to search over, we investigate generalizinghyperparameters settings from samples. We show that applying constrained hy-perparameter optimization using only a 10% sample of the data still yields a 91%average improvement in hit rate over the default parameters when applied to thefull datasets. Finally, we apply hyperparameters learned using our method of con-strained optimization on a sample to the Who To Follow recommendation serviceat Twitter and are able to increase follow rates by 15%.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Temporal Graph Networks for Deep Learning on Dynamic Graphs
Authors:
Emanuele Rossi,
Ben Chamberlain,
Fabrizio Frasca,
Davide Eynard,
Federico Monti,
Michael Bronstein
Abstract:
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing…
▽ More
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
△ Less
Submitted 9 October, 2020; v1 submitted 18 June, 2020;
originally announced June 2020.
-
SIGN: Scalable Inception Graph Neural Networks
Authors:
Fabrizio Frasca,
Emanuele Rossi,
Davide Eynard,
Ben Chamberlain,
Michael Bronstein,
Federico Monti
Abstract:
Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in develo** methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, th…
▽ More
Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in develo** methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motif-induced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct extensive experimental evaluation on various open benchmarks and show that our approach is competitive with other state-of-the-art architectures, while requiring a fraction of the training and inference time. Moreover, we obtain state-of-the-art results on ogbn-papers100M, the largest public graph dataset, with over 110 million nodes and 1.5 billion edges.
△ Less
Submitted 3 November, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.
-
What is the value of experimentation & measurement?
Authors:
C. H. Bryan Liu,
Benjamin Paul Chamberlain
Abstract:
Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until now, the question of measuring the measurer, or valuing the contribution of an E&M capability to organizational success has not been addressed. We tackle this problem by analyzing how, by decreasing esti…
▽ More
Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until now, the question of measuring the measurer, or valuing the contribution of an E&M capability to organizational success has not been addressed. We tackle this problem by analyzing how, by decreasing estimation uncertainty, E&M platforms allow for better prioritization. We quantify this benefit in terms of expected relative improvement in the performance of all new propositions and provide guidance for how much an E&M capability is worth and when organizations should invest in one.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
Fashion Outfit Generation for E-commerce
Authors:
Elaine M. Bettaney,
Stephen R. Hardwick,
Odysseas Zisimopoulos,
Benjamin Paul Chamberlain
Abstract:
Combining items of clothing into an outfit is a major task in fashion retail. Recommending sets of items that are compatible with a particular seed item is useful for providing users with guidance and inspiration, but is currently a manual process that requires expert stylists and is therefore not scalable or easy to personalise. We use a multilayer neural network fed by visual and textual feature…
▽ More
Combining items of clothing into an outfit is a major task in fashion retail. Recommending sets of items that are compatible with a particular seed item is useful for providing users with guidance and inspiration, but is currently a manual process that requires expert stylists and is therefore not scalable or easy to personalise. We use a multilayer neural network fed by visual and textual features to learn embeddings of items in a latent style space such that compatible items of different types are embedded close to one another. We train our model using the ASOS outfits dataset, which consists of a large number of outfits created by professional stylists and which we release to the research community. Our model shows strong performance in an offline outfit compatibility prediction task. We use our model to generate outfits and for the first time in this field perform an AB test, comparing our generated outfits to those produced by a baseline model which matches appropriate product types but uses no information on style. Users approved of outfits generated by our model 21% and 34% more frequently than those generated by the baseline model for womenswear and menswear respectively.
△ Less
Submitted 18 March, 2019;
originally announced April 2019.
-
Scalable Hyperbolic Recommender Systems
Authors:
Benjamin Paul Chamberlain,
Stephen R. Hardwick,
David R. Wardrope,
Fabon Dzogang,
Fabio Daolio,
Saúl Vargas
Abstract:
We present a large scale hyperbolic recommender system. We discuss why hyperbolic geometry is a more suitable underlying geometry for many recommendation systems and cover the fundamental milestones and insights that we have gained from its development. In doing so, we demonstrate the viability of hyperbolic geometry for recommender systems, showing that they significantly outperform Euclidean mod…
▽ More
We present a large scale hyperbolic recommender system. We discuss why hyperbolic geometry is a more suitable underlying geometry for many recommendation systems and cover the fundamental milestones and insights that we have gained from its development. In doing so, we demonstrate the viability of hyperbolic geometry for recommender systems, showing that they significantly outperform Euclidean models on datasets with the properties of complex networks. Key to the success of our approach are the novel choice of underlying hyperbolic model and the use of the Einstein midpoint to define an asymmetric recommender system in hyperbolic space. These choices allow us to scale to millions of users and hundreds of thousands of items.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
A Recurrent Neural Network Survival Model: Predicting Web User Return Time
Authors:
Georg L. Grob,
Ângelo Cardoso,
C. H. Bryan Liu,
Duncan A. Little,
Benjamin Paul Chamberlain
Abstract:
The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both…
▽ More
The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Designing Experiments to Measure Incrementality on Facebook
Authors:
C. H. Bryan Liu,
Elaine M. Bettaney,
Benjamin Paul Chamberlain
Abstract:
The importance of Facebook advertising has risen dramatically in recent years, with the platform accounting for almost 20% of the global online ad spend in 2017. An important consideration in advertising is incrementality: how much of the change in an experimental metric is an advertising campaign responsible for. To measure incrementality, Facebook provide lift studies. As Facebook lift studies d…
▽ More
The importance of Facebook advertising has risen dramatically in recent years, with the platform accounting for almost 20% of the global online ad spend in 2017. An important consideration in advertising is incrementality: how much of the change in an experimental metric is an advertising campaign responsible for. To measure incrementality, Facebook provide lift studies. As Facebook lift studies differ from standard A/B tests, the online experimentation literature does not describe how to calculate parameters such as power and minimum sample size. Facebook also offer multi-cell lift tests, which can be used to compare campaigns that don't have statistically identical audiences. In this case, there is no literature describing how to measure the significance of the difference in incrementality between cells, or how to estimate the power or minimum sample size. We fill these gaps in the literature by providing the statistical power and required sample size calculation for Facebook lift studies. We then generalise the statistical significance, power, and required sample size calculation to multi-cell lift studies. We represent our results theoretically in terms of the distributions of test metrics and in practical terms relating to the metrics used by practitioners, making all of our code publicly available.
△ Less
Submitted 11 July, 2018; v1 submitted 7 June, 2018;
originally announced June 2018.
-
Predicting Twitter User Socioeconomic Attributes with Network and Language Information
Authors:
Nikolaos Aletras,
Benjamin Paul Chamberlain
Abstract:
Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic a…
▽ More
Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been explored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Online Controlled Experiments for Personalised e-Commerce Strategies: Design, Challenges, and Pitfalls
Authors:
C. H. Bryan Liu,
Benjamin Paul Chamberlain
Abstract:
Online controlled experiments are the primary tool for measuring the causal impact of product changes in digital businesses. It is increasingly common for digital products and services to interact with customers in a personalised way. Using online controlled experiments to optimise personalised interaction strategies is challenging because the usual assumption of statistically equivalent user grou…
▽ More
Online controlled experiments are the primary tool for measuring the causal impact of product changes in digital businesses. It is increasingly common for digital products and services to interact with customers in a personalised way. Using online controlled experiments to optimise personalised interaction strategies is challenging because the usual assumption of statistically equivalent user groups is violated. Additionally, challenges are introduced by users qualifying for strategies based on dynamic, stochastic attributes. Traditional A/B tests can salvage statistical equivalence by pre-allocating users to control and exposed groups, but this dilutes the experimental metrics and reduces the test power. We present a stacked incrementality test framework that addresses problems with running online experiments for personalised user strategies. We derive bounds that show that our framework is superior to the best simple A/B test given enough users and that this condition is easily met for large scale online experiments. In addition, we provide a test power calculator and describe a selection of pitfalls and lessons learnt from our experience using it.
△ Less
Submitted 1 July, 2021; v1 submitted 16 March, 2018;
originally announced March 2018.
-
Speeding Up BigClam Implementation on SNAP
Authors:
C. H. Bryan Liu,
Benjamin Paul Chamberlain
Abstract:
We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlap** communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community -…
▽ More
We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlap** communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community - dominates the runtime of the implementation, yet the computation is not parallelized. We show that by parallelizing computations across multiple threads using OpenMP we can speed up the algorithm by 5.3 times when solving large networks for communities, while preserving the integrity of the program and the result.
△ Less
Submitted 4 September, 2018; v1 submitted 4 December, 2017;
originally announced December 2017.
-
Generalising Random Forest Parameter Optimisation to Include Stability and Cost
Authors:
C. H. Bryan Liu,
Benjamin Paul Chamberlain,
Duncan A. Little,
Angelo Cardoso
Abstract:
Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest…
▽ More
Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forests predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics.
△ Less
Submitted 13 July, 2017; v1 submitted 29 June, 2017;
originally announced June 2017.
-
Neural Embeddings of Graphs in Hyperbolic Space
Authors:
Benjamin Paul Chamberlain,
James Clough,
Marc Peter Deisenroth
Abstract:
Neural embeddings have been used with great success in Natural Language Processing (NLP). They provide compact representations that encapsulate word similarity and attain state-of-the-art performance in a range of linguistic tasks. The success of neural embeddings has prompted significant amounts of research into applications in domains other than language. One such domain is graph-structured data…
▽ More
Neural embeddings have been used with great success in Natural Language Processing (NLP). They provide compact representations that encapsulate word similarity and attain state-of-the-art performance in a range of linguistic tasks. The success of neural embeddings has prompted significant amounts of research into applications in domains other than language. One such domain is graph-structured data, where embeddings of vertices can be learned that encapsulate vertex similarity and improve performance on tasks including edge prediction and vertex labelling. For both NLP and graph based tasks, embeddings have been learned in high-dimensional Euclidean spaces. However, recent work has shown that the appropriate isometric space for embedding complex networks is not the flat Euclidean space, but negatively curved, hyperbolic space. We present a new concept that exploits these recent insights and propose learning neural embeddings of graphs in hyperbolic space. We provide experimental evidence that embedding graphs in their natural geometry significantly improves performance on downstream tasks for several real-world public datasets.
△ Less
Submitted 29 May, 2017;
originally announced May 2017.
-
Customer Lifetime Value Prediction Using Embeddings
Authors:
Benjamin Paul Chamberlain,
Angelo Cardoso,
C. H. Bryan Liu,
Roberto Pagliari,
Marc Peter Deisenroth
Abstract:
We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of th…
▽ More
We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shop** experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features.
△ Less
Submitted 6 July, 2017; v1 submitted 7 March, 2017;
originally announced March 2017.
-
Probabilistic Inference of Twitter Users' Age based on What They Follow
Authors:
Benjamin Paul Chamberlain,
Clive Humby,
Marc Peter Deisenroth
Abstract:
Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward ag…
▽ More
Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward age detection of Twitter users typically focus on specific properties of tweets, e.g., linguistic features, which are language dependent. In this paper, we devise a language-independent methodology for determining the age of Twitter users from data that is native to the Twitter ecosystem. The key idea is to use a Bayesian framework to generalise ground-truth age information from a few Twitter users to the entire network based on what/whom they follow. Our approach scales to inferring the age of 700 million Twitter accounts with high accuracy.
△ Less
Submitted 24 February, 2017; v1 submitted 18 January, 2016;
originally announced January 2016.
-
Real-Time Community Detection in Large Social Networks on a Laptop
Authors:
Benjamin Paul Chamberlain,
Josh Levy-Kramer,
Clive Humby,
Marc Peter Deisenroth
Abstract:
For a broad range of research, governmental and commercial applications it is important to understand the allegiances, communities and structure of key players in society. One promising direction towards extracting this information is to exploit the rich relational data in digital social networks (the social graph). As social media data sets are very large, most approaches make use of distributed…
▽ More
For a broad range of research, governmental and commercial applications it is important to understand the allegiances, communities and structure of key players in society. One promising direction towards extracting this information is to exploit the rich relational data in digital social networks (the social graph). As social media data sets are very large, most approaches make use of distributed computing systems for this purpose. Distributing graph processing requires solving many difficult engineering problems, which has lead some researchers to look at single-machine solutions that are faster and easier to maintain. In this article, we present a single-machine real-time system for large-scale graph processing that allows analysts to interactively explore graph structures. The key idea is that the aggregate actions of large numbers of users can be compressed into a data structure that encapsulates user similarities while being robust to noise and queryable in real-time. We achieve single machine real-time performance by compressing the neighbourhood of each vertex using minhash signatures and facilitate rapid queries through Locality Sensitive Hashing. These techniques reduce query times from hours using industrial desktop machines operating on the full graph to milliseconds on standard laptops. Our method allows exploration of strongly associated regions (i.e. communities) of large graphs in real-time on a laptop. It has been deployed in software that is actively used by social network analysts and offers another channel for media owners to monetise their data, hel** them to continue to provide free services that are valued by billions of people globally.
△ Less
Submitted 4 September, 2016; v1 submitted 15 January, 2016;
originally announced January 2016.
-
Analysis in HUGIN of Data Conflict
Authors:
Bo Chamberlain,
Finn Verner Jensen,
Frank Jensen,
Torsten Nordahl
Abstract:
After a brief introduction to causal probabilistic networks and the HUGIN approach, the problem of conflicting data is discussed. A measure of conflict is defined, and it is used in the medical diagnostic system MUNIN. Finally, it is discussed how to distinguish between conflicting data and a rare case.
After a brief introduction to causal probabilistic networks and the HUGIN approach, the problem of conflicting data is discussed. A measure of conflict is defined, and it is used in the medical diagnostic system MUNIN. Finally, it is discussed how to distinguish between conflicting data and a rare case.
△ Less
Submitted 27 March, 2013;
originally announced April 2013.
-
Inverse Modeling of Dynamical Systems: Multi-Dimensional Extensions of a Stochastic Switching Problem
Authors:
Erik Bates,
Blake Chamberlain,
Rachel Gettinger
Abstract:
The Buridan's ass paradox is characterized by perpetual indecision between two states, which are never attained. When this problem is formulated as a dynamical system, indecision is modeled by a discrete-state Markov process determined by the system's unknown parameters. Interest lies in estimating these parameters from a limited number of observations. We compare estimation methods and examine ho…
▽ More
The Buridan's ass paradox is characterized by perpetual indecision between two states, which are never attained. When this problem is formulated as a dynamical system, indecision is modeled by a discrete-state Markov process determined by the system's unknown parameters. Interest lies in estimating these parameters from a limited number of observations. We compare estimation methods and examine how well each can be generalized to multi-dimensional extensions of this system. By quantifying statistics such as mean, variance, frequency, and cumulative power, we construct both method of moments type estimators and likelihood-based estimators. We show, however, why these techniques become intractable in higher dimensions, and thus develop a geometric approach to reveal the parameters underlying the Markov process. We also examine the robustness of this method to the presence of noise.
△ Less
Submitted 17 August, 2012;
originally announced August 2012.