Search | arXiv e-print repository

Online Learning in Betting Markets: Profit versus Prediction

Authors: Haiqing Zhu, Alexander Soen, Yun Kuen Cheung, Lexing Xie

Abstract: We examine two types of binary betting markets, whose primary goal is for profit (such as sports gambling) or to gain information (such as prediction markets). We articulate the interplay between belief and price-setting to analyse both types of markets, and show that the goals of maximising bookmaker profit and eliciting information are fundamentally incompatible. A key insight is that profit hin… ▽ More We examine two types of binary betting markets, whose primary goal is for profit (such as sports gambling) or to gain information (such as prediction markets). We articulate the interplay between belief and price-setting to analyse both types of markets, and show that the goals of maximising bookmaker profit and eliciting information are fundamentally incompatible. A key insight is that profit hinges on the deviation between (the distribution of) bettor and true beliefs, and that heavier tails in bettor belief distribution imply higher profit. Our algorithmic contribution is to introduce online learning methods for price-setting. Traditionally bookmakers update their prices rather infrequently, we present two algorithms that guide price updates upon seeing each bet, assuming very little of bettor belief distributions. The online pricing algorithm achieves stochastic regret of $\mathcal{O}(\sqrt{T})$ against the worst local maximum, or $ \mathcal{O}(\sqrt{T \log T}) $ with high probability against the global maximum under fair odds. More broadly, the inherent trade-off between profit and information-seeking in binary betting may inspire new understandings of large-scale multi-agent behaviour. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2405.18686 [pdf, other]

Rejection via Learning Density Ratios

Authors: Alexander Soen, Hisham Husain, Philip Schulz, Vu Nguyen

Abstract: Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data di… ▽ More Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. This can be formalized via the optimization of a loss's risk with a $ φ$-divergence regularization term. Through this idealized distribution, a rejection decision can be made by utilizing the density ratio between this distribution and the data distribution. We focus on the setting where our $ φ$-divergences are specified by the family of $ α$-divergence. Our framework is tested empirically over clean and noisy datasets. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2402.05379 [pdf, other]

Tradeoffs of Diagonal Fisher Information Matrix Estimators

Authors: Alexander Soen, Ke Sun

Abstract: The Fisher information matrix characterizes the local geometry in the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two such estimators, whose accuracy and sample complexity depend on the… ▽ More The Fisher information matrix characterizes the local geometry in the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two such estimators, whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in regression and classification networks. We navigate trade-offs of both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity with respect to different parameter groups and should not be neglected when estimating the Fisher information. △ Less

Submitted 2 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.04163 [pdf, ps, other]

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

Authors: Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

Abstract: Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration… ▽ More Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration that also encapsulates functions that are not strictly additive but are, more generally, $t$-additive, as in nonextensive statistical mechanics. Notably, this recovers Volterra's product integral as a special case. We then generalize the Fundamental Theorem of calculus using an extension of the (Euclidean) derivative. This, along with a series of more specific Theorems, serves as a basis for results showing how one can specifically design, alter, or change fundamental properties of distortion measures in a simple way, with a special emphasis on geometric- and ML-related properties that are the metricity, hyperbolicity, and encoding. We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vs Euclidean scale. We unveil a new application for which the Poincaré disk model has very appealing features, and our theory comes in handy: \textit{model} embeddings for boosted combinations of decision trees, trained using the log-loss (trees) and logistic loss (combinations). △ Less

Submitted 8 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

ACM Class: I.2.6

arXiv:2302.14346 [pdf, other]

Sampled Transformer for Point Sets

Authors: Shidi Li, Christian Walder, Alexander Soen, Lexing Xie, Miaomiao Liu

Abstract: The sparse transformer can reduce the computational complexity of the self-attention layers to $O(n)$, whilst still being a universal approximator of continuous sequence-to-sequence functions. However, this permutation variant operation is not appropriate for direct application to sets. In this paper, we proposed an $O(n)$ complexity sampled transformer that can process point set elements directly… ▽ More The sparse transformer can reduce the computational complexity of the self-attention layers to $O(n)$, whilst still being a universal approximator of continuous sequence-to-sequence functions. However, this permutation variant operation is not appropriate for direct application to sets. In this paper, we proposed an $O(n)$ complexity sampled transformer that can process point set elements directly without any additional inductive bias. Our sampled transformer introduces random element sampling, which randomly splits point sets into subsets, followed by applying a shared Hamiltonian self-attention mechanism to each subset. The overall attention mechanism can be viewed as a Hamiltonian cycle in the complete attention graph, and the permutation of point set elements is equivalent to randomly sampling Hamiltonian cycles. This mechanism implements a Monte Carlo simulation of the $O(n^2)$ dense attention connections. We show that it is a universal approximator for continuous set-to-set functions. Experimental results on point-clouds show comparable or better accuracy with significantly reduced computational complexity compared to the dense transformer or alternative sparse attention schemes. △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2201.12947 [pdf, other]

Fair Wrap** for Black-box Predictions

Authors: Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

Abstract: We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an $α$-tree, which modifies the prediction. We p… ▽ More We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an $α$-tree, which modifies the prediction. We provide two generic boosting algorithms to learn $α$-trees. We show that our modification has appealing properties in terms of composition of $α$-trees, generalization, interpretability, and KL divergence between modified and original predictions. We exemplify the use of our technique in three fairness notions: conditional value-at-risk, equality of opportunity, and statistical parity; and provide experiments on several readily available datasets. △ Less

Submitted 1 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: Published in Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv:2111.02062 [pdf, other]

Linking Across Data Granularity: Fitting Multivariate Hawkes Processes to Partially Interval-Censored Data

Authors: Pio Calderon, Alexander Soen, Marian-Andrei Rizoiu

Abstract: The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are kn… ▽ More The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partial Mean Behavior Poisson (PMBP) process, a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PMBP process using synthetic and real-world datasets. Firstly, we illustrate that the PMBP process can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PMBP process in predicting YouTube popularity and find that it surpasses state-of-the-art methods. Lastly, we leverage the PMBP process to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PMBP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting. △ Less

Submitted 5 October, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2108.09940 [pdf]

Patterns of ICT usage in disaster in Samoa

Authors: Ioana Chan Mow, Agnes Wong Soon, Elisapeta Maua'i, Ainsley Anesone

Abstract: The study discussed in this paper focuses on ICT use during disasters in Samoa and is a replicate of a study carried out in 2015. The study used a survey to explore how Samoan citizens use technology, act on different types of information, and how the information source or media affects decisions to act during a disaster. Findings revealed that traditional broadcasting were still the most prominen… ▽ More The study discussed in this paper focuses on ICT use during disasters in Samoa and is a replicate of a study carried out in 2015. The study used a survey to explore how Samoan citizens use technology, act on different types of information, and how the information source or media affects decisions to act during a disaster. Findings revealed that traditional broadcasting were still the most prominent, most important, and still predominate in early warning and disaster response. However, there were now increasing usage of mobile and social media in disaster communications. Findings also revealed that people trust official reporters the most as source of information in times of crisis. The intent is that findings from this study can contribute to a people-centred approach to early warning and disaster providing empowerment to affected individuals to act in a timely and appropriate manner to ensure survival in times of disaster. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: In proceedings of the 1st Virtual Conference on Implications of Information and Digital Technologies for Development, 2021

arXiv:2107.04205 [pdf, other]

On the Variance of the Fisher Information for Deep Learning

Authors: Alexander Soen, Ke Sun

Abstract: In the realm of deep learning, the Fisher information matrix (FIM) gives novel insights and useful tools to characterize the loss landscape, perform second-order optimization, and build geometric learning theories. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators… ▽ More In the realm of deep learning, the Fisher information matrix (FIM) gives novel insights and useful tools to characterize the loss landscape, perform second-order optimization, and build geometric learning theories. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM -- both unbiased and consistent. Their estimation quality is naturally gauged by their variance given in closed form. We analyze how the parametric structure of a deep neural network can affect the variance. The meaning of this variance measure and its upper bounds are then discussed in the context of deep learning. △ Less

Submitted 27 October, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: Published in Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

arXiv:2106.00764 [pdf, other]

HisVA: A Visual Analytics System for Studying History

Authors: Dongyun Han, Gorakh Parsad, Hwiyeon Kim, Jaekyom Shim, Oh-Sang Kwon, Kyung A Son, Jooyoung Lee, Isaac Cho, Sungahn Ko

Abstract: Studying history involves many difficult tasks. Examples include searching for proper data in a large event space, understanding stories of historical events by time and space, and finding relationships among events that may not be apparent. Instructors who extensively use well-organized and well-argued materials (e.g., textbooks and online resources) can lead students to a narrow perspective in u… ▽ More Studying history involves many difficult tasks. Examples include searching for proper data in a large event space, understanding stories of historical events by time and space, and finding relationships among events that may not be apparent. Instructors who extensively use well-organized and well-argued materials (e.g., textbooks and online resources) can lead students to a narrow perspective in understanding history and prevent spontaneous investigation of historical events, with the students asking their own questions. In this work, we proposed HisVA, a visual analytics system that allows the efficient exploration of historical events from Wikipedia using three views: event, map, and resource. HisVA provides an effective event exploration space, where users can investigate relationships among historical events by reviewing and linking them in terms of space and time. To evaluate our system, we present two usage scenarios, a user study with a qualitative analysis of user exploration strategies, and %expert feedback with in-class deployment results. △ Less

Submitted 2 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

arXiv:2104.07932 [pdf, other]

Interval-censored Hawkes processes

Authors: Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie

Abstract: Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point proce… ▽ More Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point process log-likelihood) cannot be computed without exact event times. Furthermore, it does not have the independent increments property to use the Poisson likelihood. This work builds a novel point process, a set of tools, and approximations for fitting Hawkes processes within interval-censored data scenarios. First, we define the Mean Behavior Poisson process (MBPP), a novel Poisson process with a direct parameter correspondence to the popular self-exciting Hawkes process. We fit MBPP in the interval-censored setting using an interval-censored Poisson log-likelihood (IC-LL). We use the parameter equivalence to uncover the parameters of the associated Hawkes process. Second, we introduce two novel exogenous functions to distinguish the exogenous from the endogenous events. We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes. Third, we provide several approximation methods to estimate the intensity and compensator function of MBPP when no analytical solution exists. Fourth and finally, we connect the interval-censored loss of MBPP to a broader class of Bregman divergence-based functions. Using the connection, we show that the popularity estimation algorithm Hawkes Intensity Process (HIP) is a particular case of the MBPP. We verify our models through empirical testing on synthetic data and real-world data. △ Less

Submitted 25 November, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

Journal ref: Journal of Machine Learning Research, 23(338):1-84, 2022. https://jmlr.org/papers/v23/21-0917.html

arXiv:2012.00188 [pdf, other]

Fair Densities via Boosting the Sufficient Statistics of Exponential Families

Authors: Alexander Soen, Hisham Husain, Richard Nock

Abstract: We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned d… ▽ More We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data. △ Less

Submitted 15 August, 2023; v1 submitted 30 November, 2020; originally announced December 2020.

Comments: Published in Proceedings of the 40th International Conference on Machine Learning (ICML2023)

arXiv:2007.14082 [pdf, other]

UNIPoint: Universally Approximating Point Processes Intensities

Authors: Alexander Soen, Alexander Mathews, Daniel Grixti-Cheng, Lexing Xie

Abstract: Point processes are a useful mathematical tool for describing events over time, and so there are many recent approaches for representing and learning them. One notable open question is how to precisely describe the flexibility of point process models and whether there exists a general model that can represent all point processes. Our work bridges this gap. Focusing on the widely used event intensi… ▽ More Point processes are a useful mathematical tool for describing events over time, and so there are many recent approaches for representing and learning them. One notable open question is how to precisely describe the flexibility of point process models and whether there exists a general model that can represent all point processes. Our work bridges this gap. Focusing on the widely used event intensity function representation of point processes, we provide a proof that a class of learnable functions can universally approximate any valid intensity function. The proof connects the well known Stone-Weierstrass Theorem for function approximation, the uniform density of non-negative continuous functions using a transfer functions, the formulation of the parameters of a piece-wise continuous functions as a dynamic system, and a recurrent neural network implementation for capturing the dynamics. Using these insights, we design and implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event. Evaluations on synthetic and real world datasets show that this simpler representation performs better than Hawkes process variants and more complex neural network-based approaches. We expect this result will provide a practical basis for selecting and tuning models, as well as furthering theoretical work on representational complexity and learnability. △ Less

Submitted 2 March, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

arXiv:1907.12748 [pdf, other]

Influence Flowers of Academic Entities

Authors: Minjeong Shin, Alexander Soen, Benjamin T. Readshaw, Stephen M. Blackburn, Mitchell Whitelaw, Lexing Xie

Abstract: We present the Influence Flower, a new visual metaphor for the influence profile of academic entities, including people, projects, institutions, conferences, and journals. While many tools quantify influence, we aim to expose the flow of influence between entities. The Influence Flower is an ego-centric graph, with a query entity placed in the centre. The petals are styled to reflect the strength… ▽ More We present the Influence Flower, a new visual metaphor for the influence profile of academic entities, including people, projects, institutions, conferences, and journals. While many tools quantify influence, we aim to expose the flow of influence between entities. The Influence Flower is an ego-centric graph, with a query entity placed in the centre. The petals are styled to reflect the strength of influence to and from other entities of the same or different type. For example, one can break down the incoming and outgoing influences of a research lab by research topics. The Influence Flower uses a recent snapshot of Microsoft Academic Graph, consisting of 212million authors, their 176 million publications, and 1.2 billion citations. An interactive web app, Influence Map, is constructed around this central metaphor for searching and curating visualisations. We also propose a visual comparison method that highlights change in influence patterns over time. We demonstrate through several case studies that the Influence Flower supports data-driven inquiries about the following: researchers' careers over time; paper(s) and projects, including those with delayed recognition; the interdisciplinary profile of a research institution; and the shifting topical trends in conferences. We also use this tool on influence data beyond academic citations, by contrasting the academic and Twitter activities of a researcher. △ Less

Submitted 30 July, 2019; originally announced July 2019.

Comments: VAST 2019

Showing 1–14 of 14 results for author: Soen, A