Search | arXiv e-print repository

arXiv:2404.17857 [pdf, other]

Bayesian analysis of biomarker levels can predict time of recurrence of prostate cancer with strictly positive apparent Shannon information against an exponential attrition prior

Authors: Roger Sewell, Elisabeth Crowe, Sharokh F. Shariat

Abstract: Shariat et al previously investigated the possibility of predicting from clinical data (including Gleason grade and stage) and preoperative biomarkers, which of any pair of patients would suffer recurrence of prostate cancer first. We wished to establish the extent to which predictions of time of relapse from such a model could be improved upon using Bayesian methods. The same dataset was reanal… ▽ More Shariat et al previously investigated the possibility of predicting from clinical data (including Gleason grade and stage) and preoperative biomarkers, which of any pair of patients would suffer recurrence of prostate cancer first. We wished to establish the extent to which predictions of time of relapse from such a model could be improved upon using Bayesian methods. The same dataset was reanalysed with a Bayesian skew-Student mixture model. Predictions were made of which of any pair of patients would relapse first and of the time of relapse. The benefit of using these biomarkers relative to predictions made without them was measured by the apparent Shannon information, using as prior an exponential attrition model of relapse time independent of input variables. Using half the dataset for training and the other half for testing, predictions of relapse time from the strict Cox model gave $-\infty$ nepers of apparent Shannon information (it predicts that relapse can only occur at times when patients in the training set relapsed). Deliberately smoothed predictions from the Cox model gave -0.001 (-0.131 to +0.120) nepers, while the Bayesian model gave +0.109 (+0.021 to +0.192) nepers (mean, 2.5 to 97.5 centiles), being positive with posterior probability 0.993 and beating the blurred Cox model with posterior probability 0.927. These predictions from the Bayesian model thus outperform those of the Cox model, but the overall yield of predictive information leaves scope for improvement of the range of biomarkers in use. The Bayesian model presented here is the first such model for prostate cancer to consider the variation of relapse hazard with biomarker concentrations to be smooth, as is intuitive. It is also the first to be shown to provide more apparent Shannon information than the Cox model or to be shown to provide positive apparent information relative to an exponential prior. △ Less

Submitted 28 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

Comments: 16 pages, 8 figures; v2 clarifies that Gleason grade and stage were also included in the data vector for each patient

MSC Class: 62P10 (primary) 62N02 (secondary)

arXiv:2404.15764 [pdf, other]

Assessment of the quality of a prediction

Authors: Roger Sewell, Elisabeth Crowe, Sharokh F. Shariat

Abstract: Shannon defined the mutual information between two variables. We illustrate why the true mutual information between a variable and the predictions made by a prediction algorithm is not a suitable measure of prediction quality, but the apparent Shannon mutual information (ASI) is; indeed it is the unique prediction quality measure with either of two very different lists of desirable properties, as… ▽ More Shannon defined the mutual information between two variables. We illustrate why the true mutual information between a variable and the predictions made by a prediction algorithm is not a suitable measure of prediction quality, but the apparent Shannon mutual information (ASI) is; indeed it is the unique prediction quality measure with either of two very different lists of desirable properties, as previously shown by de Finetti and other authors. However, estimating the uncertainty of the ASI is a difficult problem, because of long and non-symmetric heavy tails to the distribution of the individual values of $j(x,y)=\log\frac{Q_y(x)}{P(x)}$ We propose a Bayesian modelling method for the distribution of $j(x,y)$, from the posterior distribution of which the uncertainty in the ASI can be inferred. This method is based on Dirichlet-based mixtures of skew-Student distributions. We illustrate its use on data from a Bayesian model for prediction of the recurrence time of prostate cancer. We believe that this approach is generally appropriate for most problems, where it is infeasible to derive the explicit distribution of the samples of $j(x,y)$, though the precise modelling parameters may need adjustment to suit particular cases. △ Less

Submitted 3 July, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 16 pages, 3 figures; v4 fixes two minus signs and corrects "inverse of the mean of" to "mean of the inverse of" in appendix B.6

MSC Class: 62B10 (Primary) 62F15; 62J20; 62P10 (Secondary)

arXiv:1711.11175 [pdf, ps, other]

Towards Data Quality Assessment in Online Advertising

Authors: Sahin Cem Geyik, Jianqiang Shen, Shahriar Shariat, Ali Dasdan, Santanu Kolay

Abstract: In online advertising, our aim is to match the advertisers with the most relevant users to optimize the campaign performance. In the pursuit of achieving this goal, multiple data sources provided by the advertisers or third-party data providers are utilized to choose the set of users according to the advertisers' targeting criteria. In this paper, we present a framework that can be applied to asse… ▽ More In online advertising, our aim is to match the advertisers with the most relevant users to optimize the campaign performance. In the pursuit of achieving this goal, multiple data sources provided by the advertisers or third-party data providers are utilized to choose the set of users according to the advertisers' targeting criteria. In this paper, we present a framework that can be applied to assess the quality of such data sources in large scale. This framework efficiently evaluates the similarity of a specific data source categorization to that of the ground truth, especially for those cases when the ground truth is accessible only in aggregate, and the user-level information is anonymized or unavailable due to privacy reasons. We propose multiple methodologies within this framework, present some preliminary assessment results, and evaluate how the methodologies compare to each other. We also present two use cases where we can utilize the data quality assessment results: the first use case is targeting specific user categories, and the second one is forecasting the desirable audiences we can reach for an online advertising campaign with pre-set targeting criteria. △ Less

Submitted 29 November, 2017; originally announced November 2017.

Comments: 10 pages, 7 Figures. This work has been presented in the KDD 2016 Workshop on Enterprise Intelligence

Journal ref: KDD 2016 Workshop on Enterprise Intelligence

arXiv:1609.08201 [pdf, other]

doi 10.1007/s10115-015-0898-4

Robust Time-Series Retrieval Using Probabilistic Adaptive Segmental Alignment

Authors: Shahriar Shariat, Vladimir Pavlovic

Abstract: Traditional pairwise sequence alignment is based on matching individual samples from two sequences, under time monotonicity constraints. However, in many application settings matching subsequences (segments) instead of individual samples may bring in additional robustness to noise or local non-causal perturbations. This paper presents an approach to segmental sequence alignment that jointly segmen… ▽ More Traditional pairwise sequence alignment is based on matching individual samples from two sequences, under time monotonicity constraints. However, in many application settings matching subsequences (segments) instead of individual samples may bring in additional robustness to noise or local non-causal perturbations. This paper presents an approach to segmental sequence alignment that jointly segments and aligns two sequences, generalizing the traditional per-sample alignment. To accomplish this task, we introduce a distance metric between segments based on average pairwise distances and then present a modified pair-HMM (PHMM) that incorporates the proposed distance metric to solve the joint segmentation and alignment task. We also propose a relaxation to our model that improves the computational efficiency of the generic segmental PHMM. Our results demonstrate that this new measure of sequence similarity can lead to improved classification performance, while being resilient to noise, on a variety of sequence retrieval problems, from EEG to motion sequence classification. △ Less

Submitted 26 September, 2016; originally announced September 2016.

Journal ref: Knowl Inf Syst (2016) 49: 91. doi:10.1007/s10115-015-0898-4

arXiv:1602.07057 [pdf, other]

Finding Needle in a Million Metrics: Anomaly Detection in a Large-scale Computational Advertising Platform

Authors: Bowen Zhou, Shahriar Shariat

Abstract: Online media offers opportunities to marketers to deliver brand messages to a large audience. Advertising technology platforms enables the advertisers to find the proper group of audiences and deliver ad impressions to them in real time. The recent growth of the real time bidding has posed a significant challenge on monitoring such a complicated system. With so many components we need a reliable s… ▽ More Online media offers opportunities to marketers to deliver brand messages to a large audience. Advertising technology platforms enables the advertisers to find the proper group of audiences and deliver ad impressions to them in real time. The recent growth of the real time bidding has posed a significant challenge on monitoring such a complicated system. With so many components we need a reliable system that detects the possible changes in the system and alerts the engineering team. In this paper we describe the mechanism that we invented for recovering the representative metrics and detecting the change in their behavior. We show that this mechanism is able to detect the possible problems in time by describing some incident cases. △ Less

Submitted 23 February, 2016; originally announced February 2016.

arXiv:1508.07678 [pdf, other]

doi 10.1109/ICDM.2015.32

Online Model Evaluation in a Large-Scale Computational Advertising Platform

Authors: Shahriar Shariat, Burkay Orten, Ali Dasdan

Abstract: Online media provides opportunities for marketers through which they can deliver effective brand messages to a wide range of audiences. Advertising technology platforms enable advertisers to reach their target audience by delivering ad impressions to online users in real time. In order to identify the best marketing message for a user and to purchase impressions at the right price, we rely heavily… ▽ More Online media provides opportunities for marketers through which they can deliver effective brand messages to a wide range of audiences. Advertising technology platforms enable advertisers to reach their target audience by delivering ad impressions to online users in real time. In order to identify the best marketing message for a user and to purchase impressions at the right price, we rely heavily on bid prediction and optimization models. Even though the bid prediction models are well studied in the literature, the equally important subject of model evaluation is usually overlooked. Effective and reliable evaluation of an online bidding model is crucial for making faster model improvements as well as for utilizing the marketing budgets more efficiently. In this paper, we present an experimentation framework for bid prediction models where our focus is on the practical aspects of model evaluation. Specifically, we outline the unique challenges we encounter in our platform due to a variety of factors such as heterogeneous goal definitions, varying budget requirements across different campaigns, high seasonality and the auction-based environment for inventory purchasing. Then, we introduce return on investment (ROI) as a unified model performance (i.e., success) metric and explain its merits over more traditional metrics such as click-through rate (CTR) or conversion rate (CVR). Most importantly, we discuss commonly used evaluation and metric summarization approaches in detail and propose a more accurate method for online evaluation of new experimental models against the baseline. Our meta-analysis-based approach addresses various shortcomings of other methods and yields statistically robust conclusions that allow us to conclude experiments more quickly in a reliable manner. We demonstrate the effectiveness of our evaluation strategy on real campaign data through some experiments. △ Less

Submitted 31 August, 2015; originally announced August 2015.

Comments: Accepted to ICDM2015

Journal ref: ICDM (2015) pp. 369 - 378

Showing 1–6 of 6 results for author: Shariat, S