Search | arXiv e-print repository

doi 10.1109/BigData59044.2023.10386162

Improving conversion rate prediction via self-supervised pre-training in online advertising

Authors: Alex Shtoff, Yohay Kaplan, Ariel Raviv

Abstract: The task of predicting conversion rates (CVR) lies at the heart of online advertising systems aiming to optimize bids to meet advertiser performance requirements. Even with the recent rise of deep neural networks, these predictions are often made by factorization machines (FM), especially in commercial settings where inference latency is key. These models are trained using the logistic regression… ▽ More The task of predicting conversion rates (CVR) lies at the heart of online advertising systems aiming to optimize bids to meet advertiser performance requirements. Even with the recent rise of deep neural networks, these predictions are often made by factorization machines (FM), especially in commercial settings where inference latency is key. These models are trained using the logistic regression framework on labeled tabular data formed from past user activity that is relevant to the task at hand. Many advertisers only care about click-attributed conversions. A major challenge in training models that predict conversions-given-clicks comes from data sparsity - clicks are rare, conversions attributed to clicks are even rarer. However, mitigating sparsity by adding conversions that are not click-attributed to the training set impairs model calibration. Since calibration is critical to achieving advertiser goals, this is infeasible. In this work we use the well-known idea of self-supervised pre-training, and use an auxiliary auto-encoder model trained on all conversion events, both click-attributed and not, as a feature extractor to enrich the main CVR prediction model. Since the main model does not train on non click-attributed conversions, this does not impair calibration. We adapt the basic self-supervised pre-training idea to our online advertising setup by using a loss function designed for tabular data, facilitating continual learning by ensuring auto-encoder stability, and incorporating a neural network into a large-scale real-time ad auction that ranks tens of thousands of ads, under strict latency constraints, and without incurring a major engineering cost. We show improvements both offline, during training, and in an online A/B test. Following its success in A/B tests, our solution is now fully deployed to the Yahoo native advertising system. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.07160 [pdf, other]

Audience Prospecting for Dynamic-Product-Ads in Native Advertising

Authors: Eliran Abutbul, Yohay Kaplan, Naama Krasne, Oren Somekh, Or David, Omer Duvdevany, Evgeny Segal

Abstract: With yearly revenue exceeding one billion USD, Yahoo Gemini native advertising marketplace serves more than two billion impressions daily to hundreds of millions of unique users. One of the fastest growing segments of Gemini native is dynamic-product-ads (DPA), where major advertisers, such as Amazon and Walmart, provide catalogs with millions of products for the system to choose from and present… ▽ More With yearly revenue exceeding one billion USD, Yahoo Gemini native advertising marketplace serves more than two billion impressions daily to hundreds of millions of unique users. One of the fastest growing segments of Gemini native is dynamic-product-ads (DPA), where major advertisers, such as Amazon and Walmart, provide catalogs with millions of products for the system to choose from and present to users. The subject of this work is finding and expanding the right audience for each DPA ad, which is one of the many challenges DPA presents. Approaches such as targeting various user groups, e.g., users who already visited the advertisers' websites (Retargeting), users that searched for certain products (Search-Prospecting), or users that reside in preferred locations (Location-Prospecting), have limited audience expansion capabilities. In this work we present two new approaches for audience expansion that also maintain predefined performance goals. The Conversion-Prospecting approach predicts DPA conversion rates based on Gemini native logged data, and calculates the expected cost-per-action (CPA) for determining users' eligibility to products and optimizing DPA bids in Gemini native auctions. To support new advertisers and products, the Trending-Prospecting approach matches trending products to users by learning their tendency towards products from advertisers' sites logged events. The tendency scores indicate the popularity of the product and the similarity of the user to those who have previously engaged with this product. The two new prospecting approaches were tested online, serving real Gemini native traffic, demonstrating impressive DPA delivery and DPA revenue lifts while maintaining most traffic within the acceptable CPA range (i.e., performance goal). After a successful testing phase, the proposed approaches are currently in production and serve all Gemini native traffic. △ Less

Submitted 13 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: In Proc. IeeeBigData'2023 (Industry and Government Program)

arXiv:2312.05052 [pdf, other]

doi 10.1145/3357384.3357801

Soft Frequency Cap** for Improved Ad Click Prediction in Yahoo Gemini Native

Authors: Michal Aharon, Yohay Kaplan, Rina Levy, Oren Somekh, Ayelet Blanc, Neetai Eshel, Avi Shahar, Assaf Singer, Alex Zlotnik

Abstract: Yahoo's native advertising (also known as Gemini native) serves billions of ad impressions daily, reaching a yearly run-rate of many hundred of millions USD. Driving the Gemini native models that are used to predict both click probability (pCTR) and conversion probability (pCONV) is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. \offset is a one-pass alg… ▽ More Yahoo's native advertising (also known as Gemini native) serves billions of ad impressions daily, reaching a yearly run-rate of many hundred of millions USD. Driving the Gemini native models that are used to predict both click probability (pCTR) and conversion probability (pCONV) is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. \offset is a one-pass algorithm that updates its model for every new batch of logged data using a stochastic gradient descent (SGD) based approach. Since OFFSET represents its users by their features (i.e., user-less model) due to sparsity issues, rule based hard frequency cap** (HFC) is used to control the number of times a certain user views a certain ad. Moreover, related statistics reveal that user ad fatigue results in a dramatic drop in click through rate (CTR). Therefore, to improve click prediction accuracy, we propose a soft frequency cap** (SFC) approach, where the frequency feature is incorporated into the OFFSET model as a user-ad feature and its weight vector is learned via logistic regression as part of OFFSET training. Online evaluation of the soft frequency cap** algorithm via bucket testing showed a significant 7.3% revenue lift. Since then, the frequency feature enhanced model has been pushed to production serving all traffic, and is generating a hefty revenue lift for Yahoo Gemini native. We also report related statistics that reveal, among other things, that while users' gender does not affect ad fatigue, the latter seems to increase with users' age. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: In Proc. CIKM'2019. arXiv admin note: text overlap with arXiv:2111.07866 by other authors

arXiv:2312.05017 [pdf, other]

doi 10.1145/3459637.3481958

Unbiased Filtering Of Accidental Clicks in Verizon Media Native Advertising

Authors: Yohay Kaplan, Naama Krasne, Alex Shtoff, Oren Somekh

Abstract: Verizon Media (VZM) native advertising is one of VZM largest and fastest growing businesses, reaching a run-rate of several hundred million USDs in the past year. Driving the VZM native models that are used to predict event probabilities, such as click and conversion probabilities, is OFFSET - a feature enhanced collaborative-filtering based event-prediction algorithm. In this work we focus on the… ▽ More Verizon Media (VZM) native advertising is one of VZM largest and fastest growing businesses, reaching a run-rate of several hundred million USDs in the past year. Driving the VZM native models that are used to predict event probabilities, such as click and conversion probabilities, is OFFSET - a feature enhanced collaborative-filtering based event-prediction algorithm. In this work we focus on the challenge of predicting click-through rates (CTR) when we are aware that some of the clicks have short dwell-time and are defined as accidental clicks. An accidental click implies little affinity between the user and the ad, so predicting that similar users will click on the ad is inaccurate. Therefore, it may be beneficial to remove clicks with dwell-time lower than a predefined threshold from the training set. However, we cannot ignore these positive events, as filtering these will cause the model to under predict. Previous approaches have tried to apply filtering and then adding corrective biases to the CTR predictions, but did not yield revenue lifts and therefore were not adopted. In this work, we present a new approach where the positive weight of the accidental clicks is distributed among all of the negative events (skips), based on their likelihood of causing accidental clicks, as predicted by an auxiliary model. These likelihoods are taken as the correct labels of the negative events, shifting our training from using only binary labels and adopting a binary cross-entropy loss function in our training process. After showing offline performance improvements, the modified model was tested online serving VZM native users, and provided 1.18% revenue lift over the production model which is agnostic to accidental clicks. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: In Proc. CIKM'2021

arXiv:2211.11524 [pdf, other]

Conversion-Based Dynamic-Creative-Optimization in Native Advertising

Authors: Yohay Kaplan, Yair Koren, Alex Shtoff, Tomer Shadi, Oren Somekh

Abstract: Yahoo Gemini native advertising marketplace serves billions of impressions daily, to hundreds millions of unique users, and reaches a yearly revenue of many hundreds of millions USDs. Powering Gemini native models for predicting advertise (ad) event probabilities, such as conversions and clicks, is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. The predi… ▽ More Yahoo Gemini native advertising marketplace serves billions of impressions daily, to hundreds millions of unique users, and reaches a yearly revenue of many hundreds of millions USDs. Powering Gemini native models for predicting advertise (ad) event probabilities, such as conversions and clicks, is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. The predicted probabilities are then used in Gemini native auctions to determine which ads to present for every serving event (impression). Dynamic creative optimization (DCO) is a recent Gemini native product that was launched two years ago and is increasingly gaining more attention from advertisers. The DCO product enables advertisers to issue several assets per each native ad attribute, creating multiple combinations for each DCO ad. Since different combinations may appeal to different crowds, it may be beneficial to present certain combinations more frequently than others to maximize revenue while kee** advertisers and users satisfied. The initial DCO offer was to optimize click-through rates (CTR), however as the marketplace shifts more towards conversion based campaigns, advertisers also ask for a {conversion based solution. To accommodate this request, we present a post-auction solution, where DCO ads combinations are favored according to their predicted conversion rate (CVR). The predictions are provided by an auxiliary OFFSET based combination CVR prediction model, and used to generate the combination distributions for DCO ad rendering during serving time. An online evaluation of this explore-exploit solution, via online bucket A/B testing, serving Gemini native DCO traffic, showed a 53.5% CVR lift, when compared to a control bucket serving all combinations uniformly at random. △ Less

Submitted 13 November, 2022; originally announced November 2022.

Comments: Accepted to IEEE Big Data 2022 conference

arXiv:2202.10310 [pdf, other]

Slow relaxation and aging in the model of randomly connected cycles network

Authors: S. Reich, S. Maoz, Y. Kaplan, H. Rappeport, N. Q. Balaban, O. Agam

Abstract: We propose a statistical model of a large random network with high connectivity in order to describe the behavior of {\it E.\,coli} cells after exposure to acute stress. The building blocks of this network are feedback cycles typical of the genetic and metabolic networks of a cell. Each node on the cycles is a spin degree of freedom representing a component in the cell's network that can be in one… ▽ More We propose a statistical model of a large random network with high connectivity in order to describe the behavior of {\it E.\,coli} cells after exposure to acute stress. The building blocks of this network are feedback cycles typical of the genetic and metabolic networks of a cell. Each node on the cycles is a spin degree of freedom representing a component in the cell's network that can be in one of two states - active or inactive. The cycles are interconnected by regulation or by the exchange of metabolites. Stress is realized by an external magnetic field that drives the nodes into an inactive state, and the time the magnetization passes zero value for the first time represents the first division event of the cell after the stress period. The numerical and analytical solutions for this first passage problem reproduce the aging dynamics observed in the experimental data. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:1510.00189 [pdf, other]

doi 10.1039/C5SM02415C

Oscillatory elastic instabilities in an extensional viscoelastic flow

Authors: Atul Varshney, Eldad Afik, Yoav Kaplan, Victor Steinberg

Abstract: Dilute polymer solutions are known to exhibit purely elastic instabilities even when the fluid inertia is negligible. Here we report the quantitative evidence of two consecutive oscillatory elastic instabilities in an elongation flow of a dilute polymer solution as realized in a T-junction geometry with a long recirculating cavity. The main result reported here is the observation and characterizat… ▽ More Dilute polymer solutions are known to exhibit purely elastic instabilities even when the fluid inertia is negligible. Here we report the quantitative evidence of two consecutive oscillatory elastic instabilities in an elongation flow of a dilute polymer solution as realized in a T-junction geometry with a long recirculating cavity. The main result reported here is the observation and characterization of the first transition as a forward Hopf bifurcation resulted in a uniformly oscillating state due to breaking of time translational invariance. This unexpected finding is in contrast with previous experiments and numerical simulations performed in similar ranges of the $Wi$ and $Re$ numbers, where the forward fork-bifurcation into a steady asymmetric flow due to the broken spatial inversion symmetry was reported. We discuss the plausible discrepancy between our findings and previous studies that could be attributed to the long recirculating cavity, where the length of the recirculating cavity plays a crucial role in the breaking of time translational invariance instead of the spatial inversion. The second transition is manifested via time aperiodic transverse fluctuations of the interface between the dyed and undyed fluid streams at the channel junction and advected downstream by the mean flow. Both instabilities are characterized by fluid discharge-rate and simultaneous imaging of the interface between the dyed and undyed fluid streams in the outflow channel. △ Less

Submitted 1 October, 2015; originally announced October 2015.

Comments: 6 pages, 7 figures

Journal ref: Soft Matter 12, 2186 (2016)

arXiv:1107.1884 [pdf, other]

doi 10.1016/j.physa.2011.09.032

Analysis of cross-correlations in electroencephalogram signals as an approach to proactive diagnosis of schizophrenia

Authors: Serge F. Timashev, Oleg Yu. Panischev, Yuriy S. Polyakov, Sergey A. Demin, Alexander Ya. Kaplan

Abstract: We apply flicker-noise spectroscopy (FNS), a time series analysis method operating on structure functions and power spectrum estimates, to study the clinical electroencephalogram (EEG) signals recorded in children/adolescents (11 to 14 years of age) with diagnosed schizophrenia-spectrum symptoms at the National Center for Psychiatric Health (NCPH) of the Russian Academy of Medical Sciences. The EE… ▽ More We apply flicker-noise spectroscopy (FNS), a time series analysis method operating on structure functions and power spectrum estimates, to study the clinical electroencephalogram (EEG) signals recorded in children/adolescents (11 to 14 years of age) with diagnosed schizophrenia-spectrum symptoms at the National Center for Psychiatric Health (NCPH) of the Russian Academy of Medical Sciences. The EEG signals for these subjects were compared with the signals for a control sample of chronically depressed children/adolescents. The purpose of the study is to look for diagnostic signs of subjects' susceptibility to schizophrenia in the FNS parameters for specific electrodes and cross-correlations between the signals simultaneously measured at different points on the scalp. Our analysis of EEG signals from scalp-mounted electrodes at locations F3 and F4, which are symmetrically positioned in the left and right frontal areas of cerebral cortex, respectively, demonstrates an essential role of frequency-phase synchronization, a phenomenon representing specific correlations between the characteristic frequencies and phases of excitations in the brain. We introduce quantitative measures of frequency-phase synchronization and systematize the values of FNS parameters for the EEG data. The comparison of our results with the medical diagnoses for 84 subjects performed at NCPH makes it possible to group the EEG signals into 4 categories corresponding to different risk levels of subjects' susceptibility to schizophrenia. We suggest that the introduced quantitative characteristics and classification of cross-correlations may be used for the diagnosis of schizophrenia at the early stages of its development. △ Less

Submitted 27 June, 2012; v1 submitted 10 July, 2011; originally announced July 2011.

Comments: 36 pages, 6 figures, 2 tables; to be published in "Physica A"

Journal ref: Physica A, 2012, Vol. 391, No. 4, pp. 1179-1194

arXiv:1105.1039 [pdf, ps, other]

doi 10.1103/PhysRevE.84.011147

Velocity fluctuations of population fronts propagating into metastable states

Authors: Baruch Meerson, Pavel V. Sasorov, Yitzhak Kaplan

Abstract: The position of propagating population fronts fluctuates because of the discreteness of the individuals and stochastic character of processes of birth, death and migration. Here we consider a Markov model of a population front propagating into a metastable state, and focus on the weak noise limit. For typical, small fluctuations the front motion is diffusive, and we calculate the front diffusion c… ▽ More The position of propagating population fronts fluctuates because of the discreteness of the individuals and stochastic character of processes of birth, death and migration. Here we consider a Markov model of a population front propagating into a metastable state, and focus on the weak noise limit. For typical, small fluctuations the front motion is diffusive, and we calculate the front diffusion coefficient. We also determine the probability distribution of rare, large fluctuations of the front position and, for a given average front velocity, find the most likely population density profile of the front. Implications of the theory for population extinction risk are briefly considered. △ Less

Submitted 29 August, 2011; v1 submitted 5 May, 2011; originally announced May 2011.

Comments: 8 pages, 3 figures

Journal ref: Phys. Rev. E 84, 011147 (2011)

arXiv:1103.4236 [pdf, other]

doi 10.1088/1751-8113/44/28/282001

Fermi Edge Resonances in Non-equilibrium States of Fermi Gases

Authors: E. Bettelheim, Y. Kaplan, P. Wiegmann

Abstract: We formulate the problem of the Fermi Edge Singularity in non-equilibrium states of a Fermi gas as a matrix Riemann-Hilbert problem with an integrable kernel. This formulation is the most suitable for studying the singular behavior at each edge of non-equilibrium Fermi states by means of the method of steepest descent, and also reveals the integrable structure of the problem. We supplement this re… ▽ More We formulate the problem of the Fermi Edge Singularity in non-equilibrium states of a Fermi gas as a matrix Riemann-Hilbert problem with an integrable kernel. This formulation is the most suitable for studying the singular behavior at each edge of non-equilibrium Fermi states by means of the method of steepest descent, and also reveals the integrable structure of the problem. We supplement this result by extending the familiar approach to the problem of the Fermi Edge Singularity via the bosonic representation of the electronic operators to non-equilibrium settings. It provides a compact way to extract the leading asymptotes. △ Less

Submitted 27 May, 2011; v1 submitted 22 March, 2011; originally announced March 2011.

Comments: Accepted for publication, J. Phys. A

Journal ref: J. Phys. A: Math. Theor. (2011) 44: 28. 282001

arXiv:1011.1993 [pdf, other]

doi 10.1103/PhysRevLett.106.166804

Gradient Catastrophe and Fermi Edge Resonances in Fermi Gas

Authors: Eldad Bettelheim, Yitzhak Kaplan, Paul B. Wiegmann

Abstract: A smooth spatial disturbance of the Fermi surface in a Fermi gas inevitably becomes sharp. This phenomenon, called {\it the gradient catastrophe}, causes the breakdown of a Fermi sea to disconnected parts with multiple Fermi points. We study how the gradient catastrophe effects probing the Fermi system via a Fermi edge singularity measurement. We show that the gradient catastrophe transforms the s… ▽ More A smooth spatial disturbance of the Fermi surface in a Fermi gas inevitably becomes sharp. This phenomenon, called {\it the gradient catastrophe}, causes the breakdown of a Fermi sea to disconnected parts with multiple Fermi points. We study how the gradient catastrophe effects probing the Fermi system via a Fermi edge singularity measurement. We show that the gradient catastrophe transforms the single-peaked Fermi-edge singularity of the tunneling (or absorption) spectrum to a set of multiple asymmetric singular resonances. Also we gave a mathematical formulation of FES as a matrix Riemann-Hilbert problem. △ Less

Submitted 22 March, 2011; v1 submitted 9 November, 2010; originally announced November 2010.

Journal ref: Phys. Rev. Lett. 106, 166804 (2011)

Showing 1–11 of 11 results for author: Kaplan, Y