-
Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems
Authors:
Blaž Škrlj,
Nir Ki-Tov,
Lee Edelist,
Natalia Silberstein,
Hila Weisman-Zohar,
Blaž Mramor,
Davorin Kopič,
Naama Ziporin
Abstract:
Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysi…
▽ More
Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysis, drift detection and insights into problematic production events. Integrating state-of-the-art online feature ranking for sparse data and anomaly detection ideas, Drifter is highly scalable and resource-efficient, requiring only two threads and less than a gigabyte of RAM per production deployments that handle millions of instances per minute. Evaluation on real-world data sets demonstrates Drifter's effectiveness in alerting and mitigating data quality issues, substantially improving reliability and performance of real-time live recommender systems.
△ Less
Submitted 20 September, 2023; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Unleash the Power of Context: Enhancing Large-Scale Recommender Systems with Context-Based Prediction Models
Authors:
Jan Hartman,
Assaf Klein,
Davorin Kopič,
Natalia Silberstein
Abstract:
In this work, we introduce the notion of Context-Based Prediction Models. A Context-Based Prediction Model determines the probability of a user's action (such as a click or a conversion) solely by relying on user and contextual features, without considering any specific features of the item itself. We have identified numerous valuable applications for this modeling approach, including training an…
▽ More
In this work, we introduce the notion of Context-Based Prediction Models. A Context-Based Prediction Model determines the probability of a user's action (such as a click or a conversion) solely by relying on user and contextual features, without considering any specific features of the item itself. We have identified numerous valuable applications for this modeling approach, including training an auxiliary context-based model to estimate click probability and incorporating its prediction as a feature in CTR prediction models. Our experiments indicate that this enhancement brings significant improvements in offline and online business metrics while having minimal impact on the cost of serving. Overall, our work offers a simple and scalable, yet powerful approach for enhancing the performance of large-scale commercial recommender systems, with broad implications for the field of personalized recommendations.
△ Less
Submitted 25 July, 2023;
originally announced August 2023.
-
Dynamic Surrogate Switching: Sample-Efficient Search for Factorization Machine Configurations in Online Recommendations
Authors:
Blaž Škrlj,
Adi Schwartz,
Jure Ferlež,
Davorin Kopič,
Naama Ziporin
Abstract:
Hyperparameter optimization is the process of identifying the appropriate hyperparameter configuration of a given machine learning model with regard to a given learning task. For smaller data sets, an exhaustive search is possible; However, when the data size and model complexity increase, the number of configuration evaluations becomes the main computational bottleneck. A promising paradigm for t…
▽ More
Hyperparameter optimization is the process of identifying the appropriate hyperparameter configuration of a given machine learning model with regard to a given learning task. For smaller data sets, an exhaustive search is possible; However, when the data size and model complexity increase, the number of configuration evaluations becomes the main computational bottleneck. A promising paradigm for tackling this type of problem is surrogate-based optimization. The main idea underlying this paradigm considers an incrementally updated model of the relation between the hyperparameter space and the output (target) space; the data for this model are obtained by evaluating the main learning engine, which is, for example, a factorization machine-based model. By learning to approximate the hyperparameter-target relation, the surrogate (machine learning) model can be used to score large amounts of hyperparameter configurations, exploring parts of the configuration space beyond the reach of direct machine learning engine evaluation. Commonly, a surrogate is selected prior to optimization initialization and remains the same during the search. We investigated whether dynamic switching of surrogates during the optimization itself is a sensible idea of practical relevance for selecting the most appropriate factorization machine-based models for large-scale online recommendation. We conducted benchmarks on data sets containing hundreds of millions of instances against established baselines such as Random Forest- and Gaussian process-based surrogates. The results indicate that surrogate switching can offer good performance while considering fewer learning engine evaluations.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Feature embedding in click-through rate prediction
Authors:
Samo Pahor,
Davorin Kopič,
Jure Demšar
Abstract:
We tackle the challenge of feature embedding for the purposes of improving the click-through rate prediction process. We select three models: logistic regression, factorization machines and deep factorization machines, as our baselines and propose five different feature embedding modules: embedding scaling, FM embedding, embedding encoding, NN embedding and the embedding reweighting module. The em…
▽ More
We tackle the challenge of feature embedding for the purposes of improving the click-through rate prediction process. We select three models: logistic regression, factorization machines and deep factorization machines, as our baselines and propose five different feature embedding modules: embedding scaling, FM embedding, embedding encoding, NN embedding and the embedding reweighting module. The embedding modules act as a way to improve baseline model feature embeddings and are trained alongside the rest of the model parameters in an end-to-end manner. Each module is individually added to a baseline model to obtain a new augmented model. We test the predictive performance of our augmented models on a publicly accessible dataset used for benchmarking click-through rate prediction models. Our results show that several proposed embedding modules provide an important increase in predictive performance without a drastic increase in training time.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Exploration with Model Uncertainty at Extreme Scale in Real-Time Bidding
Authors:
Jan Hartman,
Davorin Kopič
Abstract:
In this work, we present a scalable and efficient system for exploring the supply landscape in real-time bidding. The system directs exploration based on the predictive uncertainty of models used for click-through rate prediction and works in a high-throughput, low-latency environment. Through online A/B testing, we demonstrate that exploration with model uncertainty has a positive impact on model…
▽ More
In this work, we present a scalable and efficient system for exploring the supply landscape in real-time bidding. The system directs exploration based on the predictive uncertainty of models used for click-through rate prediction and works in a high-throughput, low-latency environment. Through online A/B testing, we demonstrate that exploration with model uncertainty has a positive impact on model performance and business KPIs.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Scaling TensorFlow to 300 million predictions per second
Authors:
Jan Hartman,
Davorin Kopič
Abstract:
We present the process of transitioning machine learning models to the TensorFlow framework at a large scale in an online advertising ecosystem. In this talk we address the key challenges we faced and describe how we successfully tackled them; notably, implementing the models in TF and serving them efficiently with low latency using various optimization techniques.
We present the process of transitioning machine learning models to the TensorFlow framework at a large scale in an online advertising ecosystem. In this talk we address the key challenges we faced and describe how we successfully tackled them; notably, implementing the models in TF and serving them efficiently with low latency using various optimization techniques.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
High repetition rate Time-Resolved VUV ARPES at 10.8 eV photon energy
Authors:
Simone Peli,
Denny Puntel,
Damir Kopic,
Benjamin Sockol,
Fulvio Parmigiani,
Federico Cilento
Abstract:
The quest for map** the femtosecond dynamics of the electronic band structure of complex materials via Time- and Angle-Resolved Photoelectron Spectroscopy (TR-ARPES) over their full First Brillouin Zone is pushing the development of schemes to efficiently generate ultrashort photon pulses in the VUV-range of photon energies. At present, the critical aspect is to combine a high photon energy with…
▽ More
The quest for map** the femtosecond dynamics of the electronic band structure of complex materials via Time- and Angle-Resolved Photoelectron Spectroscopy (TR-ARPES) over their full First Brillouin Zone is pushing the development of schemes to efficiently generate ultrashort photon pulses in the VUV-range of photon energies. At present, the critical aspect is to combine a high photon energy with high photoemission count rates and a small pulse-bandwidth, necessary to achieve high energy resolution in ARPES, while preserving a good time resolution and mitigating space-charge effects. Here we describe a novel approach to produce light pulses at 10.8 eV, combining high repetition rate operation (1-4 MHz), high energy resolution ($\sim26$ meV) and space-charge free operation, with a time-resolution of $\sim$700 fs. These results have been achieved by generating the 9th harmonic of a Yb fiber laser, through a phase-matched process of third harmonic generation in Xenon of the laser third harmonic. The full up-conversion process is driven by a seed pulse energy as low as 10 $μ$J, hence is easily scalable to multi-MHz operation. This source opens the way to TR-ARPES experiments for the investigation of the electron dynamics over the full first Brillouin zone of most complex materials, with unprecedented energy and momentum resolutions and high count rates. The performances of our setup are tested in a number of experiments on WTe$_2$ and Bi$_2$Se$_3$, of which we measure the electronic band structure in energy, two-dimensional momentum and time.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Space Charge Free Ultrafast Photoelectron Spectroscopy on Solids by a Narrowband Tunable Extreme Ultraviolet Light Source
Authors:
Riccardo Cucini,
Tommaso Pincelli,
Giancarlo Panaccione,
Damir Kopic,
Fabio Frassetto,
Paolo Miotti,
Gian Marco Pierantozzi,
Simone Peli,
Andrea Fondacaro,
Aleksander De Luisa,
Alessandro De Vita,
Pietro Carrara,
Damjan Krizmancic,
Daniel T. Payne,
Federico Salvador,
Andrea Sterzi,
Luca Poletto,
Fulvio Parmigiani,
Giorgio Rossi,
Federico Cilento
Abstract:
Here we report on a novel High Harmonic Generation (HHG) light source designed for space charge free ultrafast photoelectron spectroscopy (PES) on solids. The ultimate overall energy resolution achieved on a polycrystalline Au sample is ~22 meV at 40 K. These results have been obtained at a photon energy of 16.9 eV with a pulse bandwidth of ~19 meV, by varying, up to 200 kHz, the photon pulses rep…
▽ More
Here we report on a novel High Harmonic Generation (HHG) light source designed for space charge free ultrafast photoelectron spectroscopy (PES) on solids. The ultimate overall energy resolution achieved on a polycrystalline Au sample is ~22 meV at 40 K. These results have been obtained at a photon energy of 16.9 eV with a pulse bandwidth of ~19 meV, by varying, up to 200 kHz, the photon pulses repetition rate and the photon fluence on the sample. These features set a new benchmark for tunable narrowband HHG sources. By comparing the PES energy resolution and the photon pulse bandwidth with a pulse duration of ~105 fs, as retrieved from time-resolved (TR) angle resolved (AR) PES experiments on Bi$_2$Se$_3$, we validate a way for a space charge free photoelectric process close to Fourier transform limit conditions for ultrafast TR-PES experiments on solids.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Photoinduced nematic state in FeSe$_{0.4}$Te$_{0.6}$
Authors:
Laura Fanfarillo,
Damir Kopić,
Andrea Sterzi,
Giulia Manzoni,
Alberto Crepaldi,
Vladimir Tsurkan,
Dorina Croitori,
Joachim Deisenhofer,
Fulvio Parmigiani,
Massimo Capone,
Federico Cilento
Abstract:
FeSe$_{x}$Te$_{1-x}$ compounds present a complex phase diagram, ranging from the nematicity of FeSe to the $(π, π)$ magnetism of FeTe. We focus on FeSe$_{0.4}$Te$_{0.6}$, where the nematic ordering is absent at equilibrium. We use a time-resolved approach based on femtosecond light pulses to study the dynamics following photoexcitation in this system. The use of polarization-dependent time- and an…
▽ More
FeSe$_{x}$Te$_{1-x}$ compounds present a complex phase diagram, ranging from the nematicity of FeSe to the $(π, π)$ magnetism of FeTe. We focus on FeSe$_{0.4}$Te$_{0.6}$, where the nematic ordering is absent at equilibrium. We use a time-resolved approach based on femtosecond light pulses to study the dynamics following photoexcitation in this system. The use of polarization-dependent time- and angle-resolved photoelectron spectroscopy allows us to reveal a photoinduced nematic metastable state, whose stabilization cannot be interpreted in terms of an effective photodo**. We argue that the 1.55 eV photon-energy-pump-pulse perturbs the $C_4$ symmetry of the system triggering the realization of the nematic state. The possibility to induce nematicity using an ultra-short pulse sheds a new light on the driving force behind the nematic symmetry breaking in iron-based superconductors. Our results weaken the idea that a low-energy coupling with fluctuations is a necessary condition to stabilize the nematic order and ascribe the origin of the nematic order in iron-based superconductors to a clear tendency of those systems towards orbital differentiation due to strong electronic correlations induced by the Hund's coupling.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Electronic properties of type-II Weyl semimetal WTe$_2$. A review perspective
Authors:
P. K. Das,
D. Di Sante,
F. Cilento,
C. Bigi,
D. Kopic,
D. Soranzio,
A. Sterzi,
J. A. Krieger,
I. Vobornik,
J. Fujii,
T. Okuda,
V. N. Strocov,
M. B. H. Breese,
F. Parmigiani,
G. Rossi,
S. Picozzi,
R. Thomale,
G. Sangiovanni,
R. J. Cava,
G. Panaccione
Abstract:
Currently, there is a flurry of research interest on materials with an unconventional electronic structure, and we have already seen significant progress in their understanding and engineering towards real-life applications. The interest erupted with the discovery of graphene and topological insulators in the previous decade. The electrons in graphene simulate massless Dirac Fermions with a linear…
▽ More
Currently, there is a flurry of research interest on materials with an unconventional electronic structure, and we have already seen significant progress in their understanding and engineering towards real-life applications. The interest erupted with the discovery of graphene and topological insulators in the previous decade. The electrons in graphene simulate massless Dirac Fermions with a linearly dispersing Dirac cone in their band structure, while in topological insulators, the electronic bands wind non-trivially in momentum space giving rise to gapless surface states and bulk bandgap. Weyl semimetals in condensed matter systems are the latest addition to this growing family of topological materials. Weyl Fermions are known in the context of high energy physics since almost the beginning of quantum mechanics. They apparently violate charge conservation rules, displaying the "chiral anomaly", with such remarkable properties recently theoretically predicted and experimentally verified to exist as low energy quasiparticle states in certain condensed matter systems. Not only are these new materials extremely important for our fundamental understanding of quantum phenomena, but also they exhibit completely different transport phenomena. For example, massless Fermions are susceptible to scattering from non-magnetic impurities. Dirac semimetals exhibit non-saturating extremely large magnetoresistance as a consequence of their robust electronic bands being protected by time reversal symmetry. These open up whole new possibilities for materials engineering and applications including quantum computing. In this review, we recapitulate some of the outstanding properties of WTe$_2$, namely, its non-saturating titanic magnetoresistance due to perfect electron and hole carrier balance up to a very high magnetic field observed for the very first time. (Continued. Please see the main article).
△ Less
Submitted 18 December, 2018;
originally announced December 2018.