-
Naming the Pain in Machine Learning-Enabled Systems Engineering
Authors:
Marcos Kalinowski,
Daniel Mendez,
Görkem Giray,
Antonio Pedro Santos Alves,
Kelly Azevedo,
Tatiana Escovedo,
Hugo Villamizar,
Helio Lopes,
Teresa Baldassarre,
Stefan Wagner,
Stefan Biffl,
Jürgen Musil,
Michael Felderer,
Niklas Lavesson,
Tony Gorschek
Abstract:
Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an internation…
▽ More
Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrap** with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems.
△ Less
Submitted 20 May, 2024;
originally announced June 2024.
-
ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems
Authors:
Eduardo Zimelewicz,
Marcos Kalinowski,
Daniel Mendez,
Görkem Giray,
Antonio Pedro Santos Alves,
Niklas Lavesson,
Kelly Azevedo,
Hugo Villamizar,
Tatiana Escovedo,
Helio Lopes,
Stefan Biffl,
Juergen Musil,
Michael Felderer,
Stefan Wagner,
Teresa Baldassarre,
Tony Gorschek
Abstract:
[Context] Systems incorporating Machine Learning (ML) models, often called ML-enabled systems, have become commonplace. However, empirical evidence on how ML-enabled systems are engineered in practice is still limited, especially for activities surrounding ML model dissemination. [Goal] We investigate contemporary industrial practices and problems related to ML model dissemination, focusing on the…
▽ More
[Context] Systems incorporating Machine Learning (ML) models, often called ML-enabled systems, have become commonplace. However, empirical evidence on how ML-enabled systems are engineered in practice is still limited, especially for activities surrounding ML model dissemination. [Goal] We investigate contemporary industrial practices and problems related to ML model dissemination, focusing on the model deployment and the monitoring of ML life cycle phases. [Method] We conducted an international survey to gather practitioner insights on how ML-enabled systems are engineered. We gathered a total of 188 complete responses from 25 countries. We analyze the status quo and problems reported for the model deployment and monitoring phases. We analyzed contemporary practices using bootstrap** with confidence intervals and conducted qualitative analyses on the reported problems applying open and axial coding procedures. [Results] Practitioners perceive the model deployment and monitoring phases as relevant and difficult. With respect to model deployment, models are typically deployed as separate services, with limited adoption of MLOps principles. Reported problems include difficulties in designing the architecture of the infrastructure for production deployment and legacy application integration. Concerning model monitoring, many models in production are not monitored. The main monitored aspects are inputs, outputs, and decisions. Reported problems involve the absence of monitoring practices, the need to create custom monitoring tools, and the selection of suitable metrics. [Conclusion] Our results help provide a better understanding of the adopted practices and problems in practice and support guiding ML deployment and monitoring research in a problem-driven manner.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Status Quo and Problems of Requirements Engineering for Machine Learning: Results from an International Survey
Authors:
Antonio Pedro Santos Alves,
Marcos Kalinowski,
Görkem Giray,
Daniel Mendez,
Niklas Lavesson,
Kelly Azevedo,
Hugo Villamizar,
Tatiana Escovedo,
Helio Lopes,
Stefan Biffl,
Jürgen Musil,
Michael Felderer,
Stefan Wagner,
Teresa Baldassarre,
Tony Gorschek
Abstract:
Systems that use Machine Learning (ML) have become commonplace for companies that want to improve their products and processes. Literature suggests that Requirements Engineering (RE) can help address many problems when engineering ML-enabled systems. However, the state of empirical evidence on how RE is applied in practice in the context of ML-enabled systems is mainly dominated by isolated case s…
▽ More
Systems that use Machine Learning (ML) have become commonplace for companies that want to improve their products and processes. Literature suggests that Requirements Engineering (RE) can help address many problems when engineering ML-enabled systems. However, the state of empirical evidence on how RE is applied in practice in the context of ML-enabled systems is mainly dominated by isolated case studies with limited generalizability. We conducted an international survey to gather practitioner insights into the status quo and problems of RE in ML-enabled systems. We gathered 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrap** with confidence intervals and qualitative analyses on the reported problems involving open and axial coding procedures. We found significant differences in RE practices within ML projects. For instance, (i) RE-related activities are mostly conducted by project leaders and data scientists, (ii) the prevalent requirements documentation format concerns interactive Notebooks, (iii) the main focus of non-functional requirements includes data quality, model reliability, and model explainability, and (iv) main challenges include managing customer expectations and aligning requirements with data. The qualitative analyses revealed that practitioners face problems related to lack of business domain understanding, unclear goals and requirements, low customer engagement, and communication issues. These results help to provide a better understanding of the adopted practices and of which problems exist in practical environments. We put forward the need to adapt further and disseminate RE-related practices for engineering ML-enabled systems.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Large-scale information retrieval in software engineering -- an experience report from industrial application
Authors:
Michael Unterkalmsteiner,
Tony Gorschek,
Robert Feldt,
Niklas Lavesson
Abstract:
Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective o…
▽ More
Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on develo** valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Green Accelerated Hoeffding Tree
Authors:
Eva Garcia-Martin,
Albert Bifet,
Niklas Lavesson,
Rikard König,
Henrik Linusson
Abstract:
State-of-the-art machine learning solutions mainly focus on creating highly accurate models without constraints on hardware resources. Stream mining algorithms are designed to run on resource-constrained devices, thus a focus on low power and energy and memory-efficient is essential. The Hoeffding tree algorithm is able to create energy-efficient models, but at the cost of less accurate trees in c…
▽ More
State-of-the-art machine learning solutions mainly focus on creating highly accurate models without constraints on hardware resources. Stream mining algorithms are designed to run on resource-constrained devices, thus a focus on low power and energy and memory-efficient is essential. The Hoeffding tree algorithm is able to create energy-efficient models, but at the cost of less accurate trees in comparison to their ensembles counterpart. Ensembles of Hoeffding trees, on the other hand, create a highly accurate forest of trees but consume five times more energy on average. An extension that tried to obtain similar results to ensembles of Hoeffding trees was the Extremely Fast Decision Tree (EFDT). This paper presents the Green Accelerated Hoeffding Tree (GAHT) algorithm, an extension of the EFDT algorithm with a lower energy and memory footprint and the same (or higher for some datasets) accuracy levels. GAHT grows the tree setting individual splitting criteria for each node, based on the distribution of the number of instances over each particular leaf. The results show that GAHT is able to achieve the same competitive accuracy results compared to EFDT and ensembles of Hoeffding trees while reducing the energy consumption up to 70%.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Hoeffding Trees with nmin adaptation
Authors:
Eva García-Martín,
Niklas Lavesson,
Håkan Grahn,
Emiliano Casalicchio,
Veselka Boeva
Abstract:
Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution. We have observed that hav…
▽ More
Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution. We have observed that having fixed parameters lead to unnecessary computations, thus making the algorithm energy inefficient. In this paper we present the nmin adaptation method for Hoeffding trees. This method adapts the value of the nmin parameter, which significantly affects the energy consumption of the algorithm. The method reduces unnecessary computations and memory accesses, thus reducing the energy, while the accuracy is only marginally affected. We experimentally compared VFDT (Very Fast Decision Tree, the first Hoeffding tree algorithm) and CVFDT (Concept-adapting VFDT) with the VFDT-nmin (VFDT with nmin adaptation). The results show that VFDT-nmin consumes up to 27% less energy than the standard VFDT, and up to 92% less energy than CVFDT, trading off a few percent of accuracy in a few datasets.
△ Less
Submitted 3 August, 2018;
originally announced August 2018.
-
Is it ethical to avoid error analysis?
Authors:
Eva García-Martín,
Niklas Lavesson
Abstract:
Machine learning algorithms tend to create more accurate models with the availability of large datasets. In some cases, highly accurate models can hide the presence of bias in the data. There are several studies published that tackle the development of discriminatory-aware machine learning algorithms. We center on the further evaluation of machine learning models by doing error analysis, to unders…
▽ More
Machine learning algorithms tend to create more accurate models with the availability of large datasets. In some cases, highly accurate models can hide the presence of bias in the data. There are several studies published that tackle the development of discriminatory-aware machine learning algorithms. We center on the further evaluation of machine learning models by doing error analysis, to understand under what conditions the model is not working as expected. We focus on the ethical implications of avoiding error analysis, from a falsification of results and discrimination perspective. Finally, we show different ways to approach error analysis in non-interpretable machine learning algorithms such as deep learning.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Extending CKKW-merging to One-Loop Matrix Elements
Authors:
Nils Lavesson,
Leif Lonnblad
Abstract:
We extend earlier schemes for merging tree-level matrix elements with parton showers to include also merging with one-loop matrix elements. In this paper we make a first study on how to include one-loop corrections, not only for events with a given jet multiplicity, but simultaneously for several different jet multiplicities. Results are presented for the simplest non-trivial case of hadronic ev…
▽ More
We extend earlier schemes for merging tree-level matrix elements with parton showers to include also merging with one-loop matrix elements. In this paper we make a first study on how to include one-loop corrections, not only for events with a given jet multiplicity, but simultaneously for several different jet multiplicities. Results are presented for the simplest non-trivial case of hadronic events at LEP as a proof-of-concept.
△ Less
Submitted 18 November, 2008;
originally announced November 2008.
-
Merging parton showers and matrix elements -- back to basics
Authors:
Nils Lavesson,
Leif Lonnblad
Abstract:
We make a thorough comparison between different schemes of merging fixed-order tree-level matrix element generators with parton-shower models. We use the most basic benchmark of the O(alpha_S) correction to e+e- -> jets, where the simple kinematics allows us to study in detail the transition between the matrix-element and parton-shower regions. We find that the CKKW-based schemes give a reasonab…
▽ More
We make a thorough comparison between different schemes of merging fixed-order tree-level matrix element generators with parton-shower models. We use the most basic benchmark of the O(alpha_S) correction to e+e- -> jets, where the simple kinematics allows us to study in detail the transition between the matrix-element and parton-shower regions. We find that the CKKW-based schemes give a reasonably smooth transition between these regions, although problems may occur if the parton shower used is not ordered in transverse momentum. However, the so-called Pseudo-Shower and MLM schemes turn out to have potentially serious problems due to different scale definitions in different regions of phase space, and due to sensitivity to the details in the initial conditions of the parton shower programs used.
△ Less
Submitted 23 April, 2008; v1 submitted 18 December, 2007;
originally announced December 2007.
-
Comparative study of various algorithms for the merging of parton showers and matrix elements in hadronic collisions
Authors:
J. Alwall,
S. Hoeche,
F. Krauss,
N. Lavesson,
L. Lonnblad,
F. Maltoni,
M. L. Mangano,
M. Moretti,
C. G. Papadopoulos,
F. Piccinini,
S. Schumann,
M. Treccani,
J. Winter,
M. Worek
Abstract:
We compare different procedures for combining fixed-order tree-level matrix-element generators with parton showers. We use the case of W-production at the Tevatron and the LHC to compare different implementations of the so-called CKKW and MLM schemes using different matrix-element generators and different parton cascades. We find that although similar results are obtained in all cases, there are…
▽ More
We compare different procedures for combining fixed-order tree-level matrix-element generators with parton showers. We use the case of W-production at the Tevatron and the LHC to compare different implementations of the so-called CKKW and MLM schemes using different matrix-element generators and different parton cascades. We find that although similar results are obtained in all cases, there are important differences.
△ Less
Submitted 16 January, 2008; v1 submitted 18 June, 2007;
originally announced June 2007.
-
A standard format for Les Houches Event Files
Authors:
J. Alwall,
A. Ballestrero,
P. Bartalini,
S. Belov,
E. Boos,
A. Buckley,
J. M. Butterworth,
L. Dudko,
S. Frixione,
L. Garren,
S. Gieseke,
A. Gusev,
I. Hinchliffe,
J. Huston,
B. Kersevan,
F. Krauss,
N. Lavesson,
L. Lönnblad,
E. Maina,
F. Maltoni,
M. L. Mangano,
F. Moortgat,
S. Mrenna,
C. G. Papadopoulos,
R. Pittau
, et al. (10 additional authors not shown)
Abstract:
A standard file format is proposed to store process and event information, primarily output from parton-level event generators for further use by general-purpose ones. The information content is identical with what was already defined by the Les Houches Accord five years ago, but then in terms of Fortran commonblocks. This information is embedded in a minimal XML-style structure, for clarity and…
▽ More
A standard file format is proposed to store process and event information, primarily output from parton-level event generators for further use by general-purpose ones. The information content is identical with what was already defined by the Les Houches Accord five years ago, but then in terms of Fortran commonblocks. This information is embedded in a minimal XML-style structure, for clarity and to simplify parsing.
△ Less
Submitted 3 September, 2006;
originally announced September 2006.
-
Matching Parton Showers and Matrix Elements
Authors:
Stefan Hoeche,
Frank Krauss,
Nils Lavesson,
Leif Lonnblad,
Michelangelo Mangano,
Andreas Schaelicke,
Steffen Schumann
Abstract:
We compare different procedures for combining fixed-order tree-level matrix element generators with parton showers. We use the case of W-production at the Tevatron and the LHC to compare different implementations of the so-called CKKW scheme and one based on the so-called MLM scheme using different matrix element generators and different parton cascades. We find that although similar results are…
▽ More
We compare different procedures for combining fixed-order tree-level matrix element generators with parton showers. We use the case of W-production at the Tevatron and the LHC to compare different implementations of the so-called CKKW scheme and one based on the so-called MLM scheme using different matrix element generators and different parton cascades. We find that although similar results are obtained in all cases, there are important differences.
△ Less
Submitted 3 February, 2006;
originally announced February 2006.
-
HERA and the LHC - A workshop on the implications of HERA for LHC physics: Proceedings - Part B
Authors:
S. Alekhin,
G. Altarelli,
N. Amapane,
J. Andersen,
V. Andreev,
M. Arneodo,
V. Avati,
J. Baines,
R. D. Ball,
A. Banfi,
S. P. Baranov,
J. Bartels,
O. Behnke,
R. Bellan,
J. Blumlein,
H. Bottcher,
S. Bolognesi,
M. Boonekamp,
D. Bourilkov,
J. Bracinik,
A. Bruni,
G. Bruni,
A. Buckley,
A. Bunyatyan,
C. M. Buttar
, et al. (169 additional authors not shown)
Abstract:
The HERA electron--proton collider has collected 100 pb$^{-1}$ of data since its start-up in 1992, and recently moved into a high-luminosity operation mode, with upgraded detectors, aiming to increase the total integrated luminosity per experiment to more than 500 pb$^{-1}$. HERA has been a machine of excellence for the study of QCD and the structure of the proton. The Large Hadron Collider (LHC…
▽ More
The HERA electron--proton collider has collected 100 pb$^{-1}$ of data since its start-up in 1992, and recently moved into a high-luminosity operation mode, with upgraded detectors, aiming to increase the total integrated luminosity per experiment to more than 500 pb$^{-1}$. HERA has been a machine of excellence for the study of QCD and the structure of the proton. The Large Hadron Collider (LHC), which will collide protons with a centre-of-mass energy of 14 TeV, will be completed at CERN in 2007. The main mission of the LHC is to discover and study the mechanisms of electroweak symmetry breaking, possibly via the discovery of the Higgs particle, and search for new physics in the TeV energy scale, such as supersymmetry or extra dimensions. Besides these goals, the LHC will also make a substantial number of precision measurements and will offer a new regime to study the strong force via perturbative QCD processes and diffraction. For the full LHC physics programme a good understanding of QCD phenomena and the structure function of the proton is essential. Therefore, in March 2004, a one-year-long workshop started to study the implications of HERA on LHC physics. This included proposing new measurements to be made at HERA, extracting the maximum information from the available data, and develo**/improving the theoretical and experimental tools. This report summarizes the results achieved during this workshop.
△ Less
Submitted 19 March, 2007; v1 submitted 2 January, 2006;
originally announced January 2006.
-
HERA and the LHC - A workshop on the implications of HERA for LHC physics: Proceedings - Part A
Authors:
S. Alekhin,
G. Altarelli,
N. Amapane,
J. Andersen,
V. Andreev,
M. Arneodo,
V. Avati,
J. Baines,
R. D. Ball,
A. Banfi,
S. P. Baranov,
J. Bartels,
O. Behnke,
R. Bellan,
J. Blumlein,
H. Bottcher,
S. Bolognesi,
M. Boonekamp,
D. Bourilkov,
J. Bracinik,
A. Bruni,
G. Bruni,
A. Buckley,
A. Bunyatyan,
C. M. Buttar
, et al. (169 additional authors not shown)
Abstract:
The HERA electron--proton collider has collected 100 pb$^{-1}$ of data since its start-up in 1992, and recently moved into a high-luminosity operation mode, with upgraded detectors, aiming to increase the total integrated luminosity per experiment to more than 500 pb$^{-1}$. HERA has been a machine of excellence for the study of QCD and the structure of the proton. The Large Hadron Collider (LHC…
▽ More
The HERA electron--proton collider has collected 100 pb$^{-1}$ of data since its start-up in 1992, and recently moved into a high-luminosity operation mode, with upgraded detectors, aiming to increase the total integrated luminosity per experiment to more than 500 pb$^{-1}$. HERA has been a machine of excellence for the study of QCD and the structure of the proton. The Large Hadron Collider (LHC), which will collide protons with a centre-of-mass energy of 14 TeV, will be completed at CERN in 2007. The main mission of the LHC is to discover and study the mechanisms of electroweak symmetry breaking, possibly via the discovery of the Higgs particle, and search for new physics in the TeV energy scale, such as supersymmetry or extra dimensions. Besides these goals, the LHC will also make a substantial number of precision measurements and will offer a new regime to study the strong force via perturbative QCD processes and diffraction. For the full LHC physics programme a good understanding of QCD phenomena and the structure function of the proton is essential. Therefore, in March 2004, a one-year-long workshop started to study the implications of HERA on LHC physics. This included proposing new measurements to be made at HERA, extracting the maximum information from the available data, and develo**/improving the theoretical and experimental tools. This report summarizes the results achieved during this workshop.
△ Less
Submitted 31 January, 2006; v1 submitted 2 January, 2006;
originally announced January 2006.
-
W+jets Matrix Elements and the Dipole Cascade
Authors:
Nils Lavesson,
Leif Lonnblad
Abstract:
We extend the algorithm for matching fixed-order tree-level matrix element generators with the Dipole Cascade Model in Ariadne to apply to processes with incoming hadrons. We test the algoritm on for the process W+n jets at the Tevatron, and find that the results are fairly insensitive to the cutoff used to regularize the soft and collinear divergencies in the tree-level matrix elements. We also…
▽ More
We extend the algorithm for matching fixed-order tree-level matrix element generators with the Dipole Cascade Model in Ariadne to apply to processes with incoming hadrons. We test the algoritm on for the process W+n jets at the Tevatron, and find that the results are fairly insensitive to the cutoff used to regularize the soft and collinear divergencies in the tree-level matrix elements. We also investigate a few observables to check the sensitivity to the matrix element correction.
△ Less
Submitted 5 April, 2005; v1 submitted 31 March, 2005;
originally announced March 2005.