-
Interpretable machine learning approach for electron antineutrino selection in a large liquid scintillator detector
Authors:
A. Gavrikov,
V. Cerrone,
A. Serafini,
R. Brugnera,
A. Garfagnini,
M. Grassi,
B. Jelmini,
L. Lastrucci,
S. Aiello,
G. Andronico,
V. Antonelli,
A. Barresi,
D. Basilico,
M. Beretta,
A. Bergnoli,
M. Borghesi,
A. Brigatti,
R. Bruno,
A. Budano,
B. Caccianiga,
A. Cammi,
R. Caruso,
D. Chiesa,
C. Clementi,
S. Dusini
, et al. (43 additional authors not shown)
Abstract:
Several neutrino detectors, KamLAND, Daya Bay, Double Chooz, RENO, and the forthcoming large-scale JUNO, rely on liquid scintillator to detect reactor antineutrino interactions. In this context, inverse beta decay represents the golden channel for antineutrino detection, providing a pair of correlated events, thus a strong experimental signature to distinguish the signal from a variety of backgrou…
▽ More
Several neutrino detectors, KamLAND, Daya Bay, Double Chooz, RENO, and the forthcoming large-scale JUNO, rely on liquid scintillator to detect reactor antineutrino interactions. In this context, inverse beta decay represents the golden channel for antineutrino detection, providing a pair of correlated events, thus a strong experimental signature to distinguish the signal from a variety of backgrounds. However, given the low cross-section of antineutrino interactions, the development of a powerful event selection algorithm becomes imperative to achieve effective discrimination between signal and backgrounds. In this study, we introduce a machine learning (ML) model to achieve this goal: a fully connected neural network as a powerful signal-background discriminator for a large liquid scintillator detector. We demonstrate, using the JUNO detector as an example, that, despite the already high efficiency of a cut-based approach, the presented ML model can further improve the overall event selection efficiency. Moreover, it allows for the retention of signal events at the detector edges that would otherwise be rejected because of the overwhelming amount of background events in that region. We also present the first interpretable analysis of the ML approach for event selection in reactor neutrino experiments. This method provides insights into the decision-making process of the model and offers valuable information for improving and updating traditional event selection approaches.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Uncovering large inconsistencies between machine learning derived gridded settlement datasets
Authors:
Vedran Sekara,
Andrea Martini,
Manuel Garcia-Herranz,
Do-Hyung Kim
Abstract:
High-resolution human settlement maps provide detailed delineations of where people live and are vital for scientific and practical purposes, such as rapid disaster response, allocation of humanitarian resources, and international development. The increased availability of high-resolution satellite imagery, combined with powerful techniques from machine learning and artificial intelligence, has sp…
▽ More
High-resolution human settlement maps provide detailed delineations of where people live and are vital for scientific and practical purposes, such as rapid disaster response, allocation of humanitarian resources, and international development. The increased availability of high-resolution satellite imagery, combined with powerful techniques from machine learning and artificial intelligence, has spurred the creation of a wealth of settlement datasets. However, the precise agreement and alignment between these datasets is not known. Here we quantify the overlap of high-resolution settlement map for 42 African countries developed by Google (Open Buildings), Meta (High Resolution Population Maps) and GRID3 (Geo-Referenced Infrastructure and Demographic Data for Development). Across all studied countries we find large disagreement between datasets on how much area is considered settled. We demonstrate that there are considerable geographic and socio-economic factors at play and build a machine learning model to predict for which areas datasets disagree. It it vital to understand the shortcomings of AI derived high-resolution settlement layers as international organizations, governments, and NGOs are already experimenting with incorporating these into programmatic work. As such, we anticipate our work to be a starting point for more critical and detailed analyses of AI derived datasets for humanitarian, planning, policy, and scientific purposes.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Evaluation of drain, a deep-learning approach to rain retrieval from gpm passive microwave radiometer
Authors:
Nicolas Viltard,
Vibolroth Sambath,
Pierre Lepetit,
Audrey Martini,
Laurent Barthès,
Cécile Mallet
Abstract:
Retrieval of rain from Passive Microwave radiometers data has been a challenge ever since the launch of the first Defense Meteorological Satellite Program in the late 70s. Enormous progress has been made since the launch of the Tropical Rainfall Measuring Mission (TRMM) in 1997 but until recently the data were processed pixel-by-pixel or taking a few neighboring pixels into account. Deep learning…
▽ More
Retrieval of rain from Passive Microwave radiometers data has been a challenge ever since the launch of the first Defense Meteorological Satellite Program in the late 70s. Enormous progress has been made since the launch of the Tropical Rainfall Measuring Mission (TRMM) in 1997 but until recently the data were processed pixel-by-pixel or taking a few neighboring pixels into account. Deep learning has obtained remarkable improvement in the computer vision field, and offers a whole new way to tackle the rain retrieval problem. The Global Precipitation Measurement (GPM) Core satellite carries similarly to TRMM, a passive microwave radiometer and a radar that share part of their swath. The brightness temperatures measured in the 37 and 89 GHz channels are used like the RGB components of a regular image while rain rate from Dual Frequency radar provides the surface rain. A U-net is then trained on these data to develop a retrieval algorithm: Deep-learning RAIN (DRAIN). With only four brightness temperatures as an input and no other a priori information, DRAIN is offering similar or slightly better performances than GPROF, the GPM official algorithm, in most situations. These performances are assumed to be due to the fact that DRAIN works on an image basis instead of the classical pixel-by-pixel basis.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Agile Elicitation of Scalability Requirements for Open Systems: A Case Study
Authors:
Gunnar Brataas,
Antonio Martini,
Geir Kjetil Hanssen,
Georg Ræder
Abstract:
Eliciting scalability requirements during agile software development is complicated and poorly described in previous research. This article presents a lightweight artifact for eliciting scalability requirements during agile software development: the ScrumScale model. The ScrumScale model is a simple spreadsheet. The scalability concepts underlying the ScrumScale model are clarified in this design…
▽ More
Eliciting scalability requirements during agile software development is complicated and poorly described in previous research. This article presents a lightweight artifact for eliciting scalability requirements during agile software development: the ScrumScale model. The ScrumScale model is a simple spreadsheet. The scalability concepts underlying the ScrumScale model are clarified in this design science research, which also utilizes coordination theory. This paper describes the open banking case study, where a legacy banking system becomes open. This challenges the scalability of this legacy system. The first step in understanding this challenge is to elicit the new scalability requirements. In the open banking case study, key stakeholders from TietoEVRY spent 55 hours eliciting TietoEVRY's open banking project's scalability requirements. According to TietoEVRY, the ScrumScale model provided a systematic way of producing scalability requirements. For TietoEVRY, the scalability concepts behind the ScrumScale model also offered significant advantages in dialogues with other stakeholders.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
The use of incentives to promote Technical Debt management
Authors:
Terese Besker,
Antonio Martini,
Jan Bosch
Abstract:
When develo** software, it is vitally important to keep the level of technical debt down since it is well established from several studies that technical debt can, e.g., lower the development productivity, decrease the developers' morale, and compromise the overall quality of the software. However, even if researchers and practitioners working in today's software development industry are quite f…
▽ More
When develo** software, it is vitally important to keep the level of technical debt down since it is well established from several studies that technical debt can, e.g., lower the development productivity, decrease the developers' morale, and compromise the overall quality of the software. However, even if researchers and practitioners working in today's software development industry are quite familiar with the concept of technical debt and its related negative consequences, there has been no empirical research focusing specifically on how software managers actively communicate and manage the need to keep the level of technical debt as low as possible.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
Measuring affective states from technical debt: A psychoempirical software engineering experiment
Authors:
Jesper Olsson,
Erik Risfelt,
Terese Besker,
Antonio Martini,
Richard Torkar
Abstract:
Software engineering is a human activity. Despite this, human aspects are under-represented in technical debt research, perhaps because they are challenging to evaluate.
This study's objective was to investigate the relationship between technical debt and affective states (feelings, emotions, and moods) from software practitioners. Forty participants (N = 40) from twelve companies took part in a…
▽ More
Software engineering is a human activity. Despite this, human aspects are under-represented in technical debt research, perhaps because they are challenging to evaluate.
This study's objective was to investigate the relationship between technical debt and affective states (feelings, emotions, and moods) from software practitioners. Forty participants (N = 40) from twelve companies took part in a mixed-methods approach, consisting of a repeated-measures (r = 5) experiment (n = 200), a survey, and semi-structured interviews.
The statistical analysis shows that different design smells (strong indicators of technical debt) negatively or positively impact affective states. From the qualitative data, it is clear that technical debt activates a substantial portion of the emotional spectrum and is psychologically taxing. Further, the practitioners' reactions to technical debt appear to fall in different levels of maturity.
We argue that human aspects in technical debt are important factors to consider, as they may result in, e.g., procrastination, apprehension, and burnout.
△ Less
Submitted 2 May, 2021; v1 submitted 22 September, 2020;
originally announced September 2020.
-
Towards Surgically-Precise Technical Debt Estimation: Early Results and Research Roadmap
Authors:
Valentina Lenarduzzi,
Antonio Martini,
Davide Taibi,
Damian Andrew Tamburri
Abstract:
The concept of technical debt has been explored from many perspectives but its precise estimation is still under heavy empirical and experimental inquiry. We aim to understand whether, by harnessing approximate, data-driven, machine-learning approaches it is possible to improve the current techniques for technical debt estimation, as represented by a top industry quality analysis tool such as Sona…
▽ More
The concept of technical debt has been explored from many perspectives but its precise estimation is still under heavy empirical and experimental inquiry. We aim to understand whether, by harnessing approximate, data-driven, machine-learning approaches it is possible to improve the current techniques for technical debt estimation, as represented by a top industry quality analysis tool such as SonarQube. For the sake of simplicity, we focus on relatively simple regression modelling techniques and apply them to modelling the additional project cost connected to the sub-optimal conditions existing in the projects under study. Our results shows that current techniques can be improved towards a more precise estimation of technical debt and the case study shows promising results towards the identification of more accurate estimation of technical debt.
△ Less
Submitted 2 August, 2019;
originally announced August 2019.
-
Technical Debt Prioritization: State of the Art. A Systematic Literature Review
Authors:
Valentina Lenarduzzi,
Terese Besker,
Davide Taibi,
Antonio Martini,
Francesca Arcelli Fontana
Abstract:
Background. Software companies need to manage and refactor Technical Debt issues. Therefore, it is necessary to understand if and when refactoring Technical Debt should be prioritized with respect to develo** features or fixing bugs.
Objective. The goal of this study is to investigate the existing body of knowledge in software engineering to understand what Technical Debt prioritization approac…
▽ More
Background. Software companies need to manage and refactor Technical Debt issues. Therefore, it is necessary to understand if and when refactoring Technical Debt should be prioritized with respect to develo** features or fixing bugs.
Objective. The goal of this study is to investigate the existing body of knowledge in software engineering to understand what Technical Debt prioritization approaches have been proposed in research and industry. Method. We conducted a Systematic Literature Review among 384 unique papers published until 2018, following a consolidated methodology applied in Software Engineering. We included 38 primary studies. Results. Different approaches have been proposed for Technical Debt prioritization, all having different goals and optimizing on different criteria. The proposed measures capture only a small part of the plethora of factors used to prioritize Technical Debt qualitatively in practice. We report an impact map of such factors. However, there is a lack of empirical and validated set of tools. Conclusion. We observed that technical Debt prioritization research is preliminary and there is no consensus on what are the important factors and how to measure them. Consequently, we cannot consider current research conclusive and in this paper, we outline different directions for necessary future investigations.
△ Less
Submitted 30 January, 2020; v1 submitted 29 April, 2019;
originally announced April 2019.
-
An adaptive stigmergy-based system for evaluating technological indicator dynamics in the context of smart specialization
Authors:
A. L. Alfeo,
F. P. Appio,
M. G. C. A. Cimino,
A. Lazzeri,
A. Martini,
G. Vaglini
Abstract:
Regional innovation is more and more considered an important enabler of welfare. It is no coincidence that the European Commission has started looking at regional peculiarities and dynamics, in order to focus Research and Innovation Strategies for Smart Specialization towards effective investment policies. In this context, this work aims to support policy makers in the analysis of innovation-relev…
▽ More
Regional innovation is more and more considered an important enabler of welfare. It is no coincidence that the European Commission has started looking at regional peculiarities and dynamics, in order to focus Research and Innovation Strategies for Smart Specialization towards effective investment policies. In this context, this work aims to support policy makers in the analysis of innovation-relevant trends. We exploit a European database of the regional patent application to determine the dynamics of a set of technological innovation indicators. For this purpose, we design and develop a software system for assessing unfolding trends in such indicators. In contrast with conventional knowledge-based design, our approach is biologically-inspired and based on self-organization of information. This means that a functional structure, called track, appears and stays spontaneous at runtime when local dynamism in data occurs. A further prototy** of tracks allows a better distinction of the critical phenomena during unfolding events, with a better assessment of the progressing levels. The proposed mechanism works if structural parameters are correctly tuned for the given historical context. Determining such correct parameters is not a simple task since different indicators may have different dynamics. For this purpose, we adopt an adaptation mechanism based on differential evolution. The study includes the problem statement and its characterization in the literature, as well as the proposed solving approach, experimental setting and results.
△ Less
Submitted 2 January, 2019;
originally announced January 2019.
-
Nanoseconds Timing System Based on IEEE 1588 FPGA Implementation
Authors:
D. Pedretti,
M. Bellato,
R. Isocrate,
A. Bergnoli,
R. Brugnera,
D. Corti,
F. Dal Corso,
G. Galet,
A. Garfagnini,
A. Giaz,
I. Lippi,
F. Marini,
G. Andronico,
V. Antonelli,
M. Baldoncini,
E. Bernieri,
A. Brigatti,
A. Budano,
M. Buscemi,
S. Bussino,
R. Caruso,
D. Chiesa,
C. Clementi,
X. F. Ding,
S. Dusini
, et al. (32 additional authors not shown)
Abstract:
Clock synchronization procedures are mandatory in most physical experiments where event fragments are readout by spatially dislocated sensors and must be glued together to reconstruct key parameters (e.g. energy, interaction vertex etc.) of the process under investigation. These distributed data readout topologies rely on an accurate time information available at the frontend, where raw data are a…
▽ More
Clock synchronization procedures are mandatory in most physical experiments where event fragments are readout by spatially dislocated sensors and must be glued together to reconstruct key parameters (e.g. energy, interaction vertex etc.) of the process under investigation. These distributed data readout topologies rely on an accurate time information available at the frontend, where raw data are acquired and tagged with a precise timestamp prior to data buffering and central data collecting. This makes the network complexity and latency, between frontend and backend electronics, negligible within upper bounds imposed by the frontend data buffer capability. The proposed research work describes an FPGA implementation of IEEE 1588 Precision Time Protocol (PTP) that exploits the CERN Timing, Trigger and Control (TTC) system as a multicast messaging physical and data link layer. The hardware implementation extends the clock synchronization to the nanoseconds range, overcoming the typical accuracy limitations inferred by computers Ethernet based Local Area Network (LAN). Establishing a reliable communication between master and timing receiver nodes is essential in a message-based synchronization system. In the backend electronics, the serial data streams synchronization with the global clock domain is guaranteed by an hardware-based finite state machine that scans the bit period using a variable delay chain and finds the optimal sampling point. The validity of the proposed timing system has been proved in point-to-point data links as well as in star topology configurations over standard CAT-5e cables. The results achieved together with weaknesses and possible improvements are hereby detailed.
△ Less
Submitted 19 June, 2018; v1 submitted 4 June, 2018;
originally announced June 2018.
-
Entropic selection of concepts unveils hidden topics in documents corpora
Authors:
Andrea Martini,
Alessio Cardillo,
Paolo De Los Rios
Abstract:
The organization and evolution of science has recently become itself an object of scientific quantitative investigation, thanks to the wealth of information that can be extracted from scientific documents, such as citations between papers and co-authorship between researchers. However, only few studies have focused on the concepts that characterize full documents and that can be extracted and anal…
▽ More
The organization and evolution of science has recently become itself an object of scientific quantitative investigation, thanks to the wealth of information that can be extracted from scientific documents, such as citations between papers and co-authorship between researchers. However, only few studies have focused on the concepts that characterize full documents and that can be extracted and analyzed, revealing the deeper organization of scientific knowledge. Unfortunately, several concepts can be so common across documents that they hinder the emergence of the underlying topical structure of the document corpus, because they give rise to a large amount of spurious and trivial relations among documents. To identify and remove common concepts, we introduce a method to gauge their relevance according to an objective information-theoretic measure related to the statistics of their occurrence across the document corpus. After progressively removing concepts that, according to this metric, can be considered as generic, we find that the topic organization displays a correspondingly more refined structure.
△ Less
Submitted 11 May, 2018; v1 submitted 18 May, 2017;
originally announced May 2017.
-
ScienceWISE: Topic Modeling over Scientific Literature Networks
Authors:
Andrea Martini,
Artem Lutov,
Valerio Gemmetto,
Andrii Magalich,
Alessio Cardillo,
Alex Constantin,
Vasyl Palchykov,
Mourad Khayati,
Philippe Cudré-Mauroux,
Alexey Boyarsky,
Oleg Ruchayskiy,
Diego Garlaschelli,
Paolo De Los Rios,
Karl Aberer
Abstract:
We provide an up-to-date view on the knowledge management system ScienceWISE (SW) and address issues related to the automatic assignment of articles to research topics. So far, SW has been proven to be an effective platform for managing large volumes of technical articles by means of ontological concept-based browsing. However, as the publication of research articles accelerates, the expressivity…
▽ More
We provide an up-to-date view on the knowledge management system ScienceWISE (SW) and address issues related to the automatic assignment of articles to research topics. So far, SW has been proven to be an effective platform for managing large volumes of technical articles by means of ontological concept-based browsing. However, as the publication of research articles accelerates, the expressivity and the richness of the SW ontology turns into a double-edged sword: a more fine-grained characterization of articles is possible, but at the cost of introducing more spurious relations among them. In this context, the challenge of continuously recommending relevant articles to users lies in tackling a network partitioning problem, where nodes represent articles and co-occurring concepts create edges between them. In this paper, we discuss the three research directions we have taken for solving this issue: i) the identification of generic concepts to reinforce inter-article similarities; ii) the adoption of a bipartite network representation to improve scalability; iii) the design of a clustering algorithm to identify concepts for cross-disciplinary articles and obtain fine-grained topics for all articles.
△ Less
Submitted 22 December, 2016;
originally announced December 2016.