Deep learning for inferring cause of data anomalies
Authors:
V. Azzolini,
M. Borisyak,
G. Cerminara,
D. Derkach,
G. Franzoni,
F. De Guio,
O. Koval,
M. Pierini,
A. Pol,
F. Ratnikov,
F. Siroky,
A. Ustyuzhanin,
J-R. Vlimant
Abstract:
Daily operation of a large-scale experiment is a resource consuming task, particularly from perspectives of routine data quality monitoring. Typically, data comes from different sub-detectors and the global quality of data depends on the combinatorial performance of each of them. In this paper, the problem of identifying channels in which anomalies occurred is considered. We introduce a generic de…
▽ More
Daily operation of a large-scale experiment is a resource consuming task, particularly from perspectives of routine data quality monitoring. Typically, data comes from different sub-detectors and the global quality of data depends on the combinatorial performance of each of them. In this paper, the problem of identifying channels in which anomalies occurred is considered. We introduce a generic deep learning model and prove that, under reasonable assumptions, the model learns to identify 'channels' which are affected by an anomaly. Such model could be used for data quality manager cross-check and assistance and identifying good channels in anomalous data samples. The main novelty of the method is that the model does not require ground truth labels for each channel, only global flag is used. This effectively distinguishes the model from classical classification methods. Being applied to CMS data collected in the year 2010, this approach proves its ability to decompose anomaly by separate channels.
△ Less
Submitted 19 November, 2017;
originally announced November 2017.
On Multiple Hypothesis Testing with Rejection Option
Authors:
Naira Grigoryan,
Ashot Harutyunyan,
Svyatoslav Voloshynovskiy,
Oleksiy Koval
Abstract:
We study the problem of multiple hypothesis testing (HT) in view of a rejection option. That model of HT has many different applications. Errors in testing of M hypotheses regarding the source distribution with an option of rejecting all those hypotheses are considered. The source is discrete and arbitrarily varying (AVS). The tradeoffs among error probability exponents/reliabilities associated wi…
▽ More
We study the problem of multiple hypothesis testing (HT) in view of a rejection option. That model of HT has many different applications. Errors in testing of M hypotheses regarding the source distribution with an option of rejecting all those hypotheses are considered. The source is discrete and arbitrarily varying (AVS). The tradeoffs among error probability exponents/reliabilities associated with false acceptance of rejection decision and false rejection of true distribution are investigated and the optimal decision strategies are outlined. The main result is specialized for discrete memoryless sources (DMS) and studied further. An interesting insight that the analysis implies is the phenomenon (comprehensible in terms of supervised/unsupervised learning) that in optimal discrimination within M hypothetical distributions one permits always lower error than in deciding to decline the set of hypotheses. Geometric interpretations of the optimal decision schemes are given for the current and known bounds in multi-HT for AVS's.
△ Less
Submitted 25 May, 2011; v1 submitted 17 February, 2011;
originally announced February 2011.