Search | arXiv e-print repository

Explaining Classifiers Trained on Raw Hierarchical Multiple-Instance Data

Authors: Tomáš Pevný, Viliam Lisý, Branislav Bošanský, Petr Somol, Michal Pěchouček

Abstract: Learning from raw data input, thus limiting the need for feature engineering, is a component of many successful applications of machine learning methods in various domains. While many problems naturally translate into a vector representation directly usable in standard classifiers, a number of data sources have the natural form of structured data interchange formats (e.g., security logs in JSON/XM… ▽ More Learning from raw data input, thus limiting the need for feature engineering, is a component of many successful applications of machine learning methods in various domains. While many problems naturally translate into a vector representation directly usable in standard classifiers, a number of data sources have the natural form of structured data interchange formats (e.g., security logs in JSON/XML format). Existing methods, such as in Hierarchical Multiple Instance Learning (HMIL), allow learning from such data in their raw form. However, the explanation of the classifiers trained on raw structured data remains largely unexplored. By treating these models as sub-set selections problems, we demonstrate how interpretable explanations, with favourable properties, can be generated using computationally efficient algorithms. We compare to an explanation technique adopted from graph neural networks showing an order of magnitude speed-up and higher-quality explanations. △ Less

Submitted 4 August, 2022; originally announced August 2022.

arXiv:2110.04776 [pdf, ps, other]

Fitting large mixture models using stochastic component selection

Authors: Milan Papež, Tomáš Pevný, Václav Šmídl

Abstract: Traditional methods for unsupervised learning of finite mixture models require to evaluate the likelihood of all components of the mixture. This becomes computationally prohibitive when the number of components is large, as it is, for example, in the sum-product (transform) networks. Therefore, we propose to apply a combination of the expectation maximization and the Metropolis-Hastings algorithm… ▽ More Traditional methods for unsupervised learning of finite mixture models require to evaluate the likelihood of all components of the mixture. This becomes computationally prohibitive when the number of components is large, as it is, for example, in the sum-product (transform) networks. Therefore, we propose to apply a combination of the expectation maximization and the Metropolis-Hastings algorithm to evaluate only a small number of, stochastically sampled, components, thus substantially reducing the computational cost. The Markov chain of component assignments is sequentially generated across the algorithm's iterations, having a non-stationary target distribution whose parameters vary via a gradient-descent scheme. We put emphasis on generality of our method, equip** it with the ability to train both shallow and deep mixture models which involve complex, and possibly nonlinear, transformations. The performance of our method is illustrated in a variety of synthetic and real-data contexts, considering deep models, such as mixtures of normalizing flows and sum-product (transform) networks. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2105.09107 [pdf, ps, other]

Mill.jl and JsonGrinder.jl: automated differentiable feature extraction for learning from raw JSON data

Authors: Simon Mandlik, Matej Racinsky, Viliam Lisy, Tomas Pevny

Abstract: Learning from raw data input, thus limiting the need for manual feature engineering, is one of the key components of many successful applications of machine learning methods. While machine learning problems are often formulated on data that naturally translate into a vector representation suitable for classifiers, there are data sources, for example in cybersecurity, that are naturally represented… ▽ More Learning from raw data input, thus limiting the need for manual feature engineering, is one of the key components of many successful applications of machine learning methods. While machine learning problems are often formulated on data that naturally translate into a vector representation suitable for classifiers, there are data sources, for example in cybersecurity, that are naturally represented in diverse files with a unifying hierarchical structure, such as XML, JSON, and Protocol Buffers. Converting this data to vector (tensor) representation is generally done by manual feature engineering, which is laborious, lossy, and prone to human bias about the importance of particular features. Mill and JsonGrinder is a tandem of libraries, which fully automates the conversion. Starting with an arbitrary set of JSON samples, they create a differentiable machine learning model capable of infer from further JSON samples in their raw form. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: 5 pages, 2 figures, 1 table, submitted to section on one-source software of Journal of Machine Learning Research

arXiv:2006.01681 [pdf, other]

Neural Power Units

Authors: Niklas Heim, Tomáš Pevný, Václav Šmídl

Abstract: Conventional Neural Networks can approximate simple arithmetic operations, but fail to generalize beyond the range of numbers that were seen during training. Neural Arithmetic Units aim to overcome this difficulty, but current arithmetic units are either limited to operate on positive numbers or can only represent a subset of arithmetic operations. We introduce the Neural Power Unit (NPU) that ope… ▽ More Conventional Neural Networks can approximate simple arithmetic operations, but fail to generalize beyond the range of numbers that were seen during training. Neural Arithmetic Units aim to overcome this difficulty, but current arithmetic units are either limited to operate on positive numbers or can only represent a subset of arithmetic operations. We introduce the Neural Power Unit (NPU) that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer. The NPU thus fixes the shortcomings of existing arithmetic units and extends their expressivity. We achieve this by using complex arithmetic without requiring a conversion of the network to complex numbers. A simplification of the unit to the RealNPU yields a highly transparent model. We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets, and that the RealNPU can discover the governing equations of a dynamical system only from data. △ Less

Submitted 17 December, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

arXiv:2005.01297 [pdf, other]

Sum-Product-Transform Networks: Exploiting Symmetries using Invertible Transformations

Authors: Tomas Pevny, Vasek Smidl, Martin Trapp, Ondrej Polacek, Tomas Oberhuber

Abstract: In this work, we propose Sum-Product-Transform Networks (SPTN), an extension of sum-product networks that uses invertible transformations as additional internal nodes. The type and placement of transformations determine properties of the resulting SPTN with many interesting special cases. Importantly, SPTN with Gaussian leaves and affine transformations pose the same inference task tractable that… ▽ More In this work, we propose Sum-Product-Transform Networks (SPTN), an extension of sum-product networks that uses invertible transformations as additional internal nodes. The type and placement of transformations determine properties of the resulting SPTN with many interesting special cases. Importantly, SPTN with Gaussian leaves and affine transformations pose the same inference task tractable that can be computed efficiently in SPNs. We propose to store affine transformations in their SVD decompositions using an efficient parametrization of unitary matrices by a set of Givens rotations. Last but not least, we demonstrate that G-SPTNs achieve state-of-the-art results on the density estimation task and are competitive with state-of-the-art methods for anomaly detection. △ Less

Submitted 4 May, 2020; originally announced May 2020.

arXiv:2002.10923 [pdf, other]

General Framework for Binary Classification on Top Samples

Authors: Lukáš Adam, Václav Mácha, Václav Šmídl, Tomáš Pevný

Abstract: Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top or hypothesis testing may be written in this form. We propose a general framework to handle these classes of problems and show which known methods (both known and newly proposed) fall into this framework. We provide a theoretical analysis of t… ▽ More Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top or hypothesis testing may be written in this form. We propose a general framework to handle these classes of problems and show which known methods (both known and newly proposed) fall into this framework. We provide a theoretical analysis of this framework and mention selected possible pitfalls the methods may encounter. We suggest several numerical improvements including the implicit derivative and stochastic gradient descent. We provide an extensive numerical study. Based both on the theoretical properties and numerical experiments, we conclude the paper by suggesting which method should be used in which situation. △ Less

Submitted 25 February, 2020; originally announced February 2020.

MSC Class: 90C15; 90C26; 49M05

arXiv:1912.00656 [pdf, other]

Rodent: Relevance determination in differential equations

Authors: Niklas Heim, Václav Šmídl, Tomáš Pevný

Abstract: We aim to identify the generating, ordinary differential equation (ODE) from a set of trajectories of a partially observed system. Our approach does not need prescribed basis functions to learn the ODE model, but only a rich set of Neural Arithmetic Units. For maximal explainability of the learnt model, we minimise the state size of the ODE as well as the number of non-zero parameters that are nee… ▽ More We aim to identify the generating, ordinary differential equation (ODE) from a set of trajectories of a partially observed system. Our approach does not need prescribed basis functions to learn the ODE model, but only a rich set of Neural Arithmetic Units. For maximal explainability of the learnt model, we minimise the state size of the ODE as well as the number of non-zero parameters that are needed to solve the problem. This sparsification is realized through a combination of the Variational Auto-Encoder (VAE) and Automatic Relevance Determination (ARD). We show that it is possible to learn not only one specific model for a single process, but a manifold of models representing harmonic signals as well as a manifold of Lotka-Volterra systems. △ Less

Submitted 12 March, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

arXiv:1911.08756 [pdf, other]

Classification with Costly Features in Hierarchical Deep Sets

Authors: Jaromír Janisch, Tomáš Pevný, Viliam Lisý

Abstract: Classification with Costly Features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features' cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data ofte… ▽ More Classification with Costly Features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features' cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service. △ Less

Submitted 29 February, 2024; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: formerly Hierarchical Multiple-Instance Data Classification with Costly Features; RL4RealLife @ ICML2021; code available at https://github.com/jaromiru/rcwcf

arXiv:1909.02564 [pdf, other]

doi 10.1007/s10994-020-05874-8

Classification with Costly Features as a Sequential Decision-Making Problem

Authors: Jaromír Janisch, Tomáš Pevný, Viliam Lisý

Abstract: This work focuses on a specific classification problem, where the information about a sample is not readily available, but has to be acquired for a cost, and there is a per-sample budget. Inspired by real-world use-cases, we analyze average and hard variations of a directly specified budget. We postulate the problem in its explicit formulation and then convert it into an equivalent MDP, that can b… ▽ More This work focuses on a specific classification problem, where the information about a sample is not readily available, but has to be acquired for a cost, and there is a per-sample budget. Inspired by real-world use-cases, we analyze average and hard variations of a directly specified budget. We postulate the problem in its explicit formulation and then convert it into an equivalent MDP, that can be solved with deep reinforcement learning. Also, we evaluate a real-world inspired setting with sparse training dataset with missing features. The presented method performs robustly well in all settings across several distinct datasets, outperforming other prior-art algorithms. The method is flexible, as showcased with all mentioned modifications and can be improved with any domain independent advancement in RL. △ Less

Submitted 5 September, 2019; originally announced September 2019.

Journal ref: Machine Learning (2020): 1-29

arXiv:1906.09084 [pdf, other]

doi 10.1007/s10994-019-05789-z

Joint Detection of Malicious Domains and Infected Clients

Authors: Paul Prasse, Rene Knaebel, Lukas Machlica, Tomas Pevny, Tobias Scheffer

Abstract: Detection of malware-infected computers and detection of malicious web domains based on their encrypted HTTPS traffic are challenging problems, because only addresses, timestamps, and data volumes are observable. The detection problems are coupled, because infected clients tend to interact with malicious domains. Traffic data can be collected at a large scale, and antivirus tools can be used to id… ▽ More Detection of malware-infected computers and detection of malicious web domains based on their encrypted HTTPS traffic are challenging problems, because only addresses, timestamps, and data volumes are observable. The detection problems are coupled, because infected clients tend to interact with malicious domains. Traffic data can be collected at a large scale, and antivirus tools can be used to identify infected clients in retrospect. Domains, by contrast, have to be labeled individually after forensic analysis. We explore transfer learning based on sluice networks; this allows the detection models to bootstrap each other. In a large-scale experimental study, we find that the model outperforms known reference models and detects previously unknown malware, previously unknown malware families, and previously unknown malicious domains. △ Less

Submitted 21 June, 2019; originally announced June 2019.

Comments: Mach Learn (2019)

arXiv:1906.00764 [pdf, ps, other]

Approximation capability of neural networks on spaces of probability measures and tree-structured domains

Authors: Tomas Pevny, Vojtech Kovarik

Abstract: This paper extends the proof of density of neural networks in the space of continuous (or even measurable) functions on Euclidean spaces to functions on compact sets of probability measures. By doing so the work parallels a more then a decade old results on mean-map embedding of probability measures in reproducing kernel Hilbert spaces. The work has wide practical consequences for multi-instance l… ▽ More This paper extends the proof of density of neural networks in the space of continuous (or even measurable) functions on Euclidean spaces to functions on compact sets of probability measures. By doing so the work parallels a more then a decade old results on mean-map embedding of probability measures in reproducing kernel Hilbert spaces. The work has wide practical consequences for multi-instance learning, where it theoretically justifies some recently proposed constructions. The result is then extended to Cartesian products, yielding universal approximation theorem for tree-structured domains, which naturally occur in data-exchange formats like JSON, XML, YAML, AVRO, and ProtoBuffer. This has important practical implications, as it enables to automatically create an architecture of neural networks for processing structured data (AutoML paradigms), as demonstrated by an accompanied library for JSON format. △ Less

Submitted 3 June, 2019; originally announced June 2019.

arXiv:1905.11890 [pdf, other]

Anomaly scores for generative models

Authors: Václav Šmídl, Jan Bím, Tomáš Pevný

Abstract: Reconstruction error is a prevalent score used to identify anomalous samples when data are modeled by generative models, such as (variational) auto-encoders or generative adversarial networks. This score relies on the assumption that normal samples are located on a manifold and all anomalous samples are located outside. Since the manifold can be learned only where the training data lie, there are… ▽ More Reconstruction error is a prevalent score used to identify anomalous samples when data are modeled by generative models, such as (variational) auto-encoders or generative adversarial networks. This score relies on the assumption that normal samples are located on a manifold and all anomalous samples are located outside. Since the manifold can be learned only where the training data lie, there are no guarantees how the reconstruction error behaves elsewhere and the score, therefore, seems to be ill-defined. This work defines an anomaly score that is theoretically compatible with generative models, and very natural for (variational) auto-encoders as they seem to be prevalent. The new score can be also used to select hyper-parameters and models. Finally, we explain why reconstruction error delivers good experimental results despite weak theoretical justification. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Comments: 9 pages, 3 figures, submitted to NeurIPS 2019

arXiv:1807.05027 [pdf, ps, other]

Are generative deep models for novelty detection truly better?

Authors: Vít Škvára, Tomáš Pevný, Václav Šmídl

Abstract: Many deep models have been recently proposed for anomaly detection. This paper presents comparison of selected generative deep models and classical anomaly detection methods on an extensive number of non--image benchmark datasets. We provide statistical comparison of the selected models, in many configurations, architectures and hyperparamaters. We arrive to conclusion that performance of the gene… ▽ More Many deep models have been recently proposed for anomaly detection. This paper presents comparison of selected generative deep models and classical anomaly detection methods on an extensive number of non--image benchmark datasets. We provide statistical comparison of the selected models, in many configurations, architectures and hyperparamaters. We arrive to conclusion that performance of the generative models is determined by the process of selection of their hyperparameters. Specifically, performance of the deep generative models deteriorates with decreasing amount of anomalous samples used in hyperparameter selection. In practical scenarios of anomaly detection, none of the deep generative models systematically outperforms the kNN. △ Less

Submitted 13 July, 2018; originally announced July 2018.

Comments: 7 pages, ODD v5.0 - KDD 2018 workshop

arXiv:1807.00173 [pdf, other]

Algorithms for solving optimization problems arising from deep neural net models: nonsmooth problems

Authors: Vyacheslav Kungurtsev, Tomas Pevny

Abstract: Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonconvex. This alone presents a challenge to application and development of appropriate optimization algorithms for solving the p… ▽ More Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonconvex. This alone presents a challenge to application and development of appropriate optimization algorithms for solving the problem. However, in addition, there are a number of interesting problems for which the objective function is non- smooth and nonseparable. In this paper, we summarize the primary challenges involved, the state of the art, and present some numerical results on an interesting and representative class of problems. △ Less

Submitted 30 June, 2018; originally announced July 2018.

Report number: Cisco Prague WP5 Report 2016-02

arXiv:1807.00172 [pdf, other]

Algorithms for solving optimization problems arising from deep neural net models: smooth problems

Authors: Vyacheslav Kungurtsev, Tomas Pevny

Abstract: Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonlinear. This presents a challenge to application and development of appropriate optimization algorithms for solving the problem… ▽ More Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonlinear. This presents a challenge to application and development of appropriate optimization algorithms for solving the problem. In this paper, we summarize the primary challenges involved and present the case for a Newton-based method incorporating directions of negative curvature, including promising numerical results on data arising from security anomally deetection. △ Less

Submitted 30 June, 2018; originally announced July 2018.

Report number: Cisco Prague WP5 Project Report 2016-01

arXiv:1711.07364 [pdf, other]

Classification with Costly Features using Deep Reinforcement Learning

Authors: Jaromír Janisch, Tomáš Pevný, Viliam Lisý

Abstract: We study a classification problem where each feature can be acquired for a cost and the goal is to optimize a trade-off between the expected classification error and the feature cost. We revisit a former approach that has framed the problem as a sequential decision-making problem and solved it by Q-learning with a linear approximation, where individual actions are either requests for feature value… ▽ More We study a classification problem where each feature can be acquired for a cost and the goal is to optimize a trade-off between the expected classification error and the feature cost. We revisit a former approach that has framed the problem as a sequential decision-making problem and solved it by Q-learning with a linear approximation, where individual actions are either requests for feature values or terminate the episode by providing a classification decision. On a set of eight problems, we demonstrate that by replacing the linear approximation with neural networks the approach becomes comparable to the state-of-the-art algorithms developed specifically for this problem. The approach is flexible, as it can be improved with any new reinforcement learning enhancement, it allows inclusion of pre-trained high-performance classifier, and unlike prior art, its performance is robust across all evaluated datasets. △ Less

Submitted 12 November, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: AAAI 2019

arXiv:1609.07257 [pdf, other]

Using Neural Network Formalism to Solve Multiple-Instance Problems

Authors: Tomas Pevny, Petr Somol

Abstract: Many objects in the real world are difficult to describe by a single numerical vector of a fixed length, whereas describing them by a set of vectors is more natural. Therefore, Multiple instance learning (MIL) techniques have been constantly gaining on importance throughout last years. MIL formalism represents each object (sample) by a set (bag) of feature vectors (instances) of fixed length where… ▽ More Many objects in the real world are difficult to describe by a single numerical vector of a fixed length, whereas describing them by a set of vectors is more natural. Therefore, Multiple instance learning (MIL) techniques have been constantly gaining on importance throughout last years. MIL formalism represents each object (sample) by a set (bag) of feature vectors (instances) of fixed length where knowledge about objects (e.g., class label) is available on bag level but not necessarily on instance level. Many standard tools including supervised classifiers have been already adapted to MIL setting since the problem got formalized in late nineties. In this work we propose a neural network (NN) based formalism that intuitively bridges the gap between MIL problem definition and the vast existing knowledge-base of standard models and classifiers. We show that the proposed NN formalism is effectively optimizable by a modified back-propagation algorithm and can reveal unknown patterns inside bags. Comparison to eight types of classifiers from the prior art on a set of 14 publicly available benchmark datasets confirms the advantages and accuracy of the proposed solution. △ Less

Submitted 7 March, 2017; v1 submitted 23 September, 2016; originally announced September 2016.

Comments: Accepted to International Symposium on Neural Networks

Showing 1–17 of 17 results for author: Pevny, T