Search | arXiv e-print repository

doi 10.1145/3596907

Federated Learning for Computationally-Constrained Heterogeneous Devices: A Survey

Authors: Kilian Pfeiffer, Martin Rapp, Ramin Khalili, Jörg Henkel

Abstract: With an increasing number of smart devices like internet of things (IoT) devices deployed in the field, offloadingtraining of neural networks (NNs) to a central server becomes more and more infeasible. Recent efforts toimprove users' privacy have led to on-device learning emerging as an alternative. However, a model trainedonly on a single device, using only local data, is unlikely to reach a high… ▽ More With an increasing number of smart devices like internet of things (IoT) devices deployed in the field, offloadingtraining of neural networks (NNs) to a central server becomes more and more infeasible. Recent efforts toimprove users' privacy have led to on-device learning emerging as an alternative. However, a model trainedonly on a single device, using only local data, is unlikely to reach a high accuracy. Federated learning (FL)has been introduced as a solution, offering a privacy-preserving trade-off between communication overheadand model accuracy by sharing knowledge between devices but disclosing the devices' private data. Theapplicability and the benefit of applying baseline FL are, however, limited in many relevant use cases dueto the heterogeneity present in such environments. In this survey, we outline the heterogeneity challengesFL has to overcome to be widely applicable in real-world applications. We especially focus on the aspect ofcomputation heterogeneity among the participating devices and provide a comprehensive overview of recentworks on heterogeneity-aware FL. We discuss two groups: works that adapt the NN architecture and worksthat approach heterogeneity on a system level, covering Federated Averaging (FedAvg), distillation, and splitlearning-based approaches, as well as synchronous and asynchronous aggregation schemes. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Journal ref: ACM Comput. Surv. 55, 14s, Article 334, 2023

arXiv:2302.00985 [pdf, other]

Speed-Oblivious Online Scheduling: Knowing (Precise) Speeds is not Necessary

Authors: Alexander Lindermayr, Nicole Megow, Martin Rapp

Abstract: We consider online scheduling on unrelated (heterogeneous) machines in a speed-oblivious setting, where an algorithm is unaware of the exact job-dependent processing speeds. We show strong impossibility results for clairvoyant and non-clairvoyant algorithms and overcome them in models inspired by practical settings: (i) we provide competitive learning-augmented algorithms, assuming that (possibly… ▽ More We consider online scheduling on unrelated (heterogeneous) machines in a speed-oblivious setting, where an algorithm is unaware of the exact job-dependent processing speeds. We show strong impossibility results for clairvoyant and non-clairvoyant algorithms and overcome them in models inspired by practical settings: (i) we provide competitive learning-augmented algorithms, assuming that (possibly erroneous) predictions on the speeds are given, and (ii) we provide competitive algorithms for the speed-ordered model, where a single global order of machines according to their unknown job-dependent speeds is known. We prove strong theoretical guarantees and evaluate our findings on a representative heterogeneous multi-core processor. These seem to be the first empirical results for scheduling algorithms with predictions that are evaluated in a non-synthetic hardware environment. △ Less

Submitted 30 May, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: To appear at ICML 2023

arXiv:2206.05459 [pdf, other]

NPU-Accelerated Imitation Learning for Thermal Optimization of QoS-Constrained Heterogeneous Multi-Cores

Authors: Martin Rapp, Heba Khdr, Nikita Krohmer, Jörg Henkel

Abstract: Application migration and dynamic voltage and frequency scaling (DVFS) are indispensable means for fully exploiting the available potential in thermal optimization of a heterogeneous clustered multi-core processor under user-defined quality of service (QoS) targets. However, selecting the core to execute each application and the voltage/frequency (V/f) levels of each cluster is a complex problem b… ▽ More Application migration and dynamic voltage and frequency scaling (DVFS) are indispensable means for fully exploiting the available potential in thermal optimization of a heterogeneous clustered multi-core processor under user-defined quality of service (QoS) targets. However, selecting the core to execute each application and the voltage/frequency (V/f) levels of each cluster is a complex problem because 1) the diverse characteristics and QoS targets of applications require different optimizations, and 2) per-cluster DVFS requires a global optimization considering all running applications. State-of-the-art resource management techniques for power or temperature minimization either rely on measurements that are often not available (such as power) or fail to consider all the dimensions of the problem (e.g., by using simplified analytical models). Imitation learning (IL) enables to use the optimality of an oracle policy, yet at low run-time overhead, by training a model from oracle demonstrations. We are the first to employ IL for temperature minimization under QoS targets. We tackle the complexity by training a neural network (NN) and accelerate the NN inference using a neural processing unit (NPU). While such NN accelerators are becoming increasingly widespread on end devices, they are so far only used to accelerate user applications. In contrast, we use an existing accelerator on a real platform to accelerate NN-based resource management. Our evaluation on a HiKey 970 board with an Arm big.LITTLE CPU and an NPU shows significant temperature reductions at a negligible run-time overhead, with unseen applications and different cooling than used for training. △ Less

Submitted 11 June, 2022; originally announced June 2022.

arXiv:2203.09277 [pdf, other]

doi 10.1145/3491204.3527482

Beauty and the beast: A case study on performance prototy** of data-intensive containerized cloud applications

Authors: Floriment Klinaku, Martina Rapp, Jörg Henss, Stephan Rhode

Abstract: Data-intensive container-based cloud applications have become popular with the increased use cases in the Internet of Things domain. Challenges arise when engineering such applications to meet quality requirements, both classical ones like performance and emerging ones like elasticity and resilience. There is a lack of reference use cases, applications, and experiences when prototy** such applic… ▽ More Data-intensive container-based cloud applications have become popular with the increased use cases in the Internet of Things domain. Challenges arise when engineering such applications to meet quality requirements, both classical ones like performance and emerging ones like elasticity and resilience. There is a lack of reference use cases, applications, and experiences when prototy** such applications that could benefit the research community. Moreover, it is hard to generate realistic and reliable workloads that exercise the resources according to a specification. Hence, designing reference applications that would exhibit similar performance behavior in such environments is hard. In this paper, we present a work in progress towards a reference use case and application for data-intensive containerized cloud applications having an industrial motivation. Moreover, to generate reliable CPU workloads we make use of ProtoCom, a well-known library for the generation of resource demands, and report the performance under various quality requirements in a Kubernetes cluster of moderate size. Finally, we present the scalability of the current solution assuming a particular autoscaling policy. Results of the calibration show high variability of the ProtoCom library when executed in a cloud environment. We observe a moderate association between the occupancy of node and the relative variability of execution time. △ Less

Submitted 17 March, 2022; originally announced March 2022.

arXiv:2203.05468 [pdf, other]

CoCoFL: Communication- and Computation-Aware Federated Learning via Partial NN Freezing and Quantization

Authors: Kilian Pfeiffer, Martin Rapp, Ramin Khalili, Jörg Henkel

Abstract: Devices participating in federated learning (FL) typically have heterogeneous communication, computation, and memory resources. However, in synchronous FL, all devices need to finish training by the same deadline dictated by the server. Our results show that training a smaller subset of the neural network (NN) at constrained devices, i.e., drop** neurons/filters as proposed by state of the art,… ▽ More Devices participating in federated learning (FL) typically have heterogeneous communication, computation, and memory resources. However, in synchronous FL, all devices need to finish training by the same deadline dictated by the server. Our results show that training a smaller subset of the neural network (NN) at constrained devices, i.e., drop** neurons/filters as proposed by state of the art, is inefficient, preventing these devices to make an effective contribution to the model. This causes unfairness w.r.t the achievable accuracies of constrained devices, especially in cases with a skewed distribution of class labels across devices. We present a novel FL technique, CoCoFL, which maintains the full NN structure on all devices. To adapt to the devices' heterogeneous resources, CoCoFL freezes and quantizes selected layers, reducing communication, computation, and memory requirements, whereas other layers are still trained in full precision, enabling to reach a high accuracy. Thereby, CoCoFL efficiently utilizes the available resources on devices and allows constrained devices to make a significant contribution to the FL system, increasing fairness among participants (accuracy parity) and significantly improving the final accuracy of the model. △ Less

Submitted 28 June, 2023; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: Published at TMLR

Journal ref: Transactions on Machine Learning Research, 06/2023

arXiv:2112.08761 [pdf, other]

DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems

Authors: Martin Rapp, Ramin Khalili, Kilian Pfeiffer, Jörg Henkel

Abstract: We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is ach… ▽ More We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is achieved with a dropout mechanism that dynamically adjusts the computational complexity of training an NN by randomly drop** filters of convolutional layers of the model. Our main contribution is the introduction of a design space exploration (DSE) technique, which finds Pareto-optimal per-layer dropout vectors with respect to resource requirements and convergence speed of the training. Applying this technique, each device is able to dynamically select the dropout vector that fits its available resource without requiring any assistance from the server. We implement our solution in a federated learning (FL) system, where the availability of computational resources varies both between devices and over time, and show through extensive evaluation that we are able to significantly increase the convergence speed over the state of the art without compromising on the final accuracy. △ Less

Submitted 4 April, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: published in AAAI Conference on Artificial Intelligence (AAAI'22)

arXiv:2110.09208 [pdf, other]

Correlation-based Discovery of Disease Patterns for Syndromic Surveillance

Authors: Michael Rapp, Moritz Kulessa, Eneldo Loza Mencía, Johannes Fürnkranz

Abstract: Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely di… ▽ More Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely disclosure of outbreaks. However, the definition of these disease patterns is often challenging, as early symptoms are usually shared among many diseases and a particular disease can have several clinical pictures in the early phase of an infection. To support epidemiologists in the process of defining reliable disease patterns, we present a novel, data-driven approach to discover such patterns in historic data. The key idea is to take into account the correlation between indicators in a health-related data source and the reported number of infections in the respective geographic region. In an experimental evaluation, we use data from several emergency departments to discover disease patterns for three infectious diseases. Our results suggest that the proposed approach is able to find patterns that correlate with the reported infections and often identifies indicators that are related to the respective diseases. △ Less

Submitted 18 October, 2021; originally announced October 2021.

arXiv:2109.12405 [pdf, other]

CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems

Authors: Lokesh Siddhu, Rajesh Kedia, Shailja Pandey, Martin Rapp, Anuj Pathania, Jörg Henkel, Preeti Ranjan Panda

Abstract: Processing cores and the accompanying main memory working in tandem enable the modern processors. Dissipating heat produced from computation, memory access remains a significant problem for processors. Therefore, processor thermal management continues to be an active research topic. Most thermal management research takes place using simulations, given the challenges of measuring temperature in rea… ▽ More Processing cores and the accompanying main memory working in tandem enable the modern processors. Dissipating heat produced from computation, memory access remains a significant problem for processors. Therefore, processor thermal management continues to be an active research topic. Most thermal management research takes place using simulations, given the challenges of measuring temperature in real processors. Since core and memory are fabricated on separate packages in most existing processors, with the memory having lower power densities, thermal management research in processors has primarily focused on the cores. Memory bandwidth limitations associated with 2D processors lead to high-density 2.5D and 3D packaging technology. 2.5D packaging places cores and memory on the same package. 3D packaging technology takes it further by stacking layers of memory on the top of cores themselves. Such packagings significantly increase the power density, making processors prone to heating. Therefore, mitigating thermal issues in high-density processors (packaged with stacked memory) becomes an even more pressing problem. However, given the lack of thermal modeling for memories in existing interval thermal simulation toolchains, they are unsuitable for studying thermal management for high-density processors. To address this issue, we present CoMeT, the first integrated Core and Memory interval Thermal simulation toolchain. CoMeT comprehensively supports thermal simulation of high- and low-density processors corresponding to four different core-memory configurations - off-chip DDR memory, off-chip 3D memory, 2.5D, and 3D. CoMeT supports several novel features that facilitate overlying system research. Compared to an equivalent state-of-the-art core-only toolchain, CoMeT adds only a ~5% simulation-time overhead. The source code of CoMeT has been made open for public use under the MIT license. △ Less

Submitted 16 March, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

Comments: https://github.com/marg-tools/CoMeT

arXiv:2106.11690 [pdf, other]

Gradient-based Label Binning in Multi-label Classification

Authors: Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, Eyke Hüllermeier

Abstract: In multi-label classification, where a single example may be associated with several class labels at the same time, the ability to model dependencies between labels is considered crucial to effectively optimize non-decomposable evaluation measures, such as the Subset 0/1 loss. The gradient boosting framework provides a well-studied foundation for learning models that are specifically tailored to s… ▽ More In multi-label classification, where a single example may be associated with several class labels at the same time, the ability to model dependencies between labels is considered crucial to effectively optimize non-decomposable evaluation measures, such as the Subset 0/1 loss. The gradient boosting framework provides a well-studied foundation for learning models that are specifically tailored to such a loss function and recent research attests the ability to achieve high predictive accuracy in the multi-label setting. The utilization of second-order derivatives, as used by many recent boosting approaches, helps to guide the minimization of non-decomposable losses, due to the information about pairs of labels it incorporates into the optimization process. On the downside, this comes with high computational costs, even if the number of labels is small. In this work, we address the computational bottleneck of such approach -- the need to solve a system of linear equations -- by integrating a novel approximation technique into the boosting procedure. Based on the derivatives computed during training, we dynamically group the labels into a predefined number of bins to impose an upper bound on the dimensionality of the linear system. Our experiments, using an existing rule-based algorithm, suggest that this may boost the speed of training, without any significant loss in predictive performance. △ Less

Submitted 22 June, 2021; originally announced June 2021.

arXiv:2012.04377 [pdf, ps, other]

Learning Structured Declarative Rule Sets -- A Challenge for Deep Discrete Learning

Authors: Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, Michael Rapp

Abstract: Arguably the key reason for the success of deep neural networks is their ability to autonomously form non-linear combinations of the input features, which can be used in subsequent layers of the network. The analogon to this capability in inductive rule learning is to learn a structured rule base, where the inputs are combined to learn new auxiliary concepts, which can then be used as inputs by su… ▽ More Arguably the key reason for the success of deep neural networks is their ability to autonomously form non-linear combinations of the input features, which can be used in subsequent layers of the network. The analogon to this capability in inductive rule learning is to learn a structured rule base, where the inputs are combined to learn new auxiliary concepts, which can then be used as inputs by subsequent rules. Yet, research on rule learning algorithms that have such capabilities is still in their infancy, which is - we would argue - one of the key impediments to substantial progress in this field. In this position paper, we want to draw attention to this unsolved problem, with a particular focus on previous work in predicate invention and multi-label rule learning △ Less

Submitted 8 December, 2020; originally announced December 2020.

Journal ref: 2nd Workshop on Deep Continuous-Discrete Machine Learning (DeCoDeML), ECML-PKDD 2020, Ghent, Belgium

arXiv:2011.07909 [pdf, other]

doi 10.1038/s41378-021-00253-2

Integrated impedance sensing of liquid sample plug flow enables automated high throughput NMR spectroscopy

Authors: Omar Nassar, Mazin Jouda, Michael Rapp, Dario Mager, Jan G. Korvink, Neil MacKinnon

Abstract: A novel approach for automated high throughput NMR spectroscopy with improved mass-sensitivity is accomplished by integrating microfluidic technologies and micro-NMR resonators. A flow system is utilized to transport a sample of interest from outside the NMR magnet through the NMR detector, circumventing the relatively vast dead volume in the supplying tube by loading a series of individual sample… ▽ More A novel approach for automated high throughput NMR spectroscopy with improved mass-sensitivity is accomplished by integrating microfluidic technologies and micro-NMR resonators. A flow system is utilized to transport a sample of interest from outside the NMR magnet through the NMR detector, circumventing the relatively vast dead volume in the supplying tube by loading a series of individual sample plugs separated by an immiscible fluid. This dual-phase flow demands a real-time robust sensing system to track the sample position and velocities and synchronize the NMR acquisition. In this contribution, we describe an NMR probe head that possesses a microfluidic system featuring: i) a micro saddle coil for NMR spectroscopy and ii) a pair of interdigitated capacitive sensors flanking the NMR detector for continuous position and velocity monitoring of the plugs with respect to the NMR detector. The system was successfully tested for automating flow-based measurement in a 500 MHz NMR system, enabling high resolution spectroscopy and NMR sensitivity of 2.18 $nmol \ s^{1/2}$ with the flow sensors in operation. The flow sensors featured sensitivity to an absolute difference of 0.2 in relative permittivity, enabling distinction between most common solvents. It was demonstrated that a fully automated NMR measurement of nine individual 120 $μ$L samples could be done within 3.6 min or effectively 15.3 s per sample. △ Less

Submitted 16 November, 2020; originally announced November 2020.

Comments: Main Manuscript (22 pages, 8 figures) Supplementary (4 pages, 7 figures)

arXiv:2011.00792 [pdf, other]

A Flexible Class of Dependence-aware Multi-Label Loss Functions

Authors: Eyke Hüllermeier, Marcel Wever, Eneldo Loza Mencia, Johannes Fürnkranz, Michael Rapp

Abstract: Multi-label classification is the task of assigning a subset of labels to a given query instance. For evaluating such predictions, the set of predicted labels needs to be compared to the ground-truth label set associated with that instance, and various loss functions have been proposed for this purpose. In addition to assessing predictive accuracy, a key concern in this regard is to foster and to… ▽ More Multi-label classification is the task of assigning a subset of labels to a given query instance. For evaluating such predictions, the set of predicted labels needs to be compared to the ground-truth label set associated with that instance, and various loss functions have been proposed for this purpose. In addition to assessing predictive accuracy, a key concern in this regard is to foster and to analyze a learner's ability to capture label dependencies. In this paper, we introduce a new class of loss functions for multi-label classification, which overcome disadvantages of commonly used losses such as Hamming and subset 0/1. To this end, we leverage the mathematical framework of non-additive measures and integrals. Roughly speaking, a non-additive measure allows for modeling the importance of correct predictions of label subsets (instead of single labels), and thereby their impact on the overall evaluation, in a flexible way - by giving full importance to single labels and the entire label set, respectively, Hamming and subset 0/1 are rather extreme in this regard. We present concrete instantiations of this class, which comprise Hamming and subset 0/1 as special cases, and which appear to be especially appealing from a modeling perspective. The assessment of multi-label classifiers in terms of these losses is illustrated in an empirical study. △ Less

Submitted 2 November, 2020; originally announced November 2020.

arXiv:2007.00732 [pdf, other]

Context Graphs for Legal Reasoning and Argumentation

Authors: Max Rapp, Axel Adrian, Michael Kohlhase

Abstract: We propose a new, structured, logic-based framework for legal reasoning and argumentation: Instead of using a single, unstructured meaning space, theory graphs organize knowledge and inference into collections of modular meaning spaces organized by inheritance and interpretation. Context graphs extend theory graphs by attack relations and interpret theories as knowledge contexts of agents in argum… ▽ More We propose a new, structured, logic-based framework for legal reasoning and argumentation: Instead of using a single, unstructured meaning space, theory graphs organize knowledge and inference into collections of modular meaning spaces organized by inheritance and interpretation. Context graphs extend theory graphs by attack relations and interpret theories as knowledge contexts of agents in argumentation. We introduce the context graph paradigm by modeling the well-studied case Popov v. Hayashi, concentrating on the role of analogical reasoning in context graphs. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2006.13346 [pdf, other]

Learning Gradient Boosted Multi-label Classification Rules

Authors: Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, Vu-Linh Nguyen, Eyke Hüllermeier

Abstract: In multi-label classification, where the evaluation of predictions is less straightforward than in single-label classification, various meaningful, though different, loss functions have been proposed. Ideally, the learning algorithm should be customizable towards a specific choice of the performance measure. Modern implementations of boosting, most prominently gradient boosted decision trees, appe… ▽ More In multi-label classification, where the evaluation of predictions is less straightforward than in single-label classification, various meaningful, though different, loss functions have been proposed. Ideally, the learning algorithm should be customizable towards a specific choice of the performance measure. Modern implementations of boosting, most prominently gradient boosted decision trees, appear to be appealing from this point of view. However, they are mostly limited to single-label classification, and hence not amenable to multi-label losses unless these are label-wise decomposable. In this work, we develop a generalization of the gradient boosting framework to multi-output problems and propose an algorithm for learning multi-label classification rules that is able to minimize decomposable as well as non-decomposable loss functions. Using the well-known Hamming loss and subset 0/1 loss as representatives, we analyze the abilities and limitations of our approach on synthetic data and evaluate its predictive performance on multi-label benchmarks. △ Less

Submitted 23 June, 2020; originally announced June 2020.

arXiv:2006.11916 [pdf, other]

doi 10.1007/978-3-030-61527-7_35

On Aggregation in Ensembles of Multilabel Classifiers

Authors: Vu-Linh Nguyen, Eyke Hüllermeier, Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz

Abstract: While a variety of ensemble methods for multilabel classification have been proposed in the literature, the question of how to aggregate the predictions of the individual members of the ensemble has received little attention so far. In this paper, we introduce a formal framework of ensemble multilabel classification, in which we distinguish two principal approaches: "predict then combine" (PTC), w… ▽ More While a variety of ensemble methods for multilabel classification have been proposed in the literature, the question of how to aggregate the predictions of the individual members of the ensemble has received little attention so far. In this paper, we introduce a formal framework of ensemble multilabel classification, in which we distinguish two principal approaches: "predict then combine" (PTC), where the ensemble members first make loss minimizing predictions which are subsequently combined, and "combine then predict" (CTP), which first aggregates information such as marginal label probabilities from the individual ensemble members, and then derives a prediction from this aggregation. While both approaches generalize voting techniques commonly used for multilabel ensembles, they allow to explicitly take the target performance measure into account. Therefore, concrete instantiations of CTP and PTC can be tailored to concrete loss functions. Experimentally, we show that standard voting techniques are indeed outperformed by suitable instantiations of CTP and PTC, and provide some evidence that CTP performs well for decomposable loss functions, whereas PTC is the better choice for non-decomposable losses. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: 14 pages, 2 figures

Journal ref: Proc. DS 2020: 533-547

arXiv:2006.05403 [pdf, other]

Distributed Learning on Heterogeneous Resource-Constrained Devices

Authors: Martin Rapp, Ramin Khalili, Jörg Henkel

Abstract: We consider a distributed system, consisting of a heterogeneous set of devices, ranging from low-end to high-end. These devices have different profiles, e.g., different energy budgets, or different hardware specifications, determining their capabilities on performing certain learning tasks. We propose the first approach that enables distributed learning in such a heterogeneous system. Applying our… ▽ More We consider a distributed system, consisting of a heterogeneous set of devices, ranging from low-end to high-end. These devices have different profiles, e.g., different energy budgets, or different hardware specifications, determining their capabilities on performing certain learning tasks. We propose the first approach that enables distributed learning in such a heterogeneous system. Applying our approach, each device employs a neural network (NN) with a topology that fits its capabilities; however, part of these NNs share the same topology, so that their parameters can be jointly learned. This differs from current approaches, such as federated learning, which require all devices to employ the same NN, enforcing a trade-off between achievable accuracy and computational overhead of training. We evaluate heterogeneous distributed learning for reinforcement learning (RL) and observe that it greatly improves the achievable reward on more powerful devices, compared to current approaches, while still maintaining a high reward on the weaker devices. We also explore supervised learning, observing similar gains. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:1911.04393 [pdf, other]

Simplifying Random Forests: On the Trade-off between Interpretability and Accuracy

Authors: Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz

Abstract: We analyze the trade-off between model complexity and accuracy for random forests by breaking the trees up into individual classification rules and selecting a subset of them. We show experimentally that already a few rules are sufficient to achieve an acceptable accuracy close to that of the original model. Moreover, our results indicate that in many cases, this can lead to simpler models that cl… ▽ More We analyze the trade-off between model complexity and accuracy for random forests by breaking the trees up into individual classification rules and selecting a subset of them. We show experimentally that already a few rules are sufficient to achieve an acceptable accuracy close to that of the original model. Moreover, our results indicate that in many cases, this can lead to simpler models that clearly outperform the original ones. △ Less

Submitted 11 November, 2019; originally announced November 2019.

Journal ref: 1st Workshop on Deep Continuous-Discrete Machine Learning (DeCoDeML), ECML-PKDD 2019, Würzburg Germany

arXiv:1908.06874 [pdf, other]

Efficient Discovery of Expressive Multi-label Rules using Relaxed Pruning

Authors: Yannik Klein, Michael Rapp, Eneldo Loza Mencía

Abstract: Being able to model correlations between labels is considered crucial in multi-label classification. Rule-based models enable to expose such dependencies, e.g., implications, subsumptions, or exclusions, in an interpretable and human-comprehensible manner. Albeit the number of possible label combinations increases exponentially with the number of available labels, it has been shown that rules with… ▽ More Being able to model correlations between labels is considered crucial in multi-label classification. Rule-based models enable to expose such dependencies, e.g., implications, subsumptions, or exclusions, in an interpretable and human-comprehensible manner. Albeit the number of possible label combinations increases exponentially with the number of available labels, it has been shown that rules with multiple labels in their heads, which are a natural form to model local label dependencies, can be induced efficiently by exploiting certain properties of rule evaluation measures and pruning the label search space accordingly. However, experiments have revealed that multi-label heads are unlikely to be learned by existing methods due to their restrictiveness. To overcome this limitation, we propose a plug-in approach that relaxes the search space pruning used by existing methods in order to introduce a bias towards larger multi-label heads resulting in more expressive rules. We further demonstrate the effectiveness of our approach empirically and show that it does not come with drawbacks in terms of training time or predictive performance. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: Preprint version. To appear in Proceedings of the 22nd International Conference on Discovery Science, 2019

arXiv:1908.03032 [pdf, other]

doi 10.1007/978-3-030-33778-0_9

On the Trade-off Between Consistency and Coverage in Multi-label Rule Learning Heuristics

Authors: Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz

Abstract: Recently, several authors have advocated the use of rule learning algorithms to model multi-label data, as rules are interpretable and can be comprehended, analyzed, or qualitatively evaluated by domain experts. Many rule learning algorithms employ a heuristic-guided search for rules that model regularities contained in the training data and it is commonly accepted that the choice of the heuristic… ▽ More Recently, several authors have advocated the use of rule learning algorithms to model multi-label data, as rules are interpretable and can be comprehended, analyzed, or qualitatively evaluated by domain experts. Many rule learning algorithms employ a heuristic-guided search for rules that model regularities contained in the training data and it is commonly accepted that the choice of the heuristic has a significant impact on the predictive performance of the learner. Whereas the properties of rule learning heuristics have been studied in the realm of single-label classification, there is no such work taking into account the particularities of multi-label classification. This is surprising, as the quality of multi-label predictions is usually assessed in terms of a variety of different, potentially competing, performance measures that cannot all be optimized by a single learner at the same time. In this work, we show empirically that it is crucial to trade off the consistency and coverage of rules differently, depending on which multi-label measure should be optimized by a model. Based on these findings, we emphasize the need for configurable learners that can flexibly use different heuristics. As our experiments reveal, the choice of the heuristic is not straight-forward, because a search for rules that optimize a measure locally does usually not result in a model that maximizes that measure globally. △ Less

Submitted 8 August, 2019; originally announced August 2019.

Comments: Preprint version. To appear in Proceedings of the 22nd International Conference on Discovery Science, 2019

Journal ref: Proc. DS 2019: 96-111

arXiv:1812.06833 [pdf, other]

doi 10.1007/978-3-319-93034-3_3

Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

Authors: Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz

Abstract: Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into a… ▽ More Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads. △ Less

Submitted 14 December, 2018; originally announced December 2018.

Comments: Preprint version. To appear in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018. See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3074 for further information. arXiv admin note: text overlap with arXiv:1812.00050

Journal ref: Proc. PAKDD (1) 2018: 29-42

arXiv:1812.00050 [pdf, other]

doi 10.1007/978-3-319-98131-4_4

Learning Interpretable Rules for Multi-label Classification

Authors: Eneldo Loza Mencía, Johannes Fürnkranz, Eyke Hüllermeier, Michael Rapp

Abstract: Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an… ▽ More Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area. △ Less

Submitted 7 December, 2018; v1 submitted 30 November, 2018; originally announced December 2018.

Comments: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further information

Journal ref: In Jair Escalante, H., Escalera, S., Guyon, I., Baró, X., Güçlütürk, Y., Güçlü, U., van Gerven, M.A.J. (Eds.) Explainable and Interpretable Models in Computer Vision and Machine Learning, Springer-Verlag, 2018, pp.81-113

arXiv:1801.03424 [pdf]

Cadmium Resonance Parameters from Neutron Capture and Transmission Measurements at the RPI LINAC

Authors: G. Leinweber, D. P. Barry, R. C. Block, J. A. Burke, K. E. Remley, M. J. Rapp, Y. Danon

Abstract: Cadmium has been used historically as an important component of integral experiments because of its high thermal neutron absorption cross section. Correct interpretation of such experiments depends on accurate differential neutron cross section measurements. The 60 MeV electron accelerator at the Gaerttner LINAC Center was used to generate neutrons for neutron capture and total cross section measu… ▽ More Cadmium has been used historically as an important component of integral experiments because of its high thermal neutron absorption cross section. Correct interpretation of such experiments depends on accurate differential neutron cross section measurements. The 60 MeV electron accelerator at the Gaerttner LINAC Center was used to generate neutrons for neutron capture and total cross section measurements of natural Cd. Measurements were performed in the thermal and epithermal resonance range with sample thicknesses ranging from 1 x 10-4 to 4 x 10-2 atoms per barn. A full resonance region analysis was performed in order to determine the thermal cross sections and resonance integrals of the cadmium isotopes. The Bayesian R-Matrix code SAMMY 8.0 was used to shape fit the data and extract the resonance parameters. Resonance parameter and cross section uncertainties were determined from their primary components: transmission background, capture normalization, experimental resolution function, burst width, sample thickness, and counting statistics. The experiments were analyzed for consistency within the measured capture and transmission using multiple sample thicknesses. Results are compared to previously published measurements and evaluated nuclear libraries. No major changes to the thermal cross section or the first resonance in Cd113 were identified from the consensus achieved from measurements and evaluations over the past decade. △ Less

Submitted 10 January, 2018; originally announced January 2018.

Journal ref: Proc. 13th International Topical Meeting on the Nuclear Applications of Accelerators, July 31-August 4 2017, Quebec, Canada

arXiv:1604.04767 [pdf, other]

doi 10.1007/s11263-015-0799-8

Efficient Dictionary Learning with Sparseness-Enforcing Projections

Authors: Markus Thom, Matthias Rapp, Günther Palm

Abstract: Learning dictionaries suitable for sparse coding instead of using engineered bases has proven effective in a variety of image processing tasks. This paper studies the optimization of dictionaries on image data where the representation is enforced to be explicitly sparse with respect to a smooth, normalized sparseness measure. This involves the computation of Euclidean projections onto level sets o… ▽ More Learning dictionaries suitable for sparse coding instead of using engineered bases has proven effective in a variety of image processing tasks. This paper studies the optimization of dictionaries on image data where the representation is enforced to be explicitly sparse with respect to a smooth, normalized sparseness measure. This involves the computation of Euclidean projections onto level sets of the sparseness measure. While previous algorithms for this optimization problem had at least quasi-linear time complexity, here the first algorithm with linear time complexity and constant space complexity is proposed. The key for this is the mathematically rigorous derivation of a characterization of the projection's result based on a soft-shrinkage function. This theory is applied in an original algorithm called Easy Dictionary Learning (EZDL), which learns dictionaries with a simple and fast-to-compute Hebbian-like learning rule. The new algorithm is efficient, expressive and particularly simple to implement. It is demonstrated that despite its simplicity, the proposed learning algorithm is able to generate a rich variety of dictionaries, in particular a topographic organization of atoms or separable atoms. Further, the dictionaries are as expressive as those of benchmark learning algorithms in terms of the reproduction quality on entire images, and result in an equivalent denoising performance. EZDL learns approximately 30 % faster than the already very efficient Online Dictionary Learning algorithm, and is therefore eligible for rapid data set analysis and problems with vast quantities of learning samples. △ Less

Submitted 16 April, 2016; originally announced April 2016.

Comments: The final publication is available at Springer via http://dx.doi.org/10.1007/s11263-015-0799-8

Journal ref: International Journal of Computer Vision, vol. 114, no. 2, pp. 168-194, 2015

arXiv:cond-mat/0508418 [pdf]

Thin films of metallic carbon nanotubes and their optical spectra

Authors: Ralph Krupke, Stefan Linden, Michael Rapp, Frank Hennrich

Abstract: We show that separating metallic from semiconducting carbon nanotubes by dielectrophoresis is develo** towards a bulk separation method, which allows for the first time to produce thin films of only metallic single-walled carbon nanotubes and to measure their optical absorption spectra. The data proofs that the selectivity of the separation scheme is independent from the nanotube diameter. We show that separating metallic from semiconducting carbon nanotubes by dielectrophoresis is develo** towards a bulk separation method, which allows for the first time to produce thin films of only metallic single-walled carbon nanotubes and to measure their optical absorption spectra. The data proofs that the selectivity of the separation scheme is independent from the nanotube diameter. △ Less

Submitted 18 August, 2005; originally announced August 2005.

Comments: submitted to Appl. Phys. Lett

Showing 1–24 of 24 results for author: Rapp, M