Search | arXiv e-print repository

Portable, heterogeneous ensemble workflows at scale using libEnsemble

Authors: Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild

Abstract: libEnsemble is a Python-based toolkit for running dynamic ensembles, developed as part of the DOE Exascale Computing Project. The toolkit utilizes a unique generator-simulator-allocator paradigm, where generators produce input for simulators, simulators evaluate those inputs, and allocators decide whether and when a simulator or generator should be called. The generator steers the ensemble based o… ▽ More libEnsemble is a Python-based toolkit for running dynamic ensembles, developed as part of the DOE Exascale Computing Project. The toolkit utilizes a unique generator-simulator-allocator paradigm, where generators produce input for simulators, simulators evaluate those inputs, and allocators decide whether and when a simulator or generator should be called. The generator steers the ensemble based on simulation results. libEnsemble communicates between a manager and workers. Flexibility is provided through multiple manager-worker communication substrates each of which has different benefits. These include Python's multiprocessing, mpi4py, and TCP. Multisite ensembles are supported using Balsam or Globus Compute. We overview the unique characteristics of libEnsemble as well as current and potential interoperability with other packages in the workflow ecosystem. We highlight libEnsemble's dynamic resource features: libEnsemble can detect system resources (nodes, cores, and GPUs) and assign these in a portable way. These features allow users to specify resources required for each simulation automatically on a range of systems, including Frontier, Aurora, and Perlmutter. Such ensembles can include multiple simulation types, some using GPUs and others using only CPUs, sharing nodes for maximum efficiency. We demonstrate libEnsemble's capabilities, scalability, and scientific impact via a Gaussian process surrogate training problem for the longitudinal density profile at the exit of a plasma accelerator stage using Wake-T and WarpX simulations. We also describe the benefits of libEnsemble's generator-simulator coupling, which easily exposes to the user the ability to cancel, and portably kill, running simulations. Such control can be directed from the generator or allocator based on models that are updated with intermediate simulation output. △ Less

Submitted 7 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.00465 [pdf, other]

Polyamorous Scheduling

Authors: Leszek Gąsieniec, Benjamin Smith, Sebastian Wild

Abstract: Finding schedules for pairwise meetings between the members of a complex social group without creating interpersonal conflict is challenging, especially when different relationships have different needs. We formally define and study the underlying optimisation problem: Polyamorous Scheduling. In Polyamorous Scheduling, we are given an edge-weighted graph and try to find a periodic schedule of ma… ▽ More Finding schedules for pairwise meetings between the members of a complex social group without creating interpersonal conflict is challenging, especially when different relationships have different needs. We formally define and study the underlying optimisation problem: Polyamorous Scheduling. In Polyamorous Scheduling, we are given an edge-weighted graph and try to find a periodic schedule of matchings in this graph such that the maximal weighted waiting time between consecutive occurrences of the same edge is minimised. We show that the problem is NP-hard and that there is no efficient approximation algorithm with a better ratio than 4/3 unless P = NP. On the positive side, we obtain an $O(\log n)$-approximation algorithm; indeed, a $O(\log Δ)$-approximation for $Δ$ the maximum degree, i.e., the largest number of relationships of any individual. We also define a generalisation of density from the Pinwheel Scheduling Problem, "poly density", and ask whether there exists a poly-density threshold similar to the 5/6-density threshold for Pinwheel Scheduling [Kawamura, STOC 2024]. Polyamorous Scheduling is a natural generalisation of Pinwheel Scheduling with respect to its optimisation variant, Bamboo Garden Trimming. Our work contributes the first nontrivial hardness-of-approximation reduction for any periodic scheduling problem, and opens up numerous avenues for further study of Polyamorous Scheduling. △ Less

Submitted 26 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: v2: stronger and simplified hardness-of-approximation results, corrected constant in layering approximation algorithm

arXiv:2402.17631 [pdf, other]

Deterministic Cache-Oblivious Funnelselect

Authors: Gerth Stølting Brodal, Sebastian Wild

Abstract: In the multiple-selection problem one is given an unsorted array $S$ of $N$ elements and an array of $q$ query ranks $r_1<\cdots<r_q$, and the task is to return, in sorted order, the $q$ elements in $S$ of rank $r_1, \ldots, r_q$, respectively. The asymptotic deterministic comparison complexity of the problem was settled by Dobkin and Munro [JACM 1981]. In the I/O model an optimal I/O complexity w… ▽ More In the multiple-selection problem one is given an unsorted array $S$ of $N$ elements and an array of $q$ query ranks $r_1<\cdots<r_q$, and the task is to return, in sorted order, the $q$ elements in $S$ of rank $r_1, \ldots, r_q$, respectively. The asymptotic deterministic comparison complexity of the problem was settled by Dobkin and Munro [JACM 1981]. In the I/O model an optimal I/O complexity was achieved by Hu et al. [SPAA 2014]. Recently [ESA 2023], we presented a cache-oblivious algorithm with matching I/O complexity, named funnelselect, since it heavily borrows ideas from the cache-oblivious sorting algorithm funnelsort from the seminal paper by Frigo, Leiserson, Prokop and Ramachandran [FOCS 1999]. Funnelselect is inherently randomized as it relies on sampling for cheaply finding many good pivots. In this paper we present deterministic funnelselect, achieving the same optional I/O complexity cache-obliviously without randomization. Our new algorithm essentially replaces a single (in expectation) reversed-funnel computation using random pivots by a recursive algorithm using multiple reversed-funnel computations. To meet the I/O bound, this requires a carefully chosen subproblem size based on the entropy of the sequence of query ranks; deterministic funnelselect thus raises distinct technical challenges not met by randomized funnelselect. The resulting worst-case I/O bound is $O\bigl(\sum_{i=1}^{q+1} \frac{Δ_i}{B} \cdot \log_{M/B} \frac{N}{Δ_i} + \frac{N}{B}\bigr)$, where $B$ is the external memory block size, $M\geq B^{1+ε}$ is the internal memory size, for some constant $ε>0$, and $Δ_i = r_{i} - r_{i-1}$ (assuming $r_0=0$ and $r_{q+1}=N + 1$). △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2401.16623 [pdf, other]

Towards Optimal Grammars for RNA Structures

Authors: Evarista Onokpasa, Sebastian Wild, Prudence W. H. Wong

Abstract: In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framewor… ▽ More In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framework for automatic and systematic search algorithms for stochastic grammars with better compression (and prediction) ability for RNA. We perform an exhaustive search of small grammars and identify grammars that surpass the performance of human-expert grammars. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: to be presented at DCC 2024

arXiv:2401.06512 [pdf, other]

An Optimal Randomized Algorithm for Finding the Saddlepoint

Authors: Justin Dallant, Frederik Haagensen, Riko Jacob, László Kozma, Sebastian Wild

Abstract: A \emph{saddlepoint} of an $n \times n$ matrix is an entry that is the maximum of its row and the minimum of its column. Saddlepoints give the \emph{value} of a two-player zero-sum game, corresponding to its pure-strategy Nash equilibria; efficiently finding a saddlepoint is thus a natural and fundamental algorithmic task. For finding a \emph{strict saddlepoint} (an entry that is the strict maxi… ▽ More A \emph{saddlepoint} of an $n \times n$ matrix is an entry that is the maximum of its row and the minimum of its column. Saddlepoints give the \emph{value} of a two-player zero-sum game, corresponding to its pure-strategy Nash equilibria; efficiently finding a saddlepoint is thus a natural and fundamental algorithmic task. For finding a \emph{strict saddlepoint} (an entry that is the strict maximum of its row and the strict minimum of its column) we recently gave an $O({n\log^*{n}})$-time algorithm, improving the $O({n\log{n}})$ bounds from 1991 of Bienstock, Chung, Fredman, Schäffer, Shor, Suri and of Byrne and Vaserstein. In this paper we present an optimal $O({n})$-time algorithm for finding a strict saddlepoint based on random sampling. Our algorithm, like earlier approaches, accesses matrix entries only via unit-cost binary comparisons. For finding a (non-strict) saddlepoint, we extend an existing lower bound to randomized algorithms, showing that the trivial $O(n^2)$ runtime cannot be improved even with the use of randomness. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: 12 pages

ACM Class: F.2.0

arXiv:2310.16801 [pdf, other]

Finding the saddlepoint faster than sorting

Authors: Justin Dallant, Frederik Haagensen, Riko Jacob, László Kozma, Sebastian Wild

Abstract: A saddlepoint of an $n \times n$ matrix $A$ is an entry of $A$ that is a maximum in its row and a minimum in its column. Knuth (1968) gave several different algorithms for finding a saddlepoint. The worst-case running time of these algorithms is $Θ(n^2)$, and Llewellyn, Tovey, and Trick (1988) showed that this cannot be improved, as in the worst case all entries of A may need to be queried. A st… ▽ More A saddlepoint of an $n \times n$ matrix $A$ is an entry of $A$ that is a maximum in its row and a minimum in its column. Knuth (1968) gave several different algorithms for finding a saddlepoint. The worst-case running time of these algorithms is $Θ(n^2)$, and Llewellyn, Tovey, and Trick (1988) showed that this cannot be improved, as in the worst case all entries of A may need to be queried. A strict saddlepoint of $A$ is an entry that is the strict maximum in its row and the strict minimum in its column. The strict saddlepoint (if it exists) is unique, and Bienstock, Chung, Fredman, Schäffer, Shor, and Suri (1991) showed that it can be found in time $O(n \log{n})$, where a dominant runtime contribution is sorting the diagonal of the matrix. This upper bound has not been improved since 1991. In this paper we show that the strict saddlepoint can be found in $O(n \log^{*}{n})$ time, where $\log^{*}$ denotes the very slowly growing iterated logarithm function, coming close to the lower bound of $Ω(n)$. In fact, we can also compute, within the same runtime, the value of a non-strict saddlepoint, assuming one exists. Our algorithm is based on a simple recursive approach, a feasibility test inspired by searching in sorted matrices, and a relaxed notion of saddlepoint. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: To be presented at SOSA 2024

arXiv:2304.07445 [pdf, other]

A framework for fully autonomous design of materials via multiobjective optimization and active learning: challenges and next steps

Authors: Tyler H. Chang, Jakob R. Elias, Stefan M. Wild, Santanu Chaudhuri, Joseph A. Libera

Abstract: In order to deploy machine learning in a real-world self-driving laboratory where data acquisition is costly and there are multiple competing design criteria, systems need to be able to intelligently sample while balancing performance trade-offs and constraints. For these reasons, we present an active learning process based on multiobjective black-box optimization with continuously updated machine… ▽ More In order to deploy machine learning in a real-world self-driving laboratory where data acquisition is costly and there are multiple competing design criteria, systems need to be able to intelligently sample while balancing performance trade-offs and constraints. For these reasons, we present an active learning process based on multiobjective black-box optimization with continuously updated machine learning models. This workflow is built on open-source technologies for real-time data streaming and modular multiobjective optimization software development. We demonstrate a proof of concept for this workflow through the autonomous operation of a continuous-flow chemistry laboratory, which identifies ideal manufacturing conditions for the electrolyte 2,2,2-trifluoroethyl methyl carbonate. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.06881 [pdf, other]

Designing a Framework for Solving Multiobjective Simulation Optimization Problems

Authors: Tyler H. Chang, Stefan M. Wild

Abstract: Multiobjective simulation optimization (MOSO) problems are optimization problems with multiple conflicting objectives, where evaluation of at least one of the objectives depends on a black-box numerical code or real-world experiment, which we refer to as a simulation. This paper describes the design goals driving the development of the parallel MOSO library ParMOO. We derive these goals from the r… ▽ More Multiobjective simulation optimization (MOSO) problems are optimization problems with multiple conflicting objectives, where evaluation of at least one of the objectives depends on a black-box numerical code or real-world experiment, which we refer to as a simulation. This paper describes the design goals driving the development of the parallel MOSO library ParMOO. We derive these goals from the research trends and real-world requirements that arise when designing and deploying solvers for generic MOSO problems. Our specific design goals were to provide a customizable MOSO framework that allows for exploitation of simulation-based problem structures, ease of deployment in scientific workflows, maintainability, and flexibility in our support for many problem types. We explain how we have achieved these goals in the ParMOO library and provide two examples demonstrating how customized ParMOO solvers can be quickly built and deployed in real-world MOSO problems. △ Less

Submitted 6 July, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

arXiv:2302.11669 [pdf, other]

RNA secondary structures: from ab initio prediction to better compression, and back

Authors: Evarista Onokpasa, Sebastian Wild, Prudence W. H. Wong

Abstract: In this paper, we use the biological domain knowledge incorporated into stochastic models for ab initio RNA secondary-structure prediction to improve the state of the art in joint compression of RNA sequence and structure data (Liu et al., BMC Bioinformatics, 2008). Moreover, we show that, conversely, compression ratio can serve as a cheap and robust proxy for comparing the prediction quality of d… ▽ More In this paper, we use the biological domain knowledge incorporated into stochastic models for ab initio RNA secondary-structure prediction to improve the state of the art in joint compression of RNA sequence and structure data (Liu et al., BMC Bioinformatics, 2008). Moreover, we show that, conversely, compression ratio can serve as a cheap and robust proxy for comparing the prediction quality of different stochastic models, which may help guide the search for better RNA structure prediction models. Our results build on expert stochastic context-free grammar models of RNA secondary structures (Dowell & Eddy, BMC Bioinformatics, 2004; Nebel & Scheid, Theory in Biosciences, 2011) combined with different (static and adaptive) models for rule probabilities and arithmetic coding. We provide a prototype implementation and an extensive empirical evaluation, where we illustrate how grammar features and probability models affect compression ratios. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: paper at Data Compression Conference 2023

arXiv:2302.02005 [pdf, other]

DeepAstroUDA: Semi-Supervised Universal Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection

Authors: A. Ćiprijanović, A. Lewis, K. Pedro, S. Madireddy, B. Nord, G. N. Perdue, S. M. Wild

Abstract: Artificial intelligence methods show great promise in increasing the quality and speed of work with large astronomical datasets, but the high complexity of these methods leads to the extraction of dataset-specific, non-robust features. Therefore, such methods do not generalize well across multiple datasets. We present a universal domain adaptation method, \textit{DeepAstroUDA}, as an approach to o… ▽ More Artificial intelligence methods show great promise in increasing the quality and speed of work with large astronomical datasets, but the high complexity of these methods leads to the extraction of dataset-specific, non-robust features. Therefore, such methods do not generalize well across multiple datasets. We present a universal domain adaptation method, \textit{DeepAstroUDA}, as an approach to overcome this challenge. This algorithm performs semi-supervised domain adaptation and can be applied to datasets with different data distributions and class overlaps. Non-overlap** classes can be present in any of the two datasets (the labeled source domain, or the unlabeled target domain), and the method can even be used in the presence of unknown classes. We apply our method to three examples of galaxy morphology classification tasks of different complexities ($3$-class and $10$-class problems), with anomaly detection: 1) datasets created after different numbers of observing years from a single survey (LSST mock data of $1$ and $10$ years of observations); 2) data from different surveys (SDSS and DECaLS); and 3) data from observing fields with different depths within one survey (wide field and Stripe 82 deep field of SDSS). For the first time, we demonstrate the successful use of domain adaptation between very discrepant observational datasets. \textit{DeepAstroUDA} is capable of bridging the gap between two astronomical surveys, increasing classification accuracy in both domains (up to $40\%$ on the unlabeled data), and making model performance consistent across datasets. Furthermore, our method also performs well as an anomaly detection algorithm and successfully clusters unknown class samples even in the unlabeled target dataset. △ Less

Submitted 22 March, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: Accepted in Machine Learning Science and Technology (MLST); 24 pages, 14 figures

Report number: FERMILAB-PUB-23-034-CSAID

arXiv:2211.16551 [pdf, other]

doi 10.1103/PhysRevA.107.062417

Numerical evidence against advantage with quantum fidelity kernels on classical data

Authors: Lucas Slattery, Ruslan Shaydulin, Shouvanik Chakrabarti, Marco Pistoia, Sami Khairy, Stefan M. Wild

Abstract: Quantum machine learning techniques are commonly considered one of the most promising candidates for demonstrating practical quantum advantage. In particular, quantum kernel methods have been demonstrated to be able to learn certain classically intractable functions efficiently if the kernel is well-aligned with the target function. In the more general case, quantum kernels are known to suffer fro… ▽ More Quantum machine learning techniques are commonly considered one of the most promising candidates for demonstrating practical quantum advantage. In particular, quantum kernel methods have been demonstrated to be able to learn certain classically intractable functions efficiently if the kernel is well-aligned with the target function. In the more general case, quantum kernels are known to suffer from exponential "flattening" of the spectrum as the number of qubits grows, preventing generalization and necessitating the control of the inductive bias by hyperparameters. We show that the general-purpose hyperparameter tuning techniques proposed to improve the generalization of quantum kernels lead to the kernel becoming well-approximated by a classical kernel, removing the possibility of quantum advantage. We provide extensive numerical evidence for this phenomenon utilizing multiple previously studied quantum feature maps and both synthetic and real data. Our results show that unless novel techniques are developed to control the inductive bias of quantum kernels, they are unlikely to provide a quantum advantage on classical data. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Journal ref: Phys. Rev. A 107, 062417 (2023)

arXiv:2211.00677 [pdf, other]

Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection

Authors: Aleksandra Ćiprijanović, Ashia Lewis, Kevin Pedro, Sandeep Madireddy, Brian Nord, Gabriel N. Perdue, Stefan M. Wild

Abstract: In the era of big astronomical surveys, our ability to leverage artificial intelligence algorithms simultaneously for multiple datasets will open new avenues for scientific discovery. Unfortunately, simply training a deep neural network on images from one data domain often leads to very poor performance on any other dataset. Here we develop a Universal Domain Adaptation method DeepAstroUDA, capabl… ▽ More In the era of big astronomical surveys, our ability to leverage artificial intelligence algorithms simultaneously for multiple datasets will open new avenues for scientific discovery. Unfortunately, simply training a deep neural network on images from one data domain often leads to very poor performance on any other dataset. Here we develop a Universal Domain Adaptation method DeepAstroUDA, capable of performing semi-supervised domain alignment that can be applied to datasets with different types of class overlap. Extra classes can be present in any of the two datasets, and the method can even be used in the presence of unknown classes. For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets (from SDSS and DECaLS). We show that our method is capable of bridging the gap between two astronomical surveys, and also performs well for anomaly detection and clustering of unknown data in the unlabeled dataset. We apply our model to two examples of galaxy morphology classification tasks with anomaly detection: 1) classifying spiral and elliptical galaxies with detection of merging galaxies (three classes including one unknown anomaly class); 2) a more granular problem where the classes describe more detailed morphological properties of galaxies, with the detection of gravitational lenses (ten classes including one unknown anomaly class). △ Less

Submitted 11 November, 2022; v1 submitted 1 November, 2022; originally announced November 2022.

Comments: 3 figures, 1 table; accepted to Machine Learning and the Physical Sciences - Workshop at the 36th conference on Neural Information Processing Systems (NeurIPS)

Report number: FERMILAB-CONF-22-791-SCD

arXiv:2209.06909 [pdf, other]

doi 10.1137/1.9781611977561.ch16

Multiway Powersort

Authors: William Cawley Gelling, Markus E. Nebel, Benjamin Smith, Sebastian Wild

Abstract: We present a stable mergesort variant, Multiway Powersort, that exploits existing runs and finds nearly-optimal merging orders for k-way merges with negligible overhead. This builds on Powersort (Munro & Wild, ESA2018), which has recently replaced Timsort's suboptimal merge policy in the CPython reference implementation of Python, as well as in PyPy and further libraries. Multiway Powersort reduce… ▽ More We present a stable mergesort variant, Multiway Powersort, that exploits existing runs and finds nearly-optimal merging orders for k-way merges with negligible overhead. This builds on Powersort (Munro & Wild, ESA2018), which has recently replaced Timsort's suboptimal merge policy in the CPython reference implementation of Python, as well as in PyPy and further libraries. Multiway Powersort reduces the number of memory transfers, which increasingly determine the cost of internal sorting (as observed with Multiway Quicksort (Kushagra et al., ALENEX 2014; Aumüller & Dietzfelbinger, TALG 2016; Wild, PhD thesis 2016) and the inclusion of Dual-Pivot Quicksort in the Java runtime library). We demonstrate that our 4-way Powersort implementation can achieve substantial speedups over standard (2-way) Powersort and other stable sorting methods without compromising the optimally run-adaptive performance of Powersort. △ Less

Submitted 16 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: 17 pages; accompanying source code at https://github.com/sebawild/powersort; v2 adds new figure and text changes. v2 is identical to the ALENEX 2023 version

Journal ref: ALENEX 2023

arXiv:2206.06686 [pdf, other]

Bandwidth Enables Generalization in Quantum Kernel Models

Authors: Abdulkadir Canatar, Evan Peters, Cengiz Pehlevan, Stefan M. Wild, Ruslan Shaydulin

Abstract: Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical int… ▽ More Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical interest. Recent results demonstrate that generalization is hindered by the exponential size of the quantum feature space. Although these results suggest that quantum models cannot generalize when the number of qubits is large, in this paper we show that these results rely on overly restrictive assumptions. We consider a wider class of models by varying a hyperparameter that we call quantum kernel bandwidth. We analyze the large-qubit limit and provide explicit formulas for the generalization of a quantum model that can be solved in closed form. Specifically, we show that changing the value of the bandwidth can take a model from provably not being able to generalize to any target function to good generalization for well-aligned targets. Our analysis shows how the bandwidth controls the spectrum of the kernel integral operator and thereby the inductive bias of the model. We demonstrate empirically that our theory correctly predicts how varying the bandwidth affects generalization of quantum models on challenging datasets, including those far outside our theoretical assumptions. We discuss the implications of our results for quantum advantage in machine learning. △ Less

Submitted 18 June, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted version

arXiv:2205.01066 [pdf, other]

Quantifying Health Inequalities Induced by Data and AI Models

Authors: Honghan Wu, Minhong Wang, Aneeta Sylolypavan, Sarah Wild

Abstract: AI technologies are being increasingly tested and applied in critical environments including healthcare. Without an effective way to detect and mitigate AI induced inequalities, AI might do more harm than good, potentially leading to the widening of underlying inequalities. This paper proposes a generic allocation-deterioration framework for detecting and quantifying AI induced inequality. Specifi… ▽ More AI technologies are being increasingly tested and applied in critical environments including healthcare. Without an effective way to detect and mitigate AI induced inequalities, AI might do more harm than good, potentially leading to the widening of underlying inequalities. This paper proposes a generic allocation-deterioration framework for detecting and quantifying AI induced inequality. Specifically, AI induced inequalities are quantified as the area between two allocation-deterioration curves. To assess the framework's performance, experiments were conducted on ten synthetic datasets (N>33,000) generated from HiRID - a real-world Intensive Care Unit (ICU) dataset, showing its ability to accurately detect and quantify inequality proportionally to controlled inequalities. Extensive analyses were carried out to quantify health inequalities (a) embedded in two real-world ICU datasets; (b) induced by AI models trained for two resource allocation scenarios. Results showed that compared to men, women had up to 33% poorer deterioration in markers of prognosis when admitted to HiRID ICUs. All four AI models assessed were shown to induce significant inequalities (2.45% to 43.2%) for non-White compared to White patients. The models exacerbated data embedded inequalities significantly in 3 out of 8 assessments, one of which was >9 times worse. The codebase is at https://github.com/knowlab/DAindex-Framework. △ Less

Submitted 3 May, 2022; v1 submitted 24 April, 2022; originally announced May 2022.

Comments: Accepted by IJCAI-ECAI 2022 AI for Good track

arXiv:2112.14299 [pdf, other]

DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification

Authors: Aleksandra Ćiprijanović, Diana Kafkes, Gregory Snyder, F. Javier Sánchez, Gabriel Nathan Perdue, Kevin Pedro, Brian Nord, Sandeep Madireddy, Stefan M. Wild

Abstract: With increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects (that can naturally occur in the data processing and analysis pipelines) and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the eff… ▽ More With increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects (that can naturally occur in the data processing and analysis pipelines) and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: 1) increased observational noise as represented by higher levels of Poisson noise and 2) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness. Without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy by 23% on data with higher observational noise. Domain adaptation also increases by a factor of ~2.3 the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations. △ Less

Submitted 6 July, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

Comments: 20 pages, 6 figures, 5 tables; accepted in MLST

Report number: FERMILAB-PUB-21-767-SCD

arXiv:2111.05451 [pdf, other]

doi 10.1103/PhysRevA.106.042407

Importance of Kernel Bandwidth in Quantum Machine Learning

Authors: Ruslan Shaydulin, Stefan M. Wild

Abstract: Quantum kernel methods are considered a promising avenue for applying quantum computers to machine learning problems. Identifying hyperparameters controlling the inductive bias of quantum machine learning models is expected to be crucial given the central role hyperparameters play in determining the performance of classical machine learning methods. In this work we introduce the hyperparameter con… ▽ More Quantum kernel methods are considered a promising avenue for applying quantum computers to machine learning problems. Identifying hyperparameters controlling the inductive bias of quantum machine learning models is expected to be crucial given the central role hyperparameters play in determining the performance of classical machine learning methods. In this work we introduce the hyperparameter controlling the bandwidth of a quantum kernel and show that it controls the expressivity of the resulting model. We use extensive numerical experiments with multiple quantum kernels and classical datasets to show consistent change in the model behavior from underfitting (bandwidth too large) to overfitting (bandwidth too small), with optimal generalization in between. We draw a connection between the bandwidth of classical and quantum kernels and show analogous behavior in both cases. Furthermore, we show that optimizing the bandwidth can help mitigate the exponential decay of kernel values with qubit count, which is the cause behind recent observations that the performance of quantum kernel methods decreases with qubit count. We reproduce these negative results and show that if the kernel bandwidth is optimized, the performance instead improves with growing qubit count and becomes competitive with the best classical methods. △ Less

Submitted 28 September, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: Camera-ready version

Journal ref: Phys. Rev. A 106, 042407 (2022)

arXiv:2111.03639 [pdf, other]

Randomized Communication and Implicit Graph Representations

Authors: Nathaniel Harms, Sebastian Wild, Viktor Zamaraev

Abstract: We study constant-cost randomized communication problems and relate them to implicit graph representations in structural graph theory. Specifically, constant-cost communication problems correspond to hereditary graph families that admit constant-size adjacency sketches, or equivalently constant-size probabilistic universal graphs (PUGs), and these graph families are a subset of families that admit… ▽ More We study constant-cost randomized communication problems and relate them to implicit graph representations in structural graph theory. Specifically, constant-cost communication problems correspond to hereditary graph families that admit constant-size adjacency sketches, or equivalently constant-size probabilistic universal graphs (PUGs), and these graph families are a subset of families that admit adjacency labeling schemes of size O(log n), which are the subject of the well-studied implicit graph question (IGQ). We initiate the study of the hereditary graph families that admit constant-size PUGs, with the two (equivalent) goals of (1) understanding randomized constant-cost communication problems, and (2) understanding a probabilistic version of the IGQ. For each family $\mathcal F$ studied in this paper (including the monogenic bipartite families, product graphs, interval and permutation graphs, families of bounded twin-width, and others), it holds that the subfamilies $\mathcal H \subseteq \mathcal F$ admit constant-size PUGs (i.e. adjacency sketches) if and only if they are stable (i.e. they forbid a half-graph as a semi-induced subgraph). The correspondence between communication problems and hereditary graph families allows for a new method of constructing adjacency labeling schemes. By this method, we show that the induced subgraphs of any Cartesian products are positive examples to the IGQ. We prove that this probabilistic construction cannot be derandomized by using an Equality oracle, i.e. the Equality oracle cannot simulate the k-Hamming Distance communication protocol. We also obtain constant-size sketches for deciding $\mathsf{dist}(x, y) \le k$ for vertices $x$, $y$ in any stable graph family with bounded twin-width. This generalizes to constant-size sketches for deciding first-order formulas over the same graphs. △ Less

Submitted 18 July, 2023; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: 72 pages, 10 figures. Abstract shortened for arXiv

arXiv:2111.01784 [pdf, other]

Towards the 5/6-Density Conjecture of Pinwheel Scheduling

Authors: Leszek Gąsieniec, Benjamin Smith, Sebastian Wild

Abstract: Pinwheel Scheduling aims to find a perpetual schedule for unit-length tasks on a single machine subject to given maximal time spans (a.k.a. frequencies) between any two consecutive executions of the same task. The density of a Pinwheel Scheduling instance is the sum of the inverses of these task frequencies; the 5/6-Conjecture (Chan and Chin, 1993) states that any Pinwheel Scheduling instance with… ▽ More Pinwheel Scheduling aims to find a perpetual schedule for unit-length tasks on a single machine subject to given maximal time spans (a.k.a. frequencies) between any two consecutive executions of the same task. The density of a Pinwheel Scheduling instance is the sum of the inverses of these task frequencies; the 5/6-Conjecture (Chan and Chin, 1993) states that any Pinwheel Scheduling instance with density at most 5/6 is schedulable. We formalize the notion of Pareto surfaces for Pinwheel Scheduling and exploit novel structural insights to engineer an efficient algorithm for computing them. This allows us to (1) confirm the 5/6-Conjecture for all Pinwheel Scheduling instances with at most 12 tasks and (2) to prove that a given list of only 23 schedules solves all schedulable Pinwheel Scheduling instances with at most 5 tasks. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: Accepted at ALENEX 2022

arXiv:2111.00961 [pdf, other]

Robustness of deep learning algorithms in astronomy -- galaxy morphology studies

Authors: A. Ćiprijanović, D. Kafkes, G. N. Perdue, K. Pedro, G. Snyder, F. J. Sánchez, S. Madireddy, S. M. Wild, B. Nord

Abstract: Deep learning models are being increasingly adopted in wide array of scientific domains, especially to handle high-dimensionality and volume of the scientific data. However, these models tend to be brittle due to their complexity and overparametrization, especially to the inadvertent adversarial perturbations that can appear due to common image processing such as compression or blurring that are o… ▽ More Deep learning models are being increasingly adopted in wide array of scientific domains, especially to handle high-dimensionality and volume of the scientific data. However, these models tend to be brittle due to their complexity and overparametrization, especially to the inadvertent adversarial perturbations that can appear due to common image processing such as compression or blurring that are often seen with real scientific data. It is crucial to understand this brittleness and develop models robust to these adversarial perturbations. To this end, we study the effect of observational noise from the exposure time, as well as the worst case scenario of a one-pixel attack as a proxy for compression or telescope errors on performance of ResNet18 trained to distinguish between galaxies of different morphologies in LSST mock data. We also explore how domain adaptation techniques can help improve model robustness in case of this type of naturally occurring attacks and help scientists build more trustworthy and stable models. △ Less

Submitted 2 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted in: Fourth Workshop on Machine Learning and the Physical Sciences (35th Conference on Neural Information Processing Systems; NeurIPS2021); final version

Report number: FERMILAB-CONF-21-561-SCD

arXiv:2109.12213 [pdf, other]

Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization

Authors: Raghu Bollapragada, Stefan M. Wild

Abstract: We consider unconstrained stochastic optimization problems with no available gradient information. Such problems arise in settings from derivative-free simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a stochastic function using finite differences within a common random number framework. We develop modified ve… ▽ More We consider unconstrained stochastic optimization problems with no available gradient information. Such problems arise in settings from derivative-free simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a stochastic function using finite differences within a common random number framework. We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the stochastic approximations and provide global convergence results to the neighborhood of the optimal solution. We present numerical experiments on simulation optimization problems to illustrate the performance of the proposed algorithm. When compared with classical zeroth-order stochastic gradient methods, we observe that our strategies of adapting the sample sizes significantly improve performance in terms of the number of stochastic function evaluations required. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2105.04965 [pdf, other]

Succinct Euler-Tour Trees

Authors: Travis Gagie, Sebastian Wild

Abstract: We show how a collection of Euler-tour trees for a forest on $n$ vertices can be stored in $2 n + o (n)$ bits such that simple queries take constant time, more complex queries take logarithmic time and updates take polylogarithmic amortized time. We show how a collection of Euler-tour trees for a forest on $n$ vertices can be stored in $2 n + o (n)$ bits such that simple queries take constant time, more complex queries take logarithmic time and updates take polylogarithmic amortized time. △ Less

Submitted 29 June, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

arXiv:2104.13457 [pdf, other]

doi 10.4230/LIPIcs.ESA.2021.70

Hypersuccinct Trees -- New universal tree source codes for optimal compressed tree data structures and range minima

Authors: J. Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner, Sebastian Wild

Abstract: We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tr… ▽ More We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tree compression method. The resulting data structures, "hypersuccinct trees", hence combine the compression achieved by the best known universal codes with the operation support of the best succinct tree data structures. We apply hypersuccinct trees to obtain a universal compressed data structure for range-minimum queries. It has constant query time and the optimal worst-case space usage of $2n+o(n)$ bits, but the space drops to $1.736n + o(n)$ bits on average for random permutations of $n$ elements, and $2\lg\binom nr + o(n)$ for arrays with $r$ increasing runs, respectively. Both results are optimal; the former answers an open problem of Davoodi et al. (2014) and Golin et al. (2016). Compared to prior work on succinct data structures, we do not have to tailor our data structure to specific applications; hypersuccinct trees automatically adapt to the trees at hand. We show that they simultaneously achieve the optimal space usage to within lower order terms for a wide range of distributions over tree shapes, including: binary search trees (BSTs) generated by insertions in random order / Cartesian trees of random arrays, random fringe-balanced BSTs, binary trees with a given number of binary/unary/leaf nodes, random binary tries generated from memoryless sources, full binary trees, unary paths, as well as uniformly chosen weight-balanced BSTs, AVL trees, and left-leaning red-black trees. △ Less

Submitted 3 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: part of ESA 2021

arXiv:2104.11079 [pdf, other]

doi 10.2172/1807223

Randomized Algorithms for Scientific Computing (RASC)

Authors: Aydin Buluc, Tamara G. Kolda, Stefan M. Wild, Mihai Anitescu, Anthony DeGennaro, John Jakeman, Chandrika Kamath, Ramakrishnan Kannan, Miles E. Lopes, Per-Gunnar Martinsson, Kary Myers, Jelani Nelson, Juan M. Restrepo, C. Seshadhri, Draguna Vrabie, Brendt Wohlberg, Stephen J. Wright, Chao Yang, Peter Zwart

Abstract: Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and sc… ▽ More Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021. △ Less

Submitted 21 March, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

arXiv:2104.08322 [pdf, other]

doi 10.1109/TPDS.2021.3082815

libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations

Authors: Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild

Abstract: Almost all applications stop scaling at some point; those that don't are seldom performant when considering time to solution on anything but aspirational/unicorn resources. Recognizing these tradeoffs as well as greater user functionality in a near-term exascale computing era, we present libEnsemble, a library aimed at particular scalability- and capability-stretching uses. libEnsemble enables run… ▽ More Almost all applications stop scaling at some point; those that don't are seldom performant when considering time to solution on anything but aspirational/unicorn resources. Recognizing these tradeoffs as well as greater user functionality in a near-term exascale computing era, we present libEnsemble, a library aimed at particular scalability- and capability-stretching uses. libEnsemble enables running concurrent instances of an application in dynamically allocated ensembles through an extensible Python library. We highlight the structure, execution, and capabilities of the library on leading pre-exascale environments as well as advanced capabilities for exascale environments and beyond. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2103.16041 [pdf, other]

Scalable Statistical Inference of Photometric Redshift via Data Subsampling

Authors: Arindam Fadikar, Stefan M. Wild, Jonas Chaves-Montero

Abstract: Handling big data has largely been a major bottleneck in traditional statistical models. Consequently, when accurate point prediction is the primary target, machine learning models are often preferred over their statistical counterparts for bigger problems. But full probabilistic statistical models often outperform other models in quantifying uncertainties associated with model predictions. We dev… ▽ More Handling big data has largely been a major bottleneck in traditional statistical models. Consequently, when accurate point prediction is the primary target, machine learning models are often preferred over their statistical counterparts for bigger problems. But full probabilistic statistical models often outperform other models in quantifying uncertainties associated with model predictions. We develop a data-driven statistical modeling framework that combines the uncertainties from an ensemble of statistical models learned on smaller subsets of data carefully chosen to account for imbalances in the input space. We demonstrate this method on a photometric redshift estimation problem in cosmology, which seeks to infer a distribution of the redshift -- the stretching effect in observing the light of far-away galaxies -- given multivariate color information observed for an object in the sky. Our proposed method performs balanced partitioning, graph-based data subsampling across the partitions, and training of an ensemble of Gaussian process models. △ Less

Submitted 1 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

arXiv:2010.08840 [pdf, ps, other]

Lazy Search Trees

Authors: Bryce Sandlund, Sebastian Wild

Abstract: We introduce the lazy search tree data structure. The lazy search tree is a comparison-based data structure on the pointer machine that supports order-based operations such as rank, select, membership, predecessor, successor, minimum, and maximum while providing dynamic operations insert, delete, change-key, split, and merge. We analyze the performance of our data structure based on a partition of… ▽ More We introduce the lazy search tree data structure. The lazy search tree is a comparison-based data structure on the pointer machine that supports order-based operations such as rank, select, membership, predecessor, successor, minimum, and maximum while providing dynamic operations insert, delete, change-key, split, and merge. We analyze the performance of our data structure based on a partition of current elements into a set of gaps $\{Δ_i\}$ based on rank. A query falls into a particular gap and splits the gap into two new gaps at a rank $r$ associated with the query operation. If we define $B = \sum_i |Δ_i| \log_2(n/|Δ_i|)$, our performance over a sequence of $n$ insertions and $q$ distinct queries is $O(B + \min(n \log \log n, n \log q))$. We show $B$ is a lower bound. Effectively, we reduce the insertion time of binary search trees from $Θ(\log n)$ to $O(\min(\log(n/|Δ_i|) + \log \log |Δ_i|, \; \log q))$, where $Δ_i$ is the gap in which the inserted element falls. Over a sequence of $n$ insertions and $q$ queries, a time bound of $O(n \log q + q \log n)$ holds; better bounds are possible when queries are non-uniformly distributed. As an extreme case of non-uniformity, if all queries are for the minimum element, the lazy search tree performs as a priority queue with $O(\log \log n)$ time insert and decrease-key operations. The same data structure supports queries for any rank, interpolating between binary search trees and efficient priority queues. Lazy search trees can be implemented to operate mostly on arrays, requiring only $O(\min(q, n))$ pointers. Via direct reduction, our data structure also supports the efficient access theorems of the splay tree, providing a powerful data structure for non-uniform element access, both when the number of accesses is small and large. △ Less

Submitted 17 October, 2020; originally announced October 2020.

Comments: Accepted for publication in FOCS 2020

arXiv:2010.04108 [pdf, ps, other]

doi 10.1007/s00453-022-01039-2

Succinct Permutation Graphs

Authors: Konstantinos Tsakalidis, Sebastian Wild, Viktor Zamaraev

Abstract: We present a succinct data structure for permutation graphs, and their superclass of circular permutation graphs, i.e., data structures using optimal space up to lower order terms. Unlike concurrent work on circle graphs (Acan et al. 2022), our data structure also supports distance and shortest-path queries, as well as adjacency and neighborhood queries, all in optimal time. We present in particul… ▽ More We present a succinct data structure for permutation graphs, and their superclass of circular permutation graphs, i.e., data structures using optimal space up to lower order terms. Unlike concurrent work on circle graphs (Acan et al. 2022), our data structure also supports distance and shortest-path queries, as well as adjacency and neighborhood queries, all in optimal time. We present in particular the first succinct exact distance oracle for (circular) permutation graphs. A second succinct data structure also supports degree queries in time independent of the neighborhood's size at the expense of an $O(\log n/\log \log n)$-factor overhead in all running times. Furthermore, we develop a succinct data structure for the class of bipartite permutation graphs. We demonstrate how to run algorithms directly over our succinct representations for several problems on permutation graphs: Clique, Coloring, Independent Set, Hamiltonian Cycle, All-Pair Shortest Paths, and others. Finally, we initiate the study of semi-distributed graph representations; a concept that smoothly interpolates between distributed (labeling schemes) and centralized (standard data structures). We show how to turn some of our data structures into semi-distributed representations by storing only $O(n)$ bits of additional global information, circumventing the lower bound on distance labeling schemes for permutation graphs. △ Less

Submitted 24 September, 2022; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: updated to match final Algorithmica version

arXiv:2005.07644 [pdf, other]

doi 10.4230/LIPIcs.ISAAC.2020.57

Distance Oracles for Interval Graphs via Breadth-First Rank/Select in Succinct Trees

Authors: Meng He, J. Ian Munro, Yakov Nekrich, Sebastian Wild, Kaiyu Wu

Abstract: We present the first succinct distance oracles for (unweighted) interval graphs and related classes of graphs, using a novel succinct data structure for ordinal trees that supports the map** between preorder (i.e., depth-first) ranks and level-order (breadth-first) ranks of nodes in constant time. Our distance oracles for interval graphs also support navigation queries -- testing adjacency, comp… ▽ More We present the first succinct distance oracles for (unweighted) interval graphs and related classes of graphs, using a novel succinct data structure for ordinal trees that supports the map** between preorder (i.e., depth-first) ranks and level-order (breadth-first) ranks of nodes in constant time. Our distance oracles for interval graphs also support navigation queries -- testing adjacency, computing node degrees, neighborhoods, and shortest paths -- all in optimal time. Our technique also yields optimal distance oracles for proper interval graphs (unit-interval graphs) and circular-arc graphs. Our tree data structure supports all operations provided by different approaches in previous work, as well as map** to and from level-order ranks and retrieving the last (first) internal node before (after) a given node in a level-order traversal, all in constant time. △ Less

Submitted 30 September, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

Comments: to appear in ISAAC 2020

arXiv:1909.00311 [pdf, other]

doi 10.1145/3295500.3356202

Scalable Reinforcement-Learning-Based Neural Architecture Search for Cancer Deep Learning Research

Authors: Prasanna Balaprakash, Romain Egele, Misha Salim, Stefan Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, Rick Stevens

Abstract: Cancer is a complex disease, the understanding and treatment of which are being aided through increases in the volume of collected data and in the scale of deployed computing power. Consequently, there is a growing need for the development of data-driven and, in particular, deep learning methods for various tasks such as cancer diagnosis, detection, prognosis, and prediction. Despite recent succes… ▽ More Cancer is a complex disease, the understanding and treatment of which are being aided through increases in the volume of collected data and in the scale of deployed computing power. Consequently, there is a growing need for the development of data-driven and, in particular, deep learning methods for various tasks such as cancer diagnosis, detection, prognosis, and prediction. Despite recent successes, however, designing high-performing deep learning models for nonimage and nontext cancer data is a time-consuming, trial-and-error, manual task that requires both cancer domain and deep learning expertise. To that end, we develop a reinforcement-learning-based neural architecture search to automate deep-learning-based predictive model development for a class of representative cancer data. We develop custom building blocks that allow domain experts to incorporate the cancer-data-specific characteristics. We show that our approach discovers deep neural network architectures that have significantly fewer trainable parameters, shorter training time, and accuracy similar to or higher than those of manually designed architectures. We study and demonstrate the scalability of our approach on up to 1,024 Intel Knights Landing nodes of the Theta supercomputer at the Argonne Leadership Computing Facility. △ Less

Submitted 31 August, 2019; originally announced September 2019.

Comments: SC '19: IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis, November 17--22, 2019, Denver, CO

arXiv:1908.00563 [pdf, ps, other]

Dynamic Optimality Refuted -- For Tournament Heaps

Authors: J. Ian Munro, Richard Peng, Sebastian Wild, Lingyi Zhang

Abstract: We prove a separation between offline and online algorithms for finger-based tournament heaps undergoing key modifications. These heaps are implemented by binary trees with keys stored on leaves, and intermediate nodes tracking the min of their respective subtrees. They represent a natural starting point for studying self-adjusting heaps due to the need to access the root-to-leaf path upon modific… ▽ More We prove a separation between offline and online algorithms for finger-based tournament heaps undergoing key modifications. These heaps are implemented by binary trees with keys stored on leaves, and intermediate nodes tracking the min of their respective subtrees. They represent a natural starting point for studying self-adjusting heaps due to the need to access the root-to-leaf path upon modifications. We combine previous studies on the competitive ratios of unordered binary search trees by [Fredman WADS2011] and on order-by-next request by [Martínez-Roura TCS2000] and [Munro ESA2000] to show that for any number of fingers, tournament heaps cannot handle a sequence of modify-key operations with competitive ratio in $o(\sqrt{\log{n}})$. Critical to this analysis is the characterization of the modifications that a heap can undergo upon an access. There are $\exp(Θ(n \log{n}))$ valid heaps on $n$ keys, but only $\exp(Θ(n))$ binary search trees. We parameterize the modification power through the well-studied concept of fingers: additional pointers the data structure can manipulate arbitrarily. Here we demonstrate that fingers can be significantly more powerful than servers moving on a static tree by showing that access to $k$ fingers allow an offline algorithm to handle any access sequence with amortized cost $O(\log_{k}(n) + 2^{\lg^{*}n})$. △ Less

Submitted 1 August, 2019; originally announced August 2019.

arXiv:1907.11572 [pdf, other]

Sequential Learning of Active Subspaces

Authors: Nathan Wycoff, Mickael Binois, Stefan M. Wild

Abstract: In recent years, active subspace methods (ASMs) have become a popular means of performing subspace sensitivity analysis on black-box functions. Naively applied, however, ASMs require gradient evaluations of the target function. In the event of noisy, expensive, or stochastic simulators, evaluating gradients via finite differencing may be infeasible. In such cases, often a surrogate model is employ… ▽ More In recent years, active subspace methods (ASMs) have become a popular means of performing subspace sensitivity analysis on black-box functions. Naively applied, however, ASMs require gradient evaluations of the target function. In the event of noisy, expensive, or stochastic simulators, evaluating gradients via finite differencing may be infeasible. In such cases, often a surrogate model is employed, on which finite differencing is performed. When the surrogate model is a Gaussian process, we show that the ASM estimator is available in closed form, rendering the finite-difference approximation unnecessary. We use our closed-form solution to develop acquisition functions focused on sequential learning tailored to sensitivity analysis on top of ASMs. We also show that the traditional ASM estimator may be viewed as a method of moments estimator for a certain class of Gaussian processes. We demonstrate how uncertainty on Gaussian process hyperparameters may be propagated to uncertainty on the sensitivity analysis, allowing model-based confidence intervals on the active subspace. Our methodological developments are illustrated on several examples. △ Less

Submitted 20 September, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

arXiv:1905.02149 [pdf, other]

Efficient Second-Order Shape-Constrained Function Fitting

Authors: David Durfee, Yu Gao, Anup B. Rao, Sebastian Wild

Abstract: We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted-$L_{\infty}$ norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algori… ▽ More We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted-$L_{\infty}$ norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error $\varepsilon$ in $O\left(n \log \frac{U}{\varepsilon} \right)$ time, where $U$ captures the range of input values. We also give a simple greedy algorithm that runs in $O(n)$ time for the special case of unweighted $L_{\infty}$ convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming problem. We further show that a generalization of the corresponding problems to directed acyclic graphs (DAGs) is as difficult as linear programming. △ Less

Submitted 28 May, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: accepted for WADS 2019; (v2 fixes various typos)

arXiv:1903.02533 [pdf, other]

Entropy Trees and Range-Minimum Queries In Optimal Average-Case Space

Authors: J. Ian Munro, Sebastian Wild

Abstract: The range-minimum query (RMQ) problem is a fundamental data structuring task with numerous applications. Despite the fact that succinct solutions with worst-case optimal $2n+o(n)$ bits of space and constant query time are known, it has been unknown whether such a data structure can be made adaptive to the reduced entropy of random inputs (Davoodi et al. 2014). We construct a succinct data structur… ▽ More The range-minimum query (RMQ) problem is a fundamental data structuring task with numerous applications. Despite the fact that succinct solutions with worst-case optimal $2n+o(n)$ bits of space and constant query time are known, it has been unknown whether such a data structure can be made adaptive to the reduced entropy of random inputs (Davoodi et al. 2014). We construct a succinct data structure with the optimal $1.736n+o(n)$ bits of space on average for random RMQ instances, settling this open problem. Our solution relies on a compressed data structure for binary trees that is of independent interest. It can store a (static) binary search tree generated by random insertions in asymptotically optimal expected space and supports many queries in constant time. Using an instance-optimal encoding of subtrees, we furthermore obtain a "hyper-succinct" data structure for binary trees that improves upon the ultra-succinct representation of Jansson, Sadakane and Sung (2012). △ Less

Submitted 6 March, 2019; originally announced March 2019.

arXiv:1811.01259 [pdf, other]

QuickXsort - A Fast Sorting Scheme in Theory and Practice

Authors: Stefan Edelkamp, Armin Weiß, Sebastian Wild

Abstract: QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of Qu… ▽ More QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by $o(n)$-terms. For median-of-$k$ pivot selection for some constant $k$, the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most $n \lg n - 0.8358n + O(\log n)$ comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to $n \lg n- 1.4106n + o(n)$ for a remaining gap of only $0.0321n$ comparisons to the known lower bound (while using only $O(\log n)$ additional space and $O(n \log n)$ time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser's Introsort. △ Less

Submitted 3 November, 2018; originally announced November 2018.

arXiv:1810.12322 [pdf, other]

doi 10.1137/1.9781611975505.6

Sesquickselect: One and a half pivots for cache-efficient selection

Authors: Conrado Martínez, Markus Nebel, Sebastian Wild

Abstract: Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We… ▽ More Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We compute the expected number of comparisons and scanned elements (approximating memory transfers) for a generic class of (non-adaptive) multiway Quickselect and show that three or more pivots are not helpful, but two pivots are. Moreover, we consider "adaptive" variants which choose partitioning and pivot-selection methods in each recursive step from a finite set of alternatives depending on the current (relative) sought rank. We show that "Sesquickselect", a new Quickselect variant that uses either one or two pivots, makes better use of small samples w.r.t. memory transfers than other Quickselect variants. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: appears in ANALCO 2019

arXiv:1805.04154 [pdf, other]

doi 10.4230/lipics.esa.2018.63

Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs

Authors: J. Ian Munro, Sebastian Wild

Abstract: We present two stable mergesort variants, "peeksort" and "powersort", that exploit existing runs and find nearly-optimal merging orders with practically negligible overhead. Previous methods either require substantial effort for determining the merging order (Takaoka 2009; Barbay & Navarro 2013) or do not have a constant-factor optimal worst-case guarantee (Peters 2001; Auger, Nicaud & Pivoteau 20… ▽ More We present two stable mergesort variants, "peeksort" and "powersort", that exploit existing runs and find nearly-optimal merging orders with practically negligible overhead. Previous methods either require substantial effort for determining the merging order (Takaoka 2009; Barbay & Navarro 2013) or do not have a constant-factor optimal worst-case guarantee (Peters 2001; Auger, Nicaud & Pivoteau 2015; Buss & Knop 2018). We demonstrate that our methods are competitive in terms of running time with state-of-the-art implementations of stable sorting methods. △ Less

Submitted 10 May, 2018; originally announced May 2018.

arXiv:1803.05948 [pdf, other]

doi 10.4230/lipics.aofa.2018.36

Average Cost of QuickXsort with Pivot Sampling

Authors: Sebastian Wild

Abstract: QuickXsort is a strategy to combine Quicksort with another sorting method X, so that the result has essentially the same comparison cost as X in isolation, but sorts in place even when X requires a linear-size buffer. We solve the recurrence for QuickXsort precisely up to the linear term including the optimization to choose pivots from a sample of k elements. This allows to immediately obtain over… ▽ More QuickXsort is a strategy to combine Quicksort with another sorting method X, so that the result has essentially the same comparison cost as X in isolation, but sorts in place even when X requires a linear-size buffer. We solve the recurrence for QuickXsort precisely up to the linear term including the optimization to choose pivots from a sample of k elements. This allows to immediately obtain overall average costs using only the average costs of sorting method X (as if run in isolation). We thereby extend and greatly simplify the analysis of QuickHeapsort and QuickMergesort with practically efficient pivot selection, and give the first tight upper bounds including the linear term for such methods. △ Less

Submitted 17 May, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

Comments: updated to final version accepted for AofA 2018

arXiv:1610.02606 [pdf, other]

Doing Moore with Less -- Leapfrogging Moore's Law with Inexactness for Supercomputing

Authors: Sven Leyffer, Stefan M. Wild, Mike Fagan, Marc Snir, Krishna Palem, Kazutomo Yoshii, Hal Finkel

Abstract: Energy and power consumption are major limitations to continued scaling of computing systems. Inexactness, where the quality of the solution can be traded for energy savings, has been proposed as an approach to overcoming those limitations. In the past, however, inexactness necessitated the need for highly customized or specialized hardware. The current evolution of commercial off-the-shelf(COTS)… ▽ More Energy and power consumption are major limitations to continued scaling of computing systems. Inexactness, where the quality of the solution can be traded for energy savings, has been proposed as an approach to overcoming those limitations. In the past, however, inexactness necessitated the need for highly customized or specialized hardware. The current evolution of commercial off-the-shelf(COTS) processors facilitates the use of lower-precision arithmetic in ways that reduce energy consumption. We study these new opportunities in this paper, using the example of an inexact Newton algorithm for solving nonlinear equations. Moreover, we have begun develo** a set of techniques we call reinvestment that, paradoxically, use reduced precision to improve the quality of the computed result: They do so by reinvesting the energy saved by reduced precision. △ Less

Submitted 12 October, 2016; v1 submitted 8 October, 2016; originally announced October 2016.

Comments: 9 pages, 12 figures, PDFLaTeX. 12 Oct 2016: Corrected author Hal Finkel's affiliation to show ALCF/Argonne

ACM Class: F.2.1; G.1.5

arXiv:1609.08513 [pdf, other]

doi 10.1137/1.9781611975505.8

Median-of-k Jumplists and Dangling-Min BSTs

Authors: Markus E. Nebel, Elisabeth Neumann, Sebastian Wild

Abstract: We extend randomized jumplists introduced by Brönnimann et al. (STACS 2003) to choose jump-pointer targets as median of a small sample for better search costs, and present randomized algorithms with expected $O(\log n)$ time complexity that maintain the probability distribution of jump pointers upon insertions and deletions. We analyze the expected costs to search, insert and delete a random eleme… ▽ More We extend randomized jumplists introduced by Brönnimann et al. (STACS 2003) to choose jump-pointer targets as median of a small sample for better search costs, and present randomized algorithms with expected $O(\log n)$ time complexity that maintain the probability distribution of jump pointers upon insertions and deletions. We analyze the expected costs to search, insert and delete a random element, and we show that omitting jump pointers in small sublists hardly affects search costs, but significantly reduces the memory consumption. We use a bijection between jumplists and "dangling-min BSTs", a variant of (fringe-balanced) binary search trees for the analysis. Despite their similarities, some standard analysis techniques for search trees fail for dangling-min trees (and hence for jumplists). △ Less

Submitted 30 October, 2018; v1 submitted 27 September, 2016; originally announced September 2016.

Comments: appears in ANALCO 2019

ACM Class: E.1

arXiv:1608.04906 [pdf, other]

doi 10.1137/1.9781611975062.2

Quicksort Is Optimal For Many Equal Keys

Authors: Sebastian Wild

Abstract: I prove that the average number of comparisons for median-of-$k$ Quicksort (with fat-pivot a.k.a. three-way partitioning) is asymptotically only a constant $α_k$ times worse than the lower bound for sorting random multisets with $Ω(n^\varepsilon)$ duplicates of each value (for any $\varepsilon>0$). The constant is $α_k = \ln(2) / \bigl(H_{k+1}-H_{(k+1)/2} \bigr)$, which converges to 1 as… ▽ More I prove that the average number of comparisons for median-of-$k$ Quicksort (with fat-pivot a.k.a. three-way partitioning) is asymptotically only a constant $α_k$ times worse than the lower bound for sorting random multisets with $Ω(n^\varepsilon)$ duplicates of each value (for any $\varepsilon>0$). The constant is $α_k = \ln(2) / \bigl(H_{k+1}-H_{(k+1)/2} \bigr)$, which converges to 1 as $k\to\infty$, so Quicksort is asymptotically optimal for inputs with many duplicates. This resolves a conjecture by Sedgewick and Bentley (1999, 2002) and constitutes the first progress on the analysis of Quicksort with equal elements since Sedgewick's 1977 article. △ Less

Submitted 1 November, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

Comments: v4 is a major reorganization of sections; a shortened version appears in the proceedings of ANALCO 2018

arXiv:1511.01138 [pdf, ps, other]

Why Is Dual-Pivot Quicksort Fast?

Authors: Sebastian Wild

Abstract: I discuss the new dual-pivot Quicksort that is nowadays used to sort arrays of primitive types in Java. I sketch theoretical analyses of this algorithm that offer a possible, and in my opinion plausible, explanation why (a) dual-pivot Quicksort is faster than the previously used (classic) Quicksort and (b) why this improvement was not already found much earlier. I discuss the new dual-pivot Quicksort that is nowadays used to sort arrays of primitive types in Java. I sketch theoretical analyses of this algorithm that offer a possible, and in my opinion plausible, explanation why (a) dual-pivot Quicksort is faster than the previously used (classic) Quicksort and (b) why this improvement was not already found much earlier. △ Less

Submitted 28 September, 2016; v1 submitted 3 November, 2015; originally announced November 2015.

Comments: extended abstract for Theorietage 2015 (https://www.uni-trier.de/index.php?id=55089) (v2 fixes a small bug in the pseudocode)

arXiv:1504.06475 [pdf, other]

A Practical and Worst-Case Efficient Algorithm for Divisor Methods of Apportionment

Authors: Raphael Reitzig, Sebastian Wild

Abstract: Proportional apportionment is the problem of assigning seats to parties according to their relative share of votes. Divisor methods are the de-facto standard solution, used in many countries. In recent literature, there are two algorithms that implement divisor methods: one by Cheng and Eppstein (ISAAC, 2014) has worst-case optimal running time but is complex, while the other (Pukelsheim, 2014)… ▽ More Proportional apportionment is the problem of assigning seats to parties according to their relative share of votes. Divisor methods are the de-facto standard solution, used in many countries. In recent literature, there are two algorithms that implement divisor methods: one by Cheng and Eppstein (ISAAC, 2014) has worst-case optimal running time but is complex, while the other (Pukelsheim, 2014) is relatively simple and fast in practice but does not offer worst-case guarantees. We demonstrate that the former algorithm is much slower than the other in practice and propose a novel algorithm that avoids the shortcomings of both. We investigate the running-time behavior of the three contenders in order to determine which is most useful in practice. △ Less

Submitted 5 December, 2017; v1 submitted 24 April, 2015; originally announced April 2015.

Comments: (v4 adds missing figures in v3)

arXiv:1502.04048 [pdf, other]

doi 10.1007/s00453-017-0392-3

Efficient Algorithms for Envy-Free Stick Division With Fewest Cuts

Authors: Raphael Reitzig, Sebastian Wild

Abstract: Given a set of n sticks of various (not necessarily different) lengths, what is the largest length so that we can cut k equally long pieces of this length from the given set of sticks? We analyze the structure of this problem and show that it essentially reduces to a single call of a selection algorithm; we thus obtain an optimal linear-time algorithm. This algorithm also solves the related envy… ▽ More Given a set of n sticks of various (not necessarily different) lengths, what is the largest length so that we can cut k equally long pieces of this length from the given set of sticks? We analyze the structure of this problem and show that it essentially reduces to a single call of a selection algorithm; we thus obtain an optimal linear-time algorithm. This algorithm also solves the related envy-free stick-division problem, which Segal-Halevi, Hassidim, and Aumann (AAMAS, 2015) recently used as their central primitive operation for the first discrete and bounded envy-free cake cutting protocol with a proportionality guarantee when pieces can be put to waste. △ Less

Submitted 3 November, 2017; v1 submitted 13 February, 2015; originally announced February 2015.

Comments: v3 adds more context about the problem

Journal ref: Reitzig, R. & Wild, S. Algorithmica (2017). https://doi.org/10.1007/s00453-017-0392-3

arXiv:1412.0193 [pdf, other]

doi 10.1007/s00453-015-0041-7

Analysis of Pivot Sampling in Dual-Pivot Quicksort

Authors: Sebastian Wild, Markus E. Nebel, Conrado Martínez

Abstract: The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot… ▽ More The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: Compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling. △ Less

Submitted 10 August, 2015; v1 submitted 30 November, 2014; originally announced December 2014.

Comments: This article is identical (up to typograhical details) to the Algorithmica version available from Springerlink (see DOI). It is an extended and improved version of our corresponding article at the AofA 2014 conference [arXiv:1403.6602]

arXiv:1411.2059 [pdf, other]

doi 10.1137/1.9781611973761.11

Analysis of Branch Misses in Quicksort

Authors: Conrado Martínez, Markus E. Nebel, Sebastian Wild

Abstract: The analysis of algorithms mostly relies on counting classic elementary operations like additions, multiplications, comparisons, swaps etc. This approach is often sufficient to quantify an algorithm's efficiency. In some cases, however, features of modern processor architectures like pipelined execution and memory hierarchies have significant impact on running time and need to be taken into accoun… ▽ More The analysis of algorithms mostly relies on counting classic elementary operations like additions, multiplications, comparisons, swaps etc. This approach is often sufficient to quantify an algorithm's efficiency. In some cases, however, features of modern processor architectures like pipelined execution and memory hierarchies have significant impact on running time and need to be taken into account to get a reliable picture. One such example is Quicksort: It has been demonstrated experimentally that under certain conditions on the hardware the classically optimal balanced choice of the pivot as median of a sample gets harmful. The reason lies in mispredicted branches whose rollback costs become dominating. In this paper, we give the first precise analytical investigation of the influence of pipelining and the resulting branch mispredictions on the efficiency of (classic) Quicksort and Yaroslavskiy's dual-pivot Quicksort as implemented in Oracle's Java 7 library. For the latter it is still not fully understood why experiments prove it 10% faster than a highly engineered implementation of a classic single-pivot version. For different branch prediction strategies, we give precise asymptotics for the expected number of branch misses caused by the aforementioned Quicksort variants when their pivots are chosen from a sample of the input. We conclude that the difference in branch misses is too small to explain the superiority of the dual-pivot algorithm. △ Less

Submitted 7 November, 2014; originally announced November 2014.

Comments: to be presented at ANALCO 2015

arXiv:1403.6602 [pdf, other]

Pivot Sampling in Dual-Pivot Quicksort

Authors: Markus E. Nebel, Sebastian Wild

Abstract: The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries in its behavior. They were shown to cause a basic variant of this algorithm to use less comparisons than classic single-pivot Quicksort implementations. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statis… ▽ More The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries in its behavior. They were shown to cause a basic variant of this algorithm to use less comparisons than classic single-pivot Quicksort implementations. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample and give the precise leading term of the average number of comparisons, swaps and executed Java Bytecode instructions. It turns out that - unlike for classic Quicksort, where it is optimal to choose the pivot as median of the sample - the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, the optimal skew heavily depends on the employed cost measure; most strikingly, abstract costs like the number of swaps and comparisons yield a very different result than counting Java Bytecode instructions, which can be assumed most closely related to actual running time. △ Less

Submitted 13 June, 2014; v1 submitted 26 March, 2014; originally announced March 2014.

Comments: presented at AofA 2014 (http://www.aofa14.upmc.fr/)

arXiv:1310.7409 [pdf, other]

doi 10.1007/978-3-642-33090-2_71

Average Case Analysis of Java 7's Dual Pivot Quicksort

Authors: Sebastian Wild, Markus E. Nebel

Abstract: Recently, a new Quicksort variant due to Yaroslavskiy was chosen as standard sorting method for Oracle's Java 7 runtime library. The decision for the change was based on empirical studies showing that on average, the new algorithm is faster than the formerly used classic Quicksort. Surprisingly, the improvement was achieved by using a dual pivot approach, an idea that was considered not promising… ▽ More Recently, a new Quicksort variant due to Yaroslavskiy was chosen as standard sorting method for Oracle's Java 7 runtime library. The decision for the change was based on empirical studies showing that on average, the new algorithm is faster than the formerly used classic Quicksort. Surprisingly, the improvement was achieved by using a dual pivot approach, an idea that was considered not promising by several theoretical studies in the past. In this paper, we identify the reason for this unexpected success. Moreover, we present the first precise average case analysis of the new algorithm showing e.g. that a random permutation of length $n$ is sorted using $1.9n\ln n-2.46n+\mathcal{O}(\ln n)$ key comparisons and $0.6n\ln n+0.08n+\mathcal{O}(\ln n)$ swaps. △ Less

Submitted 28 October, 2013; originally announced October 2013.

Comments: Best paper award at ESA 2012, recorded talk: http://www.slideshare.net/sebawild/average-case-analysis-of-java-7s-dual-pivot-quicksort

Journal ref: In L. Epstein & P. Ferragina (Eds.), ESA 2012 (LNCS 7501, pp. 825-836). Springer Berlin/Heidelberg

arXiv:1306.3819 [pdf, ps, other]

doi 10.1007/s00453-014-9953-x

Analysis of Quickselect under Yaroslavskiy's Dual-Pivoting Algorithm

Authors: Sebastian Wild, Markus E. Nebel, Hosam Mahmoud

Abstract: There is excitement within the algorithms community about a new partitioning method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly faster than the case when it runs under classic partitioning methods. We show that this improved performance in Quicksort is not sustained in Quickselect; a variant of Quicksort for finding order statistics. We investigate the number of compariso… ▽ More There is excitement within the algorithms community about a new partitioning method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly faster than the case when it runs under classic partitioning methods. We show that this improved performance in Quicksort is not sustained in Quickselect; a variant of Quicksort for finding order statistics. We investigate the number of comparisons made by Quickselect to find a key with a randomly selected rank under Yaroslavskiy's algorithm. This grand averaging is a smoothing operator over all individual distributions for specific fixed order statistics. We give the exact grand average. The grand distribution of the number of comparison (when suitably scaled) is given as the fixed-point solution of a distributional equation of a contraction in the Zolotarev metric space. Our investigation shows that Quickselect under older partitioning methods slightly outperforms Quickselect under Yaroslavskiy's algorithm, for an order statistic of a random rank. Similar results are obtained for extremal order statistics, where again we find the exact average, and the distribution for the number of comparisons (when suitably scaled). Both limiting distributions are of perpetuities (a sum of products of independent mixed continuous random variables). △ Less

Submitted 15 November, 2014; v1 submitted 17 June, 2013; originally announced June 2013.

Comments: full version with appendices; otherwise identical to Algorithmica version

MSC Class: 60C05 (Primary); 68P10 (Secondary); 68P20

arXiv:1304.0988 [pdf, other]

doi 10.1145/2629340

Average Case and Distributional Analysis of Dual-Pivot Quicksort

Authors: Sebastian Wild, Markus E. Nebel, Ralph Neininger

Abstract: In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime library by a new dual-pivot Quicksort variant due to Vladimir Yaroslavskiy. The decision was based on the strikingly good performance of Yaroslavskiy's implementation in running time experiments. At that time, no precise investigations of the algorithm were available to explain its superior performance - on the contr… ▽ More In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime library by a new dual-pivot Quicksort variant due to Vladimir Yaroslavskiy. The decision was based on the strikingly good performance of Yaroslavskiy's implementation in running time experiments. At that time, no precise investigations of the algorithm were available to explain its superior performance - on the contrary: Previous theoretical studies of other dual-pivot Quicksort variants even discouraged the use of two pivots. Only in 2012, two of the authors gave an average case analysis of a simplified version of Yaroslavskiy's algorithm, proving that savings in the number of comparisons are possible. However, Yaroslavskiy's algorithm needs more swaps, which renders the analysis inconclusive. To force the issue, we herein extend our analysis to the fully detailed style of Knuth: We determine the exact number of executed Java Bytecode instructions. Surprisingly, Yaroslavskiy's algorithm needs sightly more Bytecode instructions than a simple implementation of classic Quicksort - contradicting observed running times. Like in Oracle's library implementation we incorporate the use of Insertionsort on small subproblems and show that it indeed speeds up Yaroslavskiy's Quicksort in terms of Bytecodes; but even with optimal Insertionsort thresholds the new Quicksort variant needs slightly more Bytecode instructions on average. Finally, we show that the (suitably normalized) costs of Yaroslavskiy's algorithm converge to a random variable whose distribution is characterized by a fixed-point equation. From that, we compute variances of costs and show that for large n, costs are concentrated around their mean. △ Less

Submitted 13 February, 2015; v1 submitted 3 April, 2013; originally announced April 2013.

Comments: v3 is content-wise identical to TALG version

ACM Class: F.2.2; G.2.1; G.3; F.2.3; D.3.2

Journal ref: ACM Transactions on Algorithms 11, 3, Article 22 (Jan 2015)

Showing 1–50 of 51 results for author: Wild, S