Skip to main content

Showing 1–50 of 51 results for author: Wild, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.03709  [pdf, other

    cs.DC

    Portable, heterogeneous ensemble workflows at scale using libEnsemble

    Authors: Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild

    Abstract: libEnsemble is a Python-based toolkit for running dynamic ensembles, developed as part of the DOE Exascale Computing Project. The toolkit utilizes a unique generator-simulator-allocator paradigm, where generators produce input for simulators, simulators evaluate those inputs, and allocators decide whether and when a simulator or generator should be called. The generator steers the ensemble based o… ▽ More

    Submitted 7 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  2. arXiv:2403.00465  [pdf, other

    cs.DS cs.SI math.OC

    Polyamorous Scheduling

    Authors: Leszek Gąsieniec, Benjamin Smith, Sebastian Wild

    Abstract: Finding schedules for pairwise meetings between the members of a complex social group without creating interpersonal conflict is challenging, especially when different relationships have different needs. We formally define and study the underlying optimisation problem: Polyamorous Scheduling. In Polyamorous Scheduling, we are given an edge-weighted graph and try to find a periodic schedule of ma… ▽ More

    Submitted 26 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: v2: stronger and simplified hardness-of-approximation results, corrected constant in layering approximation algorithm

  3. arXiv:2402.17631  [pdf, other

    cs.DS

    Deterministic Cache-Oblivious Funnelselect

    Authors: Gerth Stølting Brodal, Sebastian Wild

    Abstract: In the multiple-selection problem one is given an unsorted array $S$ of $N$ elements and an array of $q$ query ranks $r_1<\cdots<r_q$, and the task is to return, in sorted order, the $q$ elements in $S$ of rank $r_1, \ldots, r_q$, respectively. The asymptotic deterministic comparison complexity of the problem was settled by Dobkin and Munro [JACM 1981]. In the I/O model an optimal I/O complexity w… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  4. arXiv:2401.16623  [pdf, other

    cs.DS cs.IT

    Towards Optimal Grammars for RNA Structures

    Authors: Evarista Onokpasa, Sebastian Wild, Prudence W. H. Wong

    Abstract: In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framewor… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: to be presented at DCC 2024

  5. arXiv:2401.06512  [pdf, other

    cs.CC cs.DS math.CO

    An Optimal Randomized Algorithm for Finding the Saddlepoint

    Authors: Justin Dallant, Frederik Haagensen, Riko Jacob, László Kozma, Sebastian Wild

    Abstract: A \emph{saddlepoint} of an $n \times n$ matrix is an entry that is the maximum of its row and the minimum of its column. Saddlepoints give the \emph{value} of a two-player zero-sum game, corresponding to its pure-strategy Nash equilibria; efficiently finding a saddlepoint is thus a natural and fundamental algorithmic task. For finding a \emph{strict saddlepoint} (an entry that is the strict maxi… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: 12 pages

    ACM Class: F.2.0

  6. arXiv:2310.16801  [pdf, other

    cs.DS math.CO

    Finding the saddlepoint faster than sorting

    Authors: Justin Dallant, Frederik Haagensen, Riko Jacob, László Kozma, Sebastian Wild

    Abstract: A saddlepoint of an $n \times n$ matrix $A$ is an entry of $A$ that is a maximum in its row and a minimum in its column. Knuth (1968) gave several different algorithms for finding a saddlepoint. The worst-case running time of these algorithms is $Θ(n^2)$, and Llewellyn, Tovey, and Trick (1988) showed that this cannot be improved, as in the worst case all entries of A may need to be queried. A st… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: To be presented at SOSA 2024

  7. arXiv:2304.07445  [pdf, other

    cs.LG

    A framework for fully autonomous design of materials via multiobjective optimization and active learning: challenges and next steps

    Authors: Tyler H. Chang, Jakob R. Elias, Stefan M. Wild, Santanu Chaudhuri, Joseph A. Libera

    Abstract: In order to deploy machine learning in a real-world self-driving laboratory where data acquisition is costly and there are multiple competing design criteria, systems need to be able to intelligently sample while balancing performance trade-offs and constraints. For these reasons, we present an active learning process based on multiobjective black-box optimization with continuously updated machine… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  8. arXiv:2304.06881  [pdf, other

    math.OC cs.MS

    Designing a Framework for Solving Multiobjective Simulation Optimization Problems

    Authors: Tyler H. Chang, Stefan M. Wild

    Abstract: Multiobjective simulation optimization (MOSO) problems are optimization problems with multiple conflicting objectives, where evaluation of at least one of the objectives depends on a black-box numerical code or real-world experiment, which we refer to as a simulation. This paper describes the design goals driving the development of the parallel MOSO library ParMOO. We derive these goals from the r… ▽ More

    Submitted 6 July, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  9. arXiv:2302.11669  [pdf, other

    q-bio.BM cs.IT

    RNA secondary structures: from ab initio prediction to better compression, and back

    Authors: Evarista Onokpasa, Sebastian Wild, Prudence W. H. Wong

    Abstract: In this paper, we use the biological domain knowledge incorporated into stochastic models for ab initio RNA secondary-structure prediction to improve the state of the art in joint compression of RNA sequence and structure data (Liu et al., BMC Bioinformatics, 2008). Moreover, we show that, conversely, compression ratio can serve as a cheap and robust proxy for comparing the prediction quality of d… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: paper at Data Compression Conference 2023

  10. arXiv:2302.02005  [pdf, other

    astro-ph.GA cs.AI cs.CV

    DeepAstroUDA: Semi-Supervised Universal Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection

    Authors: A. Ćiprijanović, A. Lewis, K. Pedro, S. Madireddy, B. Nord, G. N. Perdue, S. M. Wild

    Abstract: Artificial intelligence methods show great promise in increasing the quality and speed of work with large astronomical datasets, but the high complexity of these methods leads to the extraction of dataset-specific, non-robust features. Therefore, such methods do not generalize well across multiple datasets. We present a universal domain adaptation method, \textit{DeepAstroUDA}, as an approach to o… ▽ More

    Submitted 22 March, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Accepted in Machine Learning Science and Technology (MLST); 24 pages, 14 figures

    Report number: FERMILAB-PUB-23-034-CSAID

  11. Numerical evidence against advantage with quantum fidelity kernels on classical data

    Authors: Lucas Slattery, Ruslan Shaydulin, Shouvanik Chakrabarti, Marco Pistoia, Sami Khairy, Stefan M. Wild

    Abstract: Quantum machine learning techniques are commonly considered one of the most promising candidates for demonstrating practical quantum advantage. In particular, quantum kernel methods have been demonstrated to be able to learn certain classically intractable functions efficiently if the kernel is well-aligned with the target function. In the more general case, quantum kernels are known to suffer fro… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Journal ref: Phys. Rev. A 107, 062417 (2023)

  12. arXiv:2211.00677  [pdf, other

    astro-ph.GA cs.AI cs.CV cs.LG

    Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection

    Authors: Aleksandra Ćiprijanović, Ashia Lewis, Kevin Pedro, Sandeep Madireddy, Brian Nord, Gabriel N. Perdue, Stefan M. Wild

    Abstract: In the era of big astronomical surveys, our ability to leverage artificial intelligence algorithms simultaneously for multiple datasets will open new avenues for scientific discovery. Unfortunately, simply training a deep neural network on images from one data domain often leads to very poor performance on any other dataset. Here we develop a Universal Domain Adaptation method DeepAstroUDA, capabl… ▽ More

    Submitted 11 November, 2022; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: 3 figures, 1 table; accepted to Machine Learning and the Physical Sciences - Workshop at the 36th conference on Neural Information Processing Systems (NeurIPS)

    Report number: FERMILAB-CONF-22-791-SCD

  13. Multiway Powersort

    Authors: William Cawley Gelling, Markus E. Nebel, Benjamin Smith, Sebastian Wild

    Abstract: We present a stable mergesort variant, Multiway Powersort, that exploits existing runs and finds nearly-optimal merging orders for k-way merges with negligible overhead. This builds on Powersort (Munro & Wild, ESA2018), which has recently replaced Timsort's suboptimal merge policy in the CPython reference implementation of Python, as well as in PyPy and further libraries. Multiway Powersort reduce… ▽ More

    Submitted 16 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: 17 pages; accompanying source code at https://github.com/sebawild/powersort; v2 adds new figure and text changes. v2 is identical to the ALENEX 2023 version

    Journal ref: ALENEX 2023

  14. arXiv:2206.06686  [pdf, other

    quant-ph cs.LG

    Bandwidth Enables Generalization in Quantum Kernel Models

    Authors: Abdulkadir Canatar, Evan Peters, Cengiz Pehlevan, Stefan M. Wild, Ruslan Shaydulin

    Abstract: Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical int… ▽ More

    Submitted 18 June, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted version

  15. arXiv:2205.01066  [pdf, other

    cs.CY cs.AI

    Quantifying Health Inequalities Induced by Data and AI Models

    Authors: Honghan Wu, Minhong Wang, Aneeta Sylolypavan, Sarah Wild

    Abstract: AI technologies are being increasingly tested and applied in critical environments including healthcare. Without an effective way to detect and mitigate AI induced inequalities, AI might do more harm than good, potentially leading to the widening of underlying inequalities. This paper proposes a generic allocation-deterioration framework for detecting and quantifying AI induced inequality. Specifi… ▽ More

    Submitted 3 May, 2022; v1 submitted 24 April, 2022; originally announced May 2022.

    Comments: Accepted by IJCAI-ECAI 2022 AI for Good track

  16. arXiv:2112.14299  [pdf, other

    cs.LG astro-ph.GA cs.AI cs.CV

    DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification

    Authors: Aleksandra Ćiprijanović, Diana Kafkes, Gregory Snyder, F. Javier Sánchez, Gabriel Nathan Perdue, Kevin Pedro, Brian Nord, Sandeep Madireddy, Stefan M. Wild

    Abstract: With increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects (that can naturally occur in the data processing and analysis pipelines) and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the eff… ▽ More

    Submitted 6 July, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

    Comments: 20 pages, 6 figures, 5 tables; accepted in MLST

    Report number: FERMILAB-PUB-21-767-SCD

  17. Importance of Kernel Bandwidth in Quantum Machine Learning

    Authors: Ruslan Shaydulin, Stefan M. Wild

    Abstract: Quantum kernel methods are considered a promising avenue for applying quantum computers to machine learning problems. Identifying hyperparameters controlling the inductive bias of quantum machine learning models is expected to be crucial given the central role hyperparameters play in determining the performance of classical machine learning methods. In this work we introduce the hyperparameter con… ▽ More

    Submitted 28 September, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: Camera-ready version

    Journal ref: Phys. Rev. A 106, 042407 (2022)

  18. arXiv:2111.03639  [pdf, other

    cs.DS cs.CC cs.DM

    Randomized Communication and Implicit Graph Representations

    Authors: Nathaniel Harms, Sebastian Wild, Viktor Zamaraev

    Abstract: We study constant-cost randomized communication problems and relate them to implicit graph representations in structural graph theory. Specifically, constant-cost communication problems correspond to hereditary graph families that admit constant-size adjacency sketches, or equivalently constant-size probabilistic universal graphs (PUGs), and these graph families are a subset of families that admit… ▽ More

    Submitted 18 July, 2023; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: 72 pages, 10 figures. Abstract shortened for arXiv

  19. arXiv:2111.01784  [pdf, other

    cs.DS

    Towards the 5/6-Density Conjecture of Pinwheel Scheduling

    Authors: Leszek Gąsieniec, Benjamin Smith, Sebastian Wild

    Abstract: Pinwheel Scheduling aims to find a perpetual schedule for unit-length tasks on a single machine subject to given maximal time spans (a.k.a. frequencies) between any two consecutive executions of the same task. The density of a Pinwheel Scheduling instance is the sum of the inverses of these task frequencies; the 5/6-Conjecture (Chan and Chin, 1993) states that any Pinwheel Scheduling instance with… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: Accepted at ALENEX 2022

  20. arXiv:2111.00961  [pdf, other

    astro-ph.GA cs.CV cs.LG

    Robustness of deep learning algorithms in astronomy -- galaxy morphology studies

    Authors: A. Ćiprijanović, D. Kafkes, G. N. Perdue, K. Pedro, G. Snyder, F. J. Sánchez, S. Madireddy, S. M. Wild, B. Nord

    Abstract: Deep learning models are being increasingly adopted in wide array of scientific domains, especially to handle high-dimensionality and volume of the scientific data. However, these models tend to be brittle due to their complexity and overparametrization, especially to the inadvertent adversarial perturbations that can appear due to common image processing such as compression or blurring that are o… ▽ More

    Submitted 2 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted in: Fourth Workshop on Machine Learning and the Physical Sciences (35th Conference on Neural Information Processing Systems; NeurIPS2021); final version

    Report number: FERMILAB-CONF-21-561-SCD

  21. arXiv:2109.12213  [pdf, other

    math.OC cs.AI stat.ML

    Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization

    Authors: Raghu Bollapragada, Stefan M. Wild

    Abstract: We consider unconstrained stochastic optimization problems with no available gradient information. Such problems arise in settings from derivative-free simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a stochastic function using finite differences within a common random number framework. We develop modified ve… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  22. arXiv:2105.04965  [pdf, other

    cs.DS

    Succinct Euler-Tour Trees

    Authors: Travis Gagie, Sebastian Wild

    Abstract: We show how a collection of Euler-tour trees for a forest on $n$ vertices can be stored in $2 n + o (n)$ bits such that simple queries take constant time, more complex queries take logarithmic time and updates take polylogarithmic amortized time.

    Submitted 29 June, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

  23. Hypersuccinct Trees -- New universal tree source codes for optimal compressed tree data structures and range minima

    Authors: J. Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner, Sebastian Wild

    Abstract: We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tr… ▽ More

    Submitted 3 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: part of ESA 2021

  24. arXiv:2104.11079  [pdf, other

    cs.AI cs.CE

    Randomized Algorithms for Scientific Computing (RASC)

    Authors: Aydin Buluc, Tamara G. Kolda, Stefan M. Wild, Mihai Anitescu, Anthony DeGennaro, John Jakeman, Chandrika Kamath, Ramakrishnan Kannan, Miles E. Lopes, Per-Gunnar Martinsson, Kary Myers, Jelani Nelson, Juan M. Restrepo, C. Seshadhri, Draguna Vrabie, Brendt Wohlberg, Stephen J. Wright, Chao Yang, Peter Zwart

    Abstract: Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and sc… ▽ More

    Submitted 21 March, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

  25. libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations

    Authors: Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild

    Abstract: Almost all applications stop scaling at some point; those that don't are seldom performant when considering time to solution on anything but aspirational/unicorn resources. Recognizing these tradeoffs as well as greater user functionality in a near-term exascale computing era, we present libEnsemble, a library aimed at particular scalability- and capability-stretching uses. libEnsemble enables run… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  26. arXiv:2103.16041  [pdf, other

    stat.ME cs.LG

    Scalable Statistical Inference of Photometric Redshift via Data Subsampling

    Authors: Arindam Fadikar, Stefan M. Wild, Jonas Chaves-Montero

    Abstract: Handling big data has largely been a major bottleneck in traditional statistical models. Consequently, when accurate point prediction is the primary target, machine learning models are often preferred over their statistical counterparts for bigger problems. But full probabilistic statistical models often outperform other models in quantifying uncertainties associated with model predictions. We dev… ▽ More

    Submitted 1 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

  27. arXiv:2010.08840  [pdf, ps, other

    cs.DS

    Lazy Search Trees

    Authors: Bryce Sandlund, Sebastian Wild

    Abstract: We introduce the lazy search tree data structure. The lazy search tree is a comparison-based data structure on the pointer machine that supports order-based operations such as rank, select, membership, predecessor, successor, minimum, and maximum while providing dynamic operations insert, delete, change-key, split, and merge. We analyze the performance of our data structure based on a partition of… ▽ More

    Submitted 17 October, 2020; originally announced October 2020.

    Comments: Accepted for publication in FOCS 2020

  28. Succinct Permutation Graphs

    Authors: Konstantinos Tsakalidis, Sebastian Wild, Viktor Zamaraev

    Abstract: We present a succinct data structure for permutation graphs, and their superclass of circular permutation graphs, i.e., data structures using optimal space up to lower order terms. Unlike concurrent work on circle graphs (Acan et al. 2022), our data structure also supports distance and shortest-path queries, as well as adjacency and neighborhood queries, all in optimal time. We present in particul… ▽ More

    Submitted 24 September, 2022; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: updated to match final Algorithmica version

  29. Distance Oracles for Interval Graphs via Breadth-First Rank/Select in Succinct Trees

    Authors: Meng He, J. Ian Munro, Yakov Nekrich, Sebastian Wild, Kaiyu Wu

    Abstract: We present the first succinct distance oracles for (unweighted) interval graphs and related classes of graphs, using a novel succinct data structure for ordinal trees that supports the map** between preorder (i.e., depth-first) ranks and level-order (breadth-first) ranks of nodes in constant time. Our distance oracles for interval graphs also support navigation queries -- testing adjacency, comp… ▽ More

    Submitted 30 September, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

    Comments: to appear in ISAAC 2020

  30. Scalable Reinforcement-Learning-Based Neural Architecture Search for Cancer Deep Learning Research

    Authors: Prasanna Balaprakash, Romain Egele, Misha Salim, Stefan Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, Rick Stevens

    Abstract: Cancer is a complex disease, the understanding and treatment of which are being aided through increases in the volume of collected data and in the scale of deployed computing power. Consequently, there is a growing need for the development of data-driven and, in particular, deep learning methods for various tasks such as cancer diagnosis, detection, prognosis, and prediction. Despite recent succes… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

    Comments: SC '19: IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis, November 17--22, 2019, Denver, CO

  31. arXiv:1908.00563  [pdf, ps, other

    cs.DS

    Dynamic Optimality Refuted -- For Tournament Heaps

    Authors: J. Ian Munro, Richard Peng, Sebastian Wild, Lingyi Zhang

    Abstract: We prove a separation between offline and online algorithms for finger-based tournament heaps undergoing key modifications. These heaps are implemented by binary trees with keys stored on leaves, and intermediate nodes tracking the min of their respective subtrees. They represent a natural starting point for studying self-adjusting heaps due to the need to access the root-to-leaf path upon modific… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

  32. arXiv:1907.11572  [pdf, other

    stat.ML cs.LG stat.CO

    Sequential Learning of Active Subspaces

    Authors: Nathan Wycoff, Mickael Binois, Stefan M. Wild

    Abstract: In recent years, active subspace methods (ASMs) have become a popular means of performing subspace sensitivity analysis on black-box functions. Naively applied, however, ASMs require gradient evaluations of the target function. In the event of noisy, expensive, or stochastic simulators, evaluating gradients via finite differencing may be infeasible. In such cases, often a surrogate model is employ… ▽ More

    Submitted 20 September, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

  33. arXiv:1905.02149  [pdf, other

    cs.DS cs.LG

    Efficient Second-Order Shape-Constrained Function Fitting

    Authors: David Durfee, Yu Gao, Anup B. Rao, Sebastian Wild

    Abstract: We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted-$L_{\infty}$ norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algori… ▽ More

    Submitted 28 May, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: accepted for WADS 2019; (v2 fixes various typos)

  34. arXiv:1903.02533  [pdf, other

    cs.DS

    Entropy Trees and Range-Minimum Queries In Optimal Average-Case Space

    Authors: J. Ian Munro, Sebastian Wild

    Abstract: The range-minimum query (RMQ) problem is a fundamental data structuring task with numerous applications. Despite the fact that succinct solutions with worst-case optimal $2n+o(n)$ bits of space and constant query time are known, it has been unknown whether such a data structure can be made adaptive to the reduced entropy of random inputs (Davoodi et al. 2014). We construct a succinct data structur… ▽ More

    Submitted 6 March, 2019; originally announced March 2019.

  35. arXiv:1811.01259  [pdf, other

    cs.DS

    QuickXsort - A Fast Sorting Scheme in Theory and Practice

    Authors: Stefan Edelkamp, Armin Weiß, Sebastian Wild

    Abstract: QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of Qu… ▽ More

    Submitted 3 November, 2018; originally announced November 2018.

  36. Sesquickselect: One and a half pivots for cache-efficient selection

    Authors: Conrado Martínez, Markus Nebel, Sebastian Wild

    Abstract: Because of unmatched improvements in CPU performance, memory transfers have become a bottleneck of program execution. As discovered in recent years, this also affects sorting in internal memory. Since partitioning around several pivots reduces overall memory transfers, we have seen renewed interest in multiway Quicksort. Here, we analyze in how far multiway partitioning helps in Quickselect. We… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Comments: appears in ANALCO 2019

  37. Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs

    Authors: J. Ian Munro, Sebastian Wild

    Abstract: We present two stable mergesort variants, "peeksort" and "powersort", that exploit existing runs and find nearly-optimal merging orders with practically negligible overhead. Previous methods either require substantial effort for determining the merging order (Takaoka 2009; Barbay & Navarro 2013) or do not have a constant-factor optimal worst-case guarantee (Peters 2001; Auger, Nicaud & Pivoteau 20… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

  38. Average Cost of QuickXsort with Pivot Sampling

    Authors: Sebastian Wild

    Abstract: QuickXsort is a strategy to combine Quicksort with another sorting method X, so that the result has essentially the same comparison cost as X in isolation, but sorts in place even when X requires a linear-size buffer. We solve the recurrence for QuickXsort precisely up to the linear term including the optimization to choose pivots from a sample of k elements. This allows to immediately obtain over… ▽ More

    Submitted 17 May, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

    Comments: updated to final version accepted for AofA 2018

  39. arXiv:1610.02606  [pdf, other

    cs.OH

    Doing Moore with Less -- Leapfrogging Moore's Law with Inexactness for Supercomputing

    Authors: Sven Leyffer, Stefan M. Wild, Mike Fagan, Marc Snir, Krishna Palem, Kazutomo Yoshii, Hal Finkel

    Abstract: Energy and power consumption are major limitations to continued scaling of computing systems. Inexactness, where the quality of the solution can be traded for energy savings, has been proposed as an approach to overcoming those limitations. In the past, however, inexactness necessitated the need for highly customized or specialized hardware. The current evolution of commercial off-the-shelf(COTS)… ▽ More

    Submitted 12 October, 2016; v1 submitted 8 October, 2016; originally announced October 2016.

    Comments: 9 pages, 12 figures, PDFLaTeX. 12 Oct 2016: Corrected author Hal Finkel's affiliation to show ALCF/Argonne

    ACM Class: F.2.1; G.1.5

  40. Median-of-k Jumplists and Dangling-Min BSTs

    Authors: Markus E. Nebel, Elisabeth Neumann, Sebastian Wild

    Abstract: We extend randomized jumplists introduced by Brönnimann et al. (STACS 2003) to choose jump-pointer targets as median of a small sample for better search costs, and present randomized algorithms with expected $O(\log n)$ time complexity that maintain the probability distribution of jump pointers upon insertions and deletions. We analyze the expected costs to search, insert and delete a random eleme… ▽ More

    Submitted 30 October, 2018; v1 submitted 27 September, 2016; originally announced September 2016.

    Comments: appears in ANALCO 2019

    ACM Class: E.1

  41. Quicksort Is Optimal For Many Equal Keys

    Authors: Sebastian Wild

    Abstract: I prove that the average number of comparisons for median-of-$k$ Quicksort (with fat-pivot a.k.a. three-way partitioning) is asymptotically only a constant $α_k$ times worse than the lower bound for sorting random multisets with $Ω(n^\varepsilon)$ duplicates of each value (for any $\varepsilon>0$). The constant is $α_k = \ln(2) / \bigl(H_{k+1}-H_{(k+1)/2} \bigr)$, which converges to 1 as… ▽ More

    Submitted 1 November, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

    Comments: v4 is a major reorganization of sections; a shortened version appears in the proceedings of ANALCO 2018

  42. arXiv:1511.01138  [pdf, ps, other

    cs.DS

    Why Is Dual-Pivot Quicksort Fast?

    Authors: Sebastian Wild

    Abstract: I discuss the new dual-pivot Quicksort that is nowadays used to sort arrays of primitive types in Java. I sketch theoretical analyses of this algorithm that offer a possible, and in my opinion plausible, explanation why (a) dual-pivot Quicksort is faster than the previously used (classic) Quicksort and (b) why this improvement was not already found much earlier.

    Submitted 28 September, 2016; v1 submitted 3 November, 2015; originally announced November 2015.

    Comments: extended abstract for Theorietage 2015 (https://www.uni-trier.de/index.php?id=55089) (v2 fixes a small bug in the pseudocode)

  43. arXiv:1504.06475  [pdf, other

    cs.DS

    A Practical and Worst-Case Efficient Algorithm for Divisor Methods of Apportionment

    Authors: Raphael Reitzig, Sebastian Wild

    Abstract: Proportional apportionment is the problem of assigning seats to parties according to their relative share of votes. Divisor methods are the de-facto standard solution, used in many countries. In recent literature, there are two algorithms that implement divisor methods: one by Cheng and Eppstein (ISAAC, 2014) has worst-case optimal running time but is complex, while the other (Pukelsheim, 2014)… ▽ More

    Submitted 5 December, 2017; v1 submitted 24 April, 2015; originally announced April 2015.

    Comments: (v4 adds missing figures in v3)

  44. Efficient Algorithms for Envy-Free Stick Division With Fewest Cuts

    Authors: Raphael Reitzig, Sebastian Wild

    Abstract: Given a set of n sticks of various (not necessarily different) lengths, what is the largest length so that we can cut k equally long pieces of this length from the given set of sticks? We analyze the structure of this problem and show that it essentially reduces to a single call of a selection algorithm; we thus obtain an optimal linear-time algorithm. This algorithm also solves the related envy… ▽ More

    Submitted 3 November, 2017; v1 submitted 13 February, 2015; originally announced February 2015.

    Comments: v3 adds more context about the problem

    Journal ref: Reitzig, R. & Wild, S. Algorithmica (2017). https://doi.org/10.1007/s00453-017-0392-3

  45. Analysis of Pivot Sampling in Dual-Pivot Quicksort

    Authors: Sebastian Wild, Markus E. Nebel, Conrado Martínez

    Abstract: The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot… ▽ More

    Submitted 10 August, 2015; v1 submitted 30 November, 2014; originally announced December 2014.

    Comments: This article is identical (up to typograhical details) to the Algorithmica version available from Springerlink (see DOI). It is an extended and improved version of our corresponding article at the AofA 2014 conference [arXiv:1403.6602]

  46. Analysis of Branch Misses in Quicksort

    Authors: Conrado Martínez, Markus E. Nebel, Sebastian Wild

    Abstract: The analysis of algorithms mostly relies on counting classic elementary operations like additions, multiplications, comparisons, swaps etc. This approach is often sufficient to quantify an algorithm's efficiency. In some cases, however, features of modern processor architectures like pipelined execution and memory hierarchies have significant impact on running time and need to be taken into accoun… ▽ More

    Submitted 7 November, 2014; originally announced November 2014.

    Comments: to be presented at ANALCO 2015

  47. arXiv:1403.6602  [pdf, other

    cs.DS math.PR

    Pivot Sampling in Dual-Pivot Quicksort

    Authors: Markus E. Nebel, Sebastian Wild

    Abstract: The new dual-pivot Quicksort by Vladimir Yaroslavskiy - used in Oracle's Java runtime library since version 7 - features intriguing asymmetries in its behavior. They were shown to cause a basic variant of this algorithm to use less comparisons than classic single-pivot Quicksort implementations. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statis… ▽ More

    Submitted 13 June, 2014; v1 submitted 26 March, 2014; originally announced March 2014.

    Comments: presented at AofA 2014 (http://www.aofa14.upmc.fr/)

  48. Average Case Analysis of Java 7's Dual Pivot Quicksort

    Authors: Sebastian Wild, Markus E. Nebel

    Abstract: Recently, a new Quicksort variant due to Yaroslavskiy was chosen as standard sorting method for Oracle's Java 7 runtime library. The decision for the change was based on empirical studies showing that on average, the new algorithm is faster than the formerly used classic Quicksort. Surprisingly, the improvement was achieved by using a dual pivot approach, an idea that was considered not promising… ▽ More

    Submitted 28 October, 2013; originally announced October 2013.

    Comments: Best paper award at ESA 2012, recorded talk: http://www.slideshare.net/sebawild/average-case-analysis-of-java-7s-dual-pivot-quicksort

    Journal ref: In L. Epstein & P. Ferragina (Eds.), ESA 2012 (LNCS 7501, pp. 825-836). Springer Berlin/Heidelberg

  49. Analysis of Quickselect under Yaroslavskiy's Dual-Pivoting Algorithm

    Authors: Sebastian Wild, Markus E. Nebel, Hosam Mahmoud

    Abstract: There is excitement within the algorithms community about a new partitioning method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly faster than the case when it runs under classic partitioning methods. We show that this improved performance in Quicksort is not sustained in Quickselect; a variant of Quicksort for finding order statistics. We investigate the number of compariso… ▽ More

    Submitted 15 November, 2014; v1 submitted 17 June, 2013; originally announced June 2013.

    Comments: full version with appendices; otherwise identical to Algorithmica version

    MSC Class: 60C05 (Primary); 68P10 (Secondary); 68P20

  50. arXiv:1304.0988  [pdf, other

    cs.DS math.PR

    Average Case and Distributional Analysis of Dual-Pivot Quicksort

    Authors: Sebastian Wild, Markus E. Nebel, Ralph Neininger

    Abstract: In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime library by a new dual-pivot Quicksort variant due to Vladimir Yaroslavskiy. The decision was based on the strikingly good performance of Yaroslavskiy's implementation in running time experiments. At that time, no precise investigations of the algorithm were available to explain its superior performance - on the contr… ▽ More

    Submitted 13 February, 2015; v1 submitted 3 April, 2013; originally announced April 2013.

    Comments: v3 is content-wise identical to TALG version

    ACM Class: F.2.2; G.2.1; G.3; F.2.3; D.3.2

    Journal ref: ACM Transactions on Algorithms 11, 3, Article 22 (Jan 2015)