Search | arXiv e-print repository

Fun Maximizing Search, (Non) Instance Optimality, and Video Games for Parrots

Abstract: Computerized Adaptive Testing (CAT) measures an examinee's ability while adapting to their level. Both too many questions and too many hard questions can make a test frustrating. Are there some CAT algorithms which can be proven to be theoretically better than others, and in which framework? We show that slightly extending the traditional framework yields a partial order on CAT algorithms. For uni… ▽ More Computerized Adaptive Testing (CAT) measures an examinee's ability while adapting to their level. Both too many questions and too many hard questions can make a test frustrating. Are there some CAT algorithms which can be proven to be theoretically better than others, and in which framework? We show that slightly extending the traditional framework yields a partial order on CAT algorithms. For uni-dimensional knowledge domains, we analyze the theoretical performance of some old and new algorithms, and we prove that none of the algorithms presented are instance optimal, conjecturing that no instance optimal can exist for the CAT problem. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2206.00734 [pdf, other]

Measuring Discrimination Abilities of Monk Parakeets Between Discreet and Continuous Quantities Through a Digital Life Enrichment Application

Authors: Jérémy Barbay, Fabián Jaña, Cristóbal Sepulveda Álvarez

Abstract: Ain et al. measured three African Grey (Psittacus erithacus) parrot's discrimination abilities between discreet and continuous quantities. Some features of their experimental protocol make it difficult to apply to other subjects and/or species without introducing a risk for some bias, as subjects could read cues from the experimenter (even though the study's subjects probably did not). Can digital… ▽ More Ain et al. measured three African Grey (Psittacus erithacus) parrot's discrimination abilities between discreet and continuous quantities. Some features of their experimental protocol make it difficult to apply to other subjects and/or species without introducing a risk for some bias, as subjects could read cues from the experimenter (even though the study's subjects probably did not). Can digital life enrichment techniques permit us to replicate their results with other species with less risk for experimental bias, with a better precision, and at lower cost? Inspired by previous informal digital life enrichment experiments with parrots, we designed and tested a web application to digitally replicate and extend Ain et al.'s experimental setup. We were able to obtain similar results to theirs for two individuals from a distinct species, Monk Parakeets (Myiopsitta Monachus), with increased guarantees against potential experimental biases, in a way which should allow to replicate such experiments at larger scale and at a much lower cost. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: Long preliminary version

arXiv:2003.10000 [pdf, other]

The Computational Complexity of Evil Hangman

Authors: Jérémy Barbay, Bernardo Subercaseaux

Abstract: The game of Hangman is a classical asymmetric two player game in which one player, the setter, chooses a secret word from a language, that the other player, the guesser, tries to discover through single letter matching queries, answered by all occurrences of this letter if any. In the Evil Hangman variant, the setter can change the secret word during the game, as long as the new choice is consiste… ▽ More The game of Hangman is a classical asymmetric two player game in which one player, the setter, chooses a secret word from a language, that the other player, the guesser, tries to discover through single letter matching queries, answered by all occurrences of this letter if any. In the Evil Hangman variant, the setter can change the secret word during the game, as long as the new choice is consistent with the information already given to the guesser. We show that a greedy strategy for Evil Hangman can perform arbitrarily far from optimal, and most importantly, that playing optimally as an Evil Hangman setter is computationally difficult. The latter result holds even assuming perfect knowledge of the language, for several classes of languages, ranging from Finite to Turing Computable. The proofs are based on reductions to Dominating Set on 3-regular graphs and to the Membership problem, combinatorial problems already known to be computationally hard. △ Less

Submitted 1 May, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

Comments: 13 pages, 4 figures, Accepted at FUN2020

arXiv:1909.01183 [pdf, other]

doi 10.1051/0004-6361/201935574

The Solar Orbiter SPICE instrument -- An extreme UV imaging spectrometer

Authors: The SPICE Consortium, :, M. Anderson, T. Appourchaux, F. Auchère, R. Aznar Cuadrado, J. Barbay, F. Baudin, S. Beardsley, K. Bocchialini, B. Borgo, D. Bruzzi, E. Buchlin, G. Burton, V. Blüchel, M. Caldwell, S. Caminade, M. Carlsson, W. Curdt, J. Davenne, J. Davila, C. E. DeForest, G. Del Zanna, D. Drummond, J. Dubau , et al. (66 additional authors not shown)

Abstract: The Spectral Imaging of the Coronal Environment (SPICE) instrument is a high-resolution imaging spectrometer operating at extreme ultraviolet (EUV) wavelengths. In this paper, we present the concept, design, and pre-launch performance of this facility instrument on the ESA/NASA Solar Orbiter mission. The goal of this paper is to give prospective users a better understanding of the possible types o… ▽ More The Spectral Imaging of the Coronal Environment (SPICE) instrument is a high-resolution imaging spectrometer operating at extreme ultraviolet (EUV) wavelengths. In this paper, we present the concept, design, and pre-launch performance of this facility instrument on the ESA/NASA Solar Orbiter mission. The goal of this paper is to give prospective users a better understanding of the possible types of observations, the data acquisition, and the sources that contribute to the instrument's signal. The paper discusses the science objectives, with a focus on the SPICE-specific aspects, before presenting the instrument's design, including optical, mechanical, thermal, and electronics aspects. This is followed by a characterisation and calibration of the instrument's performance. The paper concludes with descriptions of the operations concept and data processing. The performance measurements of the various instrument parameters meet the requirements derived from the mission's science objectives. The SPICE instrument is ready to perform measurements that will provide vital contributions to the scientific success of the Solar Orbiter mission. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: A&A, accepted 19 August 2019; 26 pages, 25 figures

Journal ref: A&A 642, A14 (2020)

arXiv:1806.04277 [pdf, other]

Indexed Dynamic Programming to boost Edit Distance and LCSS Computation

Authors: Jérémy Barbay, Andrés Olivares

Abstract: There are efficient dynamic programming solutions to the computation of the Edit Distance from $S\in[1..σ]^n$ to $T\in[1..σ]^m$, for many natural subsets of edit operations, typically in time within $O(nm)$ in the worst-case over strings of respective lengths $n$ and $m$ (which is likely to be optimal), and in time within $O(n{+}m)$ in some special cases (e.g. disjoint alphabets). We describe how… ▽ More There are efficient dynamic programming solutions to the computation of the Edit Distance from $S\in[1..σ]^n$ to $T\in[1..σ]^m$, for many natural subsets of edit operations, typically in time within $O(nm)$ in the worst-case over strings of respective lengths $n$ and $m$ (which is likely to be optimal), and in time within $O(n{+}m)$ in some special cases (e.g. disjoint alphabets). We describe how indexing the strings (in linear time), and using such an index to refine the recurrence formulas underlying the dynamic programs, yield faster algorithms in a variety of models, on a continuum of classes of instances of intermediate difficulty between the worst and the best case, thus refining the analysis beyond the worst case analysis. As a side result, we describe similar properties for the computation of the Longest Common Sub Sequence $LCSS(S,T)$ between $S$ and $T$, since it is a particular case of Edit Distance, and we discuss the application of similar algorithmic and analysis techniques for other dynamic programming solutions. More formally, we propose a parameterized analysis of the computational complexity of the Edit Distance for various set of operators and of the Longest Common Sub Sequence in function of the area of the dynamic program matrix relevant to the computation. △ Less

Submitted 11 June, 2018; originally announced June 2018.

arXiv:1806.01226 [pdf, ps, other]

Adaptive Computation of the Discrete Fréchet Distance

Authors: Jérémy Barbay

Abstract: The discrete Fr{é}chet distance is a measure of similarity between point sequences which permits to abstract differences of resolution between the two curves, approximating the original Fr{é}chet distance between curves. Such distance between sequences of respective length $n$ and $m$ can be computed in time within $O(nm)$ and space within $O(n+m)$ using classical dynamic programing techniques, a… ▽ More The discrete Fr{é}chet distance is a measure of similarity between point sequences which permits to abstract differences of resolution between the two curves, approximating the original Fr{é}chet distance between curves. Such distance between sequences of respective length $n$ and $m$ can be computed in time within $O(nm)$ and space within $O(n+m)$ using classical dynamic programing techniques, a complexity likely to be optimal in the worst case over sequences of similar lenght unless the Strong Exponential Hypothesis is proved incorrect. We propose a parameterized analysis of the computational complexity of the discrete Fr{é}chet distance in fonction of the area of the dynamic program matrix relevant to the computation, measured by its \emph{certificate width} $ω$. We prove that the discrete Fr{é}chet distance can be computed in time within $((n+m)ω)$ and space within $O(n+m+ω)$. △ Less

Submitted 4 June, 2018; originally announced June 2018.

arXiv:1805.04223 [pdf, other]

Computing Coverage Kernels Under Restricted Settings

Authors: Jérémy Barbay, Pablo Pérez-Lantero, Javiel Rojas-Ledesma

Abstract: We consider the Minimum Coverage Kernel problem: given a set $B$ of $d$-dimensional boxes, find a subset of $B$ of minimum size covering the same region as $B$. This problem is $\mathsf{NP}$-hard, but as for many $\mathsf{NP}$-hard problems on graphs, the problem becomes solvable in polynomial time under restrictions on the graph induced by $B$. We consider various classes of graphs, show that Min… ▽ More We consider the Minimum Coverage Kernel problem: given a set $B$ of $d$-dimensional boxes, find a subset of $B$ of minimum size covering the same region as $B$. This problem is $\mathsf{NP}$-hard, but as for many $\mathsf{NP}$-hard problems on graphs, the problem becomes solvable in polynomial time under restrictions on the graph induced by $B$. We consider various classes of graphs, show that Minimum Coverage Kernel remains $\mathsf{NP}$-hard even for severely restricted instances, and provide two polynomial time approximation algorithms for this problem. △ Less

Submitted 15 May, 2018; v1 submitted 10 May, 2018; originally announced May 2018.

arXiv:1705.10022 [pdf, other]

Depth Distribution in High Dimensions

Authors: Jérémy Barbay, Pablo Pérez-Lantero, Javiel Rojas-Ledesma

Abstract: Motivated by the analysis of range queries in databases, we introduce the computation of the Depth Distribution of a set $\mathcal{B}$ of axis aligned boxes, whose computation generalizes that of the Klee's Measure and of the Maximum Depth. In the worst case over instances of fixed input size $n$, we describe an algorithm of complexity within $O({n^\frac{d+1}{2}\log n})$, using space within… ▽ More Motivated by the analysis of range queries in databases, we introduce the computation of the Depth Distribution of a set $\mathcal{B}$ of axis aligned boxes, whose computation generalizes that of the Klee's Measure and of the Maximum Depth. In the worst case over instances of fixed input size $n$, we describe an algorithm of complexity within $O({n^\frac{d+1}{2}\log n})$, using space within $O({n\log n})$, mixing two techniques previously used to compute the Klee's Measure. We refine this result and previous results on the Klee's Measure and the Maximum Depth for various measures of difficulty of the input, such as the profile of the input and the degeneracy of the intersection graph formed by the boxes. △ Less

Submitted 31 May, 2017; v1 submitted 28 May, 2017; originally announced May 2017.

Comments: Extended Version of Article presented at COCOON'17

ACM Class: F.2.2

arXiv:1702.08545 [pdf, other]

Synergistic Computation of Planar Maxima and Convex Hull

Authors: Jérémy Barbay, Carlos Ochoa

Abstract: Refinements of the worst case complexity over instances of fixed input size consider the input order or the input structure, but rarely both at the same time. Barbay et al. [2016] described ``synergistic'' solutions on multisets, which take advantage of the input order and the input structure, such as to asymptotically outperform any comparable solution which takes advantage only of one of those f… ▽ More Refinements of the worst case complexity over instances of fixed input size consider the input order or the input structure, but rarely both at the same time. Barbay et al. [2016] described ``synergistic'' solutions on multisets, which take advantage of the input order and the input structure, such as to asymptotically outperform any comparable solution which takes advantage only of one of those features. We consider the extension of their results to the computation of the \textsc{Maxima Set} and the \textsc{Convex Hull} of a set of planar points. After revisiting and improving previous approaches taking advantage only of the input order or of the input structure, we describe synergistic solutions taking optimally advantage of various notions of the input order and input structure in the plane. As intermediate results, we describe and analyze the first adaptive algorithms for \textsc{Merging Maxima} and \textsc{Merging Convex Hulls}. △ Less

Submitted 27 February, 2017; originally announced February 2017.

arXiv:1701.03693 [pdf, other]

Multivariate Analysis for Computing Maxima in High Dimensions

Authors: Jérémy Barbay, Javiel Rojas

Abstract: We study the problem of computing the \textsc{Maxima} of a set of $n$ $d$-dimensional points. For dimensions 2 and 3, there are algorithms to solve the problem with order-oblivious instance-optimal running time. However, in higher dimensions there is still room for improvements. We present an algorithm sensitive to the structural entropy of the input set, which improves the running time, for large… ▽ More We study the problem of computing the \textsc{Maxima} of a set of $n$ $d$-dimensional points. For dimensions 2 and 3, there are algorithms to solve the problem with order-oblivious instance-optimal running time. However, in higher dimensions there is still room for improvements. We present an algorithm sensitive to the structural entropy of the input set, which improves the running time, for large classes of instances, on the best solution for \textsc{Maxima} to date for $d \ge 4$. △ Less

Submitted 13 January, 2017; originally announced January 2017.

ACM Class: F.2.2

arXiv:1608.06666 [pdf, other]

Synergistic Sorting, MultiSelection and Deferred Data Structures on MultiSets

Authors: Jérémy Barbay, Carlos Ochoa, Srinivasa Rao Satti

Abstract: Karp et al. (1988) described Deferred Data Structures for Multisets as "lazy" data structures which partially sort data to support online rank and select queries, with the minimum amount of work in the worst case over instances of size $n$ and number of queries $q$ fixed (i.e., the query size). Barbay et al. (2016) refined this approach to take advantage of the gaps between the positions hit by th… ▽ More Karp et al. (1988) described Deferred Data Structures for Multisets as "lazy" data structures which partially sort data to support online rank and select queries, with the minimum amount of work in the worst case over instances of size $n$ and number of queries $q$ fixed (i.e., the query size). Barbay et al. (2016) refined this approach to take advantage of the gaps between the positions hit by the queries (i.e., the structure in the queries). We develop new techniques in order to further refine this approach and to take advantage all at once of the structure (i.e., the multiplicities of the elements), the local order (i.e., the number and sizes of runs) and the global order (i.e., the number and positions of existing pivots) in the input; and of the structure and order in the sequence of queries. Our main result is a synergistic deferred data structure which performs much better on large classes of instances, while performing always asymptotically as good as previous solutions. As intermediate results, we describe two new synergistic sorting algorithms, which take advantage of the structure and order (local and global) in the input, improving upon previous results which take advantage only of the structure (Munro and Spira 1979) or of the local order (Takaoka 1997) in the input; and one new multiselection algorithm which takes advantage of not only the order and structure in the input, but also of the structure in the queries. We described two compressed data structures to represent a multiset taking advantage of both the local order and structure, while supporting the operators rank and select on the multiset. △ Less

Submitted 30 September, 2016; v1 submitted 23 August, 2016; originally announced August 2016.

Comments: 18 pages

arXiv:1602.03934 [pdf, other]

Bouncing Towers move faster than Hanoi Towers, but still require exponential time

Authors: Jérémy Barbay

Abstract: The problem of the Hanoi Tower is a classic exercise in recursive programming: the solution has a simple recursive definition, and its complexity and the matching lower bound are the solution of a simple recursive function (the solution is so easy that most students memorize it and regurgitate it at exams without truly understanding it). We describe how some very minor changes in the rules of the… ▽ More The problem of the Hanoi Tower is a classic exercise in recursive programming: the solution has a simple recursive definition, and its complexity and the matching lower bound are the solution of a simple recursive function (the solution is so easy that most students memorize it and regurgitate it at exams without truly understanding it). We describe how some very minor changes in the rules of the Hanoi Tower yield various increases of complexity in the solution, so that they require a deeper analysis than the classical Hanoi Tower problem while still yielding exponential solutions. In particular, we analyze the problem fo the Bouncing Tower, where just changing the insertion and extraction position from the top to the middle of the tower results in a surprising increase of complexity in the solution: such a tower of $n$ disks can be optimally moved in $\sqrt{3}^n$ moves for $n$ even (i.e. less than a Hanoi Tower of same height), via $5$ recursive functions (or, equivalently, one recursion function with $5$ states). △ Less

Submitted 11 March, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

Comments: 18 pages and many figures, one appendix with the disk pile problem, code in Python

arXiv:1602.00023 [pdf, ps, other]

Optimal Prefix Free Codes With Partial Sorting

Authors: Jérémy Barbay

Abstract: We describe an algorithm computing an optimal prefix free code for $n$ unsorted positive weights in time within $O(n(1+\lg α))\subseteq O(n\lg n)$, where the alternation $α\in[1..n-1]$ measures the amount of sorting required by the computation. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all inst… ▽ More We describe an algorithm computing an optimal prefix free code for $n$ unsorted positive weights in time within $O(n(1+\lg α))\subseteq O(n\lg n)$, where the alternation $α\in[1..n-1]$ measures the amount of sorting required by the computation. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of size $n$ and alternation $α$. Such results refine the state of the art complexity of $Θ(n\lg n)$ in the worst case over instances of size $n$ in the same computational model, a landmark in compression and coding since 1952, by the mere combination of van Leeuwen's algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with Deferred Data Structures to partially sort a multiset depending on the queries on it (known since 1988). △ Less

Submitted 29 January, 2016; originally announced February 2016.

Comments: 13 pages, no figures. arXiv admin note: text overlap with arXiv:1204.5801

arXiv:1505.02855 [pdf, other]

Adaptive Computation of the Klee's Measure in High Dimensions

Authors: Jérémy Barbay, Pablo Pérez-Lantero, Javiel Rojas-Ledesma

Abstract: The KLEE'S MESURE of $n$ axis-parallel boxes in $\mathbb{R}^d$ is the volume of their union. It can be computed in time within $O(n^{d/2})$ in the worst case. We describe three techniques to boost its computation: one based on some type of "degeneracy'' of the input, and two ones on the inherent "easiness'' of the structure of the input. The first technique benefits from instances where the MAXIMA… ▽ More The KLEE'S MESURE of $n$ axis-parallel boxes in $\mathbb{R}^d$ is the volume of their union. It can be computed in time within $O(n^{d/2})$ in the worst case. We describe three techniques to boost its computation: one based on some type of "degeneracy'' of the input, and two ones on the inherent "easiness'' of the structure of the input. The first technique benefits from instances where the MAXIMA of the input is of small size $h$, and yields a solution running in time within $O(n\log^{2d-2}{h}+ h^{d/2}) \subseteq O(n^{d/2}$). The second technique takes advantage of instances where no $d$-dimensional axis-aligned hyperplane intersects more than $k$ boxes in some dimension, and yields a solution running in time within $O(n \log n + n k^{(d-2)/2}) \subseteq O(n^{d/2})$. The third technique takes advantage of instances where the \emph{intersection graph} of the input has small treewidth $ω$. It yields an algorithm running in time within $O(n^4ω\log ω+ n (ω\log ω)^{d/2})$ in general, and in time within $O(n \log n + n ω^{d/2})$ if an optimal tree decomposition of the intersection graph is given. We show how to combine these techniques in an algorithm which takes advantage of all three configurations. △ Less

Submitted 2 October, 2015; v1 submitted 11 May, 2015; originally announced May 2015.

arXiv:1505.02820 [pdf, other]

Refining the Analysis of Divide and Conquer: How and When

Authors: Jeremy Barbay, Carlos Ochoa, Pablo Perez-Lantero

Abstract: Divide-and-conquer is a central paradigm for the design of algorithms, through which some fundamental computational problems, such as sorting arrays and computing convex hulls, are solved in optimal time within $Θ(n\log{n})$ in the worst case over instances of size $n$. A finer analysis of those problems yields complexities within… ▽ More Divide-and-conquer is a central paradigm for the design of algorithms, through which some fundamental computational problems, such as sorting arrays and computing convex hulls, are solved in optimal time within $Θ(n\log{n})$ in the worst case over instances of size $n$. A finer analysis of those problems yields complexities within $O(n(1 + \mathcal{H}(n_1, \dots, n_k))) \subseteq O(n(1{+}\log{k})) \subseteq O(n\log{n})$ in the worst case over all instances of size $n$ composed of $k$ "easy" fragments of respective sizes $n_1, \dots, n_k$ summing to $n$, where the entropy function $\mathcal{H}(n_1, \dots, n_k) = \sum_{i=1}^k{\frac{n_i}{n}}\log{\frac{n}{n_i}}$ measures the "difficulty" of the instance. We consider whether such refined analysis can be applied to other algorithms based on divide-and-conquer, such as polynomial multiplication, input-order adaptive computation of convex hulls in 2D and 3D, and computation of Delaunay triangulations. △ Less

Submitted 25 September, 2015; v1 submitted 11 May, 2015; originally announced May 2015.

arXiv:1505.00184 [pdf, ps, other]

Instance Optimal Geometric Algorithms

Authors: Peyman Afshani, Jérémy Barbay, Timothy Chan

Abstract: We prove the existence of an algorithm $A$ for computing 2-d or 3-d convex hulls that is optimal for every point set in the following sense: for every sequence $σ$ of $n$ points and for every algorithm $A'$ in a certain class $\mathcal{A}$, the running time of $A$ on input $σ$ is at most a constant factor times the maximum running time of $A'$ on the worst possible permutation of $σ$ for $A'$. We… ▽ More We prove the existence of an algorithm $A$ for computing 2-d or 3-d convex hulls that is optimal for every point set in the following sense: for every sequence $σ$ of $n$ points and for every algorithm $A'$ in a certain class $\mathcal{A}$, the running time of $A$ on input $σ$ is at most a constant factor times the maximum running time of $A'$ on the worst possible permutation of $σ$ for $A'$. We establish a stronger property: for every sequence $σ$ of points and every algorithm $A'$, the running time of $A$ on $σ$ is at most a constant factor times the average running time of $A'$ over all permutations of $σ$. We call algorithms satisfying these properties instance-optimal in the order-oblivious and random-order setting. Such instance-optimal algorithms simultaneously subsume output-sensitive algorithms and distribution-dependent average-case algorithms, and all algorithms that do not take advantage of the order of the input or that assume the input is given in a random order. The class $\mathcal{A}$ under consideration consists of all algorithms in a decision tree model where the tests involve only multilinear functions with a constant number of arguments. To establish an instance-specific lower bound, we deviate from traditional Ben-Or-style proofs and adopt a new adversary argument. For 2-d convex hulls, we prove that a version of the well known algorithm by Kirkpatrick and Seidel (1986) or Chan, Snoeyink, and Yap (1995) already attains this lower bound. For 3-d convex hulls, we propose a new algorithm. We further obtain instance-optimal results for a few other standard problems in computational geometry. Our framework also reveals connection to distribution-sensitive data structures and yields new results as a byproduct, for example, on on-line orthogonal range searching in 2-d and on-line halfspace range reporting in 2-d and 3-d. △ Less

Submitted 1 May, 2015; originally announced May 2015.

Comments: 28 pages in fullpage

ACM Class: F.2.2

arXiv:1504.07298 [pdf, ps, other]

Adaptive Computation of the Swap-Insert Correction Distance

Authors: Jérémy Barbay, Pablo Pérez-Lantero

Abstract: The Swap-Insert Correction distance from a string $S$ of length $n$ to another string $L$ of length $m\geq n$ on the alphabet $[1..d]$ is the minimum number of insertions, and swaps of pairs of adjacent symbols, converting $S$ into $L$. Contrarily to other correction distances, computing it is NP-Hard in the size $d$ of the alphabet. We describe an algorithm computing this distance in time within… ▽ More The Swap-Insert Correction distance from a string $S$ of length $n$ to another string $L$ of length $m\geq n$ on the alphabet $[1..d]$ is the minimum number of insertions, and swaps of pairs of adjacent symbols, converting $S$ into $L$. Contrarily to other correction distances, computing it is NP-Hard in the size $d$ of the alphabet. We describe an algorithm computing this distance in time within $O(d^2 nm g^{d-1})$, where there are $n_α$ occurrences of $α$ in $S$, $m_α$ occurrences of $α$ in $L$, and where $g=\max_{α\in[1..d]} \min\{n_α,m_α-n_α\}$ measures the difficulty of the instance. The difficulty $g$ is bounded by above by various terms, such as the length of the shortest string $S$, and by the maximum number of occurrences of a single character in $S$. Those results illustrate how, in many cases, the correction distance between two strings can be easier to compute than in the worst case scenario. △ Less

Submitted 27 June, 2015; v1 submitted 27 April, 2015; originally announced April 2015.

Comments: 16 pages, no figures, long version of the extended abstract accepted to SPIRE 2015

arXiv:1206.5336 [pdf, ps, other]

doi 10.1016/j.jda.2015.11.001

Near-Optimal Online Multiselection in Internal and External Memory

Authors: Jérémy Barbay, Ankur Gupta, S. Srinivasa Rao, Jonathan Sorenson

Abstract: We introduce an online version of the multiselection problem, in which q selection queries are requested on an unsorted array of n elements. We provide the first online algorithm that is 1-competitive with Kaligosi et al. [ICALP 2005] in terms of comparison complexity. Our algorithm also supports online search queries efficiently. We then extend our algorithm to the dynamic setting, while retain… ▽ More We introduce an online version of the multiselection problem, in which q selection queries are requested on an unsorted array of n elements. We provide the first online algorithm that is 1-competitive with Kaligosi et al. [ICALP 2005] in terms of comparison complexity. Our algorithm also supports online search queries efficiently. We then extend our algorithm to the dynamic setting, while retaining online functionality, by supporting arbitrary insertions and deletions on the array. Assuming that the insertion of an element is immediately preceded by a search for that element, we show that our dynamic online algorithm performs an optimal number of comparisons, up to lower order terms and an additive O(n) term. For the external memory model, we describe the first online multiselection algorithm that is O(1)-competitive. This result improves upon the work of Sibeyn [Journal of Algorithms 2006] when q > m, where m is the number of blocks that can be stored in main memory. We also extend it to support searches, insertions, and deletions of elements efficiently. △ Less

Submitted 13 July, 2013; v1 submitted 22 June, 2012; originally announced June 2012.

arXiv:1204.5801

Optimal Prefix Free Code in Linear Time

Authors: Jérémy Barbay

Abstract: We describe an algorithm computing an optimal prefix free code from $N$ unsorted positive integer weights in time linear in the number of machine words holding those weights. This algorithm takes advantage of common non-algebraic instructions, and of specific results on optimal prefix free codes. This result improves over the state of the art complexities of $O(N\lg N)$ in the algebraic decision t… ▽ More We describe an algorithm computing an optimal prefix free code from $N$ unsorted positive integer weights in time linear in the number of machine words holding those weights. This algorithm takes advantage of common non-algebraic instructions, and of specific results on optimal prefix free codes. This result improves over the state of the art complexities of $O(N\lg N)$ in the algebraic decision tree model and $O(N\lg\lg N)$ in the RAM model for the computation of Huffman's codes, a landmark in compression and coding since 1952. △ Less

Submitted 1 March, 2017; v1 submitted 25 April, 2012; originally announced April 2012.

Comments: The algorithm TopDown is incorrect, and it is not clear how to correct it

arXiv:1204.2034 [pdf, other]

Adaptive Techniques to find Optimal Planar Boxes

Authors: J. Barbay, G. Navarro, P. Pérez-Lantero

Abstract: Given a set $P$ of $n$ planar points, two axes and a real-valued score function $f()$ on subsets of $P$, the Optimal Planar Box problem consists in finding a box (i.e. axis-aligned rectangle) $H$ maximizing $f(H\cap P)$. We consider the case where $f()$ is monotone decomposable, i.e. there exists a composition function $g()$ monotone in its two arguments such that $f(A)=g(f(A_1),f(A_2))$ for every… ▽ More Given a set $P$ of $n$ planar points, two axes and a real-valued score function $f()$ on subsets of $P$, the Optimal Planar Box problem consists in finding a box (i.e. axis-aligned rectangle) $H$ maximizing $f(H\cap P)$. We consider the case where $f()$ is monotone decomposable, i.e. there exists a composition function $g()$ monotone in its two arguments such that $f(A)=g(f(A_1),f(A_2))$ for every subset $A\subseteq P$ and every partition $\{A_1,A_2\}$ of $A$. In this context we propose a solution for the Optimal Planar Box problem which performs in the worst case $O(n^2\lg n)$ score compositions and coordinate comparisons, and much less on other classes of instances defined by various measures of difficulty. A side result of its own interest is a fully dynamic \textit{MCS Splay tree} data structure supporting insertions and deletions with the \emph{dynamic finger} property, improving upon previous results [Cortés et al., J.Alg. 2009]. △ Less

Submitted 9 April, 2012; originally announced April 2012.

Comments: 18 pages, 4 figures

arXiv:1201.3602 [pdf, other]

Compact Binary Relation Representations with Rich Functionality

Authors: Jérémy Barbay, Francisco Claude, Gonzalo Navarro

Abstract: Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify r… ▽ More Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are space-efficient but also efficiently support a large subset of the desired queries. △ Less

Submitted 17 January, 2012; originally announced January 2012.

Comments: 32 pages

arXiv:1108.4408 [pdf, ps, other]

On Compressing Permutations and Adaptive Sorting

Authors: Jérémy Barbay, Gonzalo Navarro

Abstract: Previous compact representations of permutations have focused on adding a small index on top of the plain data $<π(1), π(2),...π(n)>$, in order to efficiently support the application of the inverse or the iterated permutation. In this paper we initiate the study of techniques that exploit the compressibility of the data itself, while retaining efficient computation of $π(i)$ and its inverse. I… ▽ More Previous compact representations of permutations have focused on adding a small index on top of the plain data $<π(1), π(2),...π(n)>$, in order to efficiently support the application of the inverse or the iterated permutation. In this paper we initiate the study of techniques that exploit the compressibility of the data itself, while retaining efficient computation of $π(i)$ and its inverse. In particular, we focus on exploiting {\em runs}, which are subsets (contiguous or not) of the domain where the permutation is monotonic. Several variants of those types of runs arise in real applications such as inverted indexes and suffix arrays. Furthermore, our improved results on compressed data structures for permutations also yield better adaptive sorting algorithms. △ Less

Submitted 22 August, 2011; originally announced August 2011.

arXiv:1009.5863 [pdf, other]

LRM-Trees: Compressed Indices, Adaptive Sorting, and Compressed Permutations

Authors: Jérémy Barbay, Johannes Fischer

Abstract: LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas, from data… ▽ More LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas, from data structures to algorithms: some compressed succinct indices for range minimum queries; a new adaptive sorting algorithm; and a compressed succinct data structure for permutations supporting direct and indirect application in time all the shortest as the permutation is compressible. △ Less

Submitted 29 September, 2010; originally announced September 2010.

Comments: 13 pages, 1 figure

arXiv:0911.4981 [pdf, other]

Efficient Fully-Compressed Sequence Representations

Authors: Jeremy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro, Yakov Nekrich

Abstract: We present a data structure that stores a sequence $s[1..n]$ over alphabet $[1..σ]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the zero-order entropy of $s$. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time $\Oh{\lg\lgσ}$ and average time $\Oh{\lg \Ho(s)}$. The worst-cas… ▽ More We present a data structure that stores a sequence $s[1..n]$ over alphabet $[1..σ]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the zero-order entropy of $s$. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time $\Oh{\lg\lgσ}$ and average time $\Oh{\lg \Ho(s)}$. The worst-case complexity matches the best previous results, yet these had been achieved with data structures using $n\Ho(s)+o(n\lgσ)$ bits. On highly compressible sequences the $o(n\lgσ)$ bits of the redundancy may be significant compared to the the $n\Ho(s)$ bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve $(i)$ compressed redundancy, retaining the best time complexities, for the smallest existing full-text self-indexes; $(ii)$ compressed permutations $π$ with times for $π()$ and $\pii()$ improved to loglogarithmic; and $(iii)$ the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. ... △ Less

Submitted 1 April, 2012; v1 submitted 25 November, 2009; originally announced November 2009.

arXiv:0902.1038 [pdf, ps, other]

Compressed Representations of Permutations, and Applications

Authors: Jérémy Barbay, Gonzalo Navarro

Abstract: We explore various techniques to compress a permutation $π$ over n integers, taking advantage of ordered subsequences in $π$, while supporting its application $π$(i) and the application of its inverse $π^{-1}(i)$ in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encodin… ▽ More We explore various techniques to compress a permutation $π$ over n integers, taking advantage of ordered subsequences in $π$, while supporting its application $π$(i) and the application of its inverse $π^{-1}(i)$ in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encoding of a permutation in order to support iterated applications $π^k(i)$ of it, of integer functions, and of inverted lists and suffix arrays. △ Less

Submitted 6 February, 2009; originally announced February 2009.

Journal ref: STACS 2009 (2009) 111-122

Showing 1–25 of 25 results for author: Barbay, J