-
S$^3$-TTA: Scale-Style Selection for Test-Time Augmentation in Biomedical Image Segmentation
Authors:
Kangxian Xie,
Siyu Huang,
Sebastian Andres Cajas Ordonez,
Hanspeter Pfister,
Donglai Wei
Abstract:
Deep-learning models have been successful in biomedical image segmentation. To generalize for real-world deployment, test-time augmentation (TTA) methods are often used to transform the test image into different versions that are hopefully closer to the training domain. Unfortunately, due to the vast diversity of instance scale and image styles, many augmented test images produce undesirable resul…
▽ More
Deep-learning models have been successful in biomedical image segmentation. To generalize for real-world deployment, test-time augmentation (TTA) methods are often used to transform the test image into different versions that are hopefully closer to the training domain. Unfortunately, due to the vast diversity of instance scale and image styles, many augmented test images produce undesirable results, thus lowering the overall performance. This work proposes a new TTA framework, S$^3$-TTA, which selects the suitable image scale and style for each test image based on a transformation consistency metric. In addition, S$^3$-TTA constructs an end-to-end augmentation-segmentation joint-training pipeline to ensure a task-oriented augmentation. On public benchmarks for cell and lung segmentation, S$^3$-TTA demonstrates improvements over the prior art by 3.4% and 1.3%, respectively, by simply augmenting the input data in testing phase.
△ Less
Submitted 6 January, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Grammar Compressed Sequences with Rank/Select Support
Authors:
Alberto Ordóñez,
Gonzalo Navarro,
Nieves R. Brisaboa
Abstract:
Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. Several recent applications need to represent highly repetitive sequences, and classical statistical compression proves ineffective. We introduce, instead, grammar-based representations for repetitive sequences, which use…
▽ More
Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. Several recent applications need to represent highly repetitive sequences, and classical statistical compression proves ineffective. We introduce, instead, grammar-based representations for repetitive sequences, which use up to 6% of the space needed by statistically compressed representations, and support direct access and rank/select operations within tens of microseconds. We demonstrate the impact of our structures in text indexing applications.
△ Less
Submitted 21 November, 2019; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Efficient and Compact Representations of Some Non-Canonical Prefix-Free Codes
Authors:
Antonio Fariña,
Travis Gagie,
Szymon Grabowski,
Giovanni Manzini,
Gonzalo Navarro,
Alberto Ordóñez
Abstract:
For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to symbols. In this paper we first show how, given a probability distribution over an alphabet of $σ$ symbols, we can store an optimal…
▽ More
For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to symbols. In this paper we first show how, given a probability distribution over an alphabet of $σ$ symbols, we can store an optimal alphabetic prefix-free code in $\Oh{σ\log L}$ bits such that we can encode and decode any codeword of length $\ell$ in $\Oh{\min (\ell, \log L)}$ time, where $L$ is the maximum codeword length. With $\Oh{2^{L^ε}}$ further bits, for any constant $ε>0$, we can encode and decode $\Oh{\log \ell}$ time. We then show how to store a nearly optimal alphabetic prefix-free code in \(o (σ)\) bits such that we can encode and decode in constant time. We also consider a kind of optimal prefix-free code introduced recently where the codewords' lengths are non-decreasing if arranged in lexicographic order of their reverses. We reduce their storage space to $\Oh{σ\log L}$ while maintaining encoding and decoding times in $\Oh{\ell}$. We also show how, with $\Oh{2^{εL}}$ further bits, we can encode and decode in constant time. All of our results hold in the word-RAM model.
△ Less
Submitted 1 April, 2021; v1 submitted 21 May, 2016;
originally announced May 2016.
-
Queries on LZ-Bounded Encodings
Authors:
Djamal Belazzougui,
Travis Gagie,
Paweł Gawrychowski,
Juha Kärkkäinen,
Alberto Ordóñez,
Simon J. Puglisi,
Yasuo Tabei
Abstract:
We describe a data structure that stores a string $S$ in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data structures, such as compressed trees and graphs. We show that our data structure can be built in a scalable manner and is both small and fast in practice compar…
▽ More
We describe a data structure that stores a string $S$ in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data structures, such as compressed trees and graphs. We show that our data structure can be built in a scalable manner and is both small and fast in practice compared to other data structures supporting such queries.
△ Less
Submitted 2 December, 2014;
originally announced December 2014.
-
Efficient and Compact Representations of Prefix Codes
Authors:
Travis Gagie,
Gonzalo Navarro,
Yakov Nekrich,
Alberto Ordóñez
Abstract:
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduce and compare several techniques to store prefix codes. Let $N$ be the sequence length and $n$ be the alphabe…
▽ More
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduce and compare several techniques to store prefix codes. Let $N$ be the sequence length and $n$ be the alphabet size. Then a naive storage of an optimal prefix code uses $O(n\log n)$ bits. Our first technique shows how to use $O(n\log\log(N/n))$ bits to store the optimal prefix code. Then we introduce an approximate technique that, for any $0<ε<1/2$, takes $O(n \log \log (1 / ε))$ bits to store a prefix code with average codeword length within an additive $ε$ of the minimum. Finally, a second approximation takes, for any constant $c > 1$, $O(n^{1 / c} \log n)$ bits to store a prefix code with average codeword length at most $c$ times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in $O(1)$ time. We experimentally compare our new techniques with the state of the art, showing that we achieve 6--8-fold space reductions, at the price of a slower encoding (2.5--8 times slower) and decoding (12--24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented.
△ Less
Submitted 29 June, 2015; v1 submitted 13 October, 2014;
originally announced October 2014.
-
Efficient Compressed Wavelet Trees over Large Alphabets
Authors:
Francisco Claude,
Gonzalo Navarro,
Alberto Ordóñez
Abstract:
The {\em wavelet tree} is a flexible data structure that permits representing sequences $S[1,n]$ of symbols over an alphabet of size $σ$, within compressed space and supporting a wide range of operations on $S$. When $σ$ is significant compared to $n$, current wavelet tree representations incur in noticeable space or time overheads. In this article we introduce the {\em wavelet matrix}, an alterna…
▽ More
The {\em wavelet tree} is a flexible data structure that permits representing sequences $S[1,n]$ of symbols over an alphabet of size $σ$, within compressed space and supporting a wide range of operations on $S$. When $σ$ is significant compared to $n$, current wavelet tree representations incur in noticeable space or time overheads. In this article we introduce the {\em wavelet matrix}, an alternative representation for large alphabets that retains all the properties of wavelet trees but is significantly faster. We also show how the wavelet matrix can be compressed up to the zero-order entropy of the sequence without sacrificing, and actually improving, its time performance. Our experimental results show that the wavelet matrix outperforms all the wavelet tree variants along the space/time tradeoff map.
△ Less
Submitted 6 May, 2014;
originally announced May 2014.