Skip to main content

Showing 1–14 of 14 results for author: Thankachan, S V

.
  1. arXiv:2311.01793  [pdf, other

    cs.DS quant-ph

    Near-Optimal Quantum Algorithms for Bounded Edit Distance and Lempel-Ziv Factorization

    Authors: Daniel Gibney, Ce **, Tomasz Kociumaka, Sharma V. Thankachan

    Abstract: Classically, the edit distance of two length-$n$ strings can be computed in $O(n^2)$ time, whereas an $O(n^{2-ε})$-time procedure would falsify the Orthogonal Vectors Hypothesis. If the edit distance does not exceed $k$, the running time can be improved to $O(n+k^2)$, which is near-optimal (conditioned on OVH) as a function of $n$ and $k$. Our first main contribution is a quantum… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted to SODA 2024. arXiv admin note: substantial text overlap with arXiv:2302.07235

  2. arXiv:2302.07235  [pdf, other

    cs.DS

    Compressibility-Aware Quantum Algorithms on Strings

    Authors: Daniel Gibney, Sharma V. Thankachan

    Abstract: Sublinear time quantum algorithms have been established for many fundamental problems on strings. This work demonstrates that new, faster quantum algorithms can be designed when the string is highly compressible. We focus on two popular and theoretically significant compression algorithms -- the Lempel-Ziv77 algorithm (LZ77) and the Run-length-encoded Burrows-Wheeler Transform (RL-BWT), and obtain… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  3. arXiv:2201.12454  [pdf, other

    cs.DS

    The Complexity of Approximate Pattern Matching on De Bruijn Graphs

    Authors: Daniel Gibney, Sharma V. Thankachan, Srinivas Aluru

    Abstract: Aligning a sequence to a walk in a labeled graph is a problem of fundamental importance to Computational Biology. For finding a walk in an arbitrary graph with $|E|$ edges that exactly matches a pattern of length $m$, a lower bound based on the Strong Exponential Time Hypothesis (SETH) implies an algorithm significantly faster than $O(|E|m)$ time is unlikely [Equi et al., ICALP 2019]. However, for… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

  4. arXiv:2008.11786  [pdf, ps, other

    cs.CC cs.DS

    Simple Reductions from Formula-SAT to Pattern Matching on Labeled Graphs and Subtree Isomorphism

    Authors: Daniel Gibney, Gary Hoppenworth, Sharma V. Thankachan

    Abstract: The CNF formula satisfiability problem (CNF-SAT) has been reduced to many fundamental problems in P to prove tight lower bounds under the Strong Exponential Time Hypothesis (SETH). Recently, the works of Abboud, Hansen, Vassilevska W. and Williams (STOC 16), and later, Abboud and Bringmann (ICALP 18) have proposed basing lower bounds on the hardness of general boolean formula satisfiability (Formu… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

  5. arXiv:1911.03035  [pdf, other

    cs.DS

    On the Complexity of BWT-runs Minimization via Alphabet Reordering

    Authors: Jason Bentley, Daniel Gibney, Sharma V. Thankachan

    Abstract: The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to the entropy-based lower bound. Recently, there has been the development of compact suffix trees in space proportional to "$r$", the number of runs in the BWT,… ▽ More

    Submitted 18 February, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

  6. arXiv:1902.01960  [pdf, other

    cs.CC

    On the Hardness and Inapproximability of Recognizing Wheeler Graphs

    Authors: Daniel Gibney, Sharma V. Thankachan

    Abstract: In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an effort to better understand under which conditions such an indexing scheme is possible. This led to the… ▽ More

    Submitted 25 February, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

  7. arXiv:1805.06177  [pdf, ps, other

    cs.DS

    On Computing Average Common Substring Over Run Length Encoded Sequences

    Authors: Sahar Hooshmand, Neda Tavakoli, Paniz Abedin, Sharma V. Thankachan

    Abstract: The Average Common Substring (ACS) is a popular alignment-free distance measure for phylogeny reconstruction. The ACS can be computed in O(n) space and time, where n=x+y is the input size. The compressed string matching is the study of string matching problems with the following twist: the input data is in a compressed format and the underling task must be performed with little or no decompression… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

  8. arXiv:1603.07457  [pdf, ps, other

    cs.DS

    Parameterized Pattern Matching -- Succinctly

    Authors: Arnab Ganguly, Rahul Shah, Sharma V. Thankachan

    Abstract: We consider the $Parameterized$ $Pattern$ $Matching$ problem, where a pattern $P$ matches some location in a text $\mathsf{T}$ iff there is a one-to-one correspondence between the alphabet symbols of the pattern to those of the text. More specifically, assume that the text $\mathsf{T}$ contains $n$ characters from a static alphabet $Σ_s$ and a parameterized alphabet $Σ_p$, where… ▽ More

    Submitted 5 April, 2016; v1 submitted 24 March, 2016; originally announced March 2016.

    ACM Class: F.2.2

  9. arXiv:1512.00378  [pdf, ps, other

    cs.DS

    An In-place Framework for Exact and Approximate Shortest Unique Substring Queries

    Authors: Wing-Kai Hon, Sharma V. Thankachan, Bojian Xu

    Abstract: We revisit the exact shortest unique substring (SUS) finding problem, and propose its approximate version where mismatches are allowed, due to its applications in subfields such as computational biology. We design a generic in-place framework that fits to solve both the exact and approximate $k$-mismatch SUS finding, using the minimum $2n$ memory words plus $n$ bytes space, where $n$ is the input… ▽ More

    Submitted 1 December, 2015; originally announced December 2015.

    Comments: 15 pages. A preliminary version of this paper appears in Proceedings of the 26th International Symposium on Algorithms and Computation (ISAAC), Nagoya, Japan, 2015

  10. arXiv:1509.08608  [pdf, other

    cs.DB cs.DS

    Probabilistic Threshold Indexing for Uncertain Strings

    Authors: Sharma V. Thankachan, Manish Patil, Rahul Shah, Sudip Biswas

    Abstract: Strings form a fundamental data type in computer systems. String searching has been extensively studied since the inception of computer science. Increasingly many applications have to deal with imprecise strings or strings with fuzzy information in them. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable ch… ▽ More

    Submitted 29 September, 2015; originally announced September 2015.

    Comments: 14 pages, 10 figures

  11. arXiv:1404.2677  [pdf, ps, other

    cs.DS

    Optimal Encodings for Range Majority Queries

    Authors: Gonzalo Navarro, Sharma V. Thankachan

    Abstract: We study the problem of designing a data structure that reports the positions of the distinct $τ$-majorities within any range of an array $A[1,n]$, without storing $A$. A $τ$-majority in a range $A[i,j]$, for $0<τ< 1$, is an element that occurs more than $τ(j-i+1)$ times in $A[i,j]$. We show that $Ω(n\log(1/τ))$ bits are necessary for any data structure able just to count the number of distinct… ▽ More

    Submitted 3 October, 2014; v1 submitted 9 April, 2014; originally announced April 2014.

  12. arXiv:1207.2632  [pdf, other

    cs.DS

    On Optimal Top-K String Retrieval

    Authors: Rahul Shah, Cheng Sheng, Sharma V. Thankachan, Jeffrey Scott Vitter

    Abstract: Let ${\cal{D}}$ = $\{d_1, d_2, d_3, ..., d_D\}$ be a given set of $D$ (string) documents of total length $n$. The top-$k$ document retrieval problem is to index $\cal{D}$ such that when a pattern $P$ of length $p$, and a parameter $k$ come as a query, the index returns the $k$ most relevant documents to the pattern $P$. Hon et. al. \cite{HSV09} gave the first linear space framework to solve this p… ▽ More

    Submitted 17 November, 2012; v1 submitted 11 July, 2012; originally announced July 2012.

    Comments: 3 figures

  13. arXiv:1108.0554  [pdf, ps, other

    cs.DS

    Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval

    Authors: Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan

    Abstract: Let $\D = $$ \{d_1,d_2,...d_D\}$ be a given set of $D$ string documents of total length $n$, our task is to index $\D$, such that the $k$ most relevant documents for an online query pattern $P$ of length $p$ can be retrieved efficiently. We propose an index of size $|CSA|+n\log D(2+o(1))$ bits and $O(t_{s}(p)+k\log\log n+poly\log\log n)$ query time for the basic relevance metric \emph{term-frequen… ▽ More

    Submitted 30 March, 2012; v1 submitted 2 August, 2011; originally announced August 2011.

    Comments: 12 pages

  14. arXiv:1007.5110  [pdf, other

    cs.DB cs.DS

    Fully Dynamic Data Structure for Top-k Queries on Uncertain Data

    Authors: Manish Patil, Rahul Shah, Sharma V. Thankachan

    Abstract: Top-$k$ queries allow end-users to focus on the most important (top-$k$) answers amongst those which satisfy the query. In traditional databases, a user defined score function assigns a score value to each tuple and a top-$k$ query returns $k$ tuples with the highest score. In uncertain database, top-$k$ answer depends not only on the scores but also on the membership probabilities of tuples. Seve… ▽ More

    Submitted 29 July, 2010; originally announced July 2010.