Search | arXiv e-print repository

Optimal Almost-Balanced Sequences

Authors: Daniella Bar-Lev, Adir Kobovich, Orian Leitersdorf, Eitan Yaakobi

Abstract: This paper presents a novel approach to address the constrained coding challenge of generating almost-balanced sequences. While strictly balanced sequences have been well studied in the past, the problem of designing efficient algorithms with small redundancy, preferably constant or even a single bit, for almost balanced sequences has remained unsolved. A sequence is $\varepsilon(n)$-almost balanc… ▽ More This paper presents a novel approach to address the constrained coding challenge of generating almost-balanced sequences. While strictly balanced sequences have been well studied in the past, the problem of designing efficient algorithms with small redundancy, preferably constant or even a single bit, for almost balanced sequences has remained unsolved. A sequence is $\varepsilon(n)$-almost balanced if its Hamming weight is between $0.5n\pm \varepsilon(n)$. It is known that for any algorithm with a constant number of bits, $\varepsilon(n)$ has to be in the order of $Θ(\sqrt{n})$, with $O(n)$ average time complexity. However, prior solutions with a single redundancy bit required $\varepsilon(n)$ to be a linear shift from $n/2$. Employing an iterative method and arithmetic coding, our emphasis lies in constructing almost balanced codes with a single redundancy bit. Notably, our method surpasses previous approaches by achieving the optimal balanced order of $Θ(\sqrt{n})$. Additionally, we extend our method to the non-binary case considering $q$-ary almost polarity-balanced sequences for even $q$, and almost symbol-balanced for $q=4$. Our work marks the first asymptotically optimal solutions for almost-balanced sequences, for both, binary and non-binary alphabet. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted to The IEEE International Symposium on Information Theory (ISIT) 2024

arXiv:2405.08475 [pdf, ps, other]

Representing Information on DNA using Patterns Induced by Enzymatic Labeling

Authors: Daniella Bar-Lev, Tuvi Etzion, Eitan Yaakobi, Zohar Yakhini

Abstract: Enzymatic DNA labeling is a powerful tool with applications in biochemistry, molecular biology, biotechnology, medical science, and genomic research. This paper contributes to the evolving field of DNA-based data storage by presenting a formal framework for modeling DNA labeling in strings, specifically tailored for data storage purposes. Our approach involves a known DNA molecule as a template fo… ▽ More Enzymatic DNA labeling is a powerful tool with applications in biochemistry, molecular biology, biotechnology, medical science, and genomic research. This paper contributes to the evolving field of DNA-based data storage by presenting a formal framework for modeling DNA labeling in strings, specifically tailored for data storage purposes. Our approach involves a known DNA molecule as a template for labeling, employing patterns induced by a set of designed labels to represent information. One hypothetical implementation can use CRISPR-Cas9 and gRNA reagents for labeling. Various aspects of the general labeling channel, including fixed-length labels, are explored, and upper bounds on the maximal size of the corresponding codes are given. The study includes the development of an efficient encoder-decoder pair that is proven optimal in terms of maximum code size under specific conditions. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted to The IEEE International Symposium on Information Theory (ISIT) 2024

arXiv:2401.15722 [pdf, ps, other]

Reducing Coverage Depth in DNA Storage: A Combinatorial Perspective on Random Access Efficiency

Authors: Anina Gruica, Daniella Bar-Lev, Alberto Ravagnani, Eitan Yaakobi

Abstract: We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of $k$ information strands, which are encoded into $n$ strands via some generator matrix $G$. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large num… ▽ More We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of $k$ information strands, which are encoded into $n$ strands via some generator matrix $G$. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large number of copies. In this context, the random access coverage depth problem refers to the expected number of reads (i.e., sequenced strands) until it is possible to decode a specific information strand, which is requested by the user. The goal is to minimize the maximum expectation over all possible requested information strands, and this value is denoted by $T_{\max}(G)$. This paper introduces new techniques to investigate the random access coverage depth problem, which capture its combinatorial nature. We establish two general formulas to find $T_{max}(G)$ for arbitrary matrices. We introduce the concept of recovery balanced codes and combine all these results and notions to compute $T_{\max}(G)$ for MDS, simplex, and Hamming codes. We also study the performance of modified systematic MDS matrices and our results show that the best results for $T_{\max}(G)$ are achieved with a specific mix of encoded strands and replication of the information strands. △ Less

Submitted 28 January, 2024; originally announced January 2024.

arXiv:2305.07992 [pdf, ps, other]

On the Capacity of DNA Labeling

Authors: Dganit Hanania, Daniella Bar-Lev, Yevgeni Nogin, Yoav Shechtman, Eitan Yaakobi

Abstract: DNA labeling is a powerful tool in molecular biology and biotechnology that allows for the visualization, detection, and study of DNA at the molecular level. Under this paradigm, a DNA molecule is being labeled by specific k patterns and is then imaged. Then, the resulted image is modeled as a (k + 1)- ary sequence in which any non-zero symbol indicates on the appearance of the corresponding label… ▽ More DNA labeling is a powerful tool in molecular biology and biotechnology that allows for the visualization, detection, and study of DNA at the molecular level. Under this paradigm, a DNA molecule is being labeled by specific k patterns and is then imaged. Then, the resulted image is modeled as a (k + 1)- ary sequence in which any non-zero symbol indicates on the appearance of the corresponding label in the DNA molecule. The primary goal of this work is to study the labeling capacity, which is defined as the maximal information rate that can be obtained using this labeling process. The labeling capacity is computed for any single label and several results are provided for multiple labels as well. Moreover, we provide the optimal minimal number of labels of length one or two that are needed in order to gain labeling capacity of 2. △ Less

Submitted 22 January, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

arXiv:2305.05972 [pdf, other]

Coding for IBLTs with Listing Guarantees

Authors: Daniella Bar-Lev, Avi Mizrahi, Tuvi Etzion, Ori Rottenstreich, Eitan Yaakobi

Abstract: The Invertible Bloom Lookup Table (IBLT) is a probabilistic data structure for set representation, with applications in network and traffic monitoring. It is known for its ability to list its elements, an operation that succeeds with high probability for sufficiently large table. However, listing can fail even for relatively small sets. This paper extends recent work on the worst-case analysis of… ▽ More The Invertible Bloom Lookup Table (IBLT) is a probabilistic data structure for set representation, with applications in network and traffic monitoring. It is known for its ability to list its elements, an operation that succeeds with high probability for sufficiently large table. However, listing can fail even for relatively small sets. This paper extends recent work on the worst-case analysis of IBLT, which guarantees successful listing for all sets of a certain size, by introducing more general IBLT schemes. These schemes allow for greater freedom in the implementation of the insert, delete, and listing operations and demonstrate that the IBLT memory can be reduced while still maintaining successful listing guarantees. The paper also explores the time-memory trade-off of these schemes, some of which are based on linear codes and $B_h$-sequences over finite fields. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2305.05656 [pdf, other]

Cover Your Bases: How to Minimize the Sequencing Coverage in DNA Storage Systems

Authors: Daniella Bar-Lev, Omer Sabary, Ryan Gabrys, Eitan Yaakobi

Abstract: Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly $120/GB, which is dramatically more expensive than reading from existing archival storage solutions today. In this work, we aim to reduce not only the cost but also the latency of DNA storage by initiating the study of the DNA coverage depth problem, which… ▽ More Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly $120/GB, which is dramatically more expensive than reading from existing archival storage solutions today. In this work, we aim to reduce not only the cost but also the latency of DNA storage by initiating the study of the DNA coverage depth problem, which aims to reduce the required number of reads to retrieve information from the storage system. Under this framework, our main goal is to understand the effect of error-correcting codes and retrieval algorithms on the required sequencing coverage depth. We establish that the expected number of reads that are required for information retrieval is minimized when the channel follows a uniform distribution. We also derive upper and lower bounds on the probability distribution of this number of required reads and provide a comprehensive upper and lower bound on its expected value. We further prove that for a noiseless channel and uniform distribution, MDS codes are optimal in terms of minimizing the expected number of reads. Additionally, we study the DNA coverage depth problem under the random-access setup, in which the user aims to retrieve just a specific information unit from the entire DNA storage system. We prove that the expected retrieval time is at least k for [n,k] MDS codes as well as for other families of codes. Furthermore, we present explicit code constructions that achieve expected retrieval times below k and evaluate their performance through analytical methods and simulations. Lastly, we provide lower bounds on the maximum expected retrieval time. Our findings offer valuable insights for reducing the cost and latency of DNA storage. △ Less

Submitted 29 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2304.10391 [pdf, other]

DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

Authors: Avital Boruchovsky, Daniella Bar-Lev, Eitan Yaakobi

Abstract: This paper introduces a new solution to DNA storage that integrates all three steps of retrieval, namely clustering, reconstruction, and error correction. DNA-correcting codes are presented as a unique solution to the problem of ensuring that the output of the storage system is unique for any valid set of input strands. To this end, we introduce a novel distance metric to capture the unique behavi… ▽ More This paper introduces a new solution to DNA storage that integrates all three steps of retrieval, namely clustering, reconstruction, and error correction. DNA-correcting codes are presented as a unique solution to the problem of ensuring that the output of the storage system is unique for any valid set of input strands. To this end, we introduce a novel distance metric to capture the unique behavior of the DNA storage system and provide necessary and sufficient conditions for DNA-correcting codes. The paper also includes several bounds and constructions of DNA-correcting codes. △ Less

Submitted 30 June, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: Extended version of the paper that appeared in ISIT 2023

arXiv:2304.01317 [pdf, other]

Universal Framework for Parametric Constrained Coding

Authors: Daniella Bar-Lev, Adir Kobovich, Orian Leitersdorf, Eitan Yaakobi

Abstract: Constrained coding is a fundamental field in coding theory that tackles efficient communication through constrained channels. While channels with fixed constraints have a general optimal solution, there is increasing demand for parametric constraints that are dependent on the message length. Several works have tackled such parametric constraints through iterative algorithms, yet they require compl… ▽ More Constrained coding is a fundamental field in coding theory that tackles efficient communication through constrained channels. While channels with fixed constraints have a general optimal solution, there is increasing demand for parametric constraints that are dependent on the message length. Several works have tackled such parametric constraints through iterative algorithms, yet they require complex constructions specific to each constraint to guarantee convergence through monotonic progression. In this paper, we propose a universal framework for tackling any parametric constrained-channel problem through a novel simple iterative algorithm. By reducing an execution of this iterative algorithm to an acyclic graph traversal, we prove a surprising result that guarantees convergence with efficient average time complexity even without requiring any monotonic progression. We demonstrate the effectiveness of this universal framework by applying it to a variety of both local and global channel constraints. We begin by exploring the local constraints involving illegal substrings of variable length, where the universal construction essentially iteratively replaces forbidden windows. We apply this local algorithm to the minimal periodicity, minimal Hamming weight, local almost-balanced Hamming weight and the previously-unsolved minimal palindrome constraints. We then continue by exploring global constraints, and demonstrate the effectiveness of the proposed construction on the repeat-free encoding, reverse-complement encoding, and the open problem of global almost-balanced encoding. For reverse-complement, we also tackle a previously-unsolved version of the constraint that addresses overlap** windows. Overall, the proposed framework generates state-of-the-art constructions with significant ease while also enabling the simultaneous integration of multiple constraints for the first time. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2212.13812 [pdf, other]

Invertible Bloom Lookup Tables with Listing Guarantees

Authors: Avi Mizrahi, Daniella Bar-Lev, Eitan Yaakobi, Ori Rottenstreich

Abstract: The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data structure for set representation that supports a listing operation as the recovery of the elements in the represented set. Its applications can be found in network synchronization and traffic monitoring as well as in error-correction codes. IBLT can list its elements with probability affected by the size of the allocated memo… ▽ More The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data structure for set representation that supports a listing operation as the recovery of the elements in the represented set. Its applications can be found in network synchronization and traffic monitoring as well as in error-correction codes. IBLT can list its elements with probability affected by the size of the allocated memory and the size of the represented set, such that it can fail with small probability even for relatively small sets. While previous works only studied the failure probability of IBLT, this work initiates the worst case analysis of IBLT that guarantees successful listing for all sets of a certain size. The worst case study is important since the failure of IBLT imposes high overhead. We describe a novel approach that guarantees successful listing when the set satisfies a tunable upper bound on its size. To allow that, we develop multiple constructions that are based on various coding techniques such as stop** sets and the stop** redundancy of error-correcting codes, Steiner systems, and covering arrays as well as new methodologies we develop. We analyze the sizes of IBLTs with listing guarantees obtained by the various methods as well as their map** memory consumption. Lastly, we study lower bounds on the achievable sizes of IBLT with listing guarantees and verify the results in the paper by simulations. △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2210.04471 [pdf, ps, other]

doi 10.1109/TIT.2023.3269124

Generalized Unique Reconstruction from Substrings

Authors: Yonatan Yehezkeally, Daniella Bar-Lev, Sagi Marcovich, Eitan Yaakobi

Abstract: This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this… ▽ More This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length $\ell$ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of $\ell$ that asymptotically behave like the lower bound. △ Less

Submitted 20 April, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: Author-submitted, peer-reviewed and accepted version (IEEE Trans. on Inform. Theory). arXiv admin note: text overlap with arXiv:2205.03933

arXiv:2206.07995 [pdf, ps, other]

On the Size of Balls and Anticodes of Small Diameter under the Fixed-Length Levenshtein Metric

Authors: Daniella Bar-Lev, Tuvi Etzion, Eitan Yaakobi

Abstract: The rapid development of DNA storage has brought the deletion and insertion channel to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. Similar to any other metric, the size of a ball is one of the most fundamental parameters. In this w… ▽ More The rapid development of DNA storage has brought the deletion and insertion channel to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. Similar to any other metric, the size of a ball is one of the most fundamental parameters. In this work, we consider the minimum, maximum, and average size of a ball with radius one, in the FLL metric. The related minimum and the maximum size of a maximal anticode with diameter one are also considered. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2103.01681

arXiv:2205.03933 [pdf, ps, other]

Reconstruction from Substrings with Partial Overlap

Authors: Yonatan Yehezkeally, Daniella Bar-Lev, Sagi Marcovich, Eitan Yaakobi

Abstract: This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the set… ▽ More This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the setup in which consecutive substrings are read with some given minimum overlap. First, upper bounds are provided on the attainable rates of codes that guarantee unique reconstruction. Then, we present efficient constructions of asymptotically optimal codes that meet the upper bound. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: 6 pages, 2 figures; conference submission

arXiv:2205.03911 [pdf, other]

Codes for Constrained Periodicity

Authors: Adir Kobovich, Orian Leitersdorf, Daniella Bar-Lev, Eitan Yaakobi

Abstract: Reliability is an inherent challenge for the emerging nonvolatile technology of racetrack memories, and there exists a fundamental relationship between codes designed for racetrack memories and codes with constrained periodicity. Previous works have sought to construct codes that avoid periodicity in windows, yet have either only provided existence proofs or required high redundancy. This paper pr… ▽ More Reliability is an inherent challenge for the emerging nonvolatile technology of racetrack memories, and there exists a fundamental relationship between codes designed for racetrack memories and codes with constrained periodicity. Previous works have sought to construct codes that avoid periodicity in windows, yet have either only provided existence proofs or required high redundancy. This paper provides the first constructions for avoiding periodicity that are both efficient (average-linear time) and with low redundancy (near the lower bound). The proposed algorithms are based on iteratively repairing windows which contain periodicity until all the windows are valid. Intuitively, such algorithms should not converge as there is no monotonic progression; yet, we prove convergence with average-linear time complexity by exploiting subtle properties of the encoder. Overall, we both provide constructions that avoid periodicity in all windows, and we also study the cardinality of such constraints. △ Less

Submitted 25 August, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

Comments: Accepted to The International Symposium on Information Theory and Its Applications (ISITA) 2022

arXiv:2202.03024 [pdf, other]

The Input and Output Entropies of the $k$-Deletion/Insertion Channel

Authors: Shubhransh Singhvi, Omer Sabary, Daniella Bar-Lev, Eitan Yaakobi

Abstract: The channel output entropy of a transmitted word is the entropy of the possible channel outputs and similarly, the input entropy of a received word is the entropy of all possible transmitted words. The goal of this work is to study these entropy values for the k-deletion, k-insertion channel, where exactly k symbols are deleted, and inserted in the transmitted word, respectively. If all possible w… ▽ More The channel output entropy of a transmitted word is the entropy of the possible channel outputs and similarly, the input entropy of a received word is the entropy of all possible transmitted words. The goal of this work is to study these entropy values for the k-deletion, k-insertion channel, where exactly k symbols are deleted, and inserted in the transmitted word, respectively. If all possible words are transmitted with the same probability then studying the input and output entropies is equivalent. For both the 1-insertion and 1-deletion channels, it is proved that among all words with a fixed number of runs, the input entropy is minimized for words with a skewed distribution of their run lengths and it is maximized for words with a balanced distribution of their run lengths. Among our results, we establish a conjecture by Atashpendar et al. which claims that for the binary 1-deletion, the input entropy is maximized for the alternating words. This conjecture is also verified for the 2-deletion channel, where it is proved that constant words with a single run minimize the input entropy. △ Less

Submitted 15 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:2201.11150 [pdf, ps, other]

doi 10.1109/TIT.2023.3292895

Adversarial Torn-paper Codes

Authors: Daniella Bar-Lev, Sagi Marcovich, Eitan Yaakobi, Yonatan Yehezkeally

Abstract: We study the adversarial torn-paper channel. This problem is motivated by applications in DNA data storage where the DNA strands that carry information may break into smaller pieces which are received out of order. Our model extends the previously researched probabilistic setting to the worst-case. We develop code constructions for any parameters of the channel for which non-vanishing asymptotic r… ▽ More We study the adversarial torn-paper channel. This problem is motivated by applications in DNA data storage where the DNA strands that carry information may break into smaller pieces which are received out of order. Our model extends the previously researched probabilistic setting to the worst-case. We develop code constructions for any parameters of the channel for which non-vanishing asymptotic rate is possible and show our constructions achieve asymptotically optimal rate while allowing for efficient encoding and decoding. Finally, we extend our results to related settings included multi-strand storage, presence of substitution errors, or incomplete coverage. △ Less

Submitted 4 July, 2023; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: Author submitted, peer-reviewed version

arXiv:2201.02466 [pdf, other]

On The Decoding Error Weight of One or Two Deletion Channels

Authors: Omer Sabary, Daniella Bar-Lev, Yotam Gershon, Alexander Yucovich, Eitan Yaakobi

Abstract: This paper tackles two problems that are relevant to coding for insertions and deletions. These problems are motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output the transmitted word or some close approximation… ▽ More This paper tackles two problems that are relevant to coding for insertions and deletions. These problems are motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output the transmitted word or some close approximation of it. The first part of this paper studies the deletion channel that deletes a symbol with some fixed probability $p$, while focusing on two instances of this channel. Since operating the maximum likelihood (ML) decoder in this case is computationally unfeasible, we study a slightly degraded version of this decoder for two channels and its expected normalized distance. We identify the dominant error patterns and based on these observations, it is derived that the expected normalized distance of the degraded ML decoder is roughly $\frac{3q-1}{q-1}p^2$, when the transmitted word is any $q$-ary sequence and $p$ is the channel's deletion probability. We also study the cases when the transmitted word belongs to the Varshamov Tenengolts (VT) code or the shifted VT code. Additionally, the insertion channel is studied as well as the case of two insertion channels. These theoretical results are verified by corresponding simulations. The second part of the paper studies optimal decoding for a special case of the deletion channel, the $k$-deletion channel, which deletes exactly $k$ symbols of the transmitted word uniformly at random. In this part, the goal is to understand how an optimal decoder operates in order to minimize the expected normalized distance. A full characterization of an efficient optimal decoder for this setup, referred to as the maximum likelihood* (ML*) decoder, is given for a channel that deletes one or two symbols. △ Less

Submitted 7 January, 2022; originally announced January 2022.

Comments: arXiv admin note: text overlap with arXiv:2001.05582

arXiv:2109.00031 [pdf]

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

Authors: Daniella Bar-Lev, Itai Orr, Omer Sabary, Tuvi Etzion, Eitan Yaakobi

Abstract: DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of th… ▽ More DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines Deep Neural Networks (DNN) trained on simulated data, Tensor-Product (TP) based Error-Correcting Codes (ECC), and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1MB of information using two different sequencing technologies. Our work improves upon the current leading solutions by up to x3200 increase in speed, 40% improvement in accuracy, and offers a code rate of 1.6 bits per base in a high noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes. △ Less

Submitted 11 March, 2024; v1 submitted 31 August, 2021; originally announced September 2021.

arXiv:2103.01681 [pdf, ps, other]

On Levenshtein Balls with Radius One

Authors: Daniella Bar-Lev, Tuvi Etzion, Eitan Yaakobi

Abstract: The rapid development of DNA storage has brought the deletion and insertion channel, once again, to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. The size of a ball is one of the most fundamental parameters in any metric. The size of… ▽ More The rapid development of DNA storage has brought the deletion and insertion channel, once again, to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. The size of a ball is one of the most fundamental parameters in any metric. The size of the ball with radius one in the FLL metric depends on the number of runs and the length of the alternating segments of the given word. In this work, we find the minimum, maximum, and average size of a ball with radius one, in the FLL metric. The related minimum and maximum sizes of a maximal anticode with diameter one are also calculated. △ Less

Submitted 29 June, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: 6 pages, to be published in 2021 IEEE International Symposium on Information Theory (ISIT)

Showing 1–18 of 18 results for author: Bar-Lev, D