-
An End-to-End Coding Scheme for DNA-Based Data Storage With Nanopore-Sequenced Reads
Authors:
Lorenz Welter,
Roman Sokolovskii,
Thomas Heinis,
Antonia Wachter-Zeh,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider error-correcting coding for deoxyribonucleic acid (DNA)-based storage using nanopore sequencing. We model the DNA storage channel as a sampling noise channel where the input data is chunked into $M$ short DNA strands, which are copied a random number of times, and the channel outputs a random selection of $N$ noisy DNA strands. The retrieved DNA reads are prone to strand-dependent inse…
▽ More
We consider error-correcting coding for deoxyribonucleic acid (DNA)-based storage using nanopore sequencing. We model the DNA storage channel as a sampling noise channel where the input data is chunked into $M$ short DNA strands, which are copied a random number of times, and the channel outputs a random selection of $N$ noisy DNA strands. The retrieved DNA reads are prone to strand-dependent insertion, deletion, and substitution (IDS) errors. We construct an index-based concatenated coding scheme consisting of the concatenation of an outer code, an index code, and an inner code. We further propose a low-complexity (linear in $N$) maximum a posteriori probability decoder that takes into account the strand-dependent IDS errors and the randomness of the drawing to infer symbolwise a posteriori probabilities for the outer decoder. We present Monte-Carlo simulations for information-outage probabilities and frame error rates for different channel setups on experimental data. We finally evaluate the overall system performance using the read/write cost trade-off. A powerful combination of tailored channel modeling and soft information processing allows us to achieve excellent performance even with error-prone nanopore-sequenced reads outperforming state-of-the-art schemes.%
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Coding Over Coupon Collector Channels for Combinatorial Motif-Based DNA Storage
Authors:
Roman Sokolovskii,
Parv Agarwal,
Luis Alberto Croquevielle,
Zijian Zhou,
Thomas Heinis
Abstract:
Encoding information in combinations of pre-synthesised deoxyribonucleic acid (DNA) strands (referred to as motifs) is an interesting approach to DNA storage that could potentially circumvent the prohibitive costs of nucleotide-by-nucleotide DNA synthesis. Based on our analysis of an empirical data set from HelixWorks, we propose two channel models for this setup (with and without interference) an…
▽ More
Encoding information in combinations of pre-synthesised deoxyribonucleic acid (DNA) strands (referred to as motifs) is an interesting approach to DNA storage that could potentially circumvent the prohibitive costs of nucleotide-by-nucleotide DNA synthesis. Based on our analysis of an empirical data set from HelixWorks, we propose two channel models for this setup (with and without interference) and analyse their fundamental limits. We propose a coding scheme that approaches those limits by leveraging all information available at the output of the channel, in contrast to earlier schemes developed for a similar setup by Preuss et al. We highlight an important connection between channel capacity curves and the fundamental trade-off between synthesis (writing) and sequencing (reading), and offer a way to mitigate an exponential growth in decoding complexity with the size of the motif library.
△ Less
Submitted 13 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Finite-Length Scaling of SC-LDPC Codes With a Limited Number of Decoding Iterations
Authors:
Roman Sokolovskii,
Alexandre Graell i Amat,
Fredrik Brännström
Abstract:
We propose four finite-length scaling laws to predict the frame error rate (FER) performance of spatially-coupled low-density parity-check codes under full belief propagation (BP) decoding with a limit on the number of decoding iterations and a scaling law for sliding window decoding, also with limited iterations. The laws for full BP decoding provide a choice between accuracy and computational co…
▽ More
We propose four finite-length scaling laws to predict the frame error rate (FER) performance of spatially-coupled low-density parity-check codes under full belief propagation (BP) decoding with a limit on the number of decoding iterations and a scaling law for sliding window decoding, also with limited iterations. The laws for full BP decoding provide a choice between accuracy and computational complexity; a good balance between them is achieved by the law that models the number of decoded bits after a certain number of BP iterations by a time-integrated Ornstein-Uhlenbeck process. This framework is developed further to model sliding window decoding as a race between the integrated Ornstein-Uhlenbeck process and an absorbing barrier that corresponds to the left boundary of the sliding window. The proposed scaling laws yield accurate FER predictions.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
On Doped SC-LDPC Codes for Streaming
Authors:
Roman Sokolovskii,
Alexandre Graell i Amat,
Fredrik Brännström
Abstract:
In streaming applications, do** improves the performance of spatially-coupled low-density parity-check (SC-LDPC) codes by creating reduced-degree check nodes in the coupled chain. We formulate a scaling law to predict the bit and block error rate of periodically-doped semi-infinite SC-LDPC code ensembles streamed over the binary erasure channel under sliding window decoding for a given finite co…
▽ More
In streaming applications, do** improves the performance of spatially-coupled low-density parity-check (SC-LDPC) codes by creating reduced-degree check nodes in the coupled chain. We formulate a scaling law to predict the bit and block error rate of periodically-doped semi-infinite SC-LDPC code ensembles streamed over the binary erasure channel under sliding window decoding for a given finite component block length. The scaling law assumes that with some probability do** is equivalent to full termination and triggers two decoding waves; otherwise, decoding performs as if the coupled chain had not been doped at all. We approximate that probability and use the derived scaling laws to predict the error rates of SC-LDPC code ensembles in the presence of do**. The proposed scaling law provides accurate error rate predictions. We further use it to show that in streaming applications periodic do** can yield higher rates than periodic full termination for the same error-correcting performance.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Finite-Length Scaling of Spatially Coupled LDPC Codes Under Window Decoding Over the BEC
Authors:
Roman Sokolovskii,
Alexandre Graell i Amat,
Fredrik Brännström
Abstract:
We analyze the finite-length performance of spatially coupled low-density parity-check (SC-LDPC) codes under window decoding over the binary erasure channel. In particular, we propose a refinement of the scaling law by Olmos and Urbanke for the frame error rate (FER) of terminated SC-LDPC ensembles under full belief propagation (BP) decoding. The refined scaling law models the decoding process as…
▽ More
We analyze the finite-length performance of spatially coupled low-density parity-check (SC-LDPC) codes under window decoding over the binary erasure channel. In particular, we propose a refinement of the scaling law by Olmos and Urbanke for the frame error rate (FER) of terminated SC-LDPC ensembles under full belief propagation (BP) decoding. The refined scaling law models the decoding process as two independent Ornstein-Uhlenbeck processes, in correspondence to the two decoding waves that propagate toward the center of the coupled chain for terminated SC-LDPC codes. We then extend the proposed scaling law to predict the performance of (terminated) SC-LDPC code ensembles under the more practical sliding window decoding. Finally, we extend this framework to predict the bit error rate (BER) and block error rate (BLER) of SC-LDPC code ensembles. The proposed scaling law yields very accurate predictions of the FER, BLER, and BER for both full BP and window decoding.
△ Less
Submitted 25 August, 2020; v1 submitted 9 March, 2020;
originally announced March 2020.
-
Survey of Information Encoding Techniques for DNA
Authors:
Thomas Heinis,
Roman Sokolovskii,
Jamie J. Alnasir
Abstract:
The yearly global production of data is growing exponentially, outpacing the capacity of existing storage media, such as tape and disk, and surpassing our ability to store it. DNA storage - the representation of arbitrary information as sequences of nucleotides - offers a promising storage medium. DNA is nature's information-storage molecule of choice and has a number of key properties: it is extr…
▽ More
The yearly global production of data is growing exponentially, outpacing the capacity of existing storage media, such as tape and disk, and surpassing our ability to store it. DNA storage - the representation of arbitrary information as sequences of nucleotides - offers a promising storage medium. DNA is nature's information-storage molecule of choice and has a number of key properties: it is extremely dense, offering the theoretical possibility of storing 455 EB/g; it is durable, with a half-life of approximately 520 years that can be increased to thousands of years when DNA is chilled and stored dry; and it is amenable to automated synthesis and sequencing. Furthermore, biochemical processes that act on DNA potentially enable highly parallel data manipulation.
Whilst biological information is encoded in DNA via a specific map** from triplet sequences of nucleotides to amino acids, DNA storage is not limited to a single encoding scheme, and there are many possible ways to map data to chemical sequences of nucleotides for synthesis, storage, retrieval and data manipulation. However, there are several biological, error-tolerance and information-retrieval considerations that an encoding scheme needs to address to be viable.
This comprehensive review focuses on comparing existing work done in encoding arbitrary data within DNA in terms of their encoding schemes, methods to address biological constraints and measures to provide error correction. We compare encoding approaches on the overall information density and coverage they achieve, as well as the data-retrieval method they use (i.e., sequential or random access). We also discuss the background and evolution of the encoding schemes.
△ Less
Submitted 2 October, 2023; v1 submitted 24 June, 2019;
originally announced June 2019.
-
A Refined Scaling Law for Spatially Coupled LDPC Codes Over the Binary Erasure Channel
Authors:
Roman Sokolovskii,
Fredrik Brännström,
Alexandre Graell i Amat
Abstract:
We propose a refined scaling law to predict the finite-length performance in the waterfall region of spatially coupled low-density parity-check codes over the binary erasure channel. In particular, we introduce some improvements to the scaling law proposed by Olmos and Urbanke that result in a better agreement between the predicted and simulated frame error rate. We also show how the scaling law c…
▽ More
We propose a refined scaling law to predict the finite-length performance in the waterfall region of spatially coupled low-density parity-check codes over the binary erasure channel. In particular, we introduce some improvements to the scaling law proposed by Olmos and Urbanke that result in a better agreement between the predicted and simulated frame error rate. We also show how the scaling law can be extended to predict the bit error rate performance.
△ Less
Submitted 2 July, 2019; v1 submitted 23 April, 2019;
originally announced April 2019.
-
A possible new phase of antagonistic nematogens in a disorienting field
Authors:
T. G. Sokolovska,
M. E. Cates,
R. O. Sokolovskii
Abstract:
A simple model is proposed for nematogenic molecules that favor perpendicular orientations as well as parallel ones. (Charged rods, for example, show this antagonistic tendency.) When a small disorienting field is applied along $z$, a low density phase $N_-$ of nematic order parameter $S_z<0$ coexists with a dense biaxial nematic $N_b$. (At zero field, $N_-$ becomes isotropic and $N_b$ uniaxial.…
▽ More
A simple model is proposed for nematogenic molecules that favor perpendicular orientations as well as parallel ones. (Charged rods, for example, show this antagonistic tendency.) When a small disorienting field is applied along $z$, a low density phase $N_-$ of nematic order parameter $S_z<0$ coexists with a dense biaxial nematic $N_b$. (At zero field, $N_-$ becomes isotropic and $N_b$ uniaxial.) But at stronger fields, a new phase $N_{+4}$, invariant under $π/2$ rotations around the field axis, appears in between $N_-$ and $N_b$. Prospects for finding the $N_{+4}$ phase experimentally are briefly discussed.
△ Less
Submitted 12 May, 2003;
originally announced May 2003.
-
Model fluid in a porous medium: results for a Bethe lattice
Authors:
R. O. Sokolovskii,
M. E. Cates,
T. G. Sokolovska
Abstract:
We consider a lattice gas with quenched impurities or `quenched-annealed binary mixture' on the Bethe lattice. The quenched part represents a porous matrix in which the (annealed) lattice gas resides. This model features the 3 main factors of fluids in random porous media: wetting, randomness and confinement. The recursive character of the Bethe lattice enables an exact treatment, whose key ingr…
▽ More
We consider a lattice gas with quenched impurities or `quenched-annealed binary mixture' on the Bethe lattice. The quenched part represents a porous matrix in which the (annealed) lattice gas resides. This model features the 3 main factors of fluids in random porous media: wetting, randomness and confinement. The recursive character of the Bethe lattice enables an exact treatment, whose key ingredient is an integral equation yielding the one-particle effective field distribution. Our analysis shows that this distribution consists of two essentially different parts. The first one is a continuous spectrum and corresponds to the macroscopic volume accessible to the fluid, the second is discrete and comes from finite closed cavities in the porous medium. Those closed cavities are in equilibrium with the bulk fluid within the grand canonical ensemble we use, but are inaccessible in real experimental situations. Fortunately, we are able to isolate their contributions. Separation of the discrete spectrum facilitates also the numerical solution of the main equation. The numerical calculations show that the continuous spectrum becomes more and more rough as the temperature decreases, and this limits the accuracy of the solution at low temperatures.
△ Less
Submitted 6 May, 2003;
originally announced May 2003.
-
Nonlinear interference effects in emission, absorption, and generation spectra
Authors:
T. Ya. Popova,
A. K. Popov,
S. G. Rautian,
R. I. Sokolovskii
Abstract:
Nonlinear effects in emission and absorption spectra of gaseous systems are considered. It is shown that level splitting can be detected spectroscopically even if it is below the Doppler width. Conditions for distinguishing interference effects from those due to nonequilibrium velocity distribution are determined. In the case of large Doppler broadening the correction for atomic motion is equiva…
▽ More
Nonlinear effects in emission and absorption spectra of gaseous systems are considered. It is shown that level splitting can be detected spectroscopically even if it is below the Doppler width. Conditions for distinguishing interference effects from those due to nonequilibrium velocity distribution are determined. In the case of large Doppler broadening the correction for atomic motion is equivalent to the substitution of an "effective immobile atom" for the moving atom ensemble. The spectral manifestation of nonlinear effects is analyzed in detail. The influence of nonlinear interference effects on the generation characteristics in the presence of external field is investigated.
△ Less
Submitted 23 May, 2000;
originally announced May 2000.
-
The effect of an external magnetic field on the gas-liquid transition in the Ising spin fluid
Authors:
R. O. Sokolovskii
Abstract:
The theoretical phase diagrams of the magnetic (Ising) lattice fluid in an external magnetic field is presented. It is shown that, depending on the strength of the nonmagnetic interaction between particles, various effects of external field on the Ising fluid take place. In particular, at moderate values of the nonmagnetic attraction the field effect on the gas-liquid critical temperature is non…
▽ More
The theoretical phase diagrams of the magnetic (Ising) lattice fluid in an external magnetic field is presented. It is shown that, depending on the strength of the nonmagnetic interaction between particles, various effects of external field on the Ising fluid take place. In particular, at moderate values of the nonmagnetic attraction the field effect on the gas-liquid critical temperature is nonmonotoneous. A justification of such behavior is given. If short-range correlations are taken into account (within a cluster approach), the Curie temperature also depend on the nonmagnetic interaction.
△ Less
Submitted 5 January, 1999;
originally announced January 1999.
-
The effect of an external magnetic field on the gas-liquid transition in the Heisenberg spin fluid
Authors:
T. G. Sokolovska,
R. O. Sokolovskii
Abstract:
We present the theoretical phase diagrams of the classical Heisenberg fluid in an external magnetic field. A consistent account of correlations is carried out by the integral equation method. A nonmonotoneous effect of fields on the temperature of the gas-liquid critical point is found. Within the mean spherical approximation this nonmonotoneous behavior disappears for short-range enough spin-sp…
▽ More
We present the theoretical phase diagrams of the classical Heisenberg fluid in an external magnetic field. A consistent account of correlations is carried out by the integral equation method. A nonmonotoneous effect of fields on the temperature of the gas-liquid critical point is found. Within the mean spherical approximation this nonmonotoneous behavior disappears for short-range enough spin-spin interactions.
△ Less
Submitted 20 October, 1998;
originally announced October 1998.
-
Relaxation dynamics of disordered Ising model. Two-site cluster approximation
Authors:
R. R. Levitskii,
S. I. Sorokov,
R. O. Sokolovskii
Abstract:
Spin relaxation in a site-disordered Ising model within master equation approach is studied. The $\vec{q},ω$-dependent susceptibility of the model is calculated and investigated. Effects described by the two-site cluster approximation and lost by the mean field approximation are discussed. Comparison of obtained results with dielectric measurements in $Cs(H_{1-x}D_x)_2PO_4$ is presented.
Spin relaxation in a site-disordered Ising model within master equation approach is studied. The $\vec{q},ω$-dependent susceptibility of the model is calculated and investigated. Effects described by the two-site cluster approximation and lost by the mean field approximation are discussed. Comparison of obtained results with dielectric measurements in $Cs(H_{1-x}D_x)_2PO_4$ is presented.
△ Less
Submitted 18 June, 1996;
originally announced June 1996.