-
Fundamental Limits of Multiple Sequence Reconstruction from Substrings
Authors:
Kel Levick,
Ilan Shomorony
Abstract:
The problem of reconstructing a sequence from the set of its length-$k$ substrings has received considerable attention due to its various applications in genomics. We study an uncoded version of this problem where multiple random sources are to be simultaneously reconstructed from the union of their $k$-mer sets. We consider an asymptotic regime where $m = n^α$ i.i.d. source sequences of length…
▽ More
The problem of reconstructing a sequence from the set of its length-$k$ substrings has received considerable attention due to its various applications in genomics. We study an uncoded version of this problem where multiple random sources are to be simultaneously reconstructed from the union of their $k$-mer sets. We consider an asymptotic regime where $m = n^α$ i.i.d. source sequences of length $n$ are to be reconstructed from the set of their substrings of length $k=β\log n$, and seek to characterize the $(α,β)$ pairs for which reconstruction is information-theoretically feasible. We show that, as $n \to \infty$, the source sequences can be reconstructed if $β> \max(2α+1,α+2)$ and cannot be reconstructed if $β< \max( 2α+1, α+ \tfrac32)$, characterizing the feasibility region almost completely. Interestingly, our result shows that there are feasible $(α,β)$ pairs where repeats across the source strings abound, and non-trivial reconstruction algorithms are needed to achieve the fundamental limit.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Achieving the Capacity of a DNA Storage Channel with Linear Coding Schemes
Authors:
Kel Levick,
Reinhard Heckel,
Ilan Shomorony
Abstract:
Due to the redundant nature of DNA synthesis and sequencing technologies, a basic model for a DNA storage system is a multi-draw "shuffling-sampling" channel. In this model, a random number of noisy copies of each sequence is observed at the channel output. Recent works have characterized the capacity of such a DNA storage channel under different noise and sequencing models, relying on sophisticat…
▽ More
Due to the redundant nature of DNA synthesis and sequencing technologies, a basic model for a DNA storage system is a multi-draw "shuffling-sampling" channel. In this model, a random number of noisy copies of each sequence is observed at the channel output. Recent works have characterized the capacity of such a DNA storage channel under different noise and sequencing models, relying on sophisticated typicality-based approaches for the achievability. Here, we consider a multi-draw DNA storage channel in the setting of noise corruption by a binary erasure channel. We show that, in this setting, the capacity is achieved by linear coding schemes. This leads to a considerably simpler derivation of the capacity expression of a multi-draw DNA storage channel than existing results in the literature.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
The Twelvefold Way of Non-Sequential Lossless Compression
Authors:
Taha Ameen ur Rahman,
Alton S. Barbehenn,
Xinan Chen,
Hassan Dbouk,
James A. Douglas,
Yuncong Geng,
Ian George,
John B. Harvill,
Sung Woo Jeon,
Kartik K. Kansal,
Kiwook Lee,
Kelly A. Levick,
Bochao Li,
Ziyue Li,
Yashaswini Murthy,
Adarsh Muthuveeru-Subramaniam,
S. Yagiz Olmez,
Matthew J. Tomei,
Tanya Veeravalli,
Xuechao Wang,
Eric A. Wayman,
Fan Wu,
Peng Xu,
Shen Yan,
Heling Zhang
, et al. (5 additional authors not shown)
Abstract:
Many information sources are not just sequences of distinguishable symbols but rather have invariances governed by alternative counting paradigms such as permutations, combinations, and partitions. We consider an entire classification of these invariances called the twelvefold way in enumerative combinatorics and develop a method to characterize lossless compression limits. Explicit computations f…
▽ More
Many information sources are not just sequences of distinguishable symbols but rather have invariances governed by alternative counting paradigms such as permutations, combinations, and partitions. We consider an entire classification of these invariances called the twelvefold way in enumerative combinatorics and develop a method to characterize lossless compression limits. Explicit computations for all twelve settings are carried out for i.i.d. uniform and Bernoulli distributions. Comparisons among settings provide quantitative insight.
△ Less
Submitted 20 January, 2021; v1 submitted 8 November, 2020;
originally announced November 2020.
-
A Neural Network Detector for Spectrum Sensing under Uncertainties
Authors:
Ziyu Ye,
Qihang Peng,
Kelly Levick,
Hui Rong,
Andrew Gilman,
Pamela Cosman,
Larry Milstein
Abstract:
Spectrum sensing is of critical importance in any cognitive radio system. When the primary user's signal has uncertain parameters, the likelihood ratio test, which is the theoretically optimal detector, generally has no closed-form expression. As a result, spectrum sensing under parameter uncertainty remains an open question, though many detectors exploiting specific features of a primary signal h…
▽ More
Spectrum sensing is of critical importance in any cognitive radio system. When the primary user's signal has uncertain parameters, the likelihood ratio test, which is the theoretically optimal detector, generally has no closed-form expression. As a result, spectrum sensing under parameter uncertainty remains an open question, though many detectors exploiting specific features of a primary signal have been proposed and have achieved reasonably good performance. In this paper, a neural network is trained as a detector for modulated signals. The result shows by training on an appropriate dataset, the neural network gains robustness under uncertainties in system parameters including the carrier frequency offset, carrier phase offset, and symbol time offset. The result displays the neural network's potential in exploiting implicit and incomplete knowledge about the signal's structure.
△ Less
Submitted 6 August, 2019; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Comparison of Neural Network Architectures for Spectrum Sensing
Authors:
Ziyu Ye,
Andrew Gilman,
Qihang Peng,
Kelly Levick,
Pamela Cosman,
Larry Milstein
Abstract:
Different neural network (NN) architectures have different advantages. Convolutional neural networks (CNNs) achieved enormous success in computer vision, while recurrent neural networks (RNNs) gained popularity in speech recognition. It is not known which type of NN architecture is the best fit for classification of communication signals. In this work, we compare the behavior of fully-connected NN…
▽ More
Different neural network (NN) architectures have different advantages. Convolutional neural networks (CNNs) achieved enormous success in computer vision, while recurrent neural networks (RNNs) gained popularity in speech recognition. It is not known which type of NN architecture is the best fit for classification of communication signals. In this work, we compare the behavior of fully-connected NN (FC), CNN, RNN, and bi-directional RNN (BiRNN) in a spectrum sensing task. The four NN architectures are compared on their detection performance, requirement of training data, computational complexity, and memory requirement. Given abundant training data and computational and memory resources, CNN, RNN, and BiRNN are shown to achieve similar performance. The performance of FC is worse than that of the other three types, except in the case where computational complexity is stringently limited.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.