-
A general method for the development of constrained codes
Authors:
Boris Ryabko
Abstract:
Nowadays there are several classes of constrained codes intended for different applications. The following two large classes can be distinguished. The first class contains codes with local constraints; for example, the source data must be encoded by binary sequences containing no sub-words 00 and 111. The second class contains codes with global constraints; for example, the code-words must be bina…
▽ More
Nowadays there are several classes of constrained codes intended for different applications. The following two large classes can be distinguished. The first class contains codes with local constraints; for example, the source data must be encoded by binary sequences containing no sub-words 00 and 111. The second class contains codes with global constraints; for example, the code-words must be binary sequences of certain even length with half zeros and half ones. It is important to note that often the necessary codes must fulfill some requirements of both classes.
In this paper we propose a general polynomial complexity method for constructing codes for both classes, as well as for combinations thereof. The proposed method uses the enumerative Cover's code, but the main difference between known applications of this code is that the known algorithms require the use of combinatorial formulae when applied, whereas the proposed method calculates all parameters on-the-fly using a polynomial complexity algorithm.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Building test batteries based on analysing random number generator tests within the framework of algorithmic information theory
Authors:
Boris Ryabko
Abstract:
The problem of testing random number generators is considered and it is shown that an approach based on algorithmic information theory allows us to compare the power of different tests in some cases where the available methods of mathematical statistics do not distinguish between the tests. In particular, it is shown that tests based on data compression methods using dictionaries should be include…
▽ More
The problem of testing random number generators is considered and it is shown that an approach based on algorithmic information theory allows us to compare the power of different tests in some cases where the available methods of mathematical statistics do not distinguish between the tests. In particular, it is shown that tests based on data compression methods using dictionaries should be included in the test batteries.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Reduction of the secret key length in the perfect cipher by data compression and randomisation
Authors:
Boris Ryabko
Abstract:
Perfect ciphers have been a very attractive cryptographic tool ever since C. Shannon described them. Note that, by definition, if a perfect cipher is used, no one can get any information about the encrypted message without knowing the secret key. We consider the problem of reducing the key length of perfect ciphers, because in many applications the length of the secret key is a crucial parameter.…
▽ More
Perfect ciphers have been a very attractive cryptographic tool ever since C. Shannon described them. Note that, by definition, if a perfect cipher is used, no one can get any information about the encrypted message without knowing the secret key. We consider the problem of reducing the key length of perfect ciphers, because in many applications the length of the secret key is a crucial parameter. This paper describes a simple method of key length reduction. This method gives a perfect cipher and is based on the use of data compression and randomisation, and the average key length can be made close to Shannon entropy (which is the key length limit). It should be noted that the method can effectively use readily available data compressors (archivers).
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Unconditionally secure ciphers with a short key for a source with unknown statistics
Authors:
Boris Ryabko
Abstract:
We consider the problem of constructing an unconditionally secure cipher with a short key for the case where the probability distribution of encrypted messages is unknown. Note that unconditional security means that an adversary with no computational constraints can obtain only a negligible amount of information ("leakage") about an encrypted message (without knowing the key). Here we consider the…
▽ More
We consider the problem of constructing an unconditionally secure cipher with a short key for the case where the probability distribution of encrypted messages is unknown. Note that unconditional security means that an adversary with no computational constraints can obtain only a negligible amount of information ("leakage") about an encrypted message (without knowing the key). Here we consider the case of a priori (partially) unknown message source statistics.
More specifically, the message source probability distribution belongs to a given family of distributions. We propose an unconditionally secure cipher for this case. As an example, one can consider constructing a single cipher for texts written in any of the languages of the European Union. That is, the message to be encrypted could be written in any of these languages.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Entropically secure cipher for messages generated by Markov chains with unknown statistics
Authors:
Boris Ryabko
Abstract:
In 2002, Russell and Wang proposed a definition of entropically security that was developed within the framework of secret key cryptography. An entropically-secure system is unconditionally secure, that is, unbreakable, regardless of the enemy's computing power. In 2004, Dodis and Smith developed the results of Russell and Wang and, in particular, stated that the concept of an entropy-protected sy…
▽ More
In 2002, Russell and Wang proposed a definition of entropically security that was developed within the framework of secret key cryptography. An entropically-secure system is unconditionally secure, that is, unbreakable, regardless of the enemy's computing power. In 2004, Dodis and Smith developed the results of Russell and Wang and, in particular, stated that the concept of an entropy-protected symmetric encryption scheme is extremely important for cryptography, since it is possible to construct entropy-protected symmetric encryption schemes with keys much shorter than the keys. the length of the input data, which allows you to bypass the famous lower bound on the length of the Shannon key. In this report, we propose an entropy-protected scheme for the case where the encrypted message is generated by a Markov chain with unknown statistics. The length of the required secret key is proportional to the logarithm of the length of the message (as opposed to the length of the message itself for the one-time pad).
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
Using data compression and randomization to build an unconditionally secure short key cipher
Authors:
Boris Ryabko
Abstract:
We consider the problem of constructing an unconditionally secure cipher for the case when the key length is less than the length of the encrypted message. (Unconditional security means that a computationally unbounded adversary cannot obtain information about the encrypted message without the key.) In this article, we propose data compression and randomization techniques combined with entropicall…
▽ More
We consider the problem of constructing an unconditionally secure cipher for the case when the key length is less than the length of the encrypted message. (Unconditional security means that a computationally unbounded adversary cannot obtain information about the encrypted message without the key.) In this article, we propose data compression and randomization techniques combined with entropically-secure encryption. The resulting cipher can be used for encryption in such a way that the key length does not depend on the entropy or the length of the encrypted message; instead, it is determined by the required security level.
△ Less
Submitted 19 December, 2021;
originally announced December 2021.
-
Fast direct access to variable length codes
Authors:
Boris Ryabko
Abstract:
We consider the issue of direct access to any letter of a sequence encoded with a variable length code and stored in the computer's memory, which is a special case of the random access problem to compressed memory. The characteristics according to which methods are evaluated are the access time to one letter and the memory used. The proposed methods, with various trade-offs between the characteris…
▽ More
We consider the issue of direct access to any letter of a sequence encoded with a variable length code and stored in the computer's memory, which is a special case of the random access problem to compressed memory. The characteristics according to which methods are evaluated are the access time to one letter and the memory used. The proposed methods, with various trade-offs between the characteristics, outperform the known ones.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Calibrating random number generator tests
Authors:
Boris Ryabko
Abstract:
Currently, statistical tests for random number generators (RNGs) are widely used in practice, and some of them are even included in information security standards. But despite the popularity of RNGs, consistent tests are known only for stationary ergodic deviations of randomness (a test is consistent if it detects any deviations from a given class when the sample size goes to $ \infty $). However,…
▽ More
Currently, statistical tests for random number generators (RNGs) are widely used in practice, and some of them are even included in information security standards. But despite the popularity of RNGs, consistent tests are known only for stationary ergodic deviations of randomness (a test is consistent if it detects any deviations from a given class when the sample size goes to $ \infty $). However, the model of a stationary ergodic source is too narrow for some RNGs, in particular, for generators based on physical effects. In this article, we propose computable consistent tests for some classes of deviations more general than stationary ergodic and describe some general properties of statistical tests. The proposed approach and the resulting test are based on the ideas and methods of information theory.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Linear hash-functions and their applications to error detection and correction
Authors:
Boris Ryabko
Abstract:
We describe and explore so-called linear hash functions and show how they can be used to build error detection and correction codes. The method can be applied for different types of errors (for example, burst errors). When the method is applied to a model where number of distorted letters is limited, the obtained estimate of its performance is slightly better than the known Varshamov-Gilbert bound…
▽ More
We describe and explore so-called linear hash functions and show how they can be used to build error detection and correction codes. The method can be applied for different types of errors (for example, burst errors). When the method is applied to a model where number of distorted letters is limited, the obtained estimate of its performance is slightly better than the known Varshamov-Gilbert bound. We also describe random code whose performance is close to the same boundary, but its construction is much simpler. In some cases the obtained methods are simpler and more flexible than the known ones. In particular, the complexity of the obtained error detection code and the well-known CRC code is close, but the proposed code, unlike CRC, can detect with certainty errors whose number does not exceed a predetermined limit.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Information Theory as a Means of Determining the Main Factors Affecting the Processors Architecture
Authors:
Anton Rakitskiy,
Boris Ryabko
Abstract:
In this article we are investigating the computers development process in the past decades in order to identify the factors that influence it the most. We describe such factors and use them to predict the direction of further development. To solve these problems, we use the concept of the Computer Capacity, which allows us to estimate the performance of computers theoretically, relying only on the…
▽ More
In this article we are investigating the computers development process in the past decades in order to identify the factors that influence it the most. We describe such factors and use them to predict the direction of further development. To solve these problems, we use the concept of the Computer Capacity, which allows us to estimate the performance of computers theoretically, relying only on the description of its architecture.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
The time-adaptive statistical testing for random number generators
Authors:
Boris Ryabko
Abstract:
The problem of constructing effective statistical tests for random number generators (RNG) is considered. Currently, there are hundreds of RNG statistical tests that are often combined into so-called batteries, each containing from a dozen to more than one hundred tests.
When a battery test is used, it is applied to a sequence generated by the RNG, and the calculation time is determined by the l…
▽ More
The problem of constructing effective statistical tests for random number generators (RNG) is considered. Currently, there are hundreds of RNG statistical tests that are often combined into so-called batteries, each containing from a dozen to more than one hundred tests.
When a battery test is used, it is applied to a sequence generated by the RNG, and the calculation time is determined by the length of the sequence and the number of tests. Generally speaking, the longer the sequence, the smaller deviations from randomness can be found by a specific test. So, when a battery is applied, on the one hand, the "better" tests are in the battery, the more chances to reject a "bad" RNG. On the other hand, the larger the battery, the less time can be spent on each test and, therefore, the shorter the test sequence. In turn, this reduces the ability to find small deviations from randomness. To reduce this trade-off, we propose an adaptive way to use batteries (and other sets) of tests, which requires less time but, in a certain sense, preserves the power of the original battery. We call this method time-adaptive battery of tests.
△ Less
Submitted 7 February, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
On asymptotically optimal tests for random number generators
Authors:
Boris Ryabko
Abstract:
The problem of constructing effective statistical tests for random number generators (RNG) is considered. Currently, statistical tests for RNGs are a mandatory part of cryptographic information protection systems, but their effectiveness is mainly estimated based on experiments with various RNGs.
We find an asymptotic estimate for the p-value of an optimal test in the case where the alternative…
▽ More
The problem of constructing effective statistical tests for random number generators (RNG) is considered. Currently, statistical tests for RNGs are a mandatory part of cryptographic information protection systems, but their effectiveness is mainly estimated based on experiments with various RNGs.
We find an asymptotic estimate for the p-value of an optimal test in the case where the alternative hypothesis is a known stationary ergodic source, and then describe a family of tests each of which has the same asymptotic estimate of the p-value for any (unknown) stationary ergodic source.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Application of data compression techniques to time series forecasting
Authors:
K. S. Chirikhin,
B. Ya. Ryabko
Abstract:
In this study we show that standard well-known file compression programs (zlib, bzip2, etc.) are able to forecast real-world time series data well. The strength of our approach is its ability to use a set of data compression algorithms and "automatically" choose the best one of them during the process of forecasting. Besides, modern data-compressors are able to find many kinds of latent regulariti…
▽ More
In this study we show that standard well-known file compression programs (zlib, bzip2, etc.) are able to forecast real-world time series data well. The strength of our approach is its ability to use a set of data compression algorithms and "automatically" choose the best one of them during the process of forecasting. Besides, modern data-compressors are able to find many kinds of latent regularities using some methods of artificial intelligence (for example, some data-compressors are based on finding the smallest formal grammar that describes the time series). Thus, our approach makes it possible to apply some particular methods of artificial intelligence for time-series forecasting.
As examples of the application of the proposed method, we made forecasts for the monthly T-index and the Kp-index time series using standard compressors. In both cases, we used the Mean Absolute Error (MAE) as an accuracy measure.
For the monthly T-index time series, we made 18 forecasts beyond the available data for each month since January 2011 to July 2017. We show that, in comparison with the forecasts made by the Australian Bureau of Meteorology, our method more accurately predicts one value ahead.
The Kp-index time series consists of 3-hour values ranging from 0 to 9. For each day from February 4, 2018 to March 28, 2018, we made forecasts for 24 values ahead. We compared our forecasts with the forecasts made by the Space Weather Prediction Center (SWPC). The results showed that the accuracy of our method is similar to the accuracy of the SWPC's method. As in the previous case, we also obtained more accurate one-step forecasts.
△ Less
Submitted 7 April, 2019;
originally announced April 2019.
-
Time-universal data compression and prediction
Authors:
Boris Ryabko
Abstract:
Suppose there is a large file which should be transmitted (or stored) and there are several (say, m) admissible data-compressors. It seems natural to try all the compressors and then choose the best, i.e. the one that gives the shortest compressed file. Then transfer (or store) the index number of the best compressor (it requires log m bits) and the compressed file.The only problem is the time, wh…
▽ More
Suppose there is a large file which should be transmitted (or stored) and there are several (say, m) admissible data-compressors. It seems natural to try all the compressors and then choose the best, i.e. the one that gives the shortest compressed file. Then transfer (or store) the index number of the best compressor (it requires log m bits) and the compressed file.The only problem is the time, which essentially increases due to the need to compress the file m times (in order to find the best compressor). We propose a method that encodes the file with the optimal compressor, but uses a relatively small additional time: the ratio of this extra time and the total time of calculation can be limited by an arbitrary positive constant.
Generally speaking, in many situations it may be necessary find the best data compressor out of a given set, which is often done by comparing them empirically. One of the goals of this work is to turn such a selection process into a part of the data compression method, automating and optimizing it.
△ Less
Submitted 9 September, 2018;
originally announced September 2018.
-
Application of the Computer Capacity to the Analysis of Processors Evolution
Authors:
Boris Ryabko,
Anton Rakitskiy
Abstract:
The notion of computer capacity was proposed in 2012, and this quantity has been estimated for computers of different kinds.
In this paper we show that, when designing new processors, the manufacturers change the parameters that affect the computer capacity. This allows us to predict the values of parameters of future processors. As the main example we use Intel processors, due to the accessibil…
▽ More
The notion of computer capacity was proposed in 2012, and this quantity has been estimated for computers of different kinds.
In this paper we show that, when designing new processors, the manufacturers change the parameters that affect the computer capacity. This allows us to predict the values of parameters of future processors. As the main example we use Intel processors, due to the accessibility of detailed description of all their technical characteristics.
△ Less
Submitted 14 May, 2017;
originally announced May 2017.
-
Using data-compressors for statistical analysis of problems on homogeneity testing and classification
Authors:
Boris Ryabko,
Andrey Guskov,
Irina Selivanova
Abstract:
Nowadays data compressors are applied to many problems of text analysis, but many such applications are developed outside of the framework of mathematical statistics. In this paper we overcome this obstacle and show how several methods of classical mathematical statistics can be developed based on applications of the data compressors.
Nowadays data compressors are applied to many problems of text analysis, but many such applications are developed outside of the framework of mathematical statistics. In this paper we overcome this obstacle and show how several methods of classical mathematical statistics can be developed based on applications of the data compressors.
△ Less
Submitted 15 January, 2017;
originally announced January 2017.
-
Information-Theoretical Analysis of Two Shannon's Ciphers
Authors:
Boris Ryabko
Abstract:
We describe generalized running key ciphers and apply them for analysis of two Shannon's methods. In particular, we suggest some estimation of the cipher equivocation and the probability of correct deciphering without key.
We describe generalized running key ciphers and apply them for analysis of two Shannon's methods. In particular, we suggest some estimation of the cipher equivocation and the probability of correct deciphering without key.
△ Less
Submitted 1 May, 2016;
originally announced May 2016.
-
Two-faced processes and random number generators
Authors:
Boris Ryabko
Abstract:
We describe random processes (with binary alphabet) whose entropy is less than 1 (per letter), but they mimic true random process, i.e., by definition, generated sequence can be interpreted as the result of the flips of a fair coin with sides that are labeled 0 and 1. It gives a possibility to construct Random Number Generators which possess theoretical guarantees. This, in turn, is important for…
▽ More
We describe random processes (with binary alphabet) whose entropy is less than 1 (per letter), but they mimic true random process, i.e., by definition, generated sequence can be interpreted as the result of the flips of a fair coin with sides that are labeled 0 and 1. It gives a possibility to construct Random Number Generators which possess theoretical guarantees. This, in turn, is important for applications such as those in cryptography.
△ Less
Submitted 22 December, 2015;
originally announced December 2015.
-
Predicting the outcomes of every process for which an asymptotically accurate stationary predictor exists is impossible
Authors:
Daniil Ryabko,
Boris Ryabko
Abstract:
The problem of prediction consists in forecasting the conditional distribution of the next outcome given the past. Assume that the source generating the data is such that there is a stationary ergodic predictor whose error converges to zero (in a certain sense). The question is whether there is a universal predictor for all such sources, that is, a predictor whose error goes to zero if any of the…
▽ More
The problem of prediction consists in forecasting the conditional distribution of the next outcome given the past. Assume that the source generating the data is such that there is a stationary ergodic predictor whose error converges to zero (in a certain sense). The question is whether there is a universal predictor for all such sources, that is, a predictor whose error goes to zero if any of the sources that have this property is chosen to generate the data. This question is answered in the negative, contrasting a number of previously established positive results concerning related but smaller sets of processes.
△ Less
Submitted 25 September, 2015;
originally announced September 2015.
-
Using Information Theory to Study the Efficiency and Capacity of Caching in the Computer Networks
Authors:
Boris Ryabko
Abstract:
Nowadays computer networks use different kind of memory whose speeds and capacities vary widely. There exist methods of a so-called caching which are intended to use the different kinds of memory in such a way that the frequently used data are stored in the faster memory, wheres the infrequent ones are stored in the slower memory. We address the problems of estimating the caching efficiency and it…
▽ More
Nowadays computer networks use different kind of memory whose speeds and capacities vary widely. There exist methods of a so-called caching which are intended to use the different kinds of memory in such a way that the frequently used data are stored in the faster memory, wheres the infrequent ones are stored in the slower memory. We address the problems of estimating the caching efficiency and its capacity. We define the efficiency and capacity of the caching and suggest a method for their estimation based on the analysis of kinds of the accessible memory.
△ Less
Submitted 13 October, 2013;
originally announced October 2013.
-
The Vernam cipher is robust to small deviations from randomness
Authors:
Boris Ryabko
Abstract:
The Vernam cipher (or one-time pad) has played an important rule in cryptography because it is a perfect secrecy system. For example, if an English text (presented in binary system) $X_1 X_2 ... $ is enciphered according to the formula $Z_i = (X_i + Y_i) \mod 2 $, where $Y_1 Y_2 ...$ is a key sequence generated by the Bernoulli source with equal probabilities of 0 and 1, anyone who knows…
▽ More
The Vernam cipher (or one-time pad) has played an important rule in cryptography because it is a perfect secrecy system. For example, if an English text (presented in binary system) $X_1 X_2 ... $ is enciphered according to the formula $Z_i = (X_i + Y_i) \mod 2 $, where $Y_1 Y_2 ...$ is a key sequence generated by the Bernoulli source with equal probabilities of 0 and 1, anyone who knows $Z_1 Z_2 ... $ has no information about $X_1 X_2 ... $ without the knowledge of the key $Y_1 Y_2 ...$. (The best strategy is to guess $X_1 X_2 ... $ not paying attention to $Z_1 Z_2 ... $.)
But what should one say about secrecy of an analogous method where the key sequence $Y_1 Y_2 ...$ is generated by the Bernoulli source with a small bias, say, $P(0) = 0.49, $ $ P(1) = 0.51$? To the best of our knowledge, there are no theoretical estimates for the secrecy of such a system, as well as for the general case where $X_1 X_2 ... $ (the plaintext) and key sequence are described by stationary ergodic processes. We consider the running-key ciphers where the plaintext and the key are generated by stationary ergodic sources and show how to estimate the secrecy of such systems. In particular, it is shown that, in a certain sense, the Vernam cipher is robust to small deviations from randomness.
△ Less
Submitted 9 March, 2013;
originally announced March 2013.
-
Experimental Investigation of Forecasting Methods Based on Universal Measures
Authors:
Boris Ryabko,
Pavel Pristavka
Abstract:
We describe and experimentally investigate a method to construct forecasting algorithms for stationary and ergodic processes based on universal measures (or so-called universal data compressors). Using some geophysical and economical time series as examples, we show that the precision of thus obtained predictions is higher than that of known methods.
We describe and experimentally investigate a method to construct forecasting algorithms for stationary and ergodic processes based on universal measures (or so-called universal data compressors). Using some geophysical and economical time series as examples, we show that the precision of thus obtained predictions is higher than that of known methods.
△ Less
Submitted 12 April, 2011;
originally announced April 2011.
-
Confidence Sets in Time-Series Filtering
Authors:
Boris Ryabko,
Daniil Ryabko
Abstract:
The problem of filtering of finite-alphabet stationary ergodic time series is considered. A method for constructing a confidence set for the (unknown) signal is proposed, such that the resulting set has the following properties: First, it includes the unknown signal with probability $γ$, where $γ$ is a parameter supplied to the filter. Second, the size of the confidence sets grows exponentially wi…
▽ More
The problem of filtering of finite-alphabet stationary ergodic time series is considered. A method for constructing a confidence set for the (unknown) signal is proposed, such that the resulting set has the following properties: First, it includes the unknown signal with probability $γ$, where $γ$ is a parameter supplied to the filter. Second, the size of the confidence sets grows exponentially with the rate that is asymptotically equal to the conditional entropy of the signal given the data. Moreover, it is shown that this rate is optimal.
△ Less
Submitted 9 July, 2012; v1 submitted 14 December, 2010;
originally announced December 2010.
-
Using Information Theory to Study the Efficiency and Capacity of Computers and Similar Devices
Authors:
Boris Ryabko
Abstract:
We address the problems of estimating the computer efficiency and the computer capacity. We define the computer efficiency and capacity and suggest a method for their estimation, based on the analysis of processor instructions and kinds of accessible memory. It is shown how the suggested method can be applied to estimate the computer capacity. In particular, this consideration gives a new…
▽ More
We address the problems of estimating the computer efficiency and the computer capacity. We define the computer efficiency and capacity and suggest a method for their estimation, based on the analysis of processor instructions and kinds of accessible memory. It is shown how the suggested method can be applied to estimate the computer capacity. In particular, this consideration gives a new look at the organization of the memory of a computer. Obtained results can be of some interest for practical applications
△ Less
Submitted 18 March, 2010;
originally announced March 2010.
-
The use of ideas of Information Theory for studying "language" and intelligence in ants
Authors:
Boris Ryabko,
Zhanna Reznikova
Abstract:
In this review we integrate results of long term experimental study on ant "language" and intelligence which were fully based on fundamental ideas of Information Theory, such as the Shannon entropy, the Kolmogorov complexity, and the Shannon's equation connecting the length of a message ($l$) and its frequency $(p)$, i.e. $l = - \log p$ for rational communication systems. This approach, new for…
▽ More
In this review we integrate results of long term experimental study on ant "language" and intelligence which were fully based on fundamental ideas of Information Theory, such as the Shannon entropy, the Kolmogorov complexity, and the Shannon's equation connecting the length of a message ($l$) and its frequency $(p)$, i.e. $l = - \log p$ for rational communication systems. This approach, new for studying biological communication systems, enabled us to obtain the following important results on ants' communication and intelligence: i) to reveal "distant homing" in ants, that is, their ability to transfer information about remote events; ii) to estimate the rate of information transmission; iii) to reveal that ants are able to grasp regularities and to use them for "compression" of information; iv) to reveal that ants are able to transfer to each other the information about the number of objects; v) to discover that ants can add and subtract small numbers. The obtained results show that Information Theory is not only wonderful mathematical theory, but many its results may be considered as Nature laws.
△ Less
Submitted 23 December, 2009;
originally announced December 2009.
-
Using Kolmogorov Complexity for Understanding Some Limitations on Steganography
Authors:
Boris Ryabko,
Daniil Ryabko
Abstract:
Recently perfectly secure steganographic systems have been described for a wide class of sources of covertexts. The speed of transmission of secret information for these stegosystems is proportional to the length of the covertext. In this work we show that there are sources of covertexts for which such stegosystems do not exist. The key observation is that if the set of possible covertexts has a…
▽ More
Recently perfectly secure steganographic systems have been described for a wide class of sources of covertexts. The speed of transmission of secret information for these stegosystems is proportional to the length of the covertext. In this work we show that there are sources of covertexts for which such stegosystems do not exist. The key observation is that if the set of possible covertexts has a maximal Kolmogorov complexity, then a high-speed perfect stegosystem has to have complexity of the same order.
△ Less
Submitted 26 January, 2009;
originally announced January 2009.
-
The Imaginary Sliding Window As a New Data Structure for Adaptive Algorithms
Authors:
Boris Ryabko
Abstract:
The scheme of the sliding window is known in Information Theory, Computer Science, the problem of predicting and in stastistics. Let a source with unknown statistics generate some word $... x_{-1}x_{0}x_{1}x_{2}...$ in some alphabet $A$. For every moment $t, t=... $ $-1, 0, 1, ...$, one stores the word ("window") $ x_{t-w} x_{t-w+1}... x_{t-1}$ where $w$,$w \geq 1$, is called "window length". In…
▽ More
The scheme of the sliding window is known in Information Theory, Computer Science, the problem of predicting and in stastistics. Let a source with unknown statistics generate some word $... x_{-1}x_{0}x_{1}x_{2}...$ in some alphabet $A$. For every moment $t, t=... $ $-1, 0, 1, ...$, one stores the word ("window") $ x_{t-w} x_{t-w+1}... x_{t-1}$ where $w$,$w \geq 1$, is called "window length". In the theory of universal coding, the code of the $x_{t}$ depends on source ststistics estimated by the window, in the problem of predicting, each letter $x_{t}$ is predicted using information of the window, etc. After that the letter $x_{t}$ is included in the window on the right, while $x_{t-w}$ is removed from the window. It is the sliding window scheme. This scheme has two merits: it allows one i) to estimate the source statistics quite precisely and ii) to adapt the code in case of a change in the source' statistics. However this scheme has a defect, namely, the necessity to store the window (i.e. the word $x_{t-w}... x_{t-1})$ which needs a large memory size for large $w$. A new scheme named "the Imaginary Sliding Window (ISW)" is constructed. The gist of this scheme is that not the last element $x_{t-w}$ but rather a random one is removed from the window. This allows one to retain both merits of the sliding window as well as the possibility of not storing the window and thus significantly decreasing the memory size.
△ Less
Submitted 27 September, 2008;
originally announced September 2008.
-
Constructing Perfect Steganographic Systems
Authors:
Boris Ryabko,
Daniil Ryabko
Abstract:
We propose steganographic systems for the case when covertexts (containers) are generated by a finite-memory source with possibly unknown statistics. The probability distributions of covertexts with and without hidden information are the same; this means that the proposed stegosystems are perfectly secure, i.e. an observer cannot determine whether hidden information is being transmitted. The speed…
▽ More
We propose steganographic systems for the case when covertexts (containers) are generated by a finite-memory source with possibly unknown statistics. The probability distributions of covertexts with and without hidden information are the same; this means that the proposed stegosystems are perfectly secure, i.e. an observer cannot determine whether hidden information is being transmitted. The speed of transmission of hidden information can be made arbitrary close to the theoretical limit - the Shannon entropy of the source of covertexts. An interesting feature of the suggested stegosystems is that they do not require any (secret or public) key.
At the same time, we outline some principled computational limitations on steganography. We show that there are such sources of covertexts, that any stegosystem that has linear (in the length of the covertext) speed of transmission of hidden text must have an exponential Kolmogorov complexity. This shows, in particular, that some assumptions on the sources of covertext are necessary.
△ Less
Submitted 11 July, 2011; v1 submitted 9 September, 2008;
originally announced September 2008.
-
Applications of Universal Source Coding to Statistical Analysis of Time Series
Authors:
Boris Ryabko
Abstract:
We show how universal codes can be used for solving some of the most important statistical problems for time series. By definition, a universal code (or a universal lossless data compressor) can compress any sequence generated by a stationary and ergodic source asymptotically to the Shannon entropy, which, in turn, is the best achievable ratio for lossless data compressors.
We consider finite-…
▽ More
We show how universal codes can be used for solving some of the most important statistical problems for time series. By definition, a universal code (or a universal lossless data compressor) can compress any sequence generated by a stationary and ergodic source asymptotically to the Shannon entropy, which, in turn, is the best achievable ratio for lossless data compressors.
We consider finite-alphabet and real-valued time series and the following problems: estimation of the limiting probabilities for finite-alphabet time series and estimation of the density for real-valued time series, the on-line prediction, regression, classification (or problems with side information) for both types of the time series and the following problems of hypothesis testing: goodness-of-fit testing, or identity testing, and testing of serial independence. It is important to note that all problems are considered in the framework of classical mathematical statistics and, on the other hand, everyday methods of data compression (or archivers) can be used as a tool for the estimation and testing. It turns out, that quite often the suggested methods and tests are more powerful than known ones when they are applied in practice.
△ Less
Submitted 7 September, 2008;
originally announced September 2008.
-
Nonparametric Statistical Inference for Ergodic Processes
Authors:
Daniil Ryabko,
Boris Ryabko
Abstract:
In this work a method for statistical analysis of time series is proposed, which is used to obtain solutions to some classical problems of mathematical statistics under the only assumption that the process generating the data is stationary ergodic. Namely, three problems are considered: goodness-of-fit (or identity) testing, process classification, and the change point problem. For each of the pro…
▽ More
In this work a method for statistical analysis of time series is proposed, which is used to obtain solutions to some classical problems of mathematical statistics under the only assumption that the process generating the data is stationary ergodic. Namely, three problems are considered: goodness-of-fit (or identity) testing, process classification, and the change point problem. For each of the problems a test is constructed that is asymptotically accurate for the case when the data is generated by stationary ergodic processes. The tests are based on empirical estimates of distributional distance.
△ Less
Submitted 3 April, 2012; v1 submitted 3 April, 2008;
originally announced April 2008.
-
Fast Recursive Coding Based on Grou** of Symbols
Authors:
Nikolay Ponomarenko,
Vladimir Lukin,
Karen Egiazarian,
Jaakko Astola,
Boris Y Ryabko
Abstract:
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger.
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger.
△ Less
Submitted 21 August, 2007;
originally announced August 2007.
-
Compression-based methods for nonparametric density estimation, on-line prediction, regression and classification for time series
Authors:
Boris Ryabko
Abstract:
We address the problem of nonparametric estimation of characteristics for stationary and ergodic time series. We consider finite-alphabet time series and real-valued ones and the following four problems: i) estimation of the (limiting) probability (or estimation of the density for real-valued time series), ii) on-line prediction, iii) regression and iv) classification (or so-called problems with…
▽ More
We address the problem of nonparametric estimation of characteristics for stationary and ergodic time series. We consider finite-alphabet time series and real-valued ones and the following four problems: i) estimation of the (limiting) probability (or estimation of the density for real-valued time series), ii) on-line prediction, iii) regression and iv) classification (or so-called problems with side information). We show that so-called archivers (or data compressors) can be used as a tool for solving these problems. In particular, firstly, it is proven that any so-called universal code (or universal data compressor) can be used as a basis for constructing asymptotically optimal methods for the above problems. (By definition, a universal code can "compress" any sequence generated by a stationary and ergodic source asymptotically till the Shannon entropy of the source.) And, secondly, we show experimentally that estimates, which are based on practically used methods of data compression, have a reasonable precision.
△ Less
Submitted 1 November, 2007; v1 submitted 7 January, 2007;
originally announced January 2007.
-
Provably Secure Universal Steganographic Systems
Authors:
Boris Ryabko,
Daniil Ryabko
Abstract:
We propose a simple universal (that is, distribution--free) steganographic system in which covertexts with and without hidden texts are statistically indistinguishable. The stegosystem can be applied to any source generating i.i.d. covertexts with unknown distribution, and the hidden text is transmitted exactly, with zero probability of error. Moreover, the proposed steganographic system has two…
▽ More
We propose a simple universal (that is, distribution--free) steganographic system in which covertexts with and without hidden texts are statistically indistinguishable. The stegosystem can be applied to any source generating i.i.d. covertexts with unknown distribution, and the hidden text is transmitted exactly, with zero probability of error. Moreover, the proposed steganographic system has two important properties. First, the rate of transmission of hidden information approaches the Shannon entropy of the covertext source as the size of blocks used for hidden text encoding tends to infinity. Second, if the size of the alphabet of the covertext source and its minentropy tend to infinity then the number of bits of hidden text per letter of covertext tends to $\log(n!)/n$ where $n$ is the (fixed) size of blocks used for hidden text encoding. The proposed stegosystem uses randomization.
△ Less
Submitted 20 June, 2006;
originally announced June 2006.
-
Universal Codes as a Basis for Time Series Testing
Authors:
Boris Ryabko,
Jaakko Astola
Abstract:
We suggest a new approach to hypothesis testing for ergodic and stationary processes. In contrast to standard methods, the suggested approach gives a possibility to make tests, based on any lossless data compression method even if the distribution law of the codeword lengths is not known. We apply this approach to the following four problems: goodness-of-fit testing (or identity testing), testin…
▽ More
We suggest a new approach to hypothesis testing for ergodic and stationary processes. In contrast to standard methods, the suggested approach gives a possibility to make tests, based on any lossless data compression method even if the distribution law of the codeword lengths is not known. We apply this approach to the following four problems: goodness-of-fit testing (or identity testing), testing for independence, testing of serial independence and homogeneity testing and suggest nonparametric statistical tests for these problems. It is important to note that practically used so-called archivers can be used for suggested testing.
△ Less
Submitted 25 February, 2006;
originally announced February 2006.
-
Fast Enumeration of Combinatorial Objects
Authors:
Boris Ryabko
Abstract:
The problem of ranking can be described as follows. We have a set of combinatorial objects $S$, such as, say, the k-subsets of n things, and we can imagine that they have been arranged in some list, say lexicographically, and we want to have a fast method for obtaining the rank of a given object in the list. This problem is widely known in Combinatorial Analysis, Computer Science and Information…
▽ More
The problem of ranking can be described as follows. We have a set of combinatorial objects $S$, such as, say, the k-subsets of n things, and we can imagine that they have been arranged in some list, say lexicographically, and we want to have a fast method for obtaining the rank of a given object in the list. This problem is widely known in Combinatorial Analysis, Computer Science and Information Theory. Ranking is closely connected with the hashing problem, especially with perfect hashing and with generating of random combinatorial objects. In Information Theory the ranking problem is closely connected with so-called enumerative encoding, which may be described as follows: there is a set of words $S$ and an enumerative code has to one-to-one encode every $s \in S$ by a binary word $code(s)$. The length of the $code(s)$ must be the same for all $s \in S$. Clearly, $|code (s)|\geq \log |S|$. (Here and below $\log x=\log_{2}x)$.) The suggested method allows the exponential growth of the speed of encoding and decoding for all combinatorial problems of enumeration which are considered, including the enumeration of permutations, compositions and others.
△ Less
Submitted 15 January, 2006;
originally announced January 2006.
-
Universal Codes as a Basis for Nonparametric Testing of Serial Independence for Time Series
Authors:
Boris Ryabko,
Jaakko Astola
Abstract:
We consider a stationary and ergodic source $p$ generated symbols $x_1 ... x_t$ from some finite set $A$ and a null hypothesis $H_0$ that $p$ is Markovian source with memory (or connectivity) not larger than $m, (m >= 0).$ The alternative hypothesis $H_1$ is that the sequence is generated by a stationary and ergodic source, which differs from the source under $H_0$. In particular, if $m= 0$ we h…
▽ More
We consider a stationary and ergodic source $p$ generated symbols $x_1 ... x_t$ from some finite set $A$ and a null hypothesis $H_0$ that $p$ is Markovian source with memory (or connectivity) not larger than $m, (m >= 0).$ The alternative hypothesis $H_1$ is that the sequence is generated by a stationary and ergodic source, which differs from the source under $H_0$. In particular, if $m= 0$ we have the null hypothesis $H_0$ that the sequence is generated by Bernoully source (or the hypothesis that $x_1 ...x_t$ are independent.) Some new tests which are based on universal codes and universal predictors, are suggested.
△ Less
Submitted 26 June, 2005;
originally announced June 2005.
-
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series
Authors:
Boris Ryabko,
Jaakko Astola,
Alex Gammerman
Abstract:
We show that Kolmogorov complexity and such its estimators as universal codes (or data compression methods) can be applied for hypotheses testing in a framework of classical mathematical statistics. The methods for identity testing and nonparametric testing of serial independence for time series are suggested.
We show that Kolmogorov complexity and such its estimators as universal codes (or data compression methods) can be applied for hypotheses testing in a framework of classical mathematical statistics. The methods for identity testing and nonparametric testing of serial independence for time series are suggested.
△ Less
Submitted 29 May, 2005;
originally announced May 2005.
-
Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding
Authors:
Boris Ryabko,
Jaakko Astola
Abstract:
The problem of predicting a sequence $x_1,x_2,...$ generated by a discrete source with unknown statistics is considered. Each letter $x_{t+1}$ is predicted using information on the word $x_1x_2... x_t$ only. In fact, this problem is a classical problem which has received much attention. Its history can be traced back to Laplace. We address the problem where each $x_i$ belongs to some large (or e…
▽ More
The problem of predicting a sequence $x_1,x_2,...$ generated by a discrete source with unknown statistics is considered. Each letter $x_{t+1}$ is predicted using information on the word $x_1x_2... x_t$ only. In fact, this problem is a classical problem which has received much attention. Its history can be traced back to Laplace. We address the problem where each $x_i$ belongs to some large (or even infinite) alphabet. A method is presented for which the precision is greater than for known algorithms, where precision is estimated by the Kullback-Leibler divergence. The results can readily be translated to results about adaptive coding.
△ Less
Submitted 21 April, 2005; v1 submitted 17 April, 2005;
originally announced April 2005.
-
Using Information Theory Approach to Randomness Testing
Authors:
B. Ya. Ryabko,
V. A. Monarev
Abstract:
We address the problem of detecting deviations of binary sequence from randomness,which is very important for random number (RNG) and pseudorandom number generators (PRNG). Namely, we consider a null hypothesis $H_0$ that a given bit sequence is generated by Bernoulli source with equal probabilities of 0 and 1 and the alternative hypothesis $H_1$ that the sequence is generated by a stationary an…
▽ More
We address the problem of detecting deviations of binary sequence from randomness,which is very important for random number (RNG) and pseudorandom number generators (PRNG). Namely, we consider a null hypothesis $H_0$ that a given bit sequence is generated by Bernoulli source with equal probabilities of 0 and 1 and the alternative hypothesis $H_1$ that the sequence is generated by a stationary and ergodic source which differs from the source under $H_0$. We show that data compression methods can be used as a basis for such testing and describe two new tests for randomness, which are based on ideas of universal coding. Known statistical tests and suggested ones are applied for testing PRNGs. Those experiments show that the power of the new tests is greater than of many known algorithms.
△ Less
Submitted 3 April, 2005;
originally announced April 2005.
-
Fast Codes for Large Alphabets
Authors:
Boris Ryabko,
Jaakko Astola,
Karen Egiazarian
Abstract:
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super letters) and use time consuming coding for these subsets only, whereas letters in the subsets have the same code length and therefore can be coded fast. The des…
▽ More
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super letters) and use time consuming coding for these subsets only, whereas letters in the subsets have the same code length and therefore can be coded fast. The described scheme can be applied to sources with known and unknown statistics.
△ Less
Submitted 2 April, 2005;
originally announced April 2005.