-
Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks
Authors:
Jarek Duda
Abstract:
Biological neural networks seem qualitatively superior (e.g. in learning, flexibility, robustness) from current artificial like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). Simultaneously, in contrast to them: have fundamentally multidirectional signal propagation~\cite{axon}, also of probability distributions e.g. for uncertainty estimation, and are believed not being able to…
▽ More
Biological neural networks seem qualitatively superior (e.g. in learning, flexibility, robustness) from current artificial like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). Simultaneously, in contrast to them: have fundamentally multidirectional signal propagation~\cite{axon}, also of probability distributions e.g. for uncertainty estimation, and are believed not being able to use standard backpropagation training~\cite{backprop}. There are proposed novel artificial neurons based on HCR (Hierarchical Correlation Reconstruction) removing the above low level differences: with neurons containing local joint distribution model (of its connections), representing joint density on normalized variables as just linear combination among $(f_\mathbf{j})$ orthonormal polynomials: $ρ(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$ and $B$ some chosen basis, with basis growth approaching complete description of joint distribution. By various index summations of such $(a_\mathbf{j})$ tensor as neuron parameters, we get simple formulas for e.g. conditional expected values for propagation in any direction, like $E[x|y,z]$, $E[y|x]$, which degenerate to KAN-like parametrization if restricting to pairwise dependencies. Such HCR network can also propagate probability distributions (also joint) like $ρ(y,z|x)$. It also allows for additional training approaches, like direct $(a_\mathbf{j})$ estimation, through tensor decomposition, or more biologically plausible information bottleneck training: layers directly influencing only neighbors, optimizing content to maximize information about the next layer, and minimizing about the previous to minimize the noise.
△ Less
Submitted 1 July, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
Simple inexpensive vertex and edge invariants distinguishing dataset strongly regular graphs
Authors:
Jarek Duda
Abstract:
While standard Weisfeiler-Leman vertex labels are not able to distinguish even vertices of regular graphs, there is proposed and tested family of inexpensive polynomial time vertex and edge invariants, distinguishing much more difficult SRGs (strongly regular graphs), also often their vertices. Among 43717 SRGs from dataset by Edward Spence, proposed vertex invariants alone were able to distinguis…
▽ More
While standard Weisfeiler-Leman vertex labels are not able to distinguish even vertices of regular graphs, there is proposed and tested family of inexpensive polynomial time vertex and edge invariants, distinguishing much more difficult SRGs (strongly regular graphs), also often their vertices. Among 43717 SRGs from dataset by Edward Spence, proposed vertex invariants alone were able to distinguish all but 4 pairs of graphs, which were easily distinguished by further application of proposed edge invariants. Specifically, proposed vertex invariants are traces or sorted diagonals of $(A|_{N_a})^p$ adjacency matrix $A$ restricted to $N_a$ neighborhood of vertex $a$, already for $p=3$ distinguishing all SRGs from 6 out of 13 sets in this dataset, 8 if adding $p=4$. Proposed edge invariants are analogously traces or diagonals of powers of $\bar{A}_{ab,cd}=A_{ab} A_{ac} A_{bd}$, nonzero for $(a,b)$ being edges. As SRGs are considered the most difficult cases for graph isomorphism problem, such algebraic-combinatorial invariants bring hope that this problem is polynomial time.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Extracting individual variable information for their decoupling, direct mutual information and multi-feature Granger causality
Authors:
Jarek Duda
Abstract:
Working with multiple variables they usually contain difficult to control complex dependencies. This article proposes extraction of their individual information, e.g. $\overline{X|Y}$ as random variable containing information from $X$, but with removed information about $Y$, by using $(x,y) \leftrightarrow (\bar{x}=\textrm{CDF}_{X|Y=y}(x),y)$ reversible normalization. One application can be decoup…
▽ More
Working with multiple variables they usually contain difficult to control complex dependencies. This article proposes extraction of their individual information, e.g. $\overline{X|Y}$ as random variable containing information from $X$, but with removed information about $Y$, by using $(x,y) \leftrightarrow (\bar{x}=\textrm{CDF}_{X|Y=y}(x),y)$ reversible normalization. One application can be decoupling of individual information of variables: reversibly transform $(X_1,\ldots,X_n)\leftrightarrow(\tilde{X}_1,\ldots \tilde{X}_n)$ together containing the same information, but being independent: $\forall_{i\neq j} \tilde{X}_i\perp \tilde{X}_j, \tilde{X}_i\perp X_j$. It requires detailed models of complex conditional probability distributions - it is generally a difficult task, but here can be done through multiple dependency reducing iterations, using imperfect methods (here HCR: Hierarchical Correlation Reconstruction). It could be also used for direct mutual information - evaluating direct information transfer: without use of intermediate variables. For causality direction there is discussed multi-feature Granger causality, e.g. to trace various types of individual information transfers between such decoupled variables, including propagation time (delay).
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Time delay multi-feature correlation analysis to extract subtle dependencies from EEG signals
Authors:
Jarek Duda
Abstract:
Electroencephalography (EEG) signals are resultants of extremely complex brain activity. Some details of this hidden dynamics might be accessible through e.g. joint distributions $ρ_{Δt}$ of signals of pairs of electrodes shifted by various time delays (lag $Δt$). A standard approach is monitoring a single evaluation of such joint distributions, like Pearson correlation (or mutual information), wh…
▽ More
Electroencephalography (EEG) signals are resultants of extremely complex brain activity. Some details of this hidden dynamics might be accessible through e.g. joint distributions $ρ_{Δt}$ of signals of pairs of electrodes shifted by various time delays (lag $Δt$). A standard approach is monitoring a single evaluation of such joint distributions, like Pearson correlation (or mutual information), which turns out relatively uninteresting - as expected, there is usually a small peak for zero delay and nearly symmetric drop with delay. In contrast, such a complex signal might be composed of multiple types of statistical dependencies - this article proposes approach to automatically decompose and extract them. Specifically, we model such joint distributions as polynomials, estimated separately for all considered lag dependencies, then with PCA dimensionality reduction we find the dominant joint density distortion directions $f_v$. This way we get a few lag dependent features $a_i(Δt)$ describing separate dominating statistical dependencies of known contributions: $ρ_{Δt}(y,z)\approx \sum_{i=1}^r a_i(Δt)\, f_{v_i}(y,z)$. Such features complement Pearson correlation, extracting hidden more complex behavior, e.g. with asymmetry which might be related with direction of information transfer, extrema suggesting characteristic delays, or oscillatory behavior suggesting some periodicity. There is also discussed extension of Granger causality to such multi-feature joint density analysis, suggesting e.g. two separate causality waves. While this early article is initial fundamental research, in future it might help e.g. with understanding of cortex hidden dynamics, diagnosis of pathologies like epilepsy, determination of precise electrode position, or building brain-computer interface.
△ Less
Submitted 29 May, 2023; v1 submitted 24 April, 2023;
originally announced May 2023.
-
Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series
Authors:
Jarek Duda
Abstract:
The real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like ARMA-ARCH assume arbitrary type of dependence. To avoid such bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time $t$ finding parameters optimizing e.g. $F_t=\sum_{τ<t} (1-η)^{t-τ} \ln(ρ_θ(x_τ))$ moving log-likelihood, evolving in ti…
▽ More
The real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like ARMA-ARCH assume arbitrary type of dependence. To avoid such bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time $t$ finding parameters optimizing e.g. $F_t=\sum_{τ<t} (1-η)^{t-τ} \ln(ρ_θ(x_τ))$ moving log-likelihood, evolving in time. It allows for example to estimate parameters using inexpensive exponential moving averages (EMA), like absolute central moments $E[|x-μ|^p]$ evolving for one or multiple powers $p\in\mathbb{R}^+$ using $m_{p,t+1} = m_{p,t} + η(|x_t-μ_t|^p-m_{p,t})$. Application of such general adaptive methods of moments will be presented on Student's t-distribution, popular especially in economical applications, here applied to log-returns of DJIA companies. While standard ARMA-ARCH approaches provide evolution of $μ$ and $σ$, here we also get evolution of $ν$ describing $ρ(x)\sim |x|^{-ν-1}$ tail shape, probability of extreme events - which might turn out catastrophic, destabilizing the market.
△ Less
Submitted 12 April, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
SynthA1c: Towards Clinically Interpretable Patient Representations for Diabetes Risk Stratification
Authors:
Michael S. Yao,
Allison Chae,
Matthew T. MacLean,
Anurag Verma,
Jeffrey Duda,
James Gee,
Drew A. Torigian,
Daniel Rader,
Charles Kahn,
Walter R. Witschey,
Hersh Sagreiya
Abstract:
Early diagnosis of Type 2 Diabetes Mellitus (T2DM) is crucial to enable timely therapeutic interventions and lifestyle modifications. As the time available for clinical office visits shortens and medical imaging data become more widely available, patient image data could be used to opportunistically identify patients for additional T2DM diagnostic workup by physicians. We investigated whether imag…
▽ More
Early diagnosis of Type 2 Diabetes Mellitus (T2DM) is crucial to enable timely therapeutic interventions and lifestyle modifications. As the time available for clinical office visits shortens and medical imaging data become more widely available, patient image data could be used to opportunistically identify patients for additional T2DM diagnostic workup by physicians. We investigated whether image-derived phenotypic data could be leveraged in tabular learning classifier models to predict T2DM risk in an automated fashion to flag high-risk patients without the need for additional blood laboratory measurements. In contrast to traditional binary classifiers, we leverage neural networks and decision tree models to represent patient data as 'SynthA1c' latent variables, which mimic blood hemoglobin A1c empirical lab measurements, that achieve sensitivities as high as 87.6%. To evaluate how SynthA1c models may generalize to other patient populations, we introduce a novel generalizable metric that uses vanilla data augmentation techniques to predict model performance on input out-of-domain covariates. We show that image-derived phenotypes and physical examination data together can accurately predict diabetes risk as a means of opportunistic risk stratification enabled by artificial intelligence and medical imaging. Our code is available at https://github.com/allisonjchae/DMT2RiskAssessment.
△ Less
Submitted 27 July, 2023; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Predicting probability distributions for cancer therapy drug selection optimization
Authors:
Jarek Duda
Abstract:
Large variability between cell lines brings a difficult optimization problem of drug selection for cancer therapy. Standard approaches use prediction of value for this purpose, corresponding e.g. to expected value of their distribution. This article shows superiority of working on, predicting the entire probability distributions - proposing basic tools for this purpose. We are mostly interested in…
▽ More
Large variability between cell lines brings a difficult optimization problem of drug selection for cancer therapy. Standard approaches use prediction of value for this purpose, corresponding e.g. to expected value of their distribution. This article shows superiority of working on, predicting the entire probability distributions - proposing basic tools for this purpose. We are mostly interested in the best drug in their batch to be tested - proper optimization of their selection for extreme statistics requires knowledge of the entire probability distributions, which for distributions of drug properties among cell lines often turn out binomial, e.g. depending on corresponding gene. Hence for basic prediction mechanism there is proposed mixture of two Gaussians, trying to predict its weight based on additional information.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Compression Optimality of Asymmetric Numeral Systems
Authors:
Josef Pieprzyk,
Jarek Duda,
Marcin Pawlowski,
Seyit Camtepe,
Arash Mahboubi,
Pawel Morawiecki
Abstract:
Compression also known as entropy coding has a rich and long history. However, a recent explosion of multimedia Internet applications (such as teleconferencing and video streaming for instance) renews an interest in fast compression that also squeezes out as much redundancy as possible. In 2009 Jarek Duda invented his asymmetric numeral system (ANS). Apart from a beautiful mathematical structure,…
▽ More
Compression also known as entropy coding has a rich and long history. However, a recent explosion of multimedia Internet applications (such as teleconferencing and video streaming for instance) renews an interest in fast compression that also squeezes out as much redundancy as possible. In 2009 Jarek Duda invented his asymmetric numeral system (ANS). Apart from a beautiful mathematical structure, it is very efficient and offers compression with a very low residual redundancy. ANS works well for any symbol source statistics. Besides, ANS has become a preferred compression algorithm in the IT industry. However, designing ANS instance requires a random selection of its symbol spread function. Consequently, each ANS instance offers compression with a slightly different compression rate.
The paper investigates compression optimality of ANS. It shows that ANS is optimal (i.e. the entropies of encoding and source are equal) for any symbol sources whose probability distribution is described by natural powers of 1/2. We use Markov chains to calculate ANS state probabilities. This allows us to determine ANS compression rate precisely. We present two algorithms for finding ANS instances with high compression rates. The first explores state probability approximations in order to choose ANS instances with better compression rates. The second algorithm is a probabilistic one. It finds ANS instances, whose compression rate can be made as close to the best rate as required. This is done at the expense of the number $θ$ of internal random ``coin'' tosses. The algorithm complexity is ${\cal O}(θL^3)$, where $L$ is the number of ANS states. The complexity can be reduced to ${\cal O}(θL\log{L})$ if we use a fast matrix inversion. If the algorithm is implemented on quantum computer, its complexity becomes ${\cal O}(θ(\log{L})^3)$.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Low cost prediction of probability distributions of molecular properties for early virtual screening
Authors:
Jarek Duda,
Sabina Podlewska
Abstract:
While there is a general focus on predictions of values, mathematically more appropriate is prediction of probability distributions: with additional possibilities like prediction of uncertainty, higher moments and quantiles. For the purpose of the computer-aided drug design field, this article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demograph…
▽ More
While there is a general focus on predictions of values, mathematically more appropriate is prediction of probability distributions: with additional possibilities like prediction of uncertainty, higher moments and quantiles. For the purpose of the computer-aided drug design field, this article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data. Instead of a single linear regression to predict values, it uses multiple linear regressions to independently predict multiple moments, finally combining them into predicted probability distribution, here of several ADMET properties based on substructural fingerprint developed by Klekota\&Roth. Discussed application example is inexpensive selection of a percentage of molecules with properties nearly certain to be in a predicted or chosen range during virtual screening. Such an approach can facilitate the interpretation of the results as the predictions characterized by high rate of uncertainty are automatically detected. In addition, for each of the investigated predictive problems, we detected crucial structural features, which should be carefully considered when optimizing compounds towards particular property. The whole methodology developed in the study constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic and guides the compound optimization process.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Predicting conditional probability distributions of redshifts of Active Galactic Nuclei using Hierarchical Correlation Reconstruction
Authors:
Jarek Duda
Abstract:
While there is a general focus on prediction of values, real data often only allows to predict conditional probability distributions, with capabilities bounded by conditional entropy $H(Y|X)$. If additionally estimating uncertainty, we can treat a predicted value as the center of Gaussian of Laplace distribution - idealization which can be far from complex conditional distributions of real data. T…
▽ More
While there is a general focus on prediction of values, real data often only allows to predict conditional probability distributions, with capabilities bounded by conditional entropy $H(Y|X)$. If additionally estimating uncertainty, we can treat a predicted value as the center of Gaussian of Laplace distribution - idealization which can be far from complex conditional distributions of real data. This article applies Hierarchical Correlation Reconstruction (HCR) approach to inexpensively predict quite complex conditional probability distributions (e.g. multimodal): by independent MSE estimation of multiple moment-like parameters, which allow to reconstruct the conditional distribution. Using linear regression for this purpose, we get interpretable models: with coefficients describing contributions of features to conditional moments. This article extends on the original approach especially by using Canonical Correlation Analysis (CCA) for feature optimization and l1 "lasso" regularization, focusing on practical problem of prediction of redshift of Active Galactic Nuclei (AGN) based on Fourth Fermi-LAT Data Release 2 (4LAC) dataset.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Fast optimization of common basis for matrix set through Common Singular Value Decomposition
Authors:
Jarek Duda
Abstract:
SVD (singular value decomposition) is one of the basic tools of machine learning, allowing to optimize basis for a given matrix. However, sometimes we have a set of matrices $\{A_k\}_k$ instead, and would like to optimize a single common basis for them: find orthogonal matrices $U$, $V$, such that $\{U^T A_k V\}$ set of matrices is somehow simpler. For example DCT-II is orthonormal basis of functi…
▽ More
SVD (singular value decomposition) is one of the basic tools of machine learning, allowing to optimize basis for a given matrix. However, sometimes we have a set of matrices $\{A_k\}_k$ instead, and would like to optimize a single common basis for them: find orthogonal matrices $U$, $V$, such that $\{U^T A_k V\}$ set of matrices is somehow simpler. For example DCT-II is orthonormal basis of functions commonly used in image/video compression - as discussed here, this kind of basis can be quickly automatically optimized for a given dataset. While also discussed gradient descent optimization might be computationally costly, there is proposed CSVD (common SVD): fast general approach based on SVD. Specifically, we choose $U$ as built of eigenvectors of $\sum_i (w_k)^q (A_k A_k^T)^p$ and $V$ of $\sum_k (w_k)^q (A_k^T A_k)^p$, where $w_k$ are their weights, $p,q>0$ are some chosen powers e.g. 1/2, optionally with normalization e.g. $A \to A - rc^T$ where $r_i=\sum_j A_{ij}, c_j =\sum_i A_{ij}$.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Context binning, model clustering and adaptivity for data compression of genetic data
Authors:
Jarek Duda
Abstract:
Rapid growth of genetic databases means huge savings from improvements in their data compression, what requires better inexpensive statistical models. This article proposes automatized optimizations e.g. of Markov-like models, especially context binning and model clustering. While it is popular to just remove low bits of the context, proposed context binning automatically optimizes such reduction…
▽ More
Rapid growth of genetic databases means huge savings from improvements in their data compression, what requires better inexpensive statistical models. This article proposes automatized optimizations e.g. of Markov-like models, especially context binning and model clustering. While it is popular to just remove low bits of the context, proposed context binning automatically optimizes such reduction as tabled: state=bin[context] determining probability distribution, this way extracting nearly all useful information also from very large contexts, into a relatively small number of states. The second proposed approach: model clustering uses k-means clustering in space of general statistical models, allowing to optimize a few models (as cluster centroids) to be chosen e.g. separately for each read. There are also briefly discussed some adaptivity techniques to include data non-stationarity.
△ Less
Submitted 3 May, 2022; v1 submitted 13 January, 2022;
originally announced January 2022.
-
Encoding of probability distributions for Asymmetric Numeral Systems
Authors:
Jarek Duda
Abstract:
Many data compressors regularly encode probability distributions for entropy coding - requiring minimal description length type of optimizations. Canonical prefix/Huffman coding usually just writes lengths of bit sequences, this way approximating probabilities with powers-of-2. Operating on more accurate probabilities usually allows for better compression ratios, and is possible e.g. using arithme…
▽ More
Many data compressors regularly encode probability distributions for entropy coding - requiring minimal description length type of optimizations. Canonical prefix/Huffman coding usually just writes lengths of bit sequences, this way approximating probabilities with powers-of-2. Operating on more accurate probabilities usually allows for better compression ratios, and is possible e.g. using arithmetic coding and Asymmetric Numeral Systems family. Especially the multiplication-free tabled variant of the latter (tANS) builds automaton often replacing Huffman coding due to better compression at similar computational cost - e.g. in popular Facebook Zstandard and Apple LZFSE compressors. There is discussed encoding of probability distributions for such applications, especially using Pyramid Vector Quantizer(PVQ)-based approach with deformation, bucket approximation, prefix trees, improving accuracy with additional bits, also tuned symbol spread for tANS.
△ Less
Submitted 4 July, 2022; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Improving distribution and flexible quantization for DCT coefficients
Authors:
Jarek Duda
Abstract:
While it is a common knowledge that AC coefficients of Fourier-related transforms, like DCT-II of JPEG image compression, are from Laplace distribution, there was tested more general EPD (exponential power distribution) $ρ\sim \exp(-(|x-μ|/σ)^κ)$ family, leading to maximum likelihood estimated (MLE) $κ\approx 0.5$ instead of Laplace distribution $κ=1$ - such replacement gives $\approx 0.1$ bits/va…
▽ More
While it is a common knowledge that AC coefficients of Fourier-related transforms, like DCT-II of JPEG image compression, are from Laplace distribution, there was tested more general EPD (exponential power distribution) $ρ\sim \exp(-(|x-μ|/σ)^κ)$ family, leading to maximum likelihood estimated (MLE) $κ\approx 0.5$ instead of Laplace distribution $κ=1$ - such replacement gives $\approx 0.1$ bits/value mean savings (per pixel for grayscale, up to $3\times$ for RGB).
There is also discussed predicting distributions (as $μ, σ, κ$ parameters) for DCT coefficients from already decoded coefficients in the current and neighboring DCT blocks. Predicting values $(μ)$ from neighboring blocks allows to reduce blocking artifacts, also improve compression ratio - for which prediction of uncertainty/width $σ$ alone provides much larger $\approx 0.5$ bits/value mean savings opportunity (often neglected).
Especially for such continuous distributions, there is also discussed quantization approach through optimized continuous \emph{quantization density function} $q$, which inverse CDF (cumulative distribution function) $Q$ on regular lattice $\{Q^{-1}((i-1/2)/N):i=1\ldots N\}$ gives quantization nodes - allowing for flexible inexpensive choice of optimized (non-uniform) quantization - of varying size $N$, with rate-distortion control. Optimizing $q$ for distortion alone leads to significant improvement, however, at cost of increased entropy due to more uniform distribution. Optimizing both turns out leading to nearly uniform quantization here, with automatized tail handling.
△ Less
Submitted 22 February, 2021; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Exploiting context dependence for image compression with upsampling
Authors:
Jarek Duda
Abstract:
Image compression with upsampling encodes information to succeedingly increase image resolution, for example by encoding differences in FUIF and JPEG XL. It is useful for progressive decoding, also often can improve compression ratio - both for lossless compression and e.g. DC coefficients of lossy. However, the currently used solutions rather do not exploit context dependence for encoding of such…
▽ More
Image compression with upsampling encodes information to succeedingly increase image resolution, for example by encoding differences in FUIF and JPEG XL. It is useful for progressive decoding, also often can improve compression ratio - both for lossless compression and e.g. DC coefficients of lossy. However, the currently used solutions rather do not exploit context dependence for encoding of such upscaling information. This article discusses simple inexpensive general techniques for this purpose, which allowed to save on average $0.645$ bits/difference (between $0.138$ and $1.489$) for the last upscaling for 48 standard $512\times 512$ grayscale 8 bit images - compared to assumption of fixed Laplace distribution. Using least squares linear regression of context to predict center of Laplace distribution gave on average $0.393$ bits/difference savings. The remaining savings were obtained by additionally predicting width of this Laplace distribution, also using just the least squares linear regression.
For RGB images, optimization of color transform alone gave mean $\approx 4.6\%$ size reduction comparing to standard YCrCb if using fixed transform, $\approx 6.3\%$ if optimizing transform individually for each image. Then further mean $\approx 10\%$ reduction was obtained if predicting Laplace parameters based on context. The presented simple inexpensive general methodology can be also used for different types of data like DCT coefficients in lossy image compression.
△ Less
Submitted 13 July, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Nearly accurate solutions for Ising-like models using Maximal Entropy Random Walk
Authors:
Jarek Duda
Abstract:
While one-dimensional Markov processes are well understood, going to higher dimensions there are only a few analytically solved Ising-like models, in practice requiring to use relatively costly, uncontrollable and inaccurate Monte-Carlo methods. There is discussed analytical approach for e.g. $width\times \infty$ approximation of lattice, also exploiting Hammersley-Clifford theorem to generate ran…
▽ More
While one-dimensional Markov processes are well understood, going to higher dimensions there are only a few analytically solved Ising-like models, in practice requiring to use relatively costly, uncontrollable and inaccurate Monte-Carlo methods. There is discussed analytical approach for e.g. $width\times \infty$ approximation of lattice, also exploiting Hammersley-Clifford theorem to generate random Gibbs/Markov field through scanning line-by-line using local statistical model as in lossless image compression. While its conditional distributions could be found with Monte-Carlo methods, there is discussed use of Maximal Entropy Random Walk (MERW) to calculate them from approximation of lattice as infinite in one direction and finite in the remaining. Specifically, in the finite directions there is built alphabet of all patterns, then transition matrix containing energy for all pairs of such patterns is built, from its dominant eigenvector getting probability distribution of pairs of patterns in Boltzmann distribution of their infinite sequences, which can be translated into local statistical model for line-by-line scan. Such inexpensive models, requiring seconds on a laptop for attached implementation and directly providing probability distributions of patterns, were tested for mean entropy and energy per node, getting maximal $\approx 0.02$ error from analytical solution near critical point, which quickly improves to extremely accurate e.g. $\approx 10^{-10}$ error for $J\approx 0.2$.
△ Less
Submitted 19 April, 2021; v1 submitted 31 December, 2019;
originally announced December 2019.
-
Modelling bid-ask spread conditional distributions using hierarchical correlation reconstruction
Authors:
Jarosław Duda,
Robert Syrek,
Henryk Gurgul
Abstract:
While we would like to predict exact values, available incomplete information is rarely sufficient - usually allowing only to predict conditional probability distributions. This article discusses hierarchical correlation reconstruction (HCR) methodology for such prediction on example of usually unavailable bid-ask spreads, predicted from more accessible data like closing price, volume, high/low pr…
▽ More
While we would like to predict exact values, available incomplete information is rarely sufficient - usually allowing only to predict conditional probability distributions. This article discusses hierarchical correlation reconstruction (HCR) methodology for such prediction on example of usually unavailable bid-ask spreads, predicted from more accessible data like closing price, volume, high/low price, returns. In HCR methodology we first normalize marginal distributions to nearly uniform like in copula theory. Then we model (joint) densities as linear combinations of orthonormal polynomials, getting its decomposition into (mixed) moments. Then here we model each moment (separately) of predicted variable as a linear combination of mixed moments of known variables using least squares linear regression - getting accurate description with interpretable coefficients describing linear relations between moments. Combining such predicted moments we get predicted density as a polynomial, for which we can e.g. calculate expected value, but also variance to evaluate uncertainty of such prediction, or we can use the entire distribution e.g. for more accurate further calculations or generating random values. There were performed 10-fold cross-validation log-likelihood tests for 22 DAX companies, leading to very accurate predictions, especially when using individual models for each company as there were found large differences between their behaviors. Additional advantage of the discussed methodology is being computationally inexpensive, finding and evaluation a model with hundreds of parameters and thousands of data points takes a second on a laptop.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
SGD momentum optimizer with step estimation by online parabola model
Authors:
Jarek Duda
Abstract:
In stochastic gradient descent, especially for neural network training, there are currently dominating first order methods: not modeling local distance to minimum. This information required for optimal step size is provided by second order methods, however, they have many difficulties, starting with full Hessian having square of dimension number of coefficients.
This article proposes a minimal s…
▽ More
In stochastic gradient descent, especially for neural network training, there are currently dominating first order methods: not modeling local distance to minimum. This information required for optimal step size is provided by second order methods, however, they have many difficulties, starting with full Hessian having square of dimension number of coefficients.
This article proposes a minimal step from successful first order momentum method toward second order: online parabola modelling in just a single direction: normalized $\hat{v}$ from momentum method. It is done by estimating linear trend of gradients $\vec{g}=\nabla F(\vecθ)$ in $\hat{v}$ direction: such that $g(\vecθ_\bot+θ\hat{v})\approx λ(θ-p)$ for $θ= \vecθ\cdot \hat{v}$, $g= \vec{g}\cdot \hat{v}$, $\vecθ_\bot=\vecθ-θ\hat{v}$. Using linear regression, $λ$, $p$ are MSE estimated by just updating four averages (of $g$, $θ$, $gθ$, $θ^2$) in the considered direction. Exponential moving averages allow here for inexpensive online estimation, weakening contribution of the old gradients. Controlling sign of curvature $λ$, we can repel from saddles in contrast to attraction in standard Newton method. In the remaining directions: not considered in second order model, we can simultaneously perform e.g. gradient descent.
There is also discussed its learning rate approximation as $μ=σ_θ/ σ_g$, allowing e.g. for adaptive SGD - with learning rate separately optimized (2nd order) for each parameter.
△ Less
Submitted 9 December, 2019; v1 submitted 16 July, 2019;
originally announced July 2019.
-
Parametric context adaptive Laplace distribution for multimedia compression
Authors:
Jarek Duda
Abstract:
Data compression often subtracts prediction and encodes the difference (residue) e.g. assuming Laplace distribution, for example for images, videos, audio, or numerical data. Its performance is strongly dependent on the proper choice of width (scale parameter) of this parametric distribution, can be improved if optimizing it based on local situation like context. For example in popular LOCO-I \cit…
▽ More
Data compression often subtracts prediction and encodes the difference (residue) e.g. assuming Laplace distribution, for example for images, videos, audio, or numerical data. Its performance is strongly dependent on the proper choice of width (scale parameter) of this parametric distribution, can be improved if optimizing it based on local situation like context. For example in popular LOCO-I \cite{loco} (JPEG-LS) lossless image compressor there is used 3 dimensional context quantized into 365 discrete possibilities treated independently. This article discusses inexpensive approaches for exploiting their dependencies with autoregressive ARCH-like context dependent models for parameters of parametric distribution for residue, also evolving in time for adaptive case. For example tested such 4 or 11 parameter models turned out to provide similar performance as 365 parameter LOCO-I model for 48 tested images. Beside smaller headers, such reduction of number of parameters can lead to better generalization. In contrast to context quantization approaches, parameterized models also allow to directly use higher dimensional contexts, for example using information from all 3 color channels, further pixels, some additional region classifiers, or from interleaving multi-scale scanning - for which there is proposed Haar upscale scan combining advantages of Haar wavelets with possibility of scanning exploiting local contexts.
△ Less
Submitted 14 October, 2019; v1 submitted 28 May, 2019;
originally announced June 2019.
-
Toroidal AutoEncoder
Authors:
Maciej Mikulski,
Jaroslaw Duda
Abstract:
Enforcing distributions of latent variables in neural networks is an active subject. It is vital in all kinds of generative models, where we want to be able to interpolate between points in the latent space, or sample from it. Modern generative AutoEncoders (AE) like WAE, SWAE, CWAE add a regularizer to the standard (deterministic) AE, which allows to enforce Gaussian distribution in the latent sp…
▽ More
Enforcing distributions of latent variables in neural networks is an active subject. It is vital in all kinds of generative models, where we want to be able to interpolate between points in the latent space, or sample from it. Modern generative AutoEncoders (AE) like WAE, SWAE, CWAE add a regularizer to the standard (deterministic) AE, which allows to enforce Gaussian distribution in the latent space. Enforcing different distributions, especially topologically nontrivial, might bring some new interesting possibilities, but this subject seems unexplored so far.
This article proposes a new approach to enforce uniform distribution on d-dimensional torus. We introduce a circular spring loss, which enforces minibatch points to be equally spaced and satisfy cyclic boundary conditions.
As example of application we propose multiple-path morphing. Minimal distance geodesic between two points in uniform distribution on latent space of angles becomes a line, however, torus topology allows us to choose such lines in alternative ways, going through different edges of $[-π,π]^d$.
Further applications to explore can be for example trying to learn real-life topologically nontrivial spaces of features, like rotations to automatically recognize 2D rotation of an object in picture by training on relative angles, or even 3D rotations by additionally using spherical features - this way morphing should be close to object rotation.
△ Less
Submitted 28 March, 2019;
originally announced March 2019.
-
Improving SGD convergence by online linear regression of gradients in multiple statistically relevant directions
Authors:
Jarek Duda
Abstract:
Deep neural networks are usually trained with stochastic gradient descent (SGD), which minimizes objective function using very rough approximations of gradient, only averaging to the real gradient. Standard approaches like momentum or ADAM only consider a single direction, and do not try to model distance from extremum - neglecting valuable information from calculated sequence of gradients, often…
▽ More
Deep neural networks are usually trained with stochastic gradient descent (SGD), which minimizes objective function using very rough approximations of gradient, only averaging to the real gradient. Standard approaches like momentum or ADAM only consider a single direction, and do not try to model distance from extremum - neglecting valuable information from calculated sequence of gradients, often stagnating in some suboptimal plateau. Second order methods could exploit these missed opportunities, however, beside suffering from very large cost and numerical instabilities, many of them attract to suboptimal points like saddles due to negligence of signs of curvatures (as eigenvalues of Hessian).
Saddle-free Newton method is a rare example of addressing this issue - changes saddle attraction into repulsion, and was shown to provide essential improvement for final value this way. However, it neglects noise while modelling second order behavior, focuses on Krylov subspace for numerical reasons, and requires costly eigendecomposion.
Maintaining SFN advantages, there are proposed inexpensive ways for exploiting these opportunities. Second order behavior is linear dependence of first derivative - we can optimally estimate it from sequence of noisy gradients with least square linear regression, in online setting here: with weakening weights of old gradients. Statistically relevant subspace is suggested by PCA of recent noisy gradients - in online setting it can be made by slowly rotating considered directions toward new gradients, gradually replacing old directions with recent statistically relevant. Eigendecomposition can be also performed online: with regularly performed step of QR method to maintain diagonal Hessian. Outside the second order modeled subspace we can simultaneously perform gradient descent.
△ Less
Submitted 13 March, 2023; v1 submitted 31 January, 2019;
originally announced January 2019.
-
Credibility evaluation of income data with hierarchical correlation reconstruction
Authors:
Jarek Duda,
Adam Szulc
Abstract:
In situations like tax declarations or analyzes of household budgets we would like to automatically evaluate credibility of exogenous variable (declared income) based on some available (endogenous) variables - we want to build a model and train it on provided data sample to predict (conditional) probability distribution of exogenous variable based on values of endogenous variables. Using Polish ho…
▽ More
In situations like tax declarations or analyzes of household budgets we would like to automatically evaluate credibility of exogenous variable (declared income) based on some available (endogenous) variables - we want to build a model and train it on provided data sample to predict (conditional) probability distribution of exogenous variable based on values of endogenous variables. Using Polish household budget survey data there will be discussed simple and systematic adaptation of hierarchical correlation reconstruction (HCR) technique for this purpose, which allows to combine interpretability of statistics with modelling of complex densities like in machine learning. For credibility evaluation we normalize marginal distribution of predicted variable to $ρ\approx 1$ uniform distribution on $[0,1]$ using empirical distribution function $(x=EDF(y)\in[0,1])$, then model density of its conditional distribution $(\textrm{Pr}(x_0|x_1 x_2\ldots))$ as a linear combination of orthonormal polynomials using coefficients modelled as linear combinations of features of the remaining variables. These coefficients can be calculated independently, have similar interpretation as cumulants, additionally allowing to directly reconstruct probability distribution. Values corresponding to high predicted density can be considered as credible, while low density suggests disagreement with statistics of data sample, for example to mark for manual verification a chosen percentage of data points evaluated as the least credible.
△ Less
Submitted 21 April, 2019; v1 submitted 19 December, 2018;
originally announced December 2018.
-
Gaussian AutoEncoder
Authors:
Jarek Duda
Abstract:
Generative AutoEncoders require a chosen probability distribution in latent space, usually multivariate Gaussian. The original Variational AutoEncoder (VAE) uses randomness in encoder - causing problematic distortion, and overlaps in latent space for distinct inputs. It turned out unnecessary: we can instead use deterministic encoder with additional regularizer to ensure that sample distribution i…
▽ More
Generative AutoEncoders require a chosen probability distribution in latent space, usually multivariate Gaussian. The original Variational AutoEncoder (VAE) uses randomness in encoder - causing problematic distortion, and overlaps in latent space for distinct inputs. It turned out unnecessary: we can instead use deterministic encoder with additional regularizer to ensure that sample distribution in latent space is close to the required. The original approach (WAE) uses Wasserstein metric, what required comparing with random sample and using an arbitrarily chosen kernel. Later CWAE finally derived a non-random analytic formula by averaging $L_2$ distance of Gaussian-smoothened sample over all 1D projections. However, these arbitrarily chosen regularizers do not lead to Gaussian distribution.
This article proposes approach for regularizers directly optimizing agreement between empirical distribution function and its desired CDF for chosen properties, for example radii and distances for Gaussian distribution, or coordinate-wise, to directly attract this distribution in latent space of AutoEncoder. We can also attract different distributions with this general approach, for example latent space uniform distribution on $[0,1]^D$ hypercube or torus would allow for data compression without entropy coding, increased density near codewords would optimize for the required quantization.
△ Less
Submitted 14 January, 2019; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Exploiting statistical dependencies of time series with hierarchical correlation reconstruction
Authors:
Jarek Duda
Abstract:
While we are usually focused on forecasting future values of time series, it is often valuable to additionally predict their entire probability distributions, e.g. to evaluate risk, Monte Carlo simulations. On example of time series of $\approx$ 30000 Dow Jones Industrial Averages, there will be presented application of hierarchical correlation reconstruction for this purpose: MSE estimating polyn…
▽ More
While we are usually focused on forecasting future values of time series, it is often valuable to additionally predict their entire probability distributions, e.g. to evaluate risk, Monte Carlo simulations. On example of time series of $\approx$ 30000 Dow Jones Industrial Averages, there will be presented application of hierarchical correlation reconstruction for this purpose: MSE estimating polynomial as joint density for (current value, context), where context is for example a few previous values. Then substituting the currently observed context and normalizing density to 1, we get predicted probability distribution for the current value. In contrast to standard machine learning approaches like neural networks, optimal polynomial coefficients here have inexpensive direct formula, have controllable accuracy, are unique and independently calculated, each has a specific cumulant-like interpretation, and such approximation can asymptotically approach complete description of any real joint distribution - providing universal tool to quantitatively describe and exploit statistical dependencies in time series, systematically enhancing ARMA/ARCH-like approaches, also based on different distributions than Gaussian which turns out improper for daily log returns. There is also discussed application for non-stationary time series like calculating linear time trend, or adapting coefficients to local statistical behavior.
△ Less
Submitted 23 January, 2019; v1 submitted 11 July, 2018;
originally announced July 2018.
-
Hierarchical correlation reconstruction with missing data, for example for biology-inspired neuron
Authors:
Jarek Duda
Abstract:
Machine learning often needs to model density from a multidimensional data sample, including correlations between coordinates. Additionally, we often have missing data case: that data points can miss values for some of coordinates. This article adapts rapid parametric density estimation approach for this purpose: modelling density as a linear combination of orthonormal functions, for which $L^2$ o…
▽ More
Machine learning often needs to model density from a multidimensional data sample, including correlations between coordinates. Additionally, we often have missing data case: that data points can miss values for some of coordinates. This article adapts rapid parametric density estimation approach for this purpose: modelling density as a linear combination of orthonormal functions, for which $L^2$ optimization says that (independently) estimated coefficient for a given function is just average over the sample of value of this function. Hierarchical correlation reconstruction first models probability density for each separate coordinate using all its appearances in data sample, then adds corrections from independently modelled pairwise correlations using all samples having both coordinates, and so on independently adding correlations for growing numbers of variables using often decreasing evidence in data sample. A basic application of such modelled multidimensional density can be imputation of missing coordinates: by inserting known coordinates to the density, and taking expected values for the missing coordinates, or even their entire joint probability distribution. Presented method can be compared with cascade correlations approach, offering several advantages in flexibility and accuracy. It can be also used as artificial neuron: maximizing prediction capabilities for only local behavior - modelling and predicting local connections.
△ Less
Submitted 27 May, 2018; v1 submitted 17 April, 2018;
originally announced April 2018.
-
Polynomial-based rotation invariant features
Authors:
Jarek Duda
Abstract:
One of basic difficulties of machine learning is handling unknown rotations of objects, for example in image recognition. A related problem is evaluation of similarity of shapes, for example of two chemical molecules, for which direct approach requires costly pairwise rotation alignment and comparison. Rotation invariants are useful tools for such purposes, allowing to extract features describing…
▽ More
One of basic difficulties of machine learning is handling unknown rotations of objects, for example in image recognition. A related problem is evaluation of similarity of shapes, for example of two chemical molecules, for which direct approach requires costly pairwise rotation alignment and comparison. Rotation invariants are useful tools for such purposes, allowing to extract features describing shape up to rotation, which can be used for example to search for similar rotated patterns, or fast evaluation of similarity of shapes e.g. for virtual screening, or machine learning including features directly describing shape. A standard approach are rotationally invariant cylindrical or spherical harmonics, which can be seen as based on polynomials on sphere, however, they provide very few invariants - only one per degree of polynomial. There will be discussed a general approach to construct arbitrarily large sets of rotation invariants of polynomials, for degree $D$ in $\mathbb{R}^n$ up to $O(n^D)$ independent invariants instead of $O(D)$ offered by standard approaches, possibly also a complete set: providing not only necessary, but also sufficient condition for differing only by rotation (and reflectional symmetry).
△ Less
Submitted 3 January, 2018;
originally announced January 2018.
-
Improving Pyramid Vector Quantizer with power projection
Authors:
Jarek Duda
Abstract:
Pyramid Vector Quantizer (PVQ) is a promising technique especially for multimedia data compression, already used in Opus audio codec and considered for AV1 video codec. It quantizes vectors from Euclidean unit sphere by first projecting them to $L^1$ norm unit sphere, then quantizing and encoding there. This paper shows that the used standard radial projection is suboptimal and proposes to tune it…
▽ More
Pyramid Vector Quantizer (PVQ) is a promising technique especially for multimedia data compression, already used in Opus audio codec and considered for AV1 video codec. It quantizes vectors from Euclidean unit sphere by first projecting them to $L^1$ norm unit sphere, then quantizing and encoding there. This paper shows that the used standard radial projection is suboptimal and proposes to tune its deformations by using parameterized power projection: $x\to x^p/\|x^p\|$ instead, where the optimized power $p$ is applied coordinate-wise, getting usually $\geq 0.5\, dB$ improvement comparing to radial projection.
△ Less
Submitted 3 May, 2017;
originally announced May 2017.
-
P?=NP as minimization of degree 4 polynomial, integration or Grassmann number problem, and new graph isomorphism problem approaches
Authors:
Jarek Duda
Abstract:
While the P vs NP problem is mainly approached form the point of view of discrete mathematics, this paper proposes reformulations into the field of abstract algebra, geometry, fourier analysis and of continuous global optimization - which advanced tools might bring new perspectives and approaches for this question. The first one is equivalence of satisfaction of 3-SAT problem with the question of…
▽ More
While the P vs NP problem is mainly approached form the point of view of discrete mathematics, this paper proposes reformulations into the field of abstract algebra, geometry, fourier analysis and of continuous global optimization - which advanced tools might bring new perspectives and approaches for this question. The first one is equivalence of satisfaction of 3-SAT problem with the question of reaching zero of a nonnegative degree 4 multivariate polynomial (sum of squares), what could be tested from the perspective of algebra by using discriminant. It could be also approached as a continuous global optimization problem inside $[0,1]^n$, for example in physical realizations like adiabatic quantum computers. However, the number of local minima usually grows exponentially. Reducing to degree 2 polynomial plus constraints of being in $\{0,1\}^n$, we get geometric formulations as the question if plane or sphere intersects with $\{0,1\}^n$. There will be also presented some non-standard perspectives for the Subset-Sum, like through convergence of a series, or zeroing of $\int_0^{2π} \prod_i \cos(\varphi k_i) d\varphi $ fourier-type integral for some natural $k_i$. The last discussed approach is using anti-commuting Grassmann numbers $θ_i$, making $(A \cdot \textrm{diag}(θ_i))^n$ nonzero only if $A$ has a Hamilton cycle. Hence, the P$\ne$NP assumption implies exponential growth of matrix representation of Grassmann numbers. There will be also discussed a looking promising algebraic/geometric approach to the graph isomorphism problem -- tested to successfully distinguish strongly regular graphs with up to 29 vertices.
△ Less
Submitted 24 October, 2022; v1 submitted 13 March, 2017;
originally announced March 2017.
-
Rapid parametric density estimation
Authors:
Jarek Duda
Abstract:
Parametric density estimation, for example as Gaussian distribution, is the base of the field of statistics. Machine learning requires inexpensive estimation of much more complex densities, and the basic approach is relatively costly maximum likelihood estimation (MLE). There will be discussed inexpensive density estimation, for example literally fitting a polynomial (or Fourier series) to the sam…
▽ More
Parametric density estimation, for example as Gaussian distribution, is the base of the field of statistics. Machine learning requires inexpensive estimation of much more complex densities, and the basic approach is relatively costly maximum likelihood estimation (MLE). There will be discussed inexpensive density estimation, for example literally fitting a polynomial (or Fourier series) to the sample, which coefficients are calculated by just averaging monomials (or sine/cosine) over the sample. Another discussed basic application is fitting distortion to some standard distribution like Gaussian - analogously to ICA, but additionally allowing to reconstruct the disturbed density. Finally, by using weighted average, it can be also applied for estimation of non-probabilistic densities, like modelling mass distribution, or for various clustering problems by using negative (or complex) weights: fitting a function which sign (or argument) determines clusters. The estimated parameters are approaching the optimal values with error drop** like $1/\sqrt{n}$, where $n$ is the sample size.
△ Less
Submitted 20 February, 2017; v1 submitted 7 February, 2017;
originally announced February 2017.
-
Lightweight compression with encryption based on Asymmetric Numeral Systems
Authors:
Jarek Duda,
Marcin Niemiec
Abstract:
Data compression combined with effective encryption is a common requirement of data storage and transmission. Low cost of these operations is often a high priority in order to increase transmission speed and reduce power usage. This requirement is crucial for battery-powered devices with limited resources, such as autonomous remote sensors or implants. Well-known and popular encryption techniques…
▽ More
Data compression combined with effective encryption is a common requirement of data storage and transmission. Low cost of these operations is often a high priority in order to increase transmission speed and reduce power usage. This requirement is crucial for battery-powered devices with limited resources, such as autonomous remote sensors or implants. Well-known and popular encryption techniques are frequently too expensive. This problem is on the increase as machine-to-machine communication and the Internet of Things are becoming a reality. Therefore, there is growing demand for finding trade-offs between security, cost and performance in lightweight cryptography. This article discusses Asymmetric Numeral Systems -- an innovative approach to entropy coding which can be used for compression with encryption. It provides compression ratio comparable with arithmetic coding at similar speed as Huffman coding, hence, this coding is starting to replace them in new compressors. Additionally, by perturbing its coding tables, the Asymmetric Numeral System makes it possible to simultaneously encrypt the encoded message at nearly no additional cost. The article introduces this approach and analyzes its security level. The basic application is reducing the number of rounds of some cipher used on ANS-compressed data, or completely removing additional encryption layer if reaching a satisfactory protection level.
△ Less
Submitted 14 December, 2016;
originally announced December 2016.
-
Practical estimation of rotation distance and induced partial order for binary trees
Authors:
Jarek Duda
Abstract:
Tree rotations (left and right) are basic local deformations allowing to transform between two unlabeled binary trees of the same size. Hence, there is a natural problem of practically finding such transformation path with low number of rotations, the optimal minimal number is called the rotation distance. Such distance could be used for instance to quantify similarity between two trees for variou…
▽ More
Tree rotations (left and right) are basic local deformations allowing to transform between two unlabeled binary trees of the same size. Hence, there is a natural problem of practically finding such transformation path with low number of rotations, the optimal minimal number is called the rotation distance. Such distance could be used for instance to quantify similarity between two trees for various machine learning problems, for example to compare hierarchical clusterings or arbitrarily chosen spanning trees of two graphs, like in SMILES notation popular for describing chemical molecules.
There will be presented inexpensive practical greedy algorithm for finding a short rotation path, optimality of which has still to be determined. It uses introduced partial order for binary trees of the same size: $t_1 \leq t_2$ iff $t_2$ can be obtained from $t_1$ by a sequence of only right rotations. Intuitively, the shortest rotation path should go through the least upper bound or the greatest lower bound for this partial order. The algorithm finds a path through candidates for both points in representation of binary tree as stack graph: describing evolution of content of stack while processing a formula described by a given binary tree. The article is accompanied with Mathematica implementation of all used procedures (Appendix).
△ Less
Submitted 19 October, 2016;
originally announced October 2016.
-
Nonuniform probability modulation for reducing energy consumption of remote sensors
Authors:
Jarek Duda
Abstract:
One of the main goals of 5G wireless telecommunication technology is improving energy efficiency, especially of remote sensors which should be able for example to transmit on average 1bit/s for 10 years from a single AAA battery. There will be discussed using modulation with nonuniform probability distribution of symbols for improving energy efficiency of transmission at cost of reduced throughput…
▽ More
One of the main goals of 5G wireless telecommunication technology is improving energy efficiency, especially of remote sensors which should be able for example to transmit on average 1bit/s for 10 years from a single AAA battery. There will be discussed using modulation with nonuniform probability distribution of symbols for improving energy efficiency of transmission at cost of reduced throughput. While the zero-signal (silence) has zero energy cost to emit, it can carry information if used alongside other symbols. If used more frequently than others, for example for majority of time slots or OFDM subcarriers, the number of bits transmitted per energy unit can be significantly increased. For example for hexagonal modulation and zero noise, this amount of bits per energy unit can be doubled by reducing throughput 2.7 times, thanks to using the zero-signal with probability $\approx$ 0.84. There will be discussed models and methods for such nonuniform probability modulations (NPM).
△ Less
Submitted 15 August, 2016;
originally announced August 2016.
-
Distortion-Resistant Hashing for rapid search of similar DNA subsequence
Authors:
Jarek Duda
Abstract:
One of the basic tasks in bioinformatics is localizing a short subsequence $S$, read while sequencing, in a long reference sequence $R$, like the human geneome. A natural rapid approach would be finding a hash value for $S$ and compare it with a prepared database of hash values for each of length $|S|$ subsequences of $R$. The problem with such approach is that it would only spot a perfect match,…
▽ More
One of the basic tasks in bioinformatics is localizing a short subsequence $S$, read while sequencing, in a long reference sequence $R$, like the human geneome. A natural rapid approach would be finding a hash value for $S$ and compare it with a prepared database of hash values for each of length $|S|$ subsequences of $R$. The problem with such approach is that it would only spot a perfect match, while in reality there are lots of small changes: substitutions, deletions and insertions.
This issue could be repaired if having a hash function designed to tolerate some small distortion accordingly to an alignment metric (like Needleman-Wunch): designed to make that two similar sequences should most likely give the same hash value. This paper discusses construction of Distortion-Resistant Hashing (DRH) to generate such fingerprints for rapid search of similar subsequences. The proposed approach is based on the rate distortion theory: in a nearly uniform subset of length $|S|$ sequences, the hash value represents the closest sequence to $S$. This gives some control of the distance of collisions: sequences having the same hash value.
△ Less
Submitted 18 February, 2016;
originally announced February 2016.
-
Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers
Authors:
Jarek Duda,
Wojciech Szpankowski,
Ananth Grama
Abstract:
Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel erro…
▽ More
Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) what is the number of `typical' sequences within the distortion bound induced by indel errors; (iii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to reduce the distortion bound so that only one typical sequence exists within the distortion bound.
Our results provide a number of important insights: (i) the maximum length of a sequence that can be accurately reconstructed in the presence of indel and substitution errors is relatively small; (ii) the number of typical sequences within the distortion bound is large; and (iii) replicated extrusion is an effective technique for unique reconstruction. In particular, we show that the number of replicas is a slow function (logarithmic) of sequence length -- implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Our model considers indel and substitution errors separately. In this sense, it can be viewed as providing (tight) bounds on reconstruction lengths and repetitions for accurate reconstruction when the two error modes are considered in a single model.
△ Less
Submitted 11 January, 2016;
originally announced January 2016.
-
Designing dedicated data compression for physics experiments within FPGA already used for data acquisition
Authors:
Jarek Duda,
Grzegorz Korcyl
Abstract:
Physics experiments produce enormous amount of raw data, counted in petabytes per day. Hence, there is large effort to reduce this amount, mainly by using some filters. The situation can be improved by additionally applying some data compression techniques: removing redundancy and optimally encoding the actual information. Preferably, both filtering and data compression should fit in FPGA already…
▽ More
Physics experiments produce enormous amount of raw data, counted in petabytes per day. Hence, there is large effort to reduce this amount, mainly by using some filters. The situation can be improved by additionally applying some data compression techniques: removing redundancy and optimally encoding the actual information. Preferably, both filtering and data compression should fit in FPGA already used for data acquisition - reducing requirements of both data storage and networking architecture.
We will briefly explain and discuss some basic techniques, for a better focus applied to design a dedicated data compression system basing on a sample data from a prototype of a tracking detector: 10000 events for 48 channels. We will focus on the time data here, which after neglecting the headers and applying data filtering, requires on average 1170 bits/event using the current coding. Encoding relative times (differences) and grou** data by channels, reduces this number to 798 bits/channel, still using fixed length coding: a fixed number of bits used for a given value. Using variable length Huffman coding to encode numbers of digital pulses for a channel and the most significant bits of values (simple binning) reduces further this number to 552 bits/event. Using adaptive binning: denser for frequent values, and an accurate entropy coder we get further down to 455 bits/event - this option can easily fit unused resources of FPGA currently used for data acquisition. Finally, using separate probability distributions for different channels, what could be done by a software compressor, leads to 437bits/event, what is 2.67 times less than the original 1170 bits/event.
△ Less
Submitted 3 November, 2015;
originally announced November 2015.
-
Normalized rotation shape descriptors and lossy compression of molecular shape
Authors:
Jarek Duda
Abstract:
There is a common need to search of molecular databases for compounds resembling some shape, what suggests having similar biological activity while searching for new drugs. The large size of the databases requires fast methods for such initial screening, for example based on feature vectors constructed to fulfill the requirement that similar molecules should correspond to close vectors. Ultrafast…
▽ More
There is a common need to search of molecular databases for compounds resembling some shape, what suggests having similar biological activity while searching for new drugs. The large size of the databases requires fast methods for such initial screening, for example based on feature vectors constructed to fulfill the requirement that similar molecules should correspond to close vectors. Ultrafast Shape Recognition (USR) is a popular approach of this type. It uses vectors of 12 real number as 3 first moments of distances from 4 emphasized points. These coordinates might contain unnecessary correlations and does not allow to reconstruct the approximated shape. In contrast, spherical harmonic (SH) decomposition uses orthogonal coordinates, suggesting their independence and so lager informational content of the feature vector. There is usually considered rotationally invariant SH descriptors, what means discarding of some essential information.
This article discusses framework for descriptors with normalized rotation, for example by using principal component analysis (PCA-SH). As one of the most interesting are ligands which have to slide into a protein, we will introduce descriptors optimized for such flat elongated shapes. Bent deformed cylinder (BDC) describes the molecule as a cylinder which was first bent, then deformed such that its cross-sections became ellipses of evolving shape. Legendre polynomials are used to describe the central axis of such bent cylinder. Additional polynomials are used to define evolution of such elliptic cross-section along the main axis. There will be also discussed bent cylindrical harmonics (BCH), which uses cross-sections described by cylindrical harmonics instead of ellipses. All these normalized rotation descriptors allow to reconstruct (decode) the approximated representation of the shape, hence can be also used for lossy compression purposes.
△ Less
Submitted 30 September, 2015;
originally announced September 2015.
-
Joint error correction enhancement of the fountain codes concept
Authors:
Jarek Duda
Abstract:
Fountain codes like LT or Raptor codes, also known as rateless erasure codes, allow to encode a message as some number of packets, such that any large enough subset of these packets is sufficient to fully reconstruct the message. It requires undamaged packets, while the packets which were not lost are usually damaged in real scenarios. Hence, an additional error correction layer is often required:…
▽ More
Fountain codes like LT or Raptor codes, also known as rateless erasure codes, allow to encode a message as some number of packets, such that any large enough subset of these packets is sufficient to fully reconstruct the message. It requires undamaged packets, while the packets which were not lost are usually damaged in real scenarios. Hence, an additional error correction layer is often required: adding some level of redundancy to each packet to be able to repair eventual damages. This approach requires a priori knowledge of the final damage level of every packet - insufficient redundancy leads to packet loss, overprotection means suboptimal channel rate. However, the sender may have inaccurate or even no a priori information about the final damage levels, for example in applications like broadcasting, degradation of a storage medium or damage of picture watermarking.
Joint Reconstruction Codes (JRC) setting is introduced and discussed in this paper for the purpose of removing the need of a priori knowledge of damage level and sub-optimality caused by overprotection and discarding underprotected packets. It is obtained by combining both processes: reconstruction from multiple packets and forward error correction. The decoder combines the resultant informational content of all received packets accordingly to their actual noise level, which can be estimated a posteriori individually for each packet. Assuming binary symmetric channel (BSC) of $ε$ bit-flip probability, every potentially damaged bit carries $R_0(ε)=1-h_1(ε)$ bits of information, where $h_1$ is the Shannon entropy. The minimal requirement to fully reconstruct the message is that the sum of rate $R_0(ε)$ over all bits is at least the size of the message. We will discuss sequential decoding for the reconstruction purpose, which statistical behavior can be estimated using Renyi entropy.
△ Less
Submitted 18 August, 2015; v1 submitted 20 May, 2015;
originally announced May 2015.
-
Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding
Authors:
Jarek Duda
Abstract:
The modern data compression is mainly based on two approaches to entropy coding: Huffman (HC) and arithmetic/range coding (AC). The former is much faster, but approximates probabilities with powers of 2, usually leading to relatively low compression rates. The latter uses nearly exact probabilities - easily approaching theoretical compression rate limit (Shannon entropy), but at cost of much large…
▽ More
The modern data compression is mainly based on two approaches to entropy coding: Huffman (HC) and arithmetic/range coding (AC). The former is much faster, but approximates probabilities with powers of 2, usually leading to relatively low compression rates. The latter uses nearly exact probabilities - easily approaching theoretical compression rate limit (Shannon entropy), but at cost of much larger computational cost.
Asymmetric numeral systems (ANS) is a new approach to accurate entropy coding, which allows to end this trade-off between speed and rate: the recent implementation [1] provides about $50\%$ faster decoding than HC for 256 size alphabet, with compression rate similar to provided by AC. This advantage is due to being simpler than AC: using single natural number as the state, instead of two to represent a range. Beside simplifying renormalization, it allows to put the entire behavior for given probability distribution into a relatively small table: defining entropy coding automaton. The memory cost of such table for 256 size alphabet is a few kilobytes. There is a large freedom while choosing a specific table - using pseudorandom number generator initialized with cryptographic key for this purpose allows to simultaneously encrypt the data.
This article also introduces and discusses many other variants of this new entropy coding approach, which can provide direct alternatives for standard AC, for large alphabet range coding, or for approximated quasi arithmetic coding.
△ Less
Submitted 6 January, 2014; v1 submitted 11 November, 2013;
originally announced November 2013.
-
Embedding grayscale halftone pictures in QR Codes using Correction Trees
Authors:
Jarek Duda
Abstract:
Barcodes like QR Codes have made that encoded messages have entered our everyday life, what suggests to attach them a second layer of information: directly available to human receiver for informational or marketing purposes. We will discuss a general problem of using codes with chosen statistical constrains, for example reproducing given grayscale picture using halftone technique. If both sender a…
▽ More
Barcodes like QR Codes have made that encoded messages have entered our everyday life, what suggests to attach them a second layer of information: directly available to human receiver for informational or marketing purposes. We will discuss a general problem of using codes with chosen statistical constrains, for example reproducing given grayscale picture using halftone technique. If both sender and receiver know these constrains, the optimal capacity can be easily approached by entropy coder. The problem is that this time only the sender knows them - we will refer to these scenarios as constrained coding. Kuznetsov and Tsybakov problem in which only the sender knows which bits are fixed can be seen as a special case, surprisingly approaching the same capacity as if both sides would know the constrains. We will analyze Correction Trees to approach analogous capacity in the general case - use weaker: statistical constrains, what allows to apply them to all bits. Finding satisfying coding is similar to finding the proper correction in error correction problem, but instead of single ensured possibility, there are now statistically expected some. While in standard steganography we hide information in the least important bits, this time we create codes resembling given picture - hide information in the freedom of realizing grayness by black and white pixels using halftone technique. We will also discuss combining with error correction and application to rate distortion problem.
△ Less
Submitted 2 December, 2012; v1 submitted 7 November, 2012;
originally announced November 2012.
-
Optimal compression of hash-origin prefix trees
Authors:
Jarek Duda
Abstract:
There is a common problem of operating on hash values of elements of some database. In this paper there will be analyzed informational content of such general task and how to practically approach such found lower boundaries. Minimal prefix tree which distinguish elements turns out to require asymptotically only about 2.77544 bits per element, while standard approaches use a few times more. While b…
▽ More
There is a common problem of operating on hash values of elements of some database. In this paper there will be analyzed informational content of such general task and how to practically approach such found lower boundaries. Minimal prefix tree which distinguish elements turns out to require asymptotically only about 2.77544 bits per element, while standard approaches use a few times more. While being certain of working inside the database, the cost of distinguishability can be reduced further to about 2.33275 bits per elements. Increasing minimal depth of nodes to reduce probability of false positives leads to simple relation with average depth of such random tree, which is asymptotically larger by about 1.33275 bits than lg(n) of the perfect binary tree. This asymptotic case can be also seen as a way to optimally encode n large unordered numbers - saving lg(n!) bits of information about their ordering, which can be the major part of contained information. This ability itself allows to reduce memory requirements even to about 0.693 of required in Bloom filter for the same false positive probability.
△ Less
Submitted 8 July, 2012; v1 submitted 20 June, 2012;
originally announced June 2012.
-
Correction Trees as an Alternative to Turbo Codes and Low Density Parity Check Codes
Authors:
Jarosław Duda,
Paweł Korus
Abstract:
The rapidly improving performance of modern hardware renders convolutional codes obsolete, and allows for the practical implementation of more sophisticated correction codes such as low density parity check (LDPC) and turbo codes (TC). Both are decoded by iterative algorithms, which require a disproportional computational effort for low channel noise. They are also unable to correct higher noise l…
▽ More
The rapidly improving performance of modern hardware renders convolutional codes obsolete, and allows for the practical implementation of more sophisticated correction codes such as low density parity check (LDPC) and turbo codes (TC). Both are decoded by iterative algorithms, which require a disproportional computational effort for low channel noise. They are also unable to correct higher noise levels, still below the Shannon theoretical limit. In this paper, we discuss an enhanced version of a convolutional-like decoding paradigm which adopts very large spaces of possible system states, of the order of $2^{64}$. Under such conditions, the traditional convolution operation is rendered useless and needs to be replaced by a carefully designed state transition procedure. The size of the system state space completely changes the correction philosophy, as state collisions are virtually impossible and the decoding procedure becomes a correction tree. The proposed decoding algorithm is practically cost-free for low channel noise. As the channel noise approaches the Shannon limit, it is still possible to perform correction, although its cost increases to infinity. In many applications, the implemented decoder can essentially outperform both LDPC and TC. This paper describes the proposed correction paradigm and theoretically analyzes the asymptotic correction performance. The considered encoder and decoder were verified experimentally for the binary symmetric channel. The correction process remains practically cost-free for channel error rates below 0.05 and 0.13 for the 1/2 and 1/4 rate codes, respectively. For the considered resource limit, the output bit error rates reach the order of $10^{-3}$ for channel error rates 0.08 and 0.18. The proposed correction paradigm can be easily extended to other communication channels; the appropriate generalizations are also discussed in this study.
△ Less
Submitted 24 May, 2012; v1 submitted 24 April, 2012;
originally announced April 2012.
-
Asymmetric numeral systems
Authors:
Jarek Duda
Abstract:
In this paper will be presented new approach to entropy coding: family of generalizations of standard numeral systems which are optimal for encoding sequence of equiprobable symbols, into asymmetric numeral systems - optimal for freely chosen probability distributions of symbols. It has some similarities to Range Coding but instead of encoding symbol in choosing a range, we spread these ranges u…
▽ More
In this paper will be presented new approach to entropy coding: family of generalizations of standard numeral systems which are optimal for encoding sequence of equiprobable symbols, into asymmetric numeral systems - optimal for freely chosen probability distributions of symbols. It has some similarities to Range Coding but instead of encoding symbol in choosing a range, we spread these ranges uniformly over the whole interval. This leads to simpler encoder - instead of using two states to define range, we need only one. This approach is very universal - we can obtain from extremely precise encoding (ABS) to extremely fast with possibility to additionally encrypt the data (ANS). This encryption uses the key to initialize random number generator, which is used to calculate the coding tables. Such preinitialized encryption has additional advantage: is resistant to brute force attack - to check a key we have to make whole initialization. There will be also presented application for new approach to error correction: after an error in each step we have chosen probability to observe that something was wrong. There will be also presented application for new approach to error correction: after an error in each step we have chosen probability to observe that something was wrong. We can get near Shannon's limit for any noise level this way with expected linear time of correction.
△ Less
Submitted 21 May, 2009; v1 submitted 2 February, 2009;
originally announced February 2009.
-
Combinatorial invariants for graph isomorphism problem
Authors:
Jarek Duda
Abstract:
Presented approach in polynomial time calculates large number of invariants for each vertex, which won't change with graph isomorphism and should fully determine the graph. For example numbers of closed paths of length k for given starting vertex, what can be though as the diagonal terms of k-th power of the adjacency matrix. For k=2 we would get degree of verities invariant, higher describes lo…
▽ More
Presented approach in polynomial time calculates large number of invariants for each vertex, which won't change with graph isomorphism and should fully determine the graph. For example numbers of closed paths of length k for given starting vertex, what can be though as the diagonal terms of k-th power of the adjacency matrix. For k=2 we would get degree of verities invariant, higher describes local topology deeper. Now if two graphs are isomorphic, they have the same set of such vectors of invariants - we can sort theses vectors lexicographically and compare them. If they agree, permutations from sorting allow to reconstruct the isomorphism. I'm presenting arguments that these invariants should fully determine the graph, but unfortunately I can't prove it in this moment. This approach can give hope, that maybe P=NP - instead of checking all instances, we should make arithmetics on these large numbers.
△ Less
Submitted 19 May, 2008; v1 submitted 22 April, 2008;
originally announced April 2008.
-
Complex base numeral systems
Authors:
Jarek Duda
Abstract:
In this paper will be introduced large, probably complete family of complex base systems, which are 'proper' - for each point of the space there is a representation which is unique for all but some zero measure set. The condition defining this family is the periodicity - we get periodic covering of the plane by fractals in hexagonal-type structure, what can be used for example in image compressi…
▽ More
In this paper will be introduced large, probably complete family of complex base systems, which are 'proper' - for each point of the space there is a representation which is unique for all but some zero measure set. The condition defining this family is the periodicity - we get periodic covering of the plane by fractals in hexagonal-type structure, what can be used for example in image compression. There will be introduced full methodology of analyzing and using this approach - both for the integer part: periodic lattice and the fractional: attractor of some IFS, for which the convex hull or properties like dimension of the boundary can be found analytically. There will be also shown how to generalize this approach to higher dimensions and found some proper systems in dimension 3.
△ Less
Submitted 24 February, 2008; v1 submitted 10 December, 2007;
originally announced December 2007.
-
Optimal encoding on discrete lattice with translational invariant constrains using statistical algorithms
Authors:
Jarek Duda
Abstract:
In this paper will be presented methodology of encoding information in valuations of discrete lattice with some translational invariant constrains in asymptotically optimal way. The method is based on finding statistical description of such valuations and changing it into statistical algorithm, which allows to construct deterministically valuation with given statistics. Optimal statistics allow…
▽ More
In this paper will be presented methodology of encoding information in valuations of discrete lattice with some translational invariant constrains in asymptotically optimal way. The method is based on finding statistical description of such valuations and changing it into statistical algorithm, which allows to construct deterministically valuation with given statistics. Optimal statistics allow to generate valuations with uniform distribution - we get maximum information capacity this way. It will be shown that we can reach the optimum for one-dimensional models using maximal entropy random walk and that for the general case we can practically get as close to the capacity of the model as we want (found numerically: lost 10^{-10} bit/node for Hard Square). There will be also presented simpler alternative to arithmetic coding method which can be used as cryptosystem and data correction method too.
△ Less
Submitted 2 November, 2008; v1 submitted 20 October, 2007;
originally announced October 2007.