-
Learning from data with structured missingness
Authors:
Robin Mitra,
Sarah F. McGough,
Tapabrata Chakraborti,
Chris Holmes,
Ryan Cop**,
Niels Hagenbuch,
Stefanie Biedermann,
Jack Noonan,
Brieuc Lehmann,
Aditi Shenvi,
Xuan Vinh Doan,
David Leslie,
Ginestra Bianconi,
Ruben Sanchez-Garcia,
Alisha Davies,
Maxine Mackintosh,
Eleni-Rosalina Andrinopoulou,
Anahid Basiri,
Chris Harbron,
Ben D. MacArthur
Abstract:
Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or st…
▽ More
Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such `structured missingness' raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here, we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Voting on Cyclic Orders, Group Theory, and Ballots
Authors:
Karl-Dieter Crisman,
Abraham Holleran,
Micah Martin,
Josephine Noonan
Abstract:
A cyclic order may be thought of informally as a way to seat people around a table, perhaps for a game of chance or for dinner. Given a set of agents such as $\{A,B,C\}$, we can formalize this by defining a cyclic order as a permutation or linear order on this finite set, under the equivalence relation where $A\succ B\succ C$ is identified with both $B\succ C\succ A$ and $C\succ A\succ B$. As with…
▽ More
A cyclic order may be thought of informally as a way to seat people around a table, perhaps for a game of chance or for dinner. Given a set of agents such as $\{A,B,C\}$, we can formalize this by defining a cyclic order as a permutation or linear order on this finite set, under the equivalence relation where $A\succ B\succ C$ is identified with both $B\succ C\succ A$ and $C\succ A\succ B$. As with other collections of sets with some structure, we might want to aggregate preferences of a (possibly different) set of voters on the set of possible ways to choose a cyclic order.
However, given the combinatorial explosion of the number of full rankings of cyclic orders, one may not wish to use the usual voting machinery. This raises the question of what sort of ballots may be appropriate; a single cyclic order, a set of them, or some other ballot type? Further, there is a natural action of the group of permutations on the set of agents. A reasonable requirement for a choice procedure would be to respect this symmetry (the equivalent of neutrality in normal voting theory).
In this paper we will exploit the representation theory of the symmetric group to analyze several natural types of ballots for voting on cyclic orders, and points-based procedures using such ballots. We provide a full characterization of such procedures for two quite different ballot types for $n=4$, along with the most important observations for $n=5$.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Interpretable Neuron Structuring with Graph Spectral Regularization
Authors:
Alexander Tong,
David van Dijk,
Jay S. Stanley III,
Matthew Amodio,
Kristina Yim,
Rebecca Muhle,
James Noonan,
Guy Wolf,
Smita Krishnaswamy
Abstract:
While neural networks are powerful approximators used to classify or embed data into lower dimensional spaces, they are often regarded as black boxes with uninterpretable features. Here we propose Graph Spectral Regularization for making hidden layers more interpretable without significantly impacting performance on the primary task. Taking inspiration from spatial organization and localization of…
▽ More
While neural networks are powerful approximators used to classify or embed data into lower dimensional spaces, they are often regarded as black boxes with uninterpretable features. Here we propose Graph Spectral Regularization for making hidden layers more interpretable without significantly impacting performance on the primary task. Taking inspiration from spatial organization and localization of neuron activations in biological networks, we use a graph Laplacian penalty to structure the activations within a layer. This penalty encourages activations to be smooth either on a predetermined graph or on a feature-space graph learned from the data via co-activations of a hidden layer of the neural network. We show numerous uses for this additional structure including cluster indication and visualization in biological and image data sets.
△ Less
Submitted 14 February, 2020; v1 submitted 30 September, 2018;
originally announced October 2018.
-
Blockwise SURE Shrinkage for Non-Local Means
Authors:
Yue Wu,
Brian Tracey,
Premkumar Natarajan,
Joseph P. Noonan
Abstract:
In this letter, we investigate the shrinkage problem for the non-local means (NLM) image denoising. In particular, we derive the closed-form of the optimal blockwise shrinkage for NLM that minimizes the Stein's unbiased risk estimator (SURE). We also propose a constant complexity algorithm allowing fast blockwise shrinkage. Simulation results show that the proposed blockwise shrinkage method impro…
▽ More
In this letter, we investigate the shrinkage problem for the non-local means (NLM) image denoising. In particular, we derive the closed-form of the optimal blockwise shrinkage for NLM that minimizes the Stein's unbiased risk estimator (SURE). We also propose a constant complexity algorithm allowing fast blockwise shrinkage. Simulation results show that the proposed blockwise shrinkage method improves NLM performance in attaining higher peak signal noise ratio (PSNR) and structural similarity index (SSIM), and makes NLM more robust against parameter changes. Similar ideas can be applicable to other patchwise image denoising techniques.
△ Less
Submitted 18 May, 2013;
originally announced May 2013.
-
Probabilistic Non-Local Means
Authors:
Yue Wu,
Brian Tracey,
Premkumar Natarajan,
Joseph P. Noonan
Abstract:
In this paper, we propose a so-called probabilistic non-local means (PNLM) method for image denoising. Our main contributions are: 1) we point out defects of the weight function used in the classic NLM; 2) we successfully derive all theoretical statistics of patch-wise differences for Gaussian noise; and 3) we employ this prior information and formulate the probabilistic weights truly reflecting t…
▽ More
In this paper, we propose a so-called probabilistic non-local means (PNLM) method for image denoising. Our main contributions are: 1) we point out defects of the weight function used in the classic NLM; 2) we successfully derive all theoretical statistics of patch-wise differences for Gaussian noise; and 3) we employ this prior information and formulate the probabilistic weights truly reflecting the similarity between two noisy patches. The probabilistic nature of the new weight function also provides a theoretical basis to choose thresholds rejecting dissimilar patches for fast computations. Our simulation results indicate the PNLM outperforms the classic NLM and many NLM recent variants in terms of peak signal noise ratio (PSNR) and structural similarity (SSIM) index. Encouraging improvements are also found when we replace the NLM weights with the probabilistic weights in tested NLM variants.
△ Less
Submitted 22 February, 2013;
originally announced February 2013.
-
James-Stein Type Center Pixel Weights for Non-Local Means Image Denoising
Authors:
Yue Wu,
Brian Tracey,
Joseph P. Noonan
Abstract:
Non-Local Means (NLM) and variants have been proven to be effective and robust in many image denoising tasks. In this letter, we study the parameter selection problem of center pixel weights (CPW) in NLM. Our key contributions are: 1) we give a novel formulation of the CPW problem from the statistical shrinkage perspective; 2) we introduce the James-Stein type CPWs for NLM; and 3) we propose a new…
▽ More
Non-Local Means (NLM) and variants have been proven to be effective and robust in many image denoising tasks. In this letter, we study the parameter selection problem of center pixel weights (CPW) in NLM. Our key contributions are: 1) we give a novel formulation of the CPW problem from the statistical shrinkage perspective; 2) we introduce the James-Stein type CPWs for NLM; and 3) we propose a new adaptive CPW that is locally tuned for each image pixel. Our experimental results showed that compared to existing CPW solutions, the new proposed CPWs are more robust and effective under various noise levels. In particular, the NLM with the James-Stein type CPWs attain higher means with smaller variances in terms of the peak signal and noise ratio, implying they improve the NLM robustness and make it less sensitive to parameter selection.
△ Less
Submitted 7 November, 2012;
originally announced November 2012.
-
A New Randomness Evaluation Method with Applications to Image Shuffling and Encryption
Authors:
Yue Wu,
Sos Agaian,
Joseph P. Noonan
Abstract:
This letter discusses the problem of testing the degree of randomness within an image, particularly for a shuffled or encrypted image. Its key contributions are: 1) a mathematical model of perfectly shuffled images; 2) the derivation of the theoretical distribution of pixel differences; 3) a new $Z$-test based approach to differentiate whether or not a test image is perfectly shuffled; and 4) a ra…
▽ More
This letter discusses the problem of testing the degree of randomness within an image, particularly for a shuffled or encrypted image. Its key contributions are: 1) a mathematical model of perfectly shuffled images; 2) the derivation of the theoretical distribution of pixel differences; 3) a new $Z$-test based approach to differentiate whether or not a test image is perfectly shuffled; and 4) a randomized algorithm to unbiasedly evaluate the degree of randomness within a given image. Simulation results show that the proposed method is robust and effective in evaluating the degree of randomness within an image, and may often be more suitable for image applications than commonly used testing schemes designed for binary data like NIST 800-22. The developed method may be also useful as a first step in determining whether or not a shuffling or encryption scheme is suitable for a particular cryptographic application.
△ Less
Submitted 7 November, 2012;
originally announced November 2012.
-
Sudoku Associated Two Dimensional Bijections for Image Scrambling
Authors:
Yue Wu,
Sos S. Agaian,
Joseph P. Noonan
Abstract:
Sudoku puzzles are now popular among people in many countries across the world with simple constraints that no repeated digits in each row, each column, or each block. In this paper, we demonstrate that the Sudoku configuration provides us a new alternative way of matrix element representation by using block-grid pair besides the conventional row-column pair. Moreover, we discover six more matrix…
▽ More
Sudoku puzzles are now popular among people in many countries across the world with simple constraints that no repeated digits in each row, each column, or each block. In this paper, we demonstrate that the Sudoku configuration provides us a new alternative way of matrix element representation by using block-grid pair besides the conventional row-column pair. Moreover, we discover six more matrix element representations by using row-digit pair, digit-row pair, column-digit pair, digit-column pair, block-digit pair, and digit-block pair associated with a Sudoku matrix. These parametric Sudoku associated matrix element representations not only allow us to denote matrix elements in secret ways, but also provide us new parametric two-dimensional bijective map**s. We study these two-dimensional bijections in the problem of image scrambling and propose a simple but effective Sudoku Associated Image Scrambler only using Sudoku associated two dimensional bijections for image scrambling without bandwidth expansion. Our simulation results over a wide collection of image types and contents demonstrate the effectiveness and robustness of the proposed method. Scrambler performance analysis with comparisons to peer algorithms under various investigation methods, including human visual inspections, gray degree of scrambling, autocorrelation coefficient of adjacent pixels, and key space and key sensitivities, suggest that the proposed method outperforms or at least reaches state-of-the-art. Similar scrambling ideas are also applicable to other digital data forms such as audio and video.
△ Less
Submitted 24 July, 2012;
originally announced July 2012.
-
A New Family of Generalized 3D Cat Maps
Authors:
Yue Wu,
Sos Agaian,
Joseph P. Noonan
Abstract:
Since the 1990s chaotic cat maps are widely used in data encryption, for their very complicated dynamics within a simple model and desired characteristics related to requirements of cryptography. The number of cat map parameters and the map period length after discretization are two major concerns in many applications for security reasons. In this paper, we propose a new family of 36 distinctive 3…
▽ More
Since the 1990s chaotic cat maps are widely used in data encryption, for their very complicated dynamics within a simple model and desired characteristics related to requirements of cryptography. The number of cat map parameters and the map period length after discretization are two major concerns in many applications for security reasons. In this paper, we propose a new family of 36 distinctive 3D cat maps with different spatial configurations taking existing 3D cat maps [1]-[4] as special cases. Our analysis and comparisons show that this new 3D cat maps family has more independent map parameters and much longer averaged period lengths than existing 3D cat maps. The presented cat map family can be extended to higher dimensional cases.
△ Less
Submitted 14 May, 2012;
originally announced May 2012.
-
A Novel Latin Square Image Cipher
Authors:
Yue Wu,
Yicong Zhou,
Joseph P. Noonan,
Sos Agaian,
C. L. Philip Chen
Abstract:
In this paper, we introduce a symmetric-key Latin square image cipher (LSIC) for grayscale and color images. Our contributions to the image encryption community include 1) we develop new Latin square image encryption primitives including Latin Square Whitening, Latin Square S-box and Latin Square P-box ; 2) we provide a new way of integrating probabilistic encryption in image encryption by embeddi…
▽ More
In this paper, we introduce a symmetric-key Latin square image cipher (LSIC) for grayscale and color images. Our contributions to the image encryption community include 1) we develop new Latin square image encryption primitives including Latin Square Whitening, Latin Square S-box and Latin Square P-box ; 2) we provide a new way of integrating probabilistic encryption in image encryption by embedding random noise in the least significant image bit-plane; and 3) we construct LSIC with these Latin square image encryption primitives all on one keyed Latin square in a new loom-like substitution-permutation network. Consequently, the proposed LSIC achieve many desired properties of a secure cipher including a large key space, high key sensitivities, uniformly distributed ciphertext, excellent confusion and diffusion properties, semantically secure, and robustness against channel noise. Theoretical analysis show that the LSIC has good resistance to many attack models including brute-force attacks, ciphertext-only attacks, known-plaintext attacks and chosen-plaintext attacks. Experimental analysis under extensive simulation results using the complete USC-SIPI Miscellaneous image dataset demonstrate that LSIC outperforms or reach state of the art suggested by many peer algorithms. All these analysis and results demonstrate that the LSIC is very suitable for digital image encryption. Finally, we open source the LSIC MATLAB code under webpage https://sites.google.com/site/tuftsyuewu/source-code.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.
-
Shannon Entropy based Randomness Measurement and Test for Image Encryption
Authors:
Yue Wu,
Joseph P. Noonan,
Sos Agaian
Abstract:
The quality of image encryption is commonly measured by the Shannon entropy over the ciphertext image. However, this measurement does not consider to the randomness of local image blocks and is inappropriate for scrambling based image encryption methods. In this paper, a new information entropy-based randomness measurement for image encryption is introduced which, for the first time, answers the q…
▽ More
The quality of image encryption is commonly measured by the Shannon entropy over the ciphertext image. However, this measurement does not consider to the randomness of local image blocks and is inappropriate for scrambling based image encryption methods. In this paper, a new information entropy-based randomness measurement for image encryption is introduced which, for the first time, answers the question of whether a given ciphertext image is sufficiently random-like. It measures the randomness over the ciphertext in a fairer way by calculating the averaged entropy of a series of small image blocks within the entire test image. In order to fulfill both quantitative and qualitative measurement, the expectation and the variance of this averaged block entropy for a true-random image are strictly derived and corresponding numerical reference tables are also provided. Moreover, a hypothesis test at significance-level is given to help accept or reject the hypothesis that the test image is ideally encrypted/random-like. Simulation results show that the proposed test is able to give both effectively quantitative and qualitative results for image encryption. The same idea can also be applied to measure other digital data, like audio and video.
△ Less
Submitted 28 March, 2011;
originally announced March 2011.