Skip to main content

Showing 1–27 of 27 results for author: Nobel, A B

.
  1. arXiv:2107.11858  [pdf, ps, other

    math.ST math.DS stat.ML

    Estimation of Stationary Optimal Transport Plans

    Authors: Kevin O'Connor, Kevin McGoff, Andrew B Nobel

    Abstract: We study optimal transport for stationary stochastic processes taking values in finite spaces. In order to reflect the stationarity of the underlying processes, we restrict attention to stationary couplings, also known as joinings. The resulting optimal joining problem captures differences in the long run average behavior of the processes of interest. We introduce estimators of both optimal joinin… ▽ More

    Submitted 10 December, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

  2. arXiv:2106.07106  [pdf, other

    cs.LG stat.ML

    Alignment and Comparison of Directed Networks via Transition Couplings of Random Walks

    Authors: Bongsoo Yi, Kevin O'Connor, Kevin McGoff, Andrew B. Nobel

    Abstract: We describe and study a transport based procedure called NetOTC (network optimal transition coupling) for the comparison and alignment of two networks. The networks of interest may be directed or undirected, weighted or unweighted, and may have distinct vertex sets of different sizes. Given two networks and a cost function relating their vertices, NetOTC finds a transition coupling of their associ… ▽ More

    Submitted 5 February, 2024; v1 submitted 13 June, 2021; originally announced June 2021.

  3. arXiv:2009.05079  [pdf, other

    stat.ME cs.LG stat.ML

    Finding Groups of Cross-Correlated Features in Bi-View Data

    Authors: Miheer Dewaskar, John Palowitch, Mark He, Michael I. Love, Andrew B. Nobel

    Abstract: Datasets in which measurements of two (or more) types are obtained from a common set of samples arise in many scientific applications. A common problem in the exploratory analysis of such data is to identify groups of features of different data types that are strongly associated. A bimodule is a pair (A,B) of feature sets from two data types such that the aggregate cross-correlation between the fe… ▽ More

    Submitted 13 May, 2024; v1 submitted 10 September, 2020; originally announced September 2020.

    Comments: 30 pages, 5 figures. R package: https://github.com/miheerdew/cbce

    MSC Class: 62-04; 62H20 (Primary) 62J15; 62P10 (Secondary)

    Journal ref: Journal of Machine Learning Research Vol. 24, 2023

  4. arXiv:2006.07998  [pdf, other

    math.OC cs.DS stat.CO

    Optimal Transport for Stationary Markov Chains via Policy Iteration

    Authors: Kevin O'Connor, Kevin McGoff, Andrew B. Nobel

    Abstract: We study the optimal transport problem for pairs of stationary finite-state Markov chains, with an emphasis on the computation of optimal transition couplings. Transition couplings are a constrained family of transport plans that capture the dynamics of Markov chains. Solutions of the optimal transition coupling (OTC) problem correspond to alignments of the two chains that minimize long-term avera… ▽ More

    Submitted 16 September, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

  5. arXiv:1711.10427  [pdf, other

    stat.ME

    Latent Association Mining in Binary Data

    Authors: Carson Mosso, Kelly Bodwin, Suman Chakraborty, Kai Zhang, Andrew B. Nobel

    Abstract: We consider the problem of identifying stable sets of mutually associated features in moderate or high-dimensional binary data. In this context we develop and investigate a method called Latent Association Mining for Binary Data (LAMB). The LAMB method is based on a simple threshold model in which the observed binary values represent a random thresholding of a latent continuous vector that may hav… ▽ More

    Submitted 8 January, 2021; v1 submitted 28 November, 2017; originally announced November 2017.

    Comments: 29 pages, 2 tables, 4 figures 54 page appendix/supplemental figures

  6. arXiv:1701.05426  [pdf, other

    stat.AP

    HT-eQTL: Integrative Expression Quantitative Trait Loci Analysis in a Large Number of Human Tissues

    Authors: Gen Li, Dereje D. Jima, Fred A. Wright, Andrew B. Nobel

    Abstract: Expression quantitative trait loci (eQTL) analysis identifies genetic markers associated with the expression of a gene. Most existing eQTL analyses and methods investigate association in a single, readily available tissue, such as blood. Joint analysis of eQTL in multiple tissues has the potential to improve, and expand the scope of, single-tissue analyses. Large-scale collaborative efforts such a… ▽ More

    Submitted 6 September, 2017; v1 submitted 19 January, 2017; originally announced January 2017.

    MSC Class: 92D10; 62F15

  7. arXiv:1611.06173  [pdf, ps, other

    math.ST math.DS

    Empirical risk minimization and complexity of dynamical models

    Authors: Kevin McGoff, Andrew B. Nobel

    Abstract: A dynamical model consists of a continuous self-map $T: \mathcal{X} \to \mathcal{X}$ of a compact state space $\mathcal{X}$ and a continuous observation function $f: \mathcal{X} \to \mathbb{R}$. This paper considers the fitting of a parametrized family of dynamical models to an observed real-valued stochastic process using empirical risk minimization. The limiting behavior of the minimum risk para… ▽ More

    Submitted 23 January, 2018; v1 submitted 18 November, 2016; originally announced November 2016.

  8. arXiv:1610.06511  [pdf, other

    cs.SI physics.soc-ph stat.ME

    Community extraction in multilayer networks with heterogeneous community structure

    Authors: James D. Wilson, John Palowitch, Shankar Bhamidi, Andrew B. Nobel

    Abstract: Multilayer networks are a useful way to capture and model multiple, binary or weighted relationships among a fixed group of objects. While community detection has proven to be a useful exploratory technique for the analysis of single-layer networks, the development of community detection methods for multilayer networks is still in its infancy. We propose and investigate a procedure, called Multila… ▽ More

    Submitted 7 November, 2017; v1 submitted 20 October, 2016; originally announced October 2016.

    Comments: 46 pages. Accepted at the Journal of Machine Learning Research (11/17)

  9. arXiv:1605.08799  [pdf, other

    stat.ME

    Estimation of Interpretable eQTL Effect Sizes Using a Log of Linear Model

    Authors: John Palowitch, Andrey Shabalin, Yihui Zhou, Andrew B. Nobel, Fred A. Wright

    Abstract: The study of expression Quantitative Trait Loci (eQTL) is an important problem in genomics and biomedicine. While detection (testing) of eQTL associations has been widely studied, less work has been devoted to the estimation of eQTL effect size. To reduce false positives, detection methods frequently rely on linear modeling of rank-based normalized or log-transformed gene expression data. Unfortun… ▽ More

    Submitted 7 September, 2017; v1 submitted 27 May, 2016; originally announced May 2016.

  10. arXiv:1601.05630  [pdf, other

    cs.SI physics.soc-ph stat.ME

    Significance-based community detection in weighted networks

    Authors: John Palowitch, Shankar Bhamidi, Andrew B. Nobel

    Abstract: Community detection is the process of grou** strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical signficance. In this paper, we introduce a null for weighted networks called the continuous configuration model. We use… ▽ More

    Submitted 23 October, 2017; v1 submitted 21 January, 2016; originally announced January 2016.

    Comments: Code and supplemental info available at http://stats.johnpalowitch.com/ccme. V3 changes: based on lengthy referee revision process, new theoretical sections added, + major organizational changes. V2 changes: grant info added, 1 reference added, bibliography section moved to end, condensed bib line spacing, corrected typos

  11. arXiv:1601.05033  [pdf, ps, other

    math.DS math.PR math.ST

    Variational analysis of inference from dynamical systems

    Authors: Kevin McGoff, Andrew B. Nobel

    Abstract: We introduce and study a variational framework for the analysis of empirical risk based inference for dynamical systems and ergodic processes. The analysis applies to a two-stage estimation procedure in which (i) the trajectory of an observed (but unknown) system is fit to a trajectory from a known reference system by minimizing cumulative per-state loss, and (ii) a parameter estimate is obtained… ▽ More

    Submitted 23 January, 2018; v1 submitted 19 January, 2016; originally announced January 2016.

  12. arXiv:1403.2457  [pdf, other

    math.DS

    Entropy and the Uniform Mean Ergodic Theorem for a Family of Sets

    Authors: Terrence M. Adams, Andrew B. Nobel

    Abstract: We define a notion of entropy for an infinite family $\mathcal{C}$ of measurable sets in a probability space. We show that the mean ergodic theorem holds uniformly for $\mathcal{C}$ under every ergodic transformation if and only if $\mathcal{C}$ has zero entropy. When the entropy of $\mathcal{C}$ is positive, we establish a strong converse showing that the uniform mean ergodic theorem fails generi… ▽ More

    Submitted 10 March, 2014; originally announced March 2014.

    MSC Class: Primary 37A25; Secondary 60F05; 37A35; 37A50

  13. arXiv:1403.1876  [pdf, other

    math.ST q-bio.QM

    Consistent Testing for Recurrent Genomic Aberrations

    Authors: Vonn Walter, Fred A. Wright, Andrew B. Nobel

    Abstract: Genomic aberrations, such as somatic copy number alterations, are frequently observed in tumor tissue. Recurrent aberrations, occurring in the same region across multiple subjects, are of interest because they may highlight genes associated with tumor development or progression. A number of tools have been proposed to assess the statistical significance of recurrent DNA copy number aberrations, bu… ▽ More

    Submitted 7 March, 2014; originally announced March 2014.

    Comments: 35 pages, 7 figures

    MSC Class: 62G09; 62P10

  14. arXiv:1311.2948  [pdf, other

    stat.ME

    An Empirical Bayes Approach for Multiple Tissue eQTL Analysis

    Authors: Gen Li, Andrey A. Shabalin, Ivan Rusyn, Fred A. Wright, Andrew B. Nobel

    Abstract: Expression quantitative trait loci (eQTL) analyses, which identify genetic markers associated with the expression of a gene, are an important tool in the understanding of diseases in human and other populations. While most eQTL studies to date consider the connection between genetic variation and expression in a single tissue, complex, multi-tissue data sets are now being generated by the GTEx ini… ▽ More

    Submitted 6 September, 2017; v1 submitted 12 November, 2013; originally announced November 2013.

    Comments: accepted by Biostatistics

  15. arXiv:1308.0777  [pdf, ps, other

    cs.SI physics.soc-ph stat.ME

    A testing based extraction algorithm for identifying significant communities in networks

    Authors: James D. Wilson, Simi Wang, Peter J. Mucha, Shankar Bhamidi, Andrew B. Nobel

    Abstract: A common and important problem arising in the study of networks is how to divide the vertices of a given network into one or more groups, called communities, in such a way that vertices of the same community are more interconnected than vertices belonging to different ones. We propose and investigate a testing based community detection procedure called Extraction of Statistically Significant Commu… ▽ More

    Submitted 3 December, 2014; v1 submitted 4 August, 2013; originally announced August 2013.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOAS760 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS760

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 3, 1853-1891

  16. arXiv:1211.2284  [pdf, other

    math.PR math.ST

    Energy Landscape for large average submatrix detection problems in Gaussian random matrices

    Authors: Shankar Bhamidi, Partha S. Dey, Andrew B. Nobel

    Abstract: The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an $n \times n$ Gaussian random matrix. The first part of the paper addresses global maxima. For fixed $k$ we identify the… ▽ More

    Submitted 13 June, 2013; v1 submitted 9 November, 2012; originally announced November 2012.

    Comments: Proofs simplified, 49 pages, 3 figures

    MSC Class: 62G32; 60F05; 60G70

  17. arXiv:1102.4110  [pdf, ps, other

    stat.ML stat.AP stat.ME

    Joint and individual variation explained (JIVE) for integrated analysis of multiple data types

    Authors: Eric F. Lock, Katherine A. Hoadley, J. S. Marron, Andrew B. Nobel

    Abstract: Research in several fields now requires the analysis of data sets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of… ▽ More

    Submitted 28 May, 2013; v1 submitted 20 February, 2011; originally announced February 2011.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS597 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS597

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 1, 523-542

  18. arXiv:1010.4515  [pdf, ps, other

    math.PR math.ST stat.ML

    Uniform Approximation of Vapnik-Chervonenkis Classes

    Authors: Terrence M. Adams, Andrew B. Nobel

    Abstract: For any family of measurable sets in a probability space, we show that either (i) the family has infinite Vapnik-Chervonenkis (VC) dimension or (ii) for every epsilon > 0 there is a finite partition pi such the pi-boundary of each set has measure at most epsilon. Immediate corollaries include the fact that a family with finite VC dimension has finite bracketing numbers, and satisfies uniform laws… ▽ More

    Submitted 21 October, 2010; originally announced October 2010.

    Comments: 13 pages, no figures

    MSC Class: 60F15 (primary); 60G10; 62G05 (secondary)

  19. Uniform convergence of Vapnik--Chervonenkis classes under ergodic sampling

    Authors: Terrence M. Adams, Andrew B. Nobel

    Abstract: We show that if $\mathcal{X}$ is a complete separable metric space and $\mathcal{C}$ is a countable family of Borel subsets of $\mathcal{X}$ with finite VC dimension, then, for every stationary ergodic process with values in $\mathcal{X}$, the relative frequencies of sets $C\in\mathcal{C}$ converge uniformly to their limiting probabilities. Beyond ergodicity, no assumptions are imposed on the samp… ▽ More

    Submitted 15 October, 2010; originally announced October 2010.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOP511 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOP-AOP511

    Journal ref: Annals of Probability 2010, Vol. 38, No. 4, 1345-1367

  20. arXiv:1009.0562  [pdf, other

    math.ST math.PR

    On the maximal size of Large-Average and ANOVA-fit Submatrices in a Gaussian Random Matrix

    Authors: Xing Sun, Andrew B. Nobel

    Abstract: We investigate the maximal size of distinguished submatrices of a Gaussian random matrix. Of interest are submatrices whose entries have average greater than or equal to a positive constant, and submatrices whose entries are well-fit by a two-way ANOVA model. We identify size thresholds and associated (asymptotic) probability bounds for both large-average and ANOVA-fit submatrices. Results are obt… ▽ More

    Submitted 2 September, 2010; originally announced September 2010.

    Comments: 25 pages, 3 figures

    MSC Class: 60B20; 60C05

  21. arXiv:1007.4037  [pdf, ps, other

    math.PR math.ST stat.ML

    Uniform Approximation and Bracketing Properties of VC classes

    Authors: Terrence M. Adams, Andrew B. Nobel

    Abstract: We show that the sets in a family with finite VC dimension can be uniformly approximated within a given error by a finite partition. Immediate corollaries include the fact that VC classes have finite bracketing numbers, satisfy uniform laws of averages under strong dependence, and exhibit uniform mixing. Our results are based on recent work concerning uniform laws of averages for VC classes under… ▽ More

    Submitted 22 July, 2010; originally announced July 2010.

    Comments: 10 pages

    MSC Class: 60F15 (primary); 60G10; 62G05 (secondary)

  22. arXiv:1007.2964  [pdf, ps, other

    math.PR math.ST stat.ML

    The Gap Dimension and Uniform Laws of Large Numbers for Ergodic Processes

    Authors: Terrence M. Adams, Andrew B. Nobel

    Abstract: Let F be a family of Borel measurable functions on a complete separable metric space. The gap (or fat-shattering) dimension of F is a combinatorial quantity that measures the extent to which functions f in F can separate finite sets of points at a predefined resolution gamma > 0. We establish a connection between the gap dimension of F and the uniform convergence of its sample averages under ergod… ▽ More

    Submitted 17 July, 2010; originally announced July 2010.

    Comments: 24 pages, submitted for publication

    MSC Class: 60F15 (primary); 60G10; 62G05 (secondary)

  23. arXiv:0905.1682  [pdf, ps, other

    q-bio.GN q-bio.QM

    Finding large average submatrices in high dimensional data

    Authors: Andrey A. Shabalin, Victor J. Weigman, Charles M. Perou, Andrew B. Nobel

    Abstract: The search for sample-variable associations is an important problem in the exploratory analysis of high dimensional data. Biclustering methods search for sample-variable associations in the form of distinguished submatrices of the data matrix. (The rows and columns of a submatrix need not be contiguous.) In this paper we propose and evaluate a statistically motivated biclustering procedure (LAS)… ▽ More

    Submitted 8 October, 2009; v1 submitted 11 May, 2009; originally announced May 2009.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS239 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS239

    Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 3, 985-1012

  24. A statistical framework for testing functional categories in microarray data

    Authors: William T. Barry, Andrew B. Nobel, Fred A. Wright

    Abstract: Ready access to emerging databases of gene annotation and functional pathways has shifted assessments of differential expression in DNA microarray studies from single genes to groups of genes with shared biological function. This paper takes a critical look at existing methods for assessing the differential expression of a group of genes (functional category), and provides some suggestions for i… ▽ More

    Submitted 27 March, 2008; originally announced March 2008.

    Comments: Published in at http://dx.doi.org/10.1214/07-AOAS146 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS146

    Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 1, 286-315

  25. arXiv:0710.2500  [pdf, ps, other

    math.PR cs.IT math.ST

    Density estimation from an individual numerical sequence

    Authors: Andrew B. Nobel, Gusztav Morvai, Sanjeev R. Kulkarni

    Abstract: This paper considers estimation of a univariate density from an individual numerical sequence. It is assumed that (i) the limiting relative frequencies of the numerical sequence are governed by an unknown density, and (ii) there is a known upper bound for the variation of the density on an increasing sequence of intervals. A simple estimation scheme is proposed, and is shown to be $L_1$ consiste… ▽ More

    Submitted 12 October, 2007; originally announced October 2007.

    Journal ref: IEEE Trans. Inform. Theory 44 (1998), no. 2, 537--541

  26. arXiv:0710.2496  [pdf, ps, other

    math.PR cs.IT math.ST

    Regression estimation from an individual stable sequence

    Authors: Gusztav Morvai, Sanjeev R. Kulkarni, Andrew B. Nobel

    Abstract: We consider univariate regression estimation from an individual (non-random) sequence $(x_1,y_1),(x_2,y_2), ... \in \real \times \real$, which is stable in the sense that for each interval $A \subseteq \real$, (i) the limiting relative frequency of $A$ under $x_1, x_2, ...$ is governed by an unknown probability distribution $μ$, and (ii) the limiting average of those $y_i$ with $x_i \in A$ is go… ▽ More

    Submitted 12 October, 2007; originally announced October 2007.

    Journal ref: Statistics 33 (1999), no. 2, 99--118

  27. arXiv:nlin/0604052  [pdf, ps, other

    nlin.CD

    Denoising Deterministic Time Series

    Authors: Steven P. Lalley, Andrew B. Nobel

    Abstract: This paper is concerned with the problem of recovering a finite, deterministic time series from observations that are corrupted by additive, independent noise. A distinctive feature of this problem is that the available data exhibit long-range dependence and, as a consequence, existing statistical theory and methods are not readily applicable. This paper gives an analysis of the denoising proble… ▽ More

    Submitted 21 April, 2006; originally announced April 2006.