Search | arXiv e-print repository

Multidimensional well-being of US households at a fine spatial scale using fused household surveys: fusionACS

Authors: Kevin Ummel, Miguel Poblete-Cazenave, Karthik Akkiraju, Nick Graetz, Hero Ashman, Cora Kingdon, Steven Herrera Tenorio, Aaryaman "Sunny" Singhal, Daniel Aldana Cohen, Narasimha D. Rao

Abstract: Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistical… ▽ More Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistically "fusing" variables from "donor" surveys onto American Community Survey (ACS) microdata. This results in an integrated microdataset of household attributes and well-being dimensions that can be analyzed to address research questions in ways that are not currently possible. The presented data comprise the fusion onto the ACS of select donor variables from the Residential Energy Consumption Survey (RECS) of 2015, the National Household Transportation Survey (NHTS) of 2017, the American Housing Survey (AHS) of 2019, and the Consumer Expenditure Survey - Interview (CEI) for the years 2015-2019. The underlying statistical techniques are included in an open-source $R$ package, fusionModel, that provides generic tools for the creation, analysis, and validation of fused microdata. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 35 pages, 6 figures

arXiv:2306.04907 [pdf, other]

Estimation of Poverty Measures for Small Areas Under a Two-Fold Nested Error Linear Regression Model: Comparison of Two Methods

Authors: Maryam Sohrabi, J. N. K. Rao

Abstract: Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular pove… ▽ More Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular poverty incidence and poverty gap called FGT measures, using a simulated census method, called ELL, based on a one-fold nested error model for a suitable transformation of the welfare variable. Modified ELL methods leading to significant gain in efficiency over ELL also have been proposed under the one-fold model. An advantage of ELL and modified ELL methods is that distributional assumptions on the random effects in the model are not needed. In this paper, we extend ELL and modified ELL to two-fold nested error models to estimate poverty indicators for areas (say a state) and subareas (say counties within a state). Our simulation results indicate that the modified ELL estimators lead to large efficiency gains over ELL at the area level and subarea level. Further, modified ELL method retaining both area and subarea estimated effects in the model (called MELL2) performs significantly better in terms of mean squared error (MSE) for sampled subareas than the modified ELL retaining only estimated area effect in the model (called MELL1). △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2206.03040 [pdf, other]

doi 10.1145/3534678.3539194

Learning Backward Compatible Embeddings

Authors: Weihua Hu, Rajas Bansal, Kaidi Cao, Nikhil Rao, Karthik Subbian, Jure Leskovec

Abstract: Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However… ▽ More Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However, as the embedding model gets updated and retrained to improve performance on the intended task, the newly-generated embeddings are no longer compatible with the existing consumer models. This means that historical versions of the embeddings can never be retired or all consumer teams have to retrain their models to make them compatible with the latest version of the embeddings, both of which are extremely costly in practice. Here we study the problem of embedding version updates and their backward compatibility. We formalize the problem where the goal is for the embedding team to keep updating the embedding version, while the consumer teams do not have to retrain their models. We develop a solution based on learning backward compatible embeddings, which allows the embedding model version to be updated frequently, while also allowing the latest version of the embedding to be quickly transformed into any backward compatible historical version of it, so that consumer teams do not have to retrain their models. Under our framework, we explore six methods and systematically evaluate them on a real-world recommender system application. We show that the best method, which we call BC-Aligner, maintains backward compatibility with existing unintended tasks even after multiple model version updates. Simultaneously, BC-Aligner achieves the intended task performance similar to the embedding model that is solely optimized for the intended task. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: KDD 2022, Applied Data Science Track

arXiv:2110.14011 [pdf, other]

Cluster-and-Conquer: A Framework For Time-Series Forecasting

Authors: Reese Pathak, Rajat Sen, Nikhil Rao, N. Benjamin Erichson, Michael I. Jordan, Inderjit S. Dhillon

Abstract: We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time ser… ▽ More We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time series in the same cluster, accounting for intra-cluster similarity while minimizing potential noise in predictions by ignoring inter-cluster effects. Our framework -- which we refer to as "cluster-and-conquer" -- is highly general, allowing for any time-series forecasting and clustering method to be used in each step. It is computationally efficient and embarrassingly parallel. We motivate our framework with a theoretical analysis in an idealized mixed linear regression setting, where we provide guarantees on the quality of the estimates. We accompany these guarantees with experimental results that demonstrate the advantages of our framework: when instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets, sometimes outperforming deep-learning-based approaches. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: 25 pages, 3 figures

arXiv:2109.01965 [pdf, other]

Scalable Feature Selection for (Multitask) Gradient Boosted Trees

Authors: Cuize Han, Nikhil Rao, Daria Sorokina, Karthik Subbian

Abstract: Gradient Boosted Decision Trees (GBDTs) are widely used for building ranking and relevance models in search and recommendation. Considerations such as latency and interpretability dictate the use of as few features as possible to train these models. Feature selection in GBDT models typically involves heuristically ranking the features by importance and selecting the top few, or by performing a ful… ▽ More Gradient Boosted Decision Trees (GBDTs) are widely used for building ranking and relevance models in search and recommendation. Considerations such as latency and interpretability dictate the use of as few features as possible to train these models. Feature selection in GBDT models typically involves heuristically ranking the features by importance and selecting the top few, or by performing a full backward feature elimination routine. On-the-fly feature selection methods proposed previously scale suboptimally with the number of features, which can be daunting in high dimensional settings. We develop a scalable forward feature selection variant for GBDT, via a novel group testing procedure that works well in high dimensions, and enjoys favorable theoretical performance and computational guarantees. We show via extensive experiments on both public and proprietary datasets that the proposed method offers significant speedups in training time, while being as competitive as existing GBDT methods in terms of model performance metrics. We also extend the method to the multitask setting, allowing the practitioner to select common features across tasks, as well as selecting task-specific features. △ Less

Submitted 4 September, 2021; originally announced September 2021.

Comments: Correct a mistake in the proof of Lemma B1 in http://proceedings.mlr.press/v108/han20a.html

Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:885-894, 2020

arXiv:2108.01152 [pdf, ps, other]

Maximizing and Satisficing in Multi-armed Bandits with Graph Information

Authors: Parth K. Thaker, Mohit Malu, Nikhil Rao, Gautam Dasarathy

Abstract: Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to simila… ▽ More Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similar relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms are captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one with a sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms \textbf{\algoname{}} (GRaph-based UcB) and $ζ$-\textbf{\algoname{}} for these problems and provide a theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement, showing a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information. △ Less

Submitted 20 November, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

arXiv:2005.12172 [pdf, ps, other]

Empirical Likelihood Inference With Public-Use Survey Data

Authors: Puying Zhao, J. N. K. Rao, Changbao Wu

Abstract: Public-use survey data are an important source of information for researchers in social science and health studies to build statistical models and make inferences on the target finite population. This paper presents two general inferential tools through the pseudo empirical likelihood and the sample empirical likelihood methods. Theoretical results on point estimation and linear or nonlinear hypot… ▽ More Public-use survey data are an important source of information for researchers in social science and health studies to build statistical models and make inferences on the target finite population. This paper presents two general inferential tools through the pseudo empirical likelihood and the sample empirical likelihood methods. Theoretical results on point estimation and linear or nonlinear hypothesis tests involving parameters defined through estimating equations are established, and practical issues with the implementation of the proposed methods are discussed. Results from simulation studies and an application to the 2016 General Social Survey dataset of Statistics Canada show that the proposed methods work well under different scenarios. The inferential procedures and theoretical results presented in the paper make the empirical likelihood a practically useful tool for users of complex survey data. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 50 pages, including 11 pages of tables

MSC Class: 62D05; 62G05; 62G10

arXiv:1908.01794 [pdf, other]

Some Developments in Clustering Analysis on Stochastic Processes

Authors: Qidi Peng, Nan Rao, Ran Zhao

Abstract: We review some developments on clustering stochastic processes and come with the conclusion that asymptotically consistent clustering algorithms can be obtained when the processes are ergodic and the dissimilarity measure satisfies the triangle inequality. Examples are provided when the processes are distribution ergodic, covariance ergodic and locally asymptotically self-similar, respectively. We review some developments on clustering stochastic processes and come with the conclusion that asymptotically consistent clustering algorithms can be obtained when the processes are ergodic and the dissimilarity measure satisfies the triangle inequality. Examples are provided when the processes are distribution ergodic, covariance ergodic and locally asymptotically self-similar, respectively. △ Less

Submitted 5 August, 2019; originally announced August 2019.

arXiv:1905.12217 [pdf, other]

Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering

Authors: Liwei Wu, Hsiang-Fu Yu, Nikhil Rao, James Sharpnack, Cho-Jui Hsieh

Abstract: In this paper, we consider recommender systems with side information in the form of graphs. Existing collaborative filtering algorithms mainly utilize only immediate neighborhood information and have a hard time taking advantage of deeper neighborhoods beyond 1-2 hops. The main caveat of exploiting deeper graph information is the rapidly growing time and space complexity when incorporating informa… ▽ More In this paper, we consider recommender systems with side information in the form of graphs. Existing collaborative filtering algorithms mainly utilize only immediate neighborhood information and have a hard time taking advantage of deeper neighborhoods beyond 1-2 hops. The main caveat of exploiting deeper graph information is the rapidly growing time and space complexity when incorporating information from these neighborhoods. In this paper, we propose using Graph DNA, a novel Deep Neighborhood Aware graph encoding algorithm, for exploiting deeper neighborhood information. DNA encoding computes approximate deep neighborhood information in linear time using Bloom filters, a space-efficient probabilistic data structure and results in a per-node encoding that is logarithmic in the number of nodes in the graph. It can be used in conjunction with both feature-based and graph-regularization-based collaborative filtering algorithms. Graph DNA has the advantages of being memory and time efficient and providing additional regularization when compared to directly using higher order graph information. We conduct experiments on real-world datasets, showing graph DNA can be easily used with 4 popular collaborative filtering algorithms and consistently leads to a performance boost with little computational and memory overhead. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Comments: under review

arXiv:1902.08944 [pdf, ps, other]

Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

Authors: Jae-kwang Kim, J. N. K. Rao, Zhonglei Wang

Abstract: Standard statistical methods that do not take proper account of the complexity of survey design can lead to erroneous inferences when applied to survey data due to unequal selection probabilities, clustering, and other design features. In particular, the actual type I error rates of tests of hypotheses using standard methods can be much bigger than the nominal significance level. Methods that take… ▽ More Standard statistical methods that do not take proper account of the complexity of survey design can lead to erroneous inferences when applied to survey data due to unequal selection probabilities, clustering, and other design features. In particular, the actual type I error rates of tests of hypotheses using standard methods can be much bigger than the nominal significance level. Methods that take account of survey design features in testing hypotheses have been proposed, including Wald tests and quasi-score tests that involve the estimated covariance matrices of parameter estimates. In this paper, we present a unified approach to hypothesis testing that does not require computing the covariance matrices by constructing bootstrap approximations to weighted likelihood ratio statistics and weighted quasi-score statistics and establish the asymptotic validity of the proposed bootstrap tests. In addition, we also consider hypothesis testing from categorical data and present a bootstrap procedure for testing simple goodness of fit and independence in a two-way table. In the simulation studies, the type I error rates of the proposed approach are much closer to their nominal significance level compared with the naive likelihood-ratio test and quasi-score test. An application to data from an educational survey under a logistic regression model is also presented. △ Less

Submitted 2 March, 2021; v1 submitted 24 February, 2019; originally announced February 2019.

arXiv:1804.06234 [pdf, other]

Cluster Analysis on Locally Asymptotically Self-similar Processes with Known Number of Clusters

Authors: Qidi Peng, Nan Rao, Ran Zhao

Abstract: We conduct cluster analysis on a class of locally asymptotically self-similar stochastic processes, which includes multifractional Brownian motion as a representative. When the true number of clusters is supposed to be known, a new covariance-based dissimilarity measure is introduced, from which we obtain the approximately asymptotically consistent clustering algorithms. In simulation studies, clu… ▽ More We conduct cluster analysis on a class of locally asymptotically self-similar stochastic processes, which includes multifractional Brownian motion as a representative. When the true number of clusters is supposed to be known, a new covariance-based dissimilarity measure is introduced, from which we obtain the approximately asymptotically consistent clustering algorithms. In simulation studies, clustering data sampled from multifractional Brownian motions with distinct functional Hurst parameters illustrates the approximated asymptotic consistency of the proposed algorithms. Clustering global financial markets' equity indexes returns and sovereign CDS spreads provides a successful real world application. △ Less

Submitted 14 January, 2020; v1 submitted 13 April, 2018; originally announced April 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1801.09049

MSC Class: 62-07; 60G10; 62M10

arXiv:1801.09049 [pdf, other]

Covariance-based Dissimilarity Measures Applied to Clustering Wide-sense Stationary Ergodic Processes

Authors: Qidi Peng, Nan Rao, Ran Zhao

Abstract: We introduce a new unsupervised learning problem: clustering wide-sense stationary ergodic stochastic processes. A covariance-based dissimilarity measure together with asymptotically consistent algorithms is designed for clustering offline and online datasets, respectively. We also suggest a formal criterion on the efficiency of dissimilarity measures, and discuss of some approach to improve the e… ▽ More We introduce a new unsupervised learning problem: clustering wide-sense stationary ergodic stochastic processes. A covariance-based dissimilarity measure together with asymptotically consistent algorithms is designed for clustering offline and online datasets, respectively. We also suggest a formal criterion on the efficiency of dissimilarity measures, and discuss of some approach to improve the efficiency of our clustering algorithms, when they are applied to cluster particular type of processes, such as self-similar processes with wide-sense stationary ergodic increments. Clustering synthetic data and real-world data are provided as examples of applications. △ Less

Submitted 1 July, 2019; v1 submitted 27 January, 2018; originally announced January 2018.

MSC Class: 62-07; 60G10; 62M10

arXiv:1711.02213 [pdf, other]

Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks

Authors: Urs Köster, Tristan J. Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William H. Constable, Oğuz H. Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, Naveen Rao

Abstract: Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the F… ▽ More Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the Flexpoint data format, aiming at a complete replacement of 32-bit floating point format training and inference, designed to support modern deep network topologies without modifications. Flexpoint tensors have a shared exponent that is dynamically adjusted to minimize overflows and maximize available dynamic range. We validate Flexpoint by training AlexNet, a deep residual network and a generative adversarial network, using a simulator implemented with the neon deep learning framework. We demonstrate that 16-bit Flexpoint closely matches 32-bit floating point in training all three models, without any need for tuning of model hyperparameters. Our results suggest Flexpoint as a promising numerical format for future hardware for training and inference. △ Less

Submitted 2 December, 2017; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: 14 pages, 5 figures, accepted in Neural Information Processing Systems 2017

arXiv:1705.02047 [pdf, other]

Matrix Completion via Factorizing Polynomials

Authors: Vatsal Shah, Nikhil Rao, Weicong Ding

Abstract: Predicting unobserved entries of a partially observed matrix has found wide applicability in several areas, such as recommender systems, computational biology, and computer vision. Many scalable methods with rigorous theoretical guarantees have been developed for algorithms where the matrix is factored into low-rank components, and embeddings are learned for the row and column entities. While ther… ▽ More Predicting unobserved entries of a partially observed matrix has found wide applicability in several areas, such as recommender systems, computational biology, and computer vision. Many scalable methods with rigorous theoretical guarantees have been developed for algorithms where the matrix is factored into low-rank components, and embeddings are learned for the row and column entities. While there has been recent research on incorporating explicit side information in the low-rank matrix factorization setting, often implicit information can be gleaned from the data, via higher-order interactions among entities. Such implicit information is especially useful in cases where the data is very sparse, as is often the case in real-world datasets. In this paper, we design a method to learn embeddings in the context of recommendation systems, using the observation that higher powers of a graph transition probability matrix encode the probability that a random walker will hit that node in a given number of steps. We develop a coordinate descent algorithm to solve the resulting optimization, that makes explicit computation of the higher order powers of the matrix redundant, preserving sparsity and making computations efficient. Experiments on several datasets show that our method, that can use higher order information, outperforms methods that only use explicitly available side information, those that use only second-order implicit information and in some cases, methods based on deep neural networks as well. △ Less

Submitted 13 February, 2018; v1 submitted 4 May, 2017; originally announced May 2017.

arXiv:1703.00607 [pdf, other]

doi 10.1145/3159652.3159703

Dynamic Word Embeddings for Evolving Semantic Discovery

Authors: Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong

Abstract: Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In… ▽ More Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In this paper, we develop a dynamic statistical model to learn time-aware word vector representation. We propose a model that simultaneously learns time-aware embeddings and solves the resulting "alignment problem". This model is trained on a crawled NYTimes dataset. Additionally, we develop multiple intuitive evaluation strategies of temporal word embeddings. Our qualitative and quantitative tests indicate that our method not only reliably captures this evolution over time, but also consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality. △ Less

Submitted 13 February, 2018; v1 submitted 1 March, 2017; originally announced March 2017.

Comments: 9 pages, published in the International Conference on Web Search and Data Mining (WSDM 2018)

arXiv:1603.03980 [pdf, ps, other]

On Learning High Dimensional Structured Single Index Models

Authors: Nikhil Rao, Ravi Ganti, Laura Balzano, Rebecca Willett, Robert Nowak

Abstract: Single Index Models (SIMs) are simple yet flexible semi-parametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimension… ▽ More Single Index Models (SIMs) are simple yet flexible semi-parametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions, and under general structural assumptions, has not been forthcoming. In this paper, we propose computationally efficient algorithms for SIM inference in high dimensions with structural constraints. Our general approach specializes to sparsity, group sparsity, and low-rank assumptions among others. Experiments show that the proposed method enjoys superior predictive performance when compared to generalized linear models, and achieves results comparable to or better than single layer feedforward neural networks with significantly less computational cost. △ Less

Submitted 29 November, 2016; v1 submitted 12 March, 2016; originally announced March 2016.

Comments: 7 pages, 3 tables, 1 Figure, substantial text overlap with arXiv:1506.08910; Accepted for publication at AAAI 2017; added new experimental results comparing our method to a single layer neural network

arXiv:1602.06042 [pdf, ps, other]

Structured Sparse Regression via Greedy Hard-Thresholding

Authors: Prateek Jain, Nikhil Rao, Inderjit Dhillon

Abstract: Several learning applications require solving high-dimensional regression problems where the relevant features belong to a small number of (overlap**) groups. For very large datasets and under standard sparsity constraints, hard thresholding methods have proven to be extremely efficient, but such methods require NP hard projections when dealing with overlap** groups. In this paper, we show tha… ▽ More Several learning applications require solving high-dimensional regression problems where the relevant features belong to a small number of (overlap**) groups. For very large datasets and under standard sparsity constraints, hard thresholding methods have proven to be extremely efficient, but such methods require NP hard projections when dealing with overlap** groups. In this paper, we show that such NP-hard projections can not only be avoided by appealing to submodular optimization, but such methods come with strong theoretical guarantees even in the presence of poorly conditioned data (i.e. say when two features have correlation $\geq 0.99$), which existing analyses cannot handle. These methods exhibit an interesting computation-accuracy trade-off and can be extended to significantly harder problems such as sparse overlap** groups. Experiments on both real and synthetic data validate our claims and demonstrate that the proposed methods are orders of magnitude faster than other greedy and convex relaxation techniques for learning with group-structured sparsity. △ Less

Submitted 27 May, 2016; v1 submitted 18 February, 2016; originally announced February 2016.

arXiv:1509.08333 [pdf, other]

High-dimensional Time Series Prediction with Missing Values

Authors: Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon

Abstract: High-dimensional time series prediction is needed in applications as diverse as demand forecasting and climatology. Often, such applications require methods that are both highly scalable, and deal with noisy data in terms of corruptions or missing values. Classical time series methods usually fall short of handling both these issues. In this paper, we propose to adapt matrix matrix completion appr… ▽ More High-dimensional time series prediction is needed in applications as diverse as demand forecasting and climatology. Often, such applications require methods that are both highly scalable, and deal with noisy data in terms of corruptions or missing values. Classical time series methods usually fall short of handling both these issues. In this paper, we propose to adapt matrix matrix completion approaches that have previously been successfully applied to large scale noisy data, but which fail to adequately model high-dimensional time series due to temporal dependencies. We present a novel temporal regularized matrix factorization (TRMF) framework which supports data-driven temporal dependency learning and enables forecasting ability to our new matrix factorization approach. TRMF is highly general, and subsumes many existing matrix factorization approaches for time series data. We make interesting connections to graph regularized matrix factorization methods in the context of learning the dependencies. Experiments on both real and synthetic data show that TRMF outperforms several existing approaches for common time series tasks. △ Less

Submitted 16 February, 2016; v1 submitted 28 September, 2015; originally announced September 2015.

arXiv:1506.08910 [pdf, other]

Learning Single Index Models in High Dimensions

Authors: Ravi Ganti, Nikhil Rao, Rebecca M. Willett, Robert Nowak

Abstract: Single Index Models (SIMs) are simple yet flexible semi-parametric models for classification and regression. Response variables are modeled as a nonlinear, monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights, and the nonlinear function. While methods have been described to learn SIMs in the low dimensional regime, a method t… ▽ More Single Index Models (SIMs) are simple yet flexible semi-parametric models for classification and regression. Response variables are modeled as a nonlinear, monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights, and the nonlinear function. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions has not been forthcoming. We propose three variants of a computationally and statistically efficient algorithm for SIM inference in high dimensions. We establish excess risk bounds for the proposed algorithms and experimentally validate the advantages that our SIM learning methods provide relative to Generalized Linear Model (GLM) and low dimensional SIM based learning methods. △ Less

Submitted 29 June, 2015; originally announced June 2015.

Comments: 16 pages, 2 figures, 1 table

arXiv:1505.04085 [pdf, other]

Optimal Low-Rank Tensor Recovery from Separable Measurements: Four Contractions Suffice

Authors: Parikshit Shah, Nikhil Rao, Gongguo Tang

Abstract: Tensors play a central role in many modern machine learning and signal processing applications. In such applications, the target tensor is usually of low rank, i.e., can be expressed as a sum of a small number of rank one tensors. This motivates us to consider the problem of low rank tensor recovery from a class of linear measurements called separable measurements. As specific examples, we focus o… ▽ More Tensors play a central role in many modern machine learning and signal processing applications. In such applications, the target tensor is usually of low rank, i.e., can be expressed as a sum of a small number of rank one tensors. This motivates us to consider the problem of low rank tensor recovery from a class of linear measurements called separable measurements. As specific examples, we focus on two distinct types of separable measurement mechanisms (a) Random projections, where each measurement corresponds to an inner product of the tensor with a suitable random tensor, and (b) the completion problem where measurements constitute revelation of a random set of entries. We present a computationally efficient algorithm, with rigorous and order-optimal sample complexity results (upto logarithmic factors) for tensor recovery. Our method is based on reduction to matrix completion sub-problems and adaptation of Leurgans' method for tensor decomposition. We extend the methodology and sample complexity results to higher order tensors, and experimentally validate our theoretical results. △ Less

Submitted 15 May, 2015; originally announced May 2015.

arXiv:1407.8384 [pdf, ps, other]

doi 10.1214/13-AOAS702

Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach

Authors: Isabel Molina, Balgobin Nandram, J. N. K. Rao

Abstract: Poverty maps are used to aid important political decisions such as allocation of development funds by governments and international organizations. Those decisions should be based on the most accurate poverty figures. However, often reliable poverty figures are not available at fine geographical levels or for particular risk population subgroups due to the sample size limitation of current national… ▽ More Poverty maps are used to aid important political decisions such as allocation of development funds by governments and international organizations. Those decisions should be based on the most accurate poverty figures. However, often reliable poverty figures are not available at fine geographical levels or for particular risk population subgroups due to the sample size limitation of current national surveys. These surveys cannot cover adequately all the desired areas or population subgroups and, therefore, models relating the different areas are needed to 'borrow strength" from area to area. In particular, the Spanish Survey on Income and Living Conditions (SILC) produces national poverty estimates but cannot provide poverty estimates by Spanish provinces due to the poor precision of direct estimates, which use only the province specific data. It also raises the ethical question of whether poverty is more severe for women than for men in a given province. We develop a hierarchical Bayes (HB) approach for poverty map** in Spanish provinces by gender that overcomes the small province sample size problem of the SILC. The proposed approach has a wide scope of application because it can be used to estimate general nonlinear parameters. We use a Bayesian version of the nested error regression model in which Markov chain Monte Carlo procedures and the convergence monitoring therein are avoided. A simulation study reveals good frequentist properties of the HB approach. The resulting poverty maps indicate that poverty, both in frequency and intensity, is localized mostly in the southern and western provinces and it is more acute for women than for men in most of the provinces. △ Less

Submitted 31 July, 2014; originally announced July 2014.

Comments: Published in at http://dx.doi.org/10.1214/13-AOAS702 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS702

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 2, 852-885

arXiv:1404.5692 [pdf, other]

doi 10.1109/TSP.2015.2461515

Forward - Backward Greedy Algorithms for Atomic Norm Regularization

Authors: Nikhil Rao, Parikshit Shah, Stephen Wright

Abstract: In many signal processing applications, the aim is to reconstruct a signal that has a simple representation with respect to a certain basis or frame. Fundamental elements of the basis known as "atoms" allow us to define "atomic norms" that can be used to formulate convex regularizations for the reconstruction problem. Efficient algorithms are available to solve these formulations in certain specia… ▽ More In many signal processing applications, the aim is to reconstruct a signal that has a simple representation with respect to a certain basis or frame. Fundamental elements of the basis known as "atoms" allow us to define "atomic norms" that can be used to formulate convex regularizations for the reconstruction problem. Efficient algorithms are available to solve these formulations in certain special cases, but an approach that works well for general atomic norms, both in terms of speed and reconstruction accuracy, remains to be found. This paper describes an optimization algorithm called CoGEnT that produces solutions with succinct atomic representations for reconstruction problems, generally formulated with atomic-norm constraints. CoGEnT combines a greedy selection scheme based on the conditional gradient approach with a backward (or "truncation") step that exploits the quadratic nature of the objective to reduce the basis size. We establish convergence properties and validate the algorithm via extensive numerical experiments on a suite of signal processing applications. Our algorithm and analysis also allow for inexact forward steps and for occasional enhancements of the current representation to be performed. CoGEnT can outperform the basic conditional gradient method, and indeed many methods that are tailored to specific applications, when the enhancement and truncation steps are defined appropriately. We also introduce several novel applications that are enabled by the atomic-norm framework, including tensor completion, moment problems in signal processing, and graph deconvolution. △ Less

Submitted 22 July, 2015; v1 submitted 22 April, 2014; originally announced April 2014.

Comments: To appear in IEEE Transactions on Signal Processing

arXiv:1402.4512 [pdf, other]

Classification with Sparse Overlap** Groups

Authors: Nikhil Rao, Robert Nowak, Christopher Cox, Timothy Rogers

Abstract: Classification with a sparsity constraint on the solution plays a central role in many high dimensional machine learning applications. In some cases, the features can be grouped together so that entire subsets of features can be selected or not selected. In many applications, however, this can be too restrictive. In this paper, we are interested in a less restrictive form of structured sparse feat… ▽ More Classification with a sparsity constraint on the solution plays a central role in many high dimensional machine learning applications. In some cases, the features can be grouped together so that entire subsets of features can be selected or not selected. In many applications, however, this can be too restrictive. In this paper, we are interested in a less restrictive form of structured sparse feature selection: we assume that while features can be grouped according to some notion of similarity, not all features in a group need be selected for the task at hand. When the groups are comprised of disjoint sets of features, this is sometimes referred to as the "sparse group" lasso, and it allows for working with a richer class of models than traditional group lasso methods. Our framework generalizes conventional sparse group lasso further by allowing for overlap** groups, an additional flexiblity needed in many applications and one that presents further challenges. The main contribution of this paper is a new procedure called Sparse Overlap** Group (SOG) lasso, a convex optimization program that automatically selects similar features for classification in high dimensions. We establish model selection error bounds for SOGlasso classification problems under a fairly general setting. In particular, the error bounds are the first such results for classification using the sparse group lasso. Furthermore, the general SOGlasso bound specializes to results for the lasso and the group lasso, some known and some new. The SOGlasso is motivated by multi-subject fMRI studies in which functional activity is classified using brain voxels as features, source localization problems in Magnetoencephalography (MEG), and analyzing gene activation patterns in microarray data analysis. Experiments with real and synthetic data demonstrate the advantages of SOGlasso compared to the lasso and group lasso. △ Less

Submitted 4 September, 2014; v1 submitted 18 February, 2014; originally announced February 2014.

Comments: Tighter result compared to the previous version. Some additional details and justification on the problem being solved

arXiv:1311.5422 [pdf, other]

Sparse Overlap** Sets Lasso for Multitask Learning and its Application to fMRI Analysis

Authors: Nikhil Rao, Christopher Cox, Robert Nowak, Timothy Rogers

Abstract: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one… ▽ More Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlap** Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi- subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. △ Less

Submitted 21 November, 2013; v1 submitted 20 November, 2013; originally announced November 2013.

Comments: To appear in Advances in Neural Information Processing Systems, 2013

arXiv:1209.3079 [pdf, other]

Signal Recovery in Unions of Subspaces with Applications to Compressive Imaging

Authors: Nikhil Rao, Benjamin Recht, Robert Nowak

Abstract: In applications ranging from communications to genetics, signals can be modeled as lying in a union of subspaces. Under this model, signal coefficients that lie in certain subspaces are active or inactive together. The potential subspaces are known in advance, but the particular set of subspaces that are active (i.e., in the signal support) must be learned from measurements. We show that exploitin… ▽ More In applications ranging from communications to genetics, signals can be modeled as lying in a union of subspaces. Under this model, signal coefficients that lie in certain subspaces are active or inactive together. The potential subspaces are known in advance, but the particular set of subspaces that are active (i.e., in the signal support) must be learned from measurements. We show that exploiting knowledge of subspaces can further reduce the number of measurements required for exact signal recovery, and derive universal bounds for the number of measurements needed. The bound is universal in the sense that it only depends on the number of subspaces under consideration, and their orientation relative to each other. The particulars of the subspaces (e.g., compositions, dimensions, extents, overlaps, etc.) does not affect the results we obtain. In the process, we derive sample complexity bounds for the special case of the group lasso with overlap** groups (the latent group lasso), which is used in a variety of applications. Finally, we also show that wavelet transform coefficients of images can be modeled as lying in groups, and hence can be efficiently recovered using group lasso methods. △ Less

Submitted 13 September, 2012; originally announced September 2012.

Comments: arXiv admin note: substantial text overlap with arXiv:1106.4355

arXiv:1108.3940 [pdf, ps, other]

doi 10.1214/11-STS346REJ

Rejoinder

Authors: J. N. K. Rao

Abstract: Rejoinder of "Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal" by J. N. K. Rao [arXiv:1108.2356] Rejoinder of "Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal" by J. N. K. Rao [arXiv:1108.2356] △ Less

Submitted 19 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/11-STS346REJ the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS346REJ

Journal ref: Statistical Science 2011, Vol. 26, No. 2, 266-270

arXiv:1108.2356 [pdf, ps, other]

doi 10.1214/10-STS346

Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal

Authors: J. N. K. Rao

Abstract: According to Hansen, Madow and Tep** [J. Amer. Statist. Assoc. 78 (1983) 776--793], "Probability sampling designs and randomization inference are widely accepted as the standard approach in sample surveys." In this article, reasons are advanced for the wide use of this design-based approach, particularly by federal agencies and other survey organizations conducting complex large scale surveys on… ▽ More According to Hansen, Madow and Tep** [J. Amer. Statist. Assoc. 78 (1983) 776--793], "Probability sampling designs and randomization inference are widely accepted as the standard approach in sample surveys." In this article, reasons are advanced for the wide use of this design-based approach, particularly by federal agencies and other survey organizations conducting complex large scale surveys on topics related to public policy. Impact of Bayesian methods in survey sampling is also discussed in two different directions: nonparametric calibrated Bayesian inferences from large samples and hierarchical Bayes methods for small area estimation based on parametric models. △ Less

Submitted 11 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-STS346 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS346

Journal ref: Statistical Science 2011, Vol. 26, No. 2, 240-256

arXiv:1106.4355 [pdf, other]

Tight Measurement Bounds for Exact Recovery of Structured Sparse Signals

Authors: Nikhil Rao, Benjamin Recht, Robert Nowak

Abstract: Standard compressive sensing results state that to exactly recover an s sparse signal in R^p, one requires O(s. log(p)) measurements. While this bound is extremely useful in practice, often real world signals are not only sparse, but also exhibit structure in the sparsity pattern. We focus on group-structured patterns in this paper. Under this model, groups of signal coefficients are active (or in… ▽ More Standard compressive sensing results state that to exactly recover an s sparse signal in R^p, one requires O(s. log(p)) measurements. While this bound is extremely useful in practice, often real world signals are not only sparse, but also exhibit structure in the sparsity pattern. We focus on group-structured patterns in this paper. Under this model, groups of signal coefficients are active (or inactive) together. The groups are predefined, but the particular set of groups that are active (i.e., in the signal support) must be learned from measurements. We show that exploiting knowledge of groups can further reduce the number of measurements required for exact signal recovery, and derive universal bounds for the number of measurements needed. The bound is universal in the sense that it only depends on the number of groups under consideration, and not the particulars of the groups (e.g., compositions, sizes, extents, overlaps, etc.). Experiments show that our result holds for a variety of overlap** group configurations. △ Less

Submitted 17 October, 2011; v1 submitted 21 June, 2011; originally announced June 2011.

Comments: Refined previous bound and added new experiments

arXiv:1104.4385 [pdf, other]

Convex Approaches to Model Wavelet Sparsity Patterns

Authors: Nikhil S Rao, Robert D. Nowak, Stephen J. Wright, Nick G. Kingsbury

Abstract: Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-con… ▽ More Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-convex optimizations. Past work has dealt with this issue by resorting to greedy or suboptimal iterative reconstruction methods. In this paper, we propose new modeling approaches based on group-sparsity penalties that leads to convex optimizations that can be solved exactly and efficiently. We show that the methods we develop perform significantly better in deconvolution and compressed sensing applications, while being as computationally efficient as standard coefficient-wise approaches such as lasso. △ Less

Submitted 22 April, 2011; originally announced April 2011.

Showing 1–29 of 29 results for author: Rao, N