Skip to main content

Showing 1–29 of 29 results for author: Rao, N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2309.11512  [pdf, other

    stat.AP cs.LG

    Multidimensional well-being of US households at a fine spatial scale using fused household surveys: fusionACS

    Authors: Kevin Ummel, Miguel Poblete-Cazenave, Karthik Akkiraju, Nick Graetz, Hero Ashman, Cora Kingdon, Steven Herrera Tenorio, Aaryaman "Sunny" Singhal, Daniel Aldana Cohen, Narasimha D. Rao

    Abstract: Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistical… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 35 pages, 6 figures

  2. arXiv:2306.04907  [pdf, other

    stat.ME stat.AP

    Estimation of Poverty Measures for Small Areas Under a Two-Fold Nested Error Linear Regression Model: Comparison of Two Methods

    Authors: Maryam Sohrabi, J. N. K. Rao

    Abstract: Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular pove… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  3. arXiv:2206.03040  [pdf, other

    stat.ML cs.IR cs.LG

    Learning Backward Compatible Embeddings

    Authors: Weihua Hu, Rajas Bansal, Kaidi Cao, Nikhil Rao, Karthik Subbian, Jure Leskovec

    Abstract: Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: KDD 2022, Applied Data Science Track

  4. arXiv:2110.14011  [pdf, other

    cs.LG stat.ML

    Cluster-and-Conquer: A Framework For Time-Series Forecasting

    Authors: Reese Pathak, Rajat Sen, Nikhil Rao, N. Benjamin Erichson, Michael I. Jordan, Inderjit S. Dhillon

    Abstract: We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time ser… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: 25 pages, 3 figures

  5. arXiv:2109.01965  [pdf, other

    stat.ML cs.LG

    Scalable Feature Selection for (Multitask) Gradient Boosted Trees

    Authors: Cuize Han, Nikhil Rao, Daria Sorokina, Karthik Subbian

    Abstract: Gradient Boosted Decision Trees (GBDTs) are widely used for building ranking and relevance models in search and recommendation. Considerations such as latency and interpretability dictate the use of as few features as possible to train these models. Feature selection in GBDT models typically involves heuristically ranking the features by importance and selecting the top few, or by performing a ful… ▽ More

    Submitted 4 September, 2021; originally announced September 2021.

    Comments: Correct a mistake in the proof of Lemma B1 in http://proceedings.mlr.press/v108/han20a.html

    Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:885-894, 2020

  6. arXiv:2108.01152  [pdf, ps, other

    cs.LG stat.ML

    Maximizing and Satisficing in Multi-armed Bandits with Graph Information

    Authors: Parth K. Thaker, Mohit Malu, Nikhil Rao, Gautam Dasarathy

    Abstract: Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to simila… ▽ More

    Submitted 20 November, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

  7. arXiv:2005.12172  [pdf, ps, other

    stat.ME

    Empirical Likelihood Inference With Public-Use Survey Data

    Authors: Puying Zhao, J. N. K. Rao, Changbao Wu

    Abstract: Public-use survey data are an important source of information for researchers in social science and health studies to build statistical models and make inferences on the target finite population. This paper presents two general inferential tools through the pseudo empirical likelihood and the sample empirical likelihood methods. Theoretical results on point estimation and linear or nonlinear hypot… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: 50 pages, including 11 pages of tables

    MSC Class: 62D05; 62G05; 62G10

  8. arXiv:1908.01794  [pdf, other

    stat.ML cs.LG

    Some Developments in Clustering Analysis on Stochastic Processes

    Authors: Qidi Peng, Nan Rao, Ran Zhao

    Abstract: We review some developments on clustering stochastic processes and come with the conclusion that asymptotically consistent clustering algorithms can be obtained when the processes are ergodic and the dissimilarity measure satisfies the triangle inequality. Examples are provided when the processes are distribution ergodic, covariance ergodic and locally asymptotically self-similar, respectively.

    Submitted 5 August, 2019; originally announced August 2019.

  9. arXiv:1905.12217  [pdf, other

    cs.LG cs.AI stat.ML

    Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering

    Authors: Liwei Wu, Hsiang-Fu Yu, Nikhil Rao, James Sharpnack, Cho-Jui Hsieh

    Abstract: In this paper, we consider recommender systems with side information in the form of graphs. Existing collaborative filtering algorithms mainly utilize only immediate neighborhood information and have a hard time taking advantage of deeper neighborhoods beyond 1-2 hops. The main caveat of exploiting deeper graph information is the rapidly growing time and space complexity when incorporating informa… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Comments: under review

  10. arXiv:1902.08944  [pdf, ps, other

    stat.ME

    Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

    Authors: Jae-kwang Kim, J. N. K. Rao, Zhonglei Wang

    Abstract: Standard statistical methods that do not take proper account of the complexity of survey design can lead to erroneous inferences when applied to survey data due to unequal selection probabilities, clustering, and other design features. In particular, the actual type I error rates of tests of hypotheses using standard methods can be much bigger than the nominal significance level. Methods that take… ▽ More

    Submitted 2 March, 2021; v1 submitted 24 February, 2019; originally announced February 2019.

  11. arXiv:1804.06234  [pdf, other

    stat.ML cs.LG

    Cluster Analysis on Locally Asymptotically Self-similar Processes with Known Number of Clusters

    Authors: Qidi Peng, Nan Rao, Ran Zhao

    Abstract: We conduct cluster analysis on a class of locally asymptotically self-similar stochastic processes, which includes multifractional Brownian motion as a representative. When the true number of clusters is supposed to be known, a new covariance-based dissimilarity measure is introduced, from which we obtain the approximately asymptotically consistent clustering algorithms. In simulation studies, clu… ▽ More

    Submitted 14 January, 2020; v1 submitted 13 April, 2018; originally announced April 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1801.09049

    MSC Class: 62-07; 60G10; 62M10

  12. arXiv:1801.09049  [pdf, other

    stat.ML

    Covariance-based Dissimilarity Measures Applied to Clustering Wide-sense Stationary Ergodic Processes

    Authors: Qidi Peng, Nan Rao, Ran Zhao

    Abstract: We introduce a new unsupervised learning problem: clustering wide-sense stationary ergodic stochastic processes. A covariance-based dissimilarity measure together with asymptotically consistent algorithms is designed for clustering offline and online datasets, respectively. We also suggest a formal criterion on the efficiency of dissimilarity measures, and discuss of some approach to improve the e… ▽ More

    Submitted 1 July, 2019; v1 submitted 27 January, 2018; originally announced January 2018.

    MSC Class: 62-07; 60G10; 62M10

  13. arXiv:1711.02213  [pdf, other

    cs.LG math.NA stat.ML

    Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks

    Authors: Urs Köster, Tristan J. Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William H. Constable, Oğuz H. Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, Naveen Rao

    Abstract: Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the F… ▽ More

    Submitted 2 December, 2017; v1 submitted 6 November, 2017; originally announced November 2017.

    Comments: 14 pages, 5 figures, accepted in Neural Information Processing Systems 2017

  14. arXiv:1705.02047  [pdf, other

    stat.ML cs.LG

    Matrix Completion via Factorizing Polynomials

    Authors: Vatsal Shah, Nikhil Rao, Weicong Ding

    Abstract: Predicting unobserved entries of a partially observed matrix has found wide applicability in several areas, such as recommender systems, computational biology, and computer vision. Many scalable methods with rigorous theoretical guarantees have been developed for algorithms where the matrix is factored into low-rank components, and embeddings are learned for the row and column entities. While ther… ▽ More

    Submitted 13 February, 2018; v1 submitted 4 May, 2017; originally announced May 2017.

  15. Dynamic Word Embeddings for Evolving Semantic Discovery

    Authors: Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong

    Abstract: Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In… ▽ More

    Submitted 13 February, 2018; v1 submitted 1 March, 2017; originally announced March 2017.

    Comments: 9 pages, published in the International Conference on Web Search and Data Mining (WSDM 2018)

  16. arXiv:1603.03980  [pdf, ps, other

    stat.ML cs.AI cs.LG

    On Learning High Dimensional Structured Single Index Models

    Authors: Nikhil Rao, Ravi Ganti, Laura Balzano, Rebecca Willett, Robert Nowak

    Abstract: Single Index Models (SIMs) are simple yet flexible semi-parametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimension… ▽ More

    Submitted 29 November, 2016; v1 submitted 12 March, 2016; originally announced March 2016.

    Comments: 7 pages, 3 tables, 1 Figure, substantial text overlap with arXiv:1506.08910; Accepted for publication at AAAI 2017; added new experimental results comparing our method to a single layer neural network

  17. arXiv:1602.06042  [pdf, ps, other

    stat.ML cs.LG

    Structured Sparse Regression via Greedy Hard-Thresholding

    Authors: Prateek Jain, Nikhil Rao, Inderjit Dhillon

    Abstract: Several learning applications require solving high-dimensional regression problems where the relevant features belong to a small number of (overlap**) groups. For very large datasets and under standard sparsity constraints, hard thresholding methods have proven to be extremely efficient, but such methods require NP hard projections when dealing with overlap** groups. In this paper, we show tha… ▽ More

    Submitted 27 May, 2016; v1 submitted 18 February, 2016; originally announced February 2016.

  18. arXiv:1509.08333  [pdf, other

    cs.LG stat.ML

    High-dimensional Time Series Prediction with Missing Values

    Authors: Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon

    Abstract: High-dimensional time series prediction is needed in applications as diverse as demand forecasting and climatology. Often, such applications require methods that are both highly scalable, and deal with noisy data in terms of corruptions or missing values. Classical time series methods usually fall short of handling both these issues. In this paper, we propose to adapt matrix matrix completion appr… ▽ More

    Submitted 16 February, 2016; v1 submitted 28 September, 2015; originally announced September 2015.

  19. arXiv:1506.08910  [pdf, other

    stat.ML cs.LG stat.ME

    Learning Single Index Models in High Dimensions

    Authors: Ravi Ganti, Nikhil Rao, Rebecca M. Willett, Robert Nowak

    Abstract: Single Index Models (SIMs) are simple yet flexible semi-parametric models for classification and regression. Response variables are modeled as a nonlinear, monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights, and the nonlinear function. While methods have been described to learn SIMs in the low dimensional regime, a method t… ▽ More

    Submitted 29 June, 2015; originally announced June 2015.

    Comments: 16 pages, 2 figures, 1 table

  20. arXiv:1505.04085  [pdf, other

    stat.ML cs.IT cs.LG math.OC

    Optimal Low-Rank Tensor Recovery from Separable Measurements: Four Contractions Suffice

    Authors: Parikshit Shah, Nikhil Rao, Gongguo Tang

    Abstract: Tensors play a central role in many modern machine learning and signal processing applications. In such applications, the target tensor is usually of low rank, i.e., can be expressed as a sum of a small number of rank one tensors. This motivates us to consider the problem of low rank tensor recovery from a class of linear measurements called separable measurements. As specific examples, we focus o… ▽ More

    Submitted 15 May, 2015; originally announced May 2015.

  21. Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach

    Authors: Isabel Molina, Balgobin Nandram, J. N. K. Rao

    Abstract: Poverty maps are used to aid important political decisions such as allocation of development funds by governments and international organizations. Those decisions should be based on the most accurate poverty figures. However, often reliable poverty figures are not available at fine geographical levels or for particular risk population subgroups due to the sample size limitation of current national… ▽ More

    Submitted 31 July, 2014; originally announced July 2014.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOAS702 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS702

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 2, 852-885

  22. arXiv:1404.5692  [pdf, other

    cs.DS cs.LG math.OC stat.ML

    Forward - Backward Greedy Algorithms for Atomic Norm Regularization

    Authors: Nikhil Rao, Parikshit Shah, Stephen Wright

    Abstract: In many signal processing applications, the aim is to reconstruct a signal that has a simple representation with respect to a certain basis or frame. Fundamental elements of the basis known as "atoms" allow us to define "atomic norms" that can be used to formulate convex regularizations for the reconstruction problem. Efficient algorithms are available to solve these formulations in certain specia… ▽ More

    Submitted 22 July, 2015; v1 submitted 22 April, 2014; originally announced April 2014.

    Comments: To appear in IEEE Transactions on Signal Processing

  23. arXiv:1402.4512  [pdf, other

    cs.LG stat.ML

    Classification with Sparse Overlap** Groups

    Authors: Nikhil Rao, Robert Nowak, Christopher Cox, Timothy Rogers

    Abstract: Classification with a sparsity constraint on the solution plays a central role in many high dimensional machine learning applications. In some cases, the features can be grouped together so that entire subsets of features can be selected or not selected. In many applications, however, this can be too restrictive. In this paper, we are interested in a less restrictive form of structured sparse feat… ▽ More

    Submitted 4 September, 2014; v1 submitted 18 February, 2014; originally announced February 2014.

    Comments: Tighter result compared to the previous version. Some additional details and justification on the problem being solved

  24. arXiv:1311.5422  [pdf, other

    cs.LG stat.ML

    Sparse Overlap** Sets Lasso for Multitask Learning and its Application to fMRI Analysis

    Authors: Nikhil Rao, Christopher Cox, Robert Nowak, Timothy Rogers

    Abstract: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one… ▽ More

    Submitted 21 November, 2013; v1 submitted 20 November, 2013; originally announced November 2013.

    Comments: To appear in Advances in Neural Information Processing Systems, 2013

  25. arXiv:1209.3079  [pdf, other

    stat.ML math.OC

    Signal Recovery in Unions of Subspaces with Applications to Compressive Imaging

    Authors: Nikhil Rao, Benjamin Recht, Robert Nowak

    Abstract: In applications ranging from communications to genetics, signals can be modeled as lying in a union of subspaces. Under this model, signal coefficients that lie in certain subspaces are active or inactive together. The potential subspaces are known in advance, but the particular set of subspaces that are active (i.e., in the signal support) must be learned from measurements. We show that exploitin… ▽ More

    Submitted 13 September, 2012; originally announced September 2012.

    Comments: arXiv admin note: substantial text overlap with arXiv:1106.4355

  26. Rejoinder

    Authors: J. N. K. Rao

    Abstract: Rejoinder of "Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal" by J. N. K. Rao [arXiv:1108.2356]

    Submitted 19 August, 2011; originally announced August 2011.

    Comments: Published in at http://dx.doi.org/10.1214/11-STS346REJ the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS346REJ

    Journal ref: Statistical Science 2011, Vol. 26, No. 2, 266-270

  27. Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal

    Authors: J. N. K. Rao

    Abstract: According to Hansen, Madow and Tep** [J. Amer. Statist. Assoc. 78 (1983) 776--793], "Probability sampling designs and randomization inference are widely accepted as the standard approach in sample surveys." In this article, reasons are advanced for the wide use of this design-based approach, particularly by federal agencies and other survey organizations conducting complex large scale surveys on… ▽ More

    Submitted 11 August, 2011; originally announced August 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-STS346 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS346

    Journal ref: Statistical Science 2011, Vol. 26, No. 2, 240-256

  28. arXiv:1106.4355  [pdf, other

    stat.ML cs.LG

    Tight Measurement Bounds for Exact Recovery of Structured Sparse Signals

    Authors: Nikhil Rao, Benjamin Recht, Robert Nowak

    Abstract: Standard compressive sensing results state that to exactly recover an s sparse signal in R^p, one requires O(s. log(p)) measurements. While this bound is extremely useful in practice, often real world signals are not only sparse, but also exhibit structure in the sparsity pattern. We focus on group-structured patterns in this paper. Under this model, groups of signal coefficients are active (or in… ▽ More

    Submitted 17 October, 2011; v1 submitted 21 June, 2011; originally announced June 2011.

    Comments: Refined previous bound and added new experiments

  29. arXiv:1104.4385  [pdf, other

    cs.CV stat.ML

    Convex Approaches to Model Wavelet Sparsity Patterns

    Authors: Nikhil S Rao, Robert D. Nowak, Stephen J. Wright, Nick G. Kingsbury

    Abstract: Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-con… ▽ More

    Submitted 22 April, 2011; originally announced April 2011.