Skip to main content

Showing 1–50 of 53 results for author: Basu, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2401.11128  [pdf, other

    stat.ME stat.CO

    Regularized Estimation of Sparse Spectral Precision Matrices

    Authors: Navonil Deb, Amy Kuceyeski, Sumanta Basu

    Abstract: Spectral precision matrix, the inverse of a spectral density matrix, is an object of central interest in frequency-domain analysis of multivariate time series. Estimation of spectral precision matrix is a key step in calculating partial coherency and graphical model selection of stationary time series. When the dimension of a multivariate time series is moderate to large, traditional estimators of… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: 55 pages, 8 figures

    MSC Class: 62H12; 62J07; 62M10; 62M15 ACM Class: G.3; I.5.2

  2. arXiv:2312.16241   

    stat.ME stat.AP

    Analysis of Pleiotropy for Testosterone and Lipid Profiles in Males and Females

    Authors: Srijan Chattopadhyay, Swapnaneel Bhattacharyya, Sevantee Basu

    Abstract: In modern scientific studies, it is often imperative to determine whether a set of phenotypes is affected by a single factor. If such an influence is identified, it becomes essential to discern whether this effect is contingent upon categories such as sex or age group, and importantly, to understand whether this dependence is rooted in purely non-environmental reasons. The exploration of such depe… ▽ More

    Submitted 21 March, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: The authors have withdrawn this manuscript owing to the work having been performed in the lab of Anasuya Chakrabarty, but the mansucript being submitted without her knowledge or consent. Therefore, the authors do not wish this work to be cited as reference for the project. If you have any questions, please contact the corresponding author

  3. arXiv:2312.10926  [pdf, other

    stat.ME

    A Random Effects Model-based Method of Moments Estimation of Causal Effect in Mendelian Randomization Studies

    Authors: Wenhao Cao, Saonli Basu

    Abstract: Recent advances in genoty** technology have delivered a wealth of genetic data, which is rapidly advancing our understanding of the underlying genetic architecture of complex diseases. Mendelian Randomization (MR) leverages such genetic data to estimate the causal effect of an exposure factor on an outcome from observational studies. In this paper, we utilize genetic correlations to summarize in… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 24 pages, 5 figures

  4. arXiv:2311.15384  [pdf, other

    stat.ML cs.LG stat.ME

    Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means

    Authors: Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das

    Abstract: Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  5. arXiv:2308.09166  [pdf, other

    stat.ME math.DS

    Sparse reconstruction of ordinary differential equations with inference

    Authors: Sara Venkatraman, Sumanta Basu, Martin T. Wells

    Abstract: Sparse regression has emerged as a popular technique for learning dynamical systems from temporal data, beginning with the SINDy (Sparse Identification of Nonlinear Dynamics) framework proposed by arXiv:1509.03580. Quantifying the uncertainty inherent in differential equations learned from data remains an open problem, thus we propose leveraging recent advances in statistical inference for sparse… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  6. arXiv:2305.15343  [pdf, other

    stat.AP

    Modeling Multiple Irregularly Spaced Financial Time Series

    Authors: Chiranjit Dutta, Nalini Ravishanker, Sumanta Basu

    Abstract: In this paper we propose univariate volatility models for irregularly spaced financial time series by modifying the regularly spaced stochastic volatility models. We also extend this approach to propose multivariate stochastic volatility (MSV) models for multiple irregularly spaced time series by modifying the MSV model that was used with daily data. We use these proposed models for modeling intra… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  7. arXiv:2305.14639  [pdf, other

    stat.ME stat.AP

    Restricted Mean Survival Time Estimation Using Bayesian Nonparametric Dependent Mixture Models

    Authors: Ruizhe Chen, Sanjib Basu, Qian Shi

    Abstract: Restricted mean survival time (RMST) is an intuitive summary statistic for time-to-event random variables, and can be used for measuring treatment effects. Compared to hazard ratio, its estimation procedure is robust against the non-proportional hazards assumption. We propose nonparametric Bayeisan (BNP) estimators for RMST using a dependent stick-breaking process prior mixture model that adjusts… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  8. arXiv:2212.06353  [pdf, ps, other

    stat.ME math.NA

    Bayesian Arc Length Survival Analysis Model (BALSAM): Theory and Application to an HIV/AIDS Clinical Trial

    Authors: Yan Gao, Rodney A. Sparapani, Sanjib Basu

    Abstract: Stochastic volatility often implies increasing risks that are difficult to capture given the dynamic nature of real-world applications. We propose using arc length, a mathematical concept, to quantify cumulative variations (the total variability over time) to more fully characterize stochastic volatility. The hazard rate, as defined by the Cox proportional hazards model in survival analysis, is as… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  9. arXiv:2206.05374  [pdf, other

    stat.ME q-fin.CP q-fin.ST

    Modeling Multivariate Positive-Valued Time Series Using R-INLA

    Authors: Chiranjit Dutta, Nalini Ravishanker, Sumanta Basu

    Abstract: In this paper we describe fast Bayesian statistical analysis of vector positive-valued time series, with application to interesting financial data streams. We discuss a flexible level correlated model (LCM) framework for building hierarchical models for vector positive-valued time series. The LCM allows us to combine marginal gamma distributions for the positive-valued component responses, while a… ▽ More

    Submitted 2 July, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: 19 pages, 1 figure

  10. arXiv:2205.10662  [pdf, other

    cs.LG cs.CV stat.ML

    Equivariant Mesh Attention Networks

    Authors: Sourya Basu, Jose Gallego-Posada, Francesco ViganĂ², James Rowbottom, Taco Cohen

    Abstract: Equivariance to symmetries has proven to be a powerful inductive bias in deep learning research. Recent works on mesh processing have concentrated on various kinds of natural symmetries, including translations, rotations, scaling, node permutations, and gauge transformations. To date, no existing architecture is equivariant to all of these transformations. In this paper, we present an attention-ba… ▽ More

    Submitted 27 August, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: Published in Transactions on Machine Learning Research (08/2022). Official code made available at https://github.com/gallego-posada/eman - For the OpenReview entry, see https://openreview.net/forum?id=3IqqJh2Ycy

  11. arXiv:2204.12001  [pdf

    stat.AP

    Measuring Discrepancies in Airbnb Guest Acceptance Rates Using Anonymized Demographic Data

    Authors: Siddhartha Basu, Ruthie Berman, Adam Bloomston, John Campbell, Anne Diaz, Nanako Era, Benjamin Evans, Sukhada Palkar, Skyler Wharton

    Abstract: In order to make technological systems and platforms more equitable, organizations must be able to measure the scale of potential inequities as well as the efficacy of proposed solutions. In this paper, we present a system that measures discrepancies in platform user experience that are attributable to perceived race (experience gaps) using anonymized data. This allows for progress to be made in t… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: 51 pages, 24 figures

    ACM Class: E.0; H.4; J.4

  12. An empirical Bayes approach to estimating dynamic models of co-regulated gene expression

    Authors: Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells

    Abstract: Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expressi… ▽ More

    Submitted 31 December, 2021; originally announced December 2021.

  13. arXiv:2112.05041  [pdf, other

    stat.ME

    Bayesian Functional Data Analysis over Dependent Regions and Its Application for Identification of Differentially Methylated Regions

    Authors: Suvo Chatterjee, Shrabanti Chowdhury, Duchwan Ryu, Sanjib Basu

    Abstract: We consider a Bayesian functional data analysis for observations measured as extremely long sequences. Splitting the sequence into a number of small windows with manageable length, the windows may not be independent especially when they are neighboring to each other. We propose to utilize Bayesian smoothing splines to estimate individual functional patterns within each window and to establish tran… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

  14. arXiv:2107.14754  [pdf, other

    stat.ME

    A Survey of Estimation Methods for Sparse High-dimensional Time Series Models

    Authors: Sumanta Basu, David S. Matteson

    Abstract: High-dimensional time series datasets are becoming increasingly common in many areas of biological and social sciences. Some important applications include gene regulatory network reconstruction using time course gene expression data, brain connectivity analysis from neuroimaging data, structural analysis of a large panel of macroeconomic indicators, and studying linkages among financial firms for… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

  15. arXiv:2103.07501  [pdf, other

    cs.LG stat.ML

    Beyond $\log^2(T)$ Regret for Decentralized Bandits in Matching Markets

    Authors: Soumya Basu, Karthik Abinav Sankararaman, Abishek Sankararaman

    Abstract: We design decentralized algorithms for regret minimization in the two-sided matching market with one-sided bandit feedback that significantly improves upon the prior works (Liu et al. 2020a, 2020b, Sankararaman et al. 2020). First, for general markets, for any $\varepsilon > 0$, we design an algorithm that achieves a $O(\log^{1+\varepsilon}(T))$ regret to the agent-optimal stable matching, with un… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

  16. arXiv:2102.08554  [pdf, other

    stat.ML cs.LG

    Recoverability Landscape of Tree Structured Markov Random Fields under Symmetric Noise

    Authors: Ashish Katiyar, Soumya Basu, Vatsal Shah, Constantine Caramanis

    Abstract: We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by a $k$-ary symmetric noise channel with unknown probability of error. For Ising models (support size = 2), past work has shown that graph structure can only be recovered up to the leaf clusters (a leaf node, its parent, and its siblings… ▽ More

    Submitted 14 June, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

  17. arXiv:2011.14066  [pdf, other

    stat.ML cs.LG

    On Generalization of Adaptive Methods for Over-parameterized Linear Regression

    Authors: Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

    Abstract: Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena, such as implicit regularization of optimization algorithms and double descent with training progression. A series of recent works have started to shed light on t… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.07055

  18. arXiv:2010.12977  [pdf

    stat.AP

    Effects of West Coast forest fire emissions on atmospheric environment: A coupled satellite and ground-based assessment

    Authors: Srikanta Sannigrahi, Qi Zhang, Francesco Pilla, Bidroha Basu, Arunima Sarkar Basu

    Abstract: Forest fires have a profound impact on the atmospheric environment and air quality across the ecosystems. The recent west coast forest fire in the United States of America (USA) has broken all the past records and caused severe environmental and public health burdens. As of middle September, nearly 6 million acres forest area were burned, and more than 25 casualties were reported so far. In this s… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

  19. arXiv:2010.06090  [pdf, other

    stat.ME stat.AP

    A Model-free Approach for Testing Association

    Authors: Saptarshi Chatterjee, Shrabanti Chowdhury, Sanjib Basu

    Abstract: The question of association between outcome and feature is generally framed in the context of a model on functional and distributional forms. Our motivating application is that of identifying serum biomarkers of angiogenesis, energy metabolism, apoptosis, and inflammation, predictive of recurrence after lung resection in node-negative non-small cell lung cancer patients with tumor stage T2a or les… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: 20 pages, 7 figures

  20. arXiv:2008.09983  [pdf, other

    cs.LG cs.DB stat.ML

    Leveraging Organizational Resources to Adapt Models to New Data Modalities

    Authors: Sahaana Suri, Raghuveer Chanda, Neslihan Bulut, Pradyumna Narayana, Yemao Zeng, Peter Bailis, Sugato Basu, Girija Narlikar, Christopher Re, Abishek Sethi

    Abstract: As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e.g., a new video content launch in a social media application requires existing text or image models to extend to video). To solve this problem, organizations typically create ML pipelines from scratch. However, this fails to utiliz… ▽ More

    Submitted 23 August, 2020; originally announced August 2020.

    Journal ref: PVLDB,13(12): 3396-3410, 2020

  21. arXiv:2008.08993  [pdf

    stat.AP

    Effect of COVID-19 on noise pollution change in Dublin, Ireland

    Authors: Bidroha Basu, Enda Murphy, Anna Molter, Arunima Sarkar Basu, Srikanta Sannigrahi, Miguel Belmonte, Francesco Pilla

    Abstract: Noise pollution is considered to be the third most hazardous pollution after air and water pollution by the World Health Organization (WHO). Short as well as long-term exposure to noise pollution has several adverse effects on humans, ranging from psychiatric disorders such as anxiety and depression, hypertension, hormonal dysfunction, and blood pressure rise leading to cardiovascular disease. One… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 20 pages, 8 figures

  22. arXiv:2007.15421  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Random Forests for dependent data

    Authors: Arkajyoti Saha, Sumanta Basu, Abhirup Datta

    Abstract: Random forest (RF) is one of the most popular methods for estimating regression functions. The local nature of the RF algorithm, based on intra-node means and variances, is ideal when errors are i.i.d. For dependent error processes like time series and spatial settings where data in all the nodes will be correlated, operating locally ignores this dependence. Also, RF will involve resampling of cor… ▽ More

    Submitted 28 June, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

  23. arXiv:2007.04511  [pdf, ps, other

    stat.ME

    Causal Effects in Twin Studies: the Role of Interference

    Authors: Bonnie Smith, Elizabeth L. Ogburn, Matt McGue, Saonli Basu, Daniel O. Scharfstein

    Abstract: The use of twins designs to address causal questions is becoming increasingly popular. A standard assumption is that there is no interference between twins---that is, no twin's exposure has a causal impact on their co-twin's outcome. However, there may be settings in which this assumption would not hold, and this would (1) impact the causal interpretation of parameters obtained by commonly used ex… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  24. arXiv:2006.15166  [pdf, other

    cs.LG cs.DS cs.GT stat.ML

    Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship

    Authors: Abishek Sankararaman, Soumya Basu, Karthik Abinav Sankararaman

    Abstract: Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e.g. UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a two-sided matching market where the demand side agents have unknown and heterogeneous valuation ov… ▽ More

    Submitted 12 March, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: AISTATS, 2021

  25. arXiv:2006.14651  [pdf, other

    cs.LG stat.ML

    Influence Functions in Deep Learning Are Fragile

    Authors: Samyadeep Basu, Philip Pope, Soheil Feizi

    Abstract: Influence functions approximate the effect of training samples in test-time predictions and have a wide variety of applications in machine learning interpretability and uncertainty estimation. A commonly-used (first-order) influence function can be implemented efficiently as a post-hoc method requiring access only to the gradients and Hessian of the model. For linear models, influence functions ar… ▽ More

    Submitted 10 February, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: ICLR 2021

  26. arXiv:2003.03426  [pdf, other

    cs.LG stat.ML

    Contextual Blocking Bandits

    Authors: Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

    Abstract: We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. However, playing an arm blocks it (across all contexts) for a fixed and known number of future time steps. The above contextual setting, which captures important scenarios such as recommendation systems or ad placement wit… ▽ More

    Submitted 17 June, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

  27. arXiv:2002.08405  [pdf, other

    cs.LG stat.ML

    On Under-exploration in Bandits with Mean Bounds from Confounded Data

    Authors: Nihal Sharma, Soumya Basu, Karthikeyan Shanmugam, Sanjay Shakkottai

    Abstract: We study a variant of the multi-armed bandit problem where side information in the form of bounds on the mean of each arm is provided. We develop the novel non-optimistic Global Under-Explore (GLUE) algorithm which uses the provided mean bounds (across all the arms) to infer pseudo-variances for each arm, which in turn decide the rate of exploration for the arms. We analyze the regret of GLUE and… ▽ More

    Submitted 10 June, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

  28. arXiv:1911.07921  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy Leakage Avoidance with Switching Ensembles

    Authors: Rauf Izmailov, Peter Lin, Chris Mesterharm, Samyadeep Basu

    Abstract: We consider membership inference attacks, one of the main privacy issues in machine learning. These recently developed attacks have been proven successful in determining, with confidence better than a random guess, whether a given sample belongs to the dataset on which the attacked machine learning model was trained. Several approaches have been developed to mitigate this privacy leakage but the t… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  29. arXiv:1911.00418  [pdf, other

    cs.LG stat.ML

    On Second-Order Group Influence Functions for Black-Box Predictions

    Authors: Samyadeep Basu, Xuchen You, Soheil Feizi

    Abstract: With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. Often we want to identify an influential group of training samples in a particular test prediction for a given machine learning model. Existing influence functions tackle this problem by using first-order approximations of the effect of removing a sample… ▽ More

    Submitted 6 July, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: To Appear in ICML 2020

  30. arXiv:1910.04257  [pdf, other

    cs.LG stat.ML

    Membership Model Inversion Attacks for Deep Networks

    Authors: Samyadeep Basu, Rauf Izmailov, Chris Mesterharm

    Abstract: With the increasing adoption of AI, inherent security and privacy vulnerabilities formachine learning systems are being discovered. One such vulnerability makes itpossible for an adversary to obtain private information about the types of instancesused to train the targeted machine learning model. This so-called model inversionattack is based on sequential leveraging of classification scores toward… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019, Workshop on Privacy in Machine Learning

  31. arXiv:1910.03225  [pdf, other

    cs.LG stat.ML

    NGBoost: Natural Gradient Boosting for Probabilistic Prediction

    Authors: Tony Duan, Anand Avati, Daisy Yi Ding, Khanh K. Thai, Sanjay Basu, Andrew Y. Ng, Alejandro Schuler

    Abstract: We present Natural Gradient Boosting (NGBoost), an algorithm for generic probabilistic prediction via gradient boosting. Typical regression models return a point estimate, conditional on covariates, but probabilistic regression models output a full probability distribution over the outcome space, conditional on the covariates. This allows for predictive uncertainty estimation -- crucial in applica… ▽ More

    Submitted 9 June, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: Accepted for ICML 2020

  32. arXiv:1907.11975  [pdf, other

    cs.LG stat.ML

    Blocking Bandits

    Authors: Soumya Basu, Rajat Sen, Sujay Sanghavi, Sanjay Shakkottai

    Abstract: We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

  33. arXiv:1906.10845  [pdf, other

    stat.ML cs.LG

    A Debiased MDI Feature Importance Measure for Random Forests

    Authors: Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, Bin Yu

    Abstract: Tree ensembles such as Random Forests have achieved impressive empirical success across a wide variety of applications. To understand how these models make predictions, people routinely turn to feature importance measures calculated from tree ensembles. It has long been known that Mean Decrease Impurity (MDI), one of the most widely used measures of feature importance, incorrectly assigns high imp… ▽ More

    Submitted 26 October, 2019; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: NeurIPS'19. The first two authors contributed equally to this paper

  34. arXiv:1906.06057  [pdf, ps, other

    cs.SI cs.LG stat.ML

    Learning Mixtures of Graphs from Epidemic Cascades

    Authors: Jessica Hoffmann, Soumya Basu, Surbhi Goel, Constantine Caramanis

    Abstract: We consider the problem of learning the weighted edges of a balanced mixture of two undirected graphs from epidemic cascades. While mixture models are popular modeling tools, algorithmic development with rigorous guarantees has lagged. Graph mixtures are apparently no exception: until now, very little is known about whether this problem is solvable. To the best of our knowledge, we establish the… ▽ More

    Submitted 29 January, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

    Comments: 29 pages

  35. arXiv:1904.10689  [pdf, other

    cs.LG stat.ML

    Layer Dynamics of Linearised Neural Nets

    Authors: Saurav Basu, Koyel Mukherjee, Shrihari Vasudevan

    Abstract: Despite the phenomenal success of deep learning in recent years, there remains a gap in understanding the fundamental mechanics of neural nets. More research is focussed on handcrafting complex and larger networks, and the design decisions are often ad-hoc and based on intuition. Some recent research has aimed to demystify the learning dynamics in neural nets by attempting to build a theory from f… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

  36. arXiv:1901.10061  [pdf, other

    cs.LG stat.ML

    A Framework for Deep Constrained Clustering -- Algorithms and Advances

    Authors: Hong**g Zhang, Sugato Basu, Ian Davidson

    Abstract: The area of constrained clustering has been extensively explored by researchers and used by practitioners. Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations. A fundamental strength of deep learning is its flexibility, and here we explore a deep learning framework for constrained clustering and in p… ▽ More

    Submitted 19 December, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

    Comments: Updated for ECML/PKDD 2019

  37. Low Rank and Structured Modeling of High-dimensional Vector Autoregressions

    Authors: Sumanta Basu, Xianqi Li, George Michailidis

    Abstract: Network modeling of high-dimensional time series data is a key learning task due to its widespread use in a number of application areas, including macroeconomics, finance and neuroscience. While the problem of sparse modeling based on vector autoregressive models (VAR) has been investigated in depth in the literature, more complex network structures that involve low rank and group sparse component… ▽ More

    Submitted 9 December, 2018; originally announced December 2018.

  38. arXiv:1812.00532  [pdf, other

    stat.ME math.ST stat.ML

    Large Spectral Density Matrix Estimation by Thresholding

    Authors: Yiming Sun, Yige Li, Amy Kuceyeski, Sumanta Basu

    Abstract: Spectral density matrix estimation of multivariate time series is a classical problem in time series and signal processing. In modern neuroscience, spectral density based metrics are commonly used for analyzing functional connectivity among brain regions. In this paper, we develop a non-asymptotic theory for regularized estimation of high-dimensional spectral density matrices of Gaussian and linea… ▽ More

    Submitted 2 December, 2018; originally announced December 2018.

  39. arXiv:1810.08223  [pdf, other

    cs.LG cs.IR stat.ML

    Micro-Browsing Models for Search Snippets

    Authors: Muhammad Asiful Islam, Ramakrishnan Srikant, Sugato Basu

    Abstract: Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user. There has been considerable work on user browsing models, to model and analyze both the examination a… ▽ More

    Submitted 18 October, 2018; originally announced October 2018.

  40. arXiv:1810.07287  [pdf, other

    stat.ML cs.LG

    Signed iterative random forests to identify enhancer-associated transcription factor binding

    Authors: Karl Kumbier, Sumanta Basu, Erwin Frise, Susan E. Celniker, James B. Brown, Susan Celniker, Bin Yu

    Abstract: Standard ChIP-seq peak calling pipelines seek to differentiate biochemically reproducible signals of individual genomic elements from background noise. However, reproducibility alone does not imply functional regulation (e.g., enhancer activation, alternative splicing). Here we present a general-purpose, interpretable machine learning method: signed iterative random forests (siRF), which we use to… ▽ More

    Submitted 12 July, 2023; v1 submitted 16 October, 2018; originally announced October 2018.

  41. arXiv:1808.09521  [pdf, other

    stat.ME

    Bounds on the conditional and average treatment effect with unobserved confounding factors

    Authors: Steve Yadlowsky, Hongseok Namkoong, Sanjay Basu, John Duchi, Lu Tian

    Abstract: For observational studies, we study the sensitivity of causal inference when treatment assignments may depend on unobserved confounders. We develop a loss minimization approach for estimating bounds on the conditional average treatment effect (CATE) when unobserved confounders have a bounded effect on the odds ratio of treatment selection. Our approach is scalable and allows flexible use of model… ▽ More

    Submitted 9 March, 2022; v1 submitted 28 August, 2018; originally announced August 2018.

  42. arXiv:1806.08819  [pdf, other

    stat.AP physics.soc-ph stat.ML

    Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

    Authors: Benjamin Q. Huynh, Sanjay Basu

    Abstract: Armed conflict has led to an unprecedented number of internally displaced persons (IDPs) - individuals who are forced out of their homes but remain within their country. IDPs often urgently require shelter, food, and healthcare, yet prediction of when large fluxes of IDPs will cross into an area remains a major challenge for aid delivery organizations. Accurate forecasting of IDP migration would e… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

  43. High-Dimensional Estimation, Basis Assets, and the Adaptive Multi-Factor Model

    Authors: Liao Zhu, Sumanta Basu, Robert A. Jarrow, Martin T. Wells

    Abstract: The paper proposes a new algorithm for the high-dimensional financial data -- the Groupwise Interpretable Basis Selection (GIBS) algorithm, to estimate a new Adaptive Multi-Factor (AMF) asset pricing model, implied by the recently developed Generalized Arbitrage Pricing Theory, which relaxes the convention that the number of risk-factors is small. We first obtain an adaptive collection of basis as… ▽ More

    Submitted 10 December, 2021; v1 submitted 23 April, 2018; originally announced April 2018.

    Journal ref: The Quarterly Journal of Finance. Vol. 10, No. 04, 2050017 (2020)

  44. Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data

    Authors: Subhabrata Majumdar, Saonli Basu, Matt McGue, Snigdhansu Chatterjee

    Abstract: We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as `single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have bett… ▽ More

    Submitted 20 May, 2023; v1 submitted 4 February, 2018; originally announced February 2018.

    Comments: Published in Scientific Reports

  45. arXiv:1711.03623  [pdf, other

    stat.ML stat.AP

    Interpretable Vector AutoRegressions with Exogenous Time Series

    Authors: Ines Wilms, Sumanta Basu, Jacob Bien, David S. Matteson

    Abstract: The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Although VAR models are intensively investigated by many researchers, practitioners often show more interest in analyzing VARX models that incorporate the impact of unmodeled exogenous variables (X) into the VAR. However, since the parameter space grows quadratically with the number of time series, estim… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

  46. arXiv:1710.09326  [pdf, other

    stat.ME

    A Robust and Unified Framework for Estimating Heritability in Twin Studies using Generalized Estimating Equations

    Authors: Jaron Arbet, Matt McGue, Saonli Basu

    Abstract: The development of a complex disease is an intricate interplay of genetic and environmental factors. "Heritability" is defined as the proportion of total trait variance due to genetic factors within a given population. Studies with monozygotic (MZ) and dizygotic (DZ) twins allow us to estimate heritability by fitting an "ACE" model which estimates the proportion of trait variance explained by addi… ▽ More

    Submitted 17 October, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

  47. arXiv:1707.09208  [pdf, other

    stat.ME

    Sparse Identification and Estimation of Large-Scale Vector AutoRegressive Moving Averages

    Authors: Ines Wilms, Sumanta Basu, Jacob Bien, David S. Matteson

    Abstract: The Vector AutoRegressive Moving Average (VARMA) model is fundamental to the theory of multivariate time series; however, identifiability issues have led practitioners to abandon it in favor of the simpler but more restrictive Vector AutoRegressive (VAR) model. We narrow this gap with a new optimization-based approach to VARMA identification built upon the principle of parsimony. Among all equival… ▽ More

    Submitted 8 June, 2021; v1 submitted 28 July, 2017; originally announced July 2017.

  48. arXiv:1706.08457  [pdf, other

    stat.ML q-bio.GN

    Iterative Random Forests to detect predictive and stable high-order interactions

    Authors: Sumanta Basu, Karl Kumbier, James B. Brown, Bin Yu

    Abstract: Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive gene expression presents a substantial statistical challenge. B… ▽ More

    Submitted 23 December, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

  49. arXiv:1605.02699  [pdf, other

    cs.CV cs.LG stat.ML

    A Theoretical Analysis of Deep Neural Networks for Texture Classification

    Authors: Saikat Basu, Manohar Karki, Robert DiBiano, Supratik Mukhopadhyay, Sangram Ganguly, Ramakrishna Nemani, Shreekant Gayaka

    Abstract: We investigate the use of Deep Neural Networks for the classification of image datasets where texture features are important for generating class-conditional discriminative representations. To this end, we first derive the size of the feature space for some standard textural features extracted from the input dataset and then use the theory of Vapnik-Chervonenkis dimension to show that hand-crafted… ▽ More

    Submitted 21 June, 2016; v1 submitted 9 May, 2016; originally announced May 2016.

    Comments: Accepted in International Joint Conference on Neural Networks, IJCNN 2016

  50. arXiv:1601.00736  [pdf, other

    stat.ME

    Penalized Maximum Likelihood Estimation of Multi-layered Gaussian Graphical Models

    Authors: Jiahe Lin, Sumanta Basu, Moulinath Banerjee, George Michailidis

    Abstract: Analyzing multi-layered graphical models provides insight into understanding the conditional relationships among nodes within layers after adjusting for and quantifying the effects of nodes from other layers. We obtain the penalized maximum likelihood estimator for Gaussian multi-layered graphical models, based on a computational approach involving screening of variables, iterative estimation of t… ▽ More

    Submitted 5 January, 2016; originally announced January 2016.