Skip to main content

Showing 1–50 of 125 results for author: Lin, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.15931  [pdf, other

    eess.SY cs.CE cs.LG stat.AP

    Multistep Criticality Search and Power Sha** in Microreactors with Reinforcement Learning

    Authors: Majdi I. Radaideh, Leo Tunkle, Dean Price, Kamal Abdulraheem, Linyu Lin, Moutaz Elias

    Abstract: Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, develo** robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid i… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 15 pages, 3 figures, and 2 tables

  2. arXiv:2406.08466  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Scaling Laws in Linear Regression: Compute, Parameters, and Data

    Authors: Licong Lin, **gfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee

    Abstract: Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, wh… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2405.15768  [pdf, other

    stat.ML cs.AI cs.LG

    Canonical Variates in Wasserstein Metric Space

    Authors: Jia Li, Lin Lin

    Abstract: In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reducti… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: double space 37 pages, 6 figures

  4. arXiv:2404.05868  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

    Authors: Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

    Abstract: Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks. Several practical methods have recently been proposed for LLM unlearning, mostly based on gradient ascent (GA) on the loss of undesirable data. Ho… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  5. arXiv:2403.17481  [pdf, ps, other

    stat.ME

    A Type of Nonlinear Fréchet Regressions

    Authors: Lu Lin, Ze Chen

    Abstract: The existing Fréchet regression is actually defined within a linear framework, since the weight function in the Fréchet objective function is linearly defined, and the resulting Fréchet regression function is identified to be a linear model when the random object belongs to a Hilbert space. Even for nonparametric and semiparametric Fréchet regressions, which are usually nonlinear, the existing met… ▽ More

    Submitted 26 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  6. arXiv:2312.16607  [pdf, other

    eess.IV cs.CV stat.ML

    A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

    Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

    Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  7. arXiv:2312.09758  [pdf, other

    cs.LG cs.AI stat.ME

    Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach

    Authors: Ziliang Chen, Yongsen Zheng, Zhao-Rong Lai, Quanlong Guan, Liang Lin

    Abstract: Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: AAAI-2024

  8. arXiv:2311.08442  [pdf, other

    math.ST cs.LG stat.ML

    Mean-field variational inference with the TAP free energy: Geometric and statistical properties in linear models

    Authors: Michael Celentano, Zhou Fan, Licong Lin, Song Mei

    Abstract: We study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p. In high dimensions, the common approach of minimizing a Kullback-Leibler divergence from the posterior distribution, or maximizing an evidence lower bound, may deviate from the true posterior mean and underestimate posterior uncertainty. We study instead minimization of the… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 79 pages, 5 figures

  9. arXiv:2310.10121  [pdf, other

    cs.LG cs.AI stat.ML

    From Continuous Dynamics to Graph Neural Networks: Neural Diffusion and Beyond

    Authors: Andi Han, Dai Shi, Lequan Lin, Junbin Gao

    Abstract: Graph neural networks (GNNs) have demonstrated significant promise in modelling relational data and have been widely applied in various fields of interest. The key mechanism behind GNNs is the so-called message passing where information is being iteratively aggregated to central nodes from their neighbourhood. Such a scheme has been found to be intrinsically linked to a physical process known as h… ▽ More

    Submitted 29 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

  10. arXiv:2310.08566  [pdf, other

    cs.LG cs.AI cs.CL math.ST stat.ML

    Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

    Authors: Licong Lin, Yu Bai, Song Mei

    Abstract: Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is… ▽ More

    Submitted 26 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  11. arXiv:2307.09210  [pdf, other

    stat.ME cs.SI stat.ML

    Nested stochastic block model for simultaneously clustering networks and nodes

    Authors: Nathaniel Josephs, Arash A. Amini, Marina Paez, Lizhen Lin

    Abstract: We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  12. arXiv:2306.15380  [pdf, ps, other

    stat.ME stat.OT

    Multivariate Rank-Based Analysis of Multiple Endpoints in Clinical Trials: A Global Test Approach

    Authors: Kexuan Li, Lingli Yang, Shaofei Zhao, Susie Sinks, Luan Lin, Peng Sun

    Abstract: Clinical trials often involve the assessment of multiple endpoints to comprehensively evaluate the efficacy and safety of interventions. In the work, we consider a global nonparametric testing procedure based on multivariate rank for the analysis of multiple endpoints in clinical trials. Unlike other existing approaches that rely on pairwise comparisons for each individual endpoint, the proposed m… ▽ More

    Submitted 27 June, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  13. arXiv:2305.18728  [pdf, other

    cs.LG stat.ML

    Plug-in Performative Optimization

    Authors: Licong Lin, Tijana Zrnic

    Abstract: When predictions are performative, the choice of which predictor to deploy influences the distribution of future observations. The overarching goal in learning under performativity is to find a predictor that has low \emph{performative risk}, that is, good performance on its induced distribution. One family of solutions for optimizing the performative risk, including bandits and other derivative-f… ▽ More

    Submitted 28 May, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

  14. arXiv:2305.18488  [pdf, other

    stat.ML cs.LG stat.ME

    A Bayesian sparse factor model with adaptive posterior concentration

    Authors: Ilsang Ohn, Lizhen Lin, Yongdai Kim

    Abstract: In this paper, we propose a new Bayesian inference method for a high-dimensional sparse factor model that allows both the factor dimensionality and the sparse structure of the loading matrix to be inferred. The novelty is to introduce a certain dependence between the sparsity level and the factor dimensionality, which leads to adaptive posterior concentration while kee** computational tractabili… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  15. arXiv:2304.11251  [pdf, other

    stat.ML cs.LG

    Machine Learning and the Future of Bayesian Computation

    Authors: Steven Winter, Trevor Campbell, Lizhen Lin, Sanvesh Srivastava, David B. Dunson

    Abstract: Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional model… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  16. arXiv:2303.02637  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial Networks

    Authors: Forough Fazeli-Asl, Michael Minyi Zhang, Lizhen Lin

    Abstract: A classic inferential statistical problem is the goodness-of-fit (GOF) test. Such a test can be challenging when the hypothesized parametric model has an intractable likelihood and its distributional form is not available. Bayesian methods for GOF can be appealing due to their ability to incorporate expert knowledge through prior distributions. However, standard Bayesian methods for this test of… ▽ More

    Submitted 10 November, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Typos corrected, Secondary (simulation and theoretical) results added, Additional discussion added, references added

  17. arXiv:2303.02534  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Semi-parametric inference based on adaptively collected data

    Authors: Licong Lin, Koulik Khamaru, Martin J. Wainwright

    Abstract: Many standard estimators, when applied to adaptively collected data, fail to be asymptotically normal, thereby complicating the construction of confidence intervals. We address this challenge in a semi-parametric context: estimating the parameter vector of a generalized linear regression model contaminated by a non-parametric nuisance component. We construct suitably weighted estimating equations… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

  18. arXiv:2302.12670  [pdf, ps, other

    stat.ME cs.LG econ.EM stat.ML

    Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning

    Authors: Rui Miao, Zhengling Qi, Cong Shi, Lin Lin

    Abstract: Pricing based on individual customer characteristics is widely used to maximize sellers' revenues. This work studies offline personalized pricing under endogeneity using an instrumental variable approach. Standard instrumental variable methods in causal inference/econometrics either focus on a discrete treatment space or require the exclusion restriction of instruments from having a direct effect… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  19. arXiv:2302.11222  [pdf, other

    stat.ME

    Source-Function Weighted-Transfer Learning for Nonparametric Regression with Seemingly Similar Sources

    Authors: Lu Lin, Weiyu Li

    Abstract: The homogeneity, or more generally, the similarity between source domains and a target domain seems to be essential to a positive transfer learning. In practice, however, the similarity condition is difficult to check and is often violated. In this paper, instead of the popularly used similarity condition, a seeming similarity is introduced, which is defined by a non-orthogonality together with a… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  20. arXiv:2302.08606  [pdf, other

    stat.ML cs.LG stat.ME

    Intrinsic and extrinsic deep learning on manifolds

    Authors: Yihao Fang, Ilsang Ohn, Vijay Gupta, Lizhen Lin

    Abstract: We propose extrinsic and intrinsic deep neural network architectures as general frameworks for deep learning on manifolds. Specifically, extrinsic deep neural networks (eDNNs) preserve geometric features on manifolds by utilizing an equivariant embedding from the manifold to its image in the Euclidean space. Moreover, intrinsic deep neural networks (iDNNs) incorporate the underlying intrinsic geom… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  21. Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

    Authors: Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva, Lin Lin, Vernon M. Chinchilli

    Abstract: Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: Short Version of NeurT-FDR, accepted at CIKM 2022. arXiv admin note: substantial text overlap with arXiv:2101.09809

  22. Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data

    Authors: Yan Chen, Shuixin Fang, Lu Lin

    Abstract: We are interested in renewable estimations and algorithms for nonparametric models with streaming data. In our method, the nonparametric function of interest is expressed through a functional depending on a weight function and a conditional distribution function (CDF). The CDF is estimated by renewable kernel estimations combined with function interpolations, based on which we propose the method o… ▽ More

    Submitted 8 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 24 pages, 0 figures

  23. arXiv:2207.06561  [pdf, other

    stat.ME

    Nonparametric Bayesian Approach to Treatment Ranking in Network Meta-Analysis with Application to Comparisons of Antidepressants

    Authors: Andrés F. Barrientos, Garritt L. Page, Lifeng Lin

    Abstract: Network meta-analysis is a powerful tool to synthesize evidence from independent studies and compare multiple treatments simultaneously. A critical task of performing a network meta-analysis is to offer ranks of all available treatment options for a specific disease outcome. Frequently, the estimated treatment rankings are accompanied by a large amount of uncertainty, suffer from multiplicity issu… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  24. arXiv:2206.06086  [pdf, ps, other

    stat.ML cs.LG stat.ME

    A Correlation-Ratio Transfer Learning and Variational Stein's Paradox

    Authors: Lu Lin, Weiyu Li

    Abstract: A basic condition for efficient transfer learning is the similarity between a target model and source models. In practice, however, the similarity condition is difficult to meet or is even violated. Instead of the similarity condition, a brand-new strategy, linear correlation-ratio, is introduced in this paper to build an accurate relationship between the models. Such a correlation-ratio can be ea… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  25. arXiv:2206.02017  [pdf, ps, other

    stat.ME

    Feature screening for multi-response linear models by empirical likelihood

    Authors: Jun Lu, Qinqin Hu, Lu Lin

    Abstract: This paper proposes a new feature screening method for the multi-response ultrahigh dimensional linear model by empirical likelihood. Through a multivariate moment condition, the empirical likelihood induced ranking statistics can exploit the joint effect among responses, and thus result in a much better performance than the methods considering responses individually. More importantly, by the use… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

  26. arXiv:2205.02719  [pdf, other

    cs.LG cs.AI cs.DC math.OC stat.ML

    Communication-Efficient Adaptive Federated Learning

    Authors: Yujia Wang, Lu Lin, **ghui Chen

    Abstract: Federated learning is a machine learning training paradigm that enables clients to jointly train models without sharing their own localized data. However, the implementation of federated learning in practice still faces numerous challenges, such as the large communication overhead due to the repetitive server-client synchronization and the lack of adaptivity by SGD-based model updates. Despite tha… ▽ More

    Submitted 19 April, 2023; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: Updated version. A previous version is accepted by ICML 2022 (37 pages, 7 figures, 3 tables)

  27. arXiv:2203.02090  [pdf, other

    stat.ME cs.SI stat.CO stat.ML

    Bayesian community detection for networks with covariates

    Authors: Luyi Shen, Arash Amini, Nathaniel Josephs, Lizhen Lin

    Abstract: The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in related models and algorithms. Among the various learning tasks with network data, community detection, the discovery of node clusters or "communities," has arguably received the most attention in the scientific community. In many real-world… ▽ More

    Submitted 6 April, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

  28. arXiv:2202.13503  [pdf, other

    stat.ML cs.LG

    Variational Interpretable Learning from Multi-view Data

    Authors: Lin Qiu, Lynn Lin, Vernon M. Chinchilli

    Abstract: The main idea of canonical correlation analysis (CCA) is to map different views onto a common latent space with maximum correlation. We propose a deep interpretable variational canonical correlation analysis (DICCA) for multi-view learning. The developed model extends the existing latent variable model for linear CCA to nonlinear models through the use of deep generative networks. DICCA is designe… ▽ More

    Submitted 1 March, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2003.04292 by other authors. text overlap with arXiv:1802.06765 by other authors

  29. arXiv:2202.09353  [pdf, other

    physics.comp-ph cs.LG physics.acc-ph stat.AP

    Model Calibration of the Liquid Mercury Spallation Target using Evolutionary Neural Networks and Sparse Polynomial Expansions

    Authors: Majdi I. Radaideh, Hoang Tran, Lianshan Lin, Hao Jiang, Drew Winder, Sarma Gorti, Guannan Zhang, Justin Mach, Sarah Cousineau

    Abstract: The mercury constitutive model predicting the strain and stress in the target vessel plays a central role in improving the lifetime prediction and future target designs of the mercury targets at the Spallation Neutron Source (SNS). We leverage the experiment strain data collected over multiple years to improve the mercury constitutive model through a combination of large-scale simulations of the t… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: 26 pages, 10 figures, 6 tables

    Journal ref: Nucl. Instrum. Methods Phys. Res. B 525 (2022) 41-54

  30. arXiv:2202.03959  [pdf, other

    physics.acc-ph stat.AP

    Bayesian Inverse Uncertainty Quantification of the Physical Model Parameters for the Spallation Neutron Source First Target Station

    Authors: Majdi I. Radaideh, Lianshan Lin, Hao Jiang, Sarah Cousineau

    Abstract: The reliability of the mercury spallation target is mission-critical for the neutron science program of the spallation neutron source at the Oak Ridge National Laboratory. We present an inverse uncertainty quantification (UQ) study using the Bayesian framework for the mercury equation of state model parameters, with the assistance of polynomial chaos expansion surrogate models. By leveraging high-… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Comments: 21 pages, 9 figures, 7 tables

    Journal ref: Results in Physics 36 (2022) 105414

  31. arXiv:2201.12597  [pdf, other

    stat.ME math.ST

    Global Bias-Corrected Divide-and-Conquer by Quantile-Matched Composite for General Nonparametric Regressions

    Authors: Yan Chen, Lu Lin

    Abstract: The issues of bias-correction and robustness are crucial in the strategy of divide-and-conquer (DC), especially for asymmetric nonparametric models with massive data. It is known that quantile-based methods can achieve the robustness, but the quantile estimation for nonparametric regression has non-ignorable bias when the error distribution is asymmetric. This paper explores a global bias-correcte… ▽ More

    Submitted 29 January, 2022; originally announced January 2022.

    Comments: 44 pages, 2 figures

  32. arXiv:2112.09086  [pdf, ps, other

    stat.ML cs.LG math.NA

    A new locally linear embedding scheme in light of Hessian eigenmap

    Authors: Liren Lin, Chih-Wei Chen

    Abstract: We provide a new interpretation of Hessian locally linear embedding (HLLE), revealing that it is essentially a variant way to implement the same idea of locally linear embedding (LLE). Based on the new interpretation, a substantial simplification can be made, in which the idea of "Hessian" is replaced by rather arbitrary weights. Moreover, we show by numerical examples that HLLE may produce projec… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: 13 pages

    MSC Class: 62-07

  33. arXiv:2112.02580  [pdf, other

    stat.ME math.ST

    Bayesian Optimal Two-sample Tests in High-dimension

    Authors: Kyoungjae Lee, Kisung You, Lizhen Lin

    Abstract: We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that only a few entries of two mean vectors or covariance matrices are different. Many existing tests that rely on aggregating the difference between empirical means o… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

  34. arXiv:2111.00705  [pdf, other

    cs.LG cs.AI cs.DC math.OC stat.ML

    Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

    Authors: Yujia Wang, Lu Lin, **ghui Chen

    Abstract: Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer att… ▽ More

    Submitted 23 February, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted by AISTATS 2022 (29 pages, 11 figures, 2 tables)

  35. arXiv:2110.13240  [pdf, other

    stat.ML cs.LG

    Adaptive Weighted Multi-View Clustering

    Authors: Shuo Shuo Liu, Lin Lin

    Abstract: Learning multi-view data is an emerging problem in machine learning research, and nonnegative matrix factorization (NMF) is a popular dimensionality-reduction method for integrating information from multiple views. These views often provide not only consensus but also complementary information. However, most multi-view NMF algorithms assign equal weight to each view or tune the weight via line sea… ▽ More

    Submitted 24 April, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

  36. arXiv:2110.08849  [pdf, other

    stat.AP

    A Bayesian Selection Model for Correcting Outcome Reporting Bias With Application to a Meta-analysis on Heart Failure Interventions

    Authors: Ray Bai, Xiaokang Liu, Lifeng Lin, Yulun Liu, Stephen E. Kimmel, Haitao Chu, Yong Chen

    Abstract: Multivariate meta-analysis (MMA) is a powerful tool for jointly estimating multiple outcomes' treatment effects. However, the validity of results from MMA is potentially compromised by outcome reporting bias (ORB), or the tendency for studies to selectively report outcomes. Until recently, ORB has been understudied. Since ORB can lead to biased conclusions, it is crucial to correct the estimates o… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: 26 pages, 5 tables, 8 figures

  37. arXiv:2109.11795  [pdf, other

    stat.ME math.ST

    Scalable Bayesian high-dimensional local dependence learning

    Authors: Kyoungjae Lee, Lizhen Lin

    Abstract: In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a va… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  38. arXiv:2109.03204  [pdf, other

    math.ST stat.ML

    Adaptive variational Bayes: Optimality, computation and applications

    Authors: Ilsang Ohn, Lizhen Lin

    Abstract: In this paper, we explore adaptive inference based on variational Bayes. Although several studies have been conducted to analyze the contraction properties of variational posteriors, there is still a lack of a general and computationally tractable variational Bayes method that performs adaptive inference. To fill this gap, we propose a novel adaptive variational Bayes framework, which can operate… ▽ More

    Submitted 11 March, 2024; v1 submitted 7 September, 2021; originally announced September 2021.

    Journal ref: Ann. Statist. 52(1):335-363. (2024)

  39. arXiv:2108.12680  [pdf, ps, other

    math.NA cs.LG stat.ML

    Avoiding unwanted results in locally linear embedding: A new understanding of regularization

    Authors: Liren Lin

    Abstract: We demonstrate that locally linear embedding (LLE) inherently admits some unwanted results when no regularization is used, even for cases in which regularization is not supposed to be needed in the original algorithm. The existence of one special type of result, which we call ``projection pattern'', is mathematically proved in the situation that an exact local linear relation is achieved in each n… ▽ More

    Submitted 28 August, 2021; originally announced August 2021.

    Comments: 11 pages

    MSC Class: 62-07

  40. arXiv:2108.04035  [pdf, other

    cs.LG stat.ML

    Mixture of Linear Models Co-supervised by Deep Neural Networks

    Authors: Beomseok Seo, Lin Lin, Jia Li

    Abstract: Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true f… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Comments: Submitted to Journal of Computational and Graphical Statistics on April 19, 2021

  41. arXiv:2106.01552  [pdf, other

    astro-ph.IM astro-ph.HE stat.AP stat.ME

    Uncertainty Quantification of a Computer Model for Binary Black Hole Formation

    Authors: Luyao Lin, Derek Bingham, Floor Broekgaarden, Ilya Mandel

    Abstract: In this paper, a fast and parallelizable method based on Gaussian Processes (GPs) is introduced to emulate computer models that simulate the formation of binary black holes (BBHs) through the evolution of pairs of massive stars. Two obstacles that arise in this application are the a priori unknown conditions of BBH formation and the large scale of the simulation data. We address them by proposing… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 24 pages, 11 figures

  42. arXiv:2105.04046  [pdf, other

    stat.ML cs.LG

    A likelihood approach to nonparametric estimation of a singular distribution using deep generative models

    Authors: Minwoo Chae, Dongha Kim, Yongdai Kim, Lizhen Lin

    Abstract: We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional… ▽ More

    Submitted 28 March, 2023; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: 42 pages, 13 figures, 1 table

    MSC Class: 62G05 (Primary); 62G20 (Secondary)

  43. arXiv:2101.09809  [pdf, other

    stat.ML cs.LG

    NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy

    Authors: Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva, Lin Lin, Vernon M. Chinchilli

    Abstract: Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-lev… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

  44. arXiv:2101.08639  [pdf, other

    stat.ME

    A General Framework of Online Updating Variable Selection for Generalized Linear Models with Streaming Datasets

    Authors: Xiaoyu Ma, Lu Lin, Yujie Gai

    Abstract: In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and parameter estimation in generalized linear models with streaming datasets. This is a type of online updating penalty likelihoods with differentiable or n… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: 35 pages, 2 figures, 13 tables

  45. arXiv:2011.12416  [pdf, other

    stat.ME math.ST

    A spectral-based framework for hypothesis testing in populations of networks

    Authors: Li Chen, Nathaniel Josephs, Lizhen Lin, Jie Zhou, Eric D. Kolaczyk

    Abstract: In this paper, we propose a new spectral-based approach to hypothesis testing for populations of networks. The primary goal is to develop a test to determine whether two given samples of networks come from the same random model or distribution. Our test statistic is based on the trace of the third order for a centered and scaled adjacency matrix, which we prove converges to the standard normal dis… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  46. arXiv:2011.09815  [pdf

    stat.ME stat.AP stat.ML

    A sco** review of causal methods enabling predictions under hypothetical interventions

    Authors: Li**g Lin, Matthew Sperrin, David A. Jenkins, Glen P. Martin, Niels Peek

    Abstract: Background and Aims: The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. We aimed to identify published methods for develo** and validating prediction mo… ▽ More

    Submitted 12 January, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

    Journal ref: Diagnostic and Prognostic Research, 2021

  47. arXiv:2010.08908  [pdf, other

    stat.CO cs.LG math.OC

    Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds

    Authors: Lizhen Lin, Bayan Saparbayeva, Michael Minyi Zhang, David B. Dunson

    Abstract: We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we "convexify" the objective function and solve a series of convex sub-problems in the optimization procedure. One of the key challenges for optimization on manifolds is the difficu… ▽ More

    Submitted 17 October, 2020; originally announced October 2020.

  48. arXiv:2010.05295  [pdf, other

    stat.ML cs.LG math.NA

    Efficient Long-Range Convolutions for Point Clouds

    Authors: Yifan Peng, Lin Lin, Lexing Ying, Leonardo Zepeda-Núñez

    Abstract: The efficient treatment of long-range interactions for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that direc… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

  49. arXiv:2010.05170  [pdf, other

    stat.ML cs.LG math.ST

    What causes the test error? Going beyond bias-variance via ANOVA

    Authors: Licong Lin, Edgar Dobriban

    Abstract: Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level. This can seem puzzling; in the worst case, such models do not need to generalize. This puzzle inspired a great amount of work, arguing when overparametrization reduces test error, in a phenomenon called "double descent". Recent work aimed to understand in greater depth why overparametrizati… ▽ More

    Submitted 9 June, 2021; v1 submitted 11 October, 2020; originally announced October 2020.

  50. arXiv:2010.00435  [pdf, other

    cs.SI cs.LG stat.AP stat.ML

    Community detection, pattern recognition, and hypergraph-based learning: approaches using metric geometry and persistent homology

    Authors: Dong Quan Ngoc Nguyen, Lin Xing, Lizhen Lin

    Abstract: Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structu… ▽ More

    Submitted 29 September, 2020; originally announced October 2020.