Search | arXiv e-print repository

Multistep Criticality Search and Power Sha** in Microreactors with Reinforcement Learning

Authors: Majdi I. Radaideh, Leo Tunkle, Dean Price, Kamal Abdulraheem, Linyu Lin, Moutaz Elias

Abstract: Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, develo** robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid i… ▽ More Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, develo** robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid increased application to control problems, such as plasma control in fusion tokamaks and building energy management. In this work, we introduce the use of RL for intelligent control in nuclear microreactors. The RL agent is trained using proximal policy optimization (PPO) and advantage actor-critic (A2C), cutting-edge deep RL techniques, based on a high-fidelity simulation of a microreactor design inspired by the Westinghouse eVinci\textsuperscript{TM} design. We utilized a Serpent model to generate data on drum positions, core criticality, and core power distribution for training a feedforward neural network surrogate model. This surrogate model was then used to guide a PPO and A2C control policies in determining the optimal drum position across various reactor burnup states, ensuring critical core conditions and symmetrical power distribution across all six core portions. The results demonstrate the excellent performance of PPO in identifying optimal drum positions, achieving a hextant power tilt ratio of approximately 1.002 (within the limit of $<$ 1.02) and maintaining criticality within a 10 pcm range. A2C did not provide as competitive of a performance as PPO in terms of performance metrics for all burnup steps considered in the cycle. Additionally, the results highlight the capability of well-trained RL control policies to quickly identify control actions, suggesting a promising approach for enabling real-time autonomous control through digital twins. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 15 pages, 3 figures, and 2 tables

arXiv:2406.08466 [pdf, other]

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Authors: Licong Lin, **gfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee

Abstract: Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, wh… ▽ More Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, which predict that increasing model size monotonically improves performance. We study the theory of scaling laws in an infinite dimensional linear regression setup. Specifically, we consider a model with $M$ parameters as a linear function of sketched covariates. The model is trained by one-pass stochastic gradient descent (SGD) using $N$ data. Assuming the optimal parameter satisfies a Gaussian prior and the data covariance matrix has a power-law spectrum of degree $a>1$, we show that the reducible part of the test error is $Θ(M^{-(a-1)} + N^{-(a-1)/a})$. The variance error, which increases with $M$, is dominated by the other errors due to the implicit regularization of SGD, thus disappearing from the bound. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.15768 [pdf, other]

Canonical Variates in Wasserstein Metric Space

Authors: Jia Li, Lin Lin

Abstract: In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reducti… ▽ More In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, we define both between-class and within-class variations as the average squared distances between pairs of instances, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. We conduct empirical studies to assess the algorithm's convergence and, through experimental validation, demonstrate that our dimension reduction technique substantially enhances classification performance. Moreover, our method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness against variations in the distributional representations of data clouds. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: double space 37 pages, 6 figures

arXiv:2404.05868 [pdf, other]

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

Authors: Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

Abstract: Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks. Several practical methods have recently been proposed for LLM unlearning, mostly based on gradient ascent (GA) on the loss of undesirable data. Ho… ▽ More Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks. Several practical methods have recently been proposed for LLM unlearning, mostly based on gradient ascent (GA) on the loss of undesirable data. However, on certain unlearning tasks, these methods either fail to effectively unlearn the target data or suffer from catastrophic collapse -- a drastic degradation of the model's utilities. In this paper, we propose Negative Preference Optimization (NPO), a simple alignment-inspired method that could efficiently and effectively unlearn a target dataset. We theoretically show that the progression toward catastrophic collapse by minimizing the NPO loss is exponentially slower than GA. Through experiments on synthetic data and the benchmark TOFU dataset, we demonstrate that NPO-based methods achieve a better balance between unlearning the undesirable data and maintaining the model's utilities. We also observe that NPO-based methods generate more sensible outputs than GA-based methods, whose outputs are often gibberish. Remarkably, on TOFU, NPO-based methods are the first to achieve reasonable unlearning results in forgetting 50% (or more) of the training data, whereas existing methods already struggle with forgetting 10% of training data. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.17481 [pdf, ps, other]

A Type of Nonlinear Fréchet Regressions

Authors: Lu Lin, Ze Chen

Abstract: The existing Fréchet regression is actually defined within a linear framework, since the weight function in the Fréchet objective function is linearly defined, and the resulting Fréchet regression function is identified to be a linear model when the random object belongs to a Hilbert space. Even for nonparametric and semiparametric Fréchet regressions, which are usually nonlinear, the existing met… ▽ More The existing Fréchet regression is actually defined within a linear framework, since the weight function in the Fréchet objective function is linearly defined, and the resulting Fréchet regression function is identified to be a linear model when the random object belongs to a Hilbert space. Even for nonparametric and semiparametric Fréchet regressions, which are usually nonlinear, the existing methods handle them by local linear (or local polynomial) technique, and the resulting Fréchet regressions are (locally) linear as well. We in this paper introduce a type of nonlinear Fréchet regressions. Such a framework can be utilized to fit the essentially nonlinear models in a general metric space and uniquely identify the nonlinear structure in a Hilbert space. Particularly, its generalized linear form can return to the standard linear Fréchet regression through a special choice of the weight function. Moreover, the generalized linear form possesses methodological and computational simplicity because the Euclidean variable and the metric space element are completely separable. The favorable theoretical properties (e.g. the estimation consistency and presentation theorem) of the nonlinear Fréchet regressions are established systemically. The comprehensive simulation studies and a human mortality data analysis demonstrate that the new strategy is significantly better than the competitors. △ Less

Submitted 26 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2312.16607 [pdf, other]

A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mueller matrix images of liver pathological samples with radiomics features derived from corresponding pathological images to classify HCC and ICC. Our fusion network integrates a two-tier fusion approach, comprising early feature-level fusion and late classification-level fusion. By harnessing the strengths of polarization imaging techniques and image feature-based machine learning, our proposed fusion network significantly enhances classification accuracy. Notably, even at reduced imaging resolutions, the fusion network maintains robust performance due to the additional information provided by polarization features, which may not align with human visual perception. Our experimental results underscore the potential of this fusion network as a powerful tool for computer-aided diagnosis of HCC and ICC, showcasing the benefits and prospects of integrating polarization imaging techniques into the current image-intensive digital pathological diagnosis. We aim to contribute this innovative approach to top-tier journals, offering fresh insights and valuable tools in the fields of medical imaging and cancer diagnosis. By introducing polarization imaging into liver cancer classification, we demonstrate its interdisciplinary potential in addressing challenges in medical image analysis, promising advancements in medical imaging and cancer diagnosis. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.09758 [pdf, other]

Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach

Authors: Ziliang Chen, Yongsen Zheng, Zhao-Rong Lai, Quanlong Guan, Liang Lin

Abstract: Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail… ▽ More Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. The \emph{fake invariance} severely endangers OOD generalization since the trustful objective can not be diagnosed and existing causal surgeries are invalid to rectify. In this paper, we review a IRL family (InvRat) under the Partially and Fully Informative Invariant Feature Structural Causal Models (PIIF SCM /FIIF SCM) respectively, to certify their weaknesses in representing fake invariant features, then, unify their causal diagrams to propose ReStructured SCM (RS-SCM). RS-SCM can ideally rebuild the spurious and the fake invariant features simultaneously. Given this, we further develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects. It can be easily implemented by a small feature selection subnet introduced in the IRL family, which is alternatively optimized to achieve our goal. Experiments verified the superiority of our approach to fight against the fake invariant issue across a variety of OOD generalization benchmarks. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: AAAI-2024

arXiv:2311.08442 [pdf, other]

Mean-field variational inference with the TAP free energy: Geometric and statistical properties in linear models

Authors: Michael Celentano, Zhou Fan, Licong Lin, Song Mei

Abstract: We study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p. In high dimensions, the common approach of minimizing a Kullback-Leibler divergence from the posterior distribution, or maximizing an evidence lower bound, may deviate from the true posterior mean and underestimate posterior uncertainty. We study instead minimization of the… ▽ More We study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p. In high dimensions, the common approach of minimizing a Kullback-Leibler divergence from the posterior distribution, or maximizing an evidence lower bound, may deviate from the true posterior mean and underestimate posterior uncertainty. We study instead minimization of the TAP free energy, showing in a high-dimensional asymptotic framework that it has a local minimizer which provides a consistent estimate of the posterior marginals and may be used for correctly calibrated posterior inference. Geometrically, we show that the landscape of the TAP free energy is strongly convex in an extensive neighborhood of this local minimizer, which under certain general conditions can be found by an Approximate Message Passing (AMP) algorithm. We then exhibit an efficient algorithm that linearly converges to the minimizer within this local neighborhood. In settings where it is conjectured that no efficient algorithm can find this local neighborhood, we prove analogous geometric properties for a local minimizer of the TAP free energy reachable by AMP, and show that posterior inference based on this minimizer remains correctly calibrated. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 79 pages, 5 figures

arXiv:2310.10121 [pdf, other]

From Continuous Dynamics to Graph Neural Networks: Neural Diffusion and Beyond

Authors: Andi Han, Dai Shi, Lequan Lin, Junbin Gao

Abstract: Graph neural networks (GNNs) have demonstrated significant promise in modelling relational data and have been widely applied in various fields of interest. The key mechanism behind GNNs is the so-called message passing where information is being iteratively aggregated to central nodes from their neighbourhood. Such a scheme has been found to be intrinsically linked to a physical process known as h… ▽ More Graph neural networks (GNNs) have demonstrated significant promise in modelling relational data and have been widely applied in various fields of interest. The key mechanism behind GNNs is the so-called message passing where information is being iteratively aggregated to central nodes from their neighbourhood. Such a scheme has been found to be intrinsically linked to a physical process known as heat diffusion, where the propagation of GNNs naturally corresponds to the evolution of heat density. Analogizing the process of message passing to the heat dynamics allows to fundamentally understand the power and pitfalls of GNNs and consequently informs better model design. Recently, there emerges a plethora of works that proposes GNNs inspired from the continuous dynamics formulation, in an attempt to mitigate the known limitations of GNNs, such as oversmoothing and oversquashing. In this survey, we provide the first systematic and comprehensive review of studies that leverage the continuous perspective of GNNs. To this end, we introduce foundational ingredients for adapting continuous dynamics to GNNs, along with a general framework for the design of graph neural dynamics. We then review and categorize existing works based on their driven mechanisms and underlying dynamics. We also summarize how the limitations of classic GNNs can be addressed under the continuous framework. We conclude by identifying multiple open research directions. △ Less

Submitted 29 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.08566 [pdf, other]

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Authors: Licong Lin, Yu Bai, Song Mei

Abstract: Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is… ▽ More Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories. △ Less

Submitted 26 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2307.09210 [pdf, other]

Nested stochastic block model for simultaneously clustering networks and nodes

Authors: Nathaniel Josephs, Arash A. Amini, Marina Paez, Lizhen Lin

Abstract: We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the… ▽ More We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.15380 [pdf, ps, other]

Multivariate Rank-Based Analysis of Multiple Endpoints in Clinical Trials: A Global Test Approach

Authors: Kexuan Li, Lingli Yang, Shaofei Zhao, Susie Sinks, Luan Lin, Peng Sun

Abstract: Clinical trials often involve the assessment of multiple endpoints to comprehensively evaluate the efficacy and safety of interventions. In the work, we consider a global nonparametric testing procedure based on multivariate rank for the analysis of multiple endpoints in clinical trials. Unlike other existing approaches that rely on pairwise comparisons for each individual endpoint, the proposed m… ▽ More Clinical trials often involve the assessment of multiple endpoints to comprehensively evaluate the efficacy and safety of interventions. In the work, we consider a global nonparametric testing procedure based on multivariate rank for the analysis of multiple endpoints in clinical trials. Unlike other existing approaches that rely on pairwise comparisons for each individual endpoint, the proposed method directly incorporates the multivariate ranks of the observations. By considering the joint ranking of all endpoints, the proposed approach provides robustness against diverse data distributions and censoring mechanisms commonly encountered in clinical trials. Through extensive simulations, we demonstrate the superior performance of the multivariate rank-based approach in controlling type I error and achieving higher power compared to existing rank-based methods. The simulations illustrate the advantages of leveraging multivariate ranks and highlight the robustness of the approach in various settings. The proposed method offers an effective tool for the analysis of multiple endpoints in clinical trials, enhancing the reliability and efficiency of outcome evaluations. △ Less

Submitted 27 June, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2305.18728 [pdf, other]

Plug-in Performative Optimization

Authors: Licong Lin, Tijana Zrnic

Abstract: When predictions are performative, the choice of which predictor to deploy influences the distribution of future observations. The overarching goal in learning under performativity is to find a predictor that has low \emph{performative risk}, that is, good performance on its induced distribution. One family of solutions for optimizing the performative risk, including bandits and other derivative-f… ▽ More When predictions are performative, the choice of which predictor to deploy influences the distribution of future observations. The overarching goal in learning under performativity is to find a predictor that has low \emph{performative risk}, that is, good performance on its induced distribution. One family of solutions for optimizing the performative risk, including bandits and other derivative-free methods, is agnostic to any structure in the performative feedback, leading to exceedingly slow convergence rates. A complementary family of solutions makes use of explicit \emph{models} for the feedback, such as best-response models in strategic classification, enabling faster rates. However, these rates critically rely on the feedback model being correct. In this work we study a general protocol for making use of possibly misspecified models in performative prediction, called \emph{plug-in performative optimization}. We show this solution can be far superior to model-agnostic strategies, as long as the misspecification is not too extreme. Our results support the hypothesis that models, even if misspecified, can indeed help with learning in performative settings. △ Less

Submitted 28 May, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.18488 [pdf, other]

A Bayesian sparse factor model with adaptive posterior concentration

Authors: Ilsang Ohn, Lizhen Lin, Yongdai Kim

Abstract: In this paper, we propose a new Bayesian inference method for a high-dimensional sparse factor model that allows both the factor dimensionality and the sparse structure of the loading matrix to be inferred. The novelty is to introduce a certain dependence between the sparsity level and the factor dimensionality, which leads to adaptive posterior concentration while kee** computational tractabili… ▽ More In this paper, we propose a new Bayesian inference method for a high-dimensional sparse factor model that allows both the factor dimensionality and the sparse structure of the loading matrix to be inferred. The novelty is to introduce a certain dependence between the sparsity level and the factor dimensionality, which leads to adaptive posterior concentration while kee** computational tractability. We show that the posterior distribution asymptotically concentrates on the true factor dimensionality, and more importantly, this posterior consistency is adaptive to the sparsity level of the true loading matrix and the noise variance. We also prove that the proposed Bayesian model attains the optimal detection rate of the factor dimensionality in a more general situation than those found in the literature. Moreover, we obtain a near-optimal posterior concentration rate of the covariance matrix. Numerical studies are conducted and show the superiority of the proposed method compared with other competitors. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2304.11251 [pdf, other]

Machine Learning and the Future of Bayesian Computation

Authors: Steven Winter, Trevor Campbell, Lizhen Lin, Sanvesh Srivastava, David B. Dunson

Abstract: Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional model… ▽ More Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional models with many observations. In this article we discuss the potential to improve posterior computation using ideas from machine learning. Concrete future directions are explored in vignettes on normalizing flows, Bayesian coresets, distributed Bayesian inference, and variational inference. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2303.02637 [pdf, other]

A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial Networks

Authors: Forough Fazeli-Asl, Michael Minyi Zhang, Lizhen Lin

Abstract: A classic inferential statistical problem is the goodness-of-fit (GOF) test. Such a test can be challenging when the hypothesized parametric model has an intractable likelihood and its distributional form is not available. Bayesian methods for GOF can be appealing due to their ability to incorporate expert knowledge through prior distributions. However, standard Bayesian methods for this test of… ▽ More A classic inferential statistical problem is the goodness-of-fit (GOF) test. Such a test can be challenging when the hypothesized parametric model has an intractable likelihood and its distributional form is not available. Bayesian methods for GOF can be appealing due to their ability to incorporate expert knowledge through prior distributions. However, standard Bayesian methods for this test often require strong distributional assumptions on the data and their relevant parameters. To address this issue, we propose a semi-Bayesian nonparametric (semi-BNP) procedure in the context of the maximum mean discrepancy (MMD) measure that can be applied to the GOF test. Our method introduces a novel Bayesian estimator for the MMD, enabling the development of a measure-based hypothesis test for intractable models. Through extensive experiments, we demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis. Furthermore, we showcase the versatility of our approach by embedding the proposed estimator within a generative adversarial network (GAN) framework. It facilitates a robust BNP learning approach as another significant application of our method. With our BNP procedure, this new GAN approach can enhance sample diversity and improve inferential accuracy compared to traditional techniques. △ Less

Submitted 10 November, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

Comments: Typos corrected, Secondary (simulation and theoretical) results added, Additional discussion added, references added

arXiv:2303.02534 [pdf, other]

Semi-parametric inference based on adaptively collected data

Authors: Licong Lin, Koulik Khamaru, Martin J. Wainwright

Abstract: Many standard estimators, when applied to adaptively collected data, fail to be asymptotically normal, thereby complicating the construction of confidence intervals. We address this challenge in a semi-parametric context: estimating the parameter vector of a generalized linear regression model contaminated by a non-parametric nuisance component. We construct suitably weighted estimating equations… ▽ More Many standard estimators, when applied to adaptively collected data, fail to be asymptotically normal, thereby complicating the construction of confidence intervals. We address this challenge in a semi-parametric context: estimating the parameter vector of a generalized linear regression model contaminated by a non-parametric nuisance component. We construct suitably weighted estimating equations that account for adaptivity in data collection, and provide conditions under which the associated estimates are asymptotically normal. Our results characterize the degree of "explorability" required for asymptotic normality to hold. For the simpler problem of estimating a linear functional, we provide similar guarantees under much weaker assumptions. We illustrate our general theory with concrete consequences for various problems, including standard linear bandits and sparse generalized bandits, and compare with other methods via simulation studies. △ Less

Submitted 4 March, 2023; originally announced March 2023.

arXiv:2302.12670 [pdf, ps, other]

Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning

Authors: Rui Miao, Zhengling Qi, Cong Shi, Lin Lin

Abstract: Pricing based on individual customer characteristics is widely used to maximize sellers' revenues. This work studies offline personalized pricing under endogeneity using an instrumental variable approach. Standard instrumental variable methods in causal inference/econometrics either focus on a discrete treatment space or require the exclusion restriction of instruments from having a direct effect… ▽ More Pricing based on individual customer characteristics is widely used to maximize sellers' revenues. This work studies offline personalized pricing under endogeneity using an instrumental variable approach. Standard instrumental variable methods in causal inference/econometrics either focus on a discrete treatment space or require the exclusion restriction of instruments from having a direct effect on the outcome, which limits their applicability in personalized pricing. In this paper, we propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables (PRINT) for continuous treatment that allow direct effects on the outcome. Specifically, relying on the structural models of revenue and price, we establish the identifiability condition of an optimal pricing strategy under endogeneity with the help of invalid instrumental variables. Based on this new identification, which leads to solving conditional moment restrictions with generalized residual functions, we construct an adversarial min-max estimator and learn an optimal pricing strategy. Furthermore, we establish an asymptotic regret bound to find an optimal pricing strategy. Finally, we demonstrate the effectiveness of the proposed method via extensive simulation studies as well as a real data application from an US online auto loan company. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2302.11222 [pdf, other]

Source-Function Weighted-Transfer Learning for Nonparametric Regression with Seemingly Similar Sources

Authors: Lu Lin, Weiyu Li

Abstract: The homogeneity, or more generally, the similarity between source domains and a target domain seems to be essential to a positive transfer learning. In practice, however, the similarity condition is difficult to check and is often violated. In this paper, instead of the popularly used similarity condition, a seeming similarity is introduced, which is defined by a non-orthogonality together with a… ▽ More The homogeneity, or more generally, the similarity between source domains and a target domain seems to be essential to a positive transfer learning. In practice, however, the similarity condition is difficult to check and is often violated. In this paper, instead of the popularly used similarity condition, a seeming similarity is introduced, which is defined by a non-orthogonality together with a smoothness. Such a condition is naturally satisfied under common situations and even implies the dissimilarity in some sense. Based on the seeming similarity together with an $L_2$-adjustment, a source-function weighted-transfer learning estimation (sw-TLE) is constructed. By source-function weighting, an adaptive transfer learning is achieved in the sense that it is applied to similar and dissimilar scenarios with a relatively high estimation efficiency. Particularly, under the case with homogenous source and target models, the sw-TLE even can be competitive with the full data estimator. The hidden relationship between the source-function weighting estimator and the James-Stein estimator is established as well, which reveals the structural reasonability of our methodology. Moreover, the strategy does apply to nonparametric and semiparametric models. The comprehensive simulation studies and real data analysis can illustrate that the new strategy is significantly better than the competitors. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.08606 [pdf, other]

Intrinsic and extrinsic deep learning on manifolds

Authors: Yihao Fang, Ilsang Ohn, Vijay Gupta, Lizhen Lin

Abstract: We propose extrinsic and intrinsic deep neural network architectures as general frameworks for deep learning on manifolds. Specifically, extrinsic deep neural networks (eDNNs) preserve geometric features on manifolds by utilizing an equivariant embedding from the manifold to its image in the Euclidean space. Moreover, intrinsic deep neural networks (iDNNs) incorporate the underlying intrinsic geom… ▽ More We propose extrinsic and intrinsic deep neural network architectures as general frameworks for deep learning on manifolds. Specifically, extrinsic deep neural networks (eDNNs) preserve geometric features on manifolds by utilizing an equivariant embedding from the manifold to its image in the Euclidean space. Moreover, intrinsic deep neural networks (iDNNs) incorporate the underlying intrinsic geometry of manifolds via exponential and log maps with respect to a Riemannian structure. Consequently, we prove that the empirical risk of the empirical risk minimizers (ERM) of eDNNs and iDNNs converge in optimal rates. Overall, The eDNNs framework is simple and easy to compute, while the iDNNs framework is accurate and fast converging. To demonstrate the utilities of our framework, various simulation studies, and real data analyses are presented with eDNNs and iDNNs. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2210.03178 [pdf, other]

doi 10.1145/3511808.3557672

Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

Authors: Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva, Lin Lin, Vernon M. Chinchilli

Abstract: Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and… ▽ More Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and auxiliary metrics or covariates. We incorporate auxiliary covariates among test-level covariates in a deep Black-Box framework controlling FDR (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple-hypothesis testing. Our method parametrizes the test-level covariates as a neural network and adjusts the auxiliary covariates through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: Short Version of NeurT-FDR, accepted at CIKM 2022. arXiv admin note: substantial text overlap with arXiv:2101.09809

arXiv:2209.12739 [pdf, ps, other]

doi 10.1007/s11222-023-10352-x

Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data

Authors: Yan Chen, Shuixin Fang, Lu Lin

Abstract: We are interested in renewable estimations and algorithms for nonparametric models with streaming data. In our method, the nonparametric function of interest is expressed through a functional depending on a weight function and a conditional distribution function (CDF). The CDF is estimated by renewable kernel estimations combined with function interpolations, based on which we propose the method o… ▽ More We are interested in renewable estimations and algorithms for nonparametric models with streaming data. In our method, the nonparametric function of interest is expressed through a functional depending on a weight function and a conditional distribution function (CDF). The CDF is estimated by renewable kernel estimations combined with function interpolations, based on which we propose the method of renewable weighted composite quantile regression (WCQR). Then we fully use the model structure and obtain new selectors for the weight function, such that the WCQR can achieve asymptotic unbiasness when estimating specific functions in the model. We also propose practical bandwidth selectors for streaming data and find the optimal weight function minimizing the asymptotic variance. The asymptotical results show that our estimator is almost equivalent to the oracle estimator obtained from the entire data together. Besides, our method also enjoys adaptiveness to error distributions, robustness to outliers, and efficiency in both estimation and computation. Simulation studies and real data analyses further confirm our theoretical findings. △ Less

Submitted 8 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 24 pages, 0 figures

arXiv:2207.06561 [pdf, other]

Nonparametric Bayesian Approach to Treatment Ranking in Network Meta-Analysis with Application to Comparisons of Antidepressants

Authors: Andrés F. Barrientos, Garritt L. Page, Lifeng Lin

Abstract: Network meta-analysis is a powerful tool to synthesize evidence from independent studies and compare multiple treatments simultaneously. A critical task of performing a network meta-analysis is to offer ranks of all available treatment options for a specific disease outcome. Frequently, the estimated treatment rankings are accompanied by a large amount of uncertainty, suffer from multiplicity issu… ▽ More Network meta-analysis is a powerful tool to synthesize evidence from independent studies and compare multiple treatments simultaneously. A critical task of performing a network meta-analysis is to offer ranks of all available treatment options for a specific disease outcome. Frequently, the estimated treatment rankings are accompanied by a large amount of uncertainty, suffer from multiplicity issues, and rarely permit ties. These issues make interpreting rankings problematic as they are often treated as absolute metrics. To address these shortcomings, we formulate a ranking strategy that adapts to scenarios with high order uncertainty by producing more conservative results. This improves the interpretability while simultaneously accounting for multiple comparisons. To admit ties between treatment effects, we also develop a Bayesian Nonparametric approach for network meta-analysis. The approach capitalizes on the induced clustering mechanism of Bayesian Nonparametric methods producing a positive probability that two treatment effects are equal. We demonstrate the utility of the procedure through numerical experiments and a network meta-analysis designed to study antidepressant treatments. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2206.06086 [pdf, ps, other]

A Correlation-Ratio Transfer Learning and Variational Stein's Paradox

Authors: Lu Lin, Weiyu Li

Abstract: A basic condition for efficient transfer learning is the similarity between a target model and source models. In practice, however, the similarity condition is difficult to meet or is even violated. Instead of the similarity condition, a brand-new strategy, linear correlation-ratio, is introduced in this paper to build an accurate relationship between the models. Such a correlation-ratio can be ea… ▽ More A basic condition for efficient transfer learning is the similarity between a target model and source models. In practice, however, the similarity condition is difficult to meet or is even violated. Instead of the similarity condition, a brand-new strategy, linear correlation-ratio, is introduced in this paper to build an accurate relationship between the models. Such a correlation-ratio can be easily estimated by historical data or a part of sample. Then, a correlation-ratio transfer learning likelihood is established based on the correlation-ratio combination. On the practical side, the new framework is applied to some application scenarios, especially the areas of data streams and medical studies. Methodologically, some techniques are suggested for transferring the information from simple source models to a relatively complex target model. Theoretically, some favorable properties, including the global convergence rate, are achieved, even for the case where the source models are not similar to the target model. All in all, it can be seen from the theories and experimental results that the inference on the target model is significantly improved by the information from similar or dissimilar source models. In other words, a variational Stein's paradox is illustrated in the context of transfer learning. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2206.02017 [pdf, ps, other]

Feature screening for multi-response linear models by empirical likelihood

Authors: Jun Lu, Qinqin Hu, Lu Lin

Abstract: This paper proposes a new feature screening method for the multi-response ultrahigh dimensional linear model by empirical likelihood. Through a multivariate moment condition, the empirical likelihood induced ranking statistics can exploit the joint effect among responses, and thus result in a much better performance than the methods considering responses individually. More importantly, by the use… ▽ More This paper proposes a new feature screening method for the multi-response ultrahigh dimensional linear model by empirical likelihood. Through a multivariate moment condition, the empirical likelihood induced ranking statistics can exploit the joint effect among responses, and thus result in a much better performance than the methods considering responses individually. More importantly, by the use of empirical likelihood, the new method adapts to the heterogeneity in the conditional variance of random error. The sure screening property of the newly proposed method is proved with the model size controlled within a reasonable scale. Additionally, the new screening method is also extended to a conditional version so that it can recover the hidden predictors which are easily missed by the unconditional method. The corresponding theoretical properties are also provided. Finally, both numerical studies and real data analysis are provided to illustrate the effectiveness of the proposed methods. △ Less

Submitted 4 June, 2022; originally announced June 2022.

arXiv:2205.02719 [pdf, other]

Communication-Efficient Adaptive Federated Learning

Authors: Yujia Wang, Lu Lin, **ghui Chen

Abstract: Federated learning is a machine learning training paradigm that enables clients to jointly train models without sharing their own localized data. However, the implementation of federated learning in practice still faces numerous challenges, such as the large communication overhead due to the repetitive server-client synchronization and the lack of adaptivity by SGD-based model updates. Despite tha… ▽ More Federated learning is a machine learning training paradigm that enables clients to jointly train models without sharing their own localized data. However, the implementation of federated learning in practice still faces numerous challenges, such as the large communication overhead due to the repetitive server-client synchronization and the lack of adaptivity by SGD-based model updates. Despite that various methods have been proposed for reducing the communication cost by gradient compression or quantization, and the federated versions of adaptive optimizers such as FedAdam are proposed to add more adaptivity, the current federated learning framework still cannot solve the aforementioned challenges all at once. In this paper, we propose a novel communication-efficient adaptive federated learning method (FedCAMS) with theoretical convergence guarantees. We show that in the nonconvex stochastic optimization setting, our proposed FedCAMS achieves the same convergence rate of $O(\frac{1}{\sqrt{TKm}})$ as its non-compressed counterparts. Extensive experiments on various benchmarks verify our theoretical analysis. △ Less

Submitted 19 April, 2023; v1 submitted 5 May, 2022; originally announced May 2022.

Comments: Updated version. A previous version is accepted by ICML 2022 (37 pages, 7 figures, 3 tables)

arXiv:2203.02090 [pdf, other]

Bayesian community detection for networks with covariates

Authors: Luyi Shen, Arash Amini, Nathaniel Josephs, Lizhen Lin

Abstract: The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in related models and algorithms. Among the various learning tasks with network data, community detection, the discovery of node clusters or "communities," has arguably received the most attention in the scientific community. In many real-world… ▽ More The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in related models and algorithms. Among the various learning tasks with network data, community detection, the discovery of node clusters or "communities," has arguably received the most attention in the scientific community. In many real-world applications, the network data often come with additional information in the form of node or edge covariates that should ideally be leveraged for inference. In this paper, we add to a limited literature on community detection for networks with covariates by proposing a Bayesian stochastic block model with a covariate-dependent random partition prior. Under our prior, the covariates are explicitly expressed in specifying the prior distribution on the cluster membership. Our model has the flexibility of modeling uncertainties of all the parameter estimates including the community membership. Importantly, and unlike the majority of existing methods, our model has the ability to learn the number of the communities via posterior inference without having to assume it to be known. Our model can be applied to community detection in both dense and sparse networks, with both categorical and continuous covariates, and our MCMC algorithm is very efficient with good mixing properties. We demonstrate the superior performance of our model over existing models in a comprehensive simulation study and an application to two real datasets. △ Less

Submitted 6 April, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

arXiv:2202.13503 [pdf, other]

Variational Interpretable Learning from Multi-view Data

Authors: Lin Qiu, Lynn Lin, Vernon M. Chinchilli

Abstract: The main idea of canonical correlation analysis (CCA) is to map different views onto a common latent space with maximum correlation. We propose a deep interpretable variational canonical correlation analysis (DICCA) for multi-view learning. The developed model extends the existing latent variable model for linear CCA to nonlinear models through the use of deep generative networks. DICCA is designe… ▽ More The main idea of canonical correlation analysis (CCA) is to map different views onto a common latent space with maximum correlation. We propose a deep interpretable variational canonical correlation analysis (DICCA) for multi-view learning. The developed model extends the existing latent variable model for linear CCA to nonlinear models through the use of deep generative networks. DICCA is designed to disentangle both the shared and view-specific variations for multi-view data. To further make the model more interpretable, we place a sparsity-inducing prior on the latent weight with a structured variational autoencoder that is comprised of view-specific generators. Empirical results on real-world datasets show that our methods are competitive across domains. △ Less

Submitted 1 March, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2003.04292 by other authors. text overlap with arXiv:1802.06765 by other authors

arXiv:2202.09353 [pdf, other]

doi 10.1016/j.nimb.2022.06.001

Model Calibration of the Liquid Mercury Spallation Target using Evolutionary Neural Networks and Sparse Polynomial Expansions

Authors: Majdi I. Radaideh, Hoang Tran, Lianshan Lin, Hao Jiang, Drew Winder, Sarma Gorti, Guannan Zhang, Justin Mach, Sarah Cousineau

Abstract: The mercury constitutive model predicting the strain and stress in the target vessel plays a central role in improving the lifetime prediction and future target designs of the mercury targets at the Spallation Neutron Source (SNS). We leverage the experiment strain data collected over multiple years to improve the mercury constitutive model through a combination of large-scale simulations of the t… ▽ More The mercury constitutive model predicting the strain and stress in the target vessel plays a central role in improving the lifetime prediction and future target designs of the mercury targets at the Spallation Neutron Source (SNS). We leverage the experiment strain data collected over multiple years to improve the mercury constitutive model through a combination of large-scale simulations of the target behavior and the use of machine learning tools for parameter estimation. We present two interdisciplinary approaches for surrogate-based model calibration of expensive simulations using evolutionary neural networks and sparse polynomial expansions. The experiments and results of the two methods show a very good agreement for the solid mechanics simulation of the mercury spallation target. The proposed methods are used to calibrate the tensile cutoff threshold, mercury density, and mercury speed of sound during intense proton pulse experiments. Using strain experimental data from the mercury target sensors, the newly calibrated simulations achieve 7\% average improvement on the signal prediction accuracy and 8\% reduction in mean absolute error compared to previously reported reference parameters, with some sensors experiencing up to 30\% improvement. The proposed calibrated simulations can significantly aid in fatigue analysis to estimate the mercury target lifetime and integrity, which reduces abrupt target failure and saves a tremendous amount of costs. However, an important conclusion from this work points out to a deficiency in the current constitutive model based on the equation of state in capturing the full physics of the spallation reaction. Given that some of the calibrated parameters that show a good agreement with the experimental data can be nonphysical mercury properties, we need a more advanced two-phase flow model to capture bubble dynamics and mercury cavitation. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Comments: 26 pages, 10 figures, 6 tables

Journal ref: Nucl. Instrum. Methods Phys. Res. B 525 (2022) 41-54

arXiv:2202.03959 [pdf, other]

doi 10.1016/j.rinp.2022.105414

Bayesian Inverse Uncertainty Quantification of the Physical Model Parameters for the Spallation Neutron Source First Target Station

Authors: Majdi I. Radaideh, Lianshan Lin, Hao Jiang, Sarah Cousineau

Abstract: The reliability of the mercury spallation target is mission-critical for the neutron science program of the spallation neutron source at the Oak Ridge National Laboratory. We present an inverse uncertainty quantification (UQ) study using the Bayesian framework for the mercury equation of state model parameters, with the assistance of polynomial chaos expansion surrogate models. By leveraging high-… ▽ More The reliability of the mercury spallation target is mission-critical for the neutron science program of the spallation neutron source at the Oak Ridge National Laboratory. We present an inverse uncertainty quantification (UQ) study using the Bayesian framework for the mercury equation of state model parameters, with the assistance of polynomial chaos expansion surrogate models. By leveraging high-fidelity structural mechanics simulations and real measured strain data, the inverse UQ results reveal a maximum-a-posteriori estimate, mean, and standard deviation of $6.5\times 10^4$ ($6.49\times 10^4 \pm 2.39\times 10^3$) Pa for the tensile cutoff threshold, $12112.1$ ($12111.8 \pm 14.9$) kg/m$^3$ for the mercury density, and $1850.4$ ($1849.7 \pm 5.3$) m/s for the mercury speed of sound. These values do not necessarily represent the nominal mercury physical properties, but the ones that fit the strain data and the solid mechanics model we have used, and can be explained by three reasons: The limitations of the computer model or what is known as the model-form uncertainty, the biases and errors in the experimental data, and the mercury cavitation damage that also contributes to the change in mercury behavior. Consequently, the equation of state model parameters try to compensate for these effects to improve fitness to the data. The mercury target simulations using the updated parametric values result in an excellent agreement with 88% average accuracy compared to experimental data, 6% average increase compared to reference parameters, with some sensors experiencing an increase of more than 25%. With a more accurate simulated strain response, the component fatigue analysis can utilize the comprehensive strain history data to evaluate the target vessel's lifetime closer to its real limit, saving tremendous target costs. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 21 pages, 9 figures, 7 tables

Journal ref: Results in Physics 36 (2022) 105414

arXiv:2201.12597 [pdf, other]

Global Bias-Corrected Divide-and-Conquer by Quantile-Matched Composite for General Nonparametric Regressions

Authors: Yan Chen, Lu Lin

Abstract: The issues of bias-correction and robustness are crucial in the strategy of divide-and-conquer (DC), especially for asymmetric nonparametric models with massive data. It is known that quantile-based methods can achieve the robustness, but the quantile estimation for nonparametric regression has non-ignorable bias when the error distribution is asymmetric. This paper explores a global bias-correcte… ▽ More The issues of bias-correction and robustness are crucial in the strategy of divide-and-conquer (DC), especially for asymmetric nonparametric models with massive data. It is known that quantile-based methods can achieve the robustness, but the quantile estimation for nonparametric regression has non-ignorable bias when the error distribution is asymmetric. This paper explores a global bias-corrected DC by quantile-matched composite for nonparametric regressions with general error distributions. The proposed strategies can achieve the bias-correction and robustness, simultaneously. Unlike common DC quantile estimations that use an identical quantile level to construct a local estimator by each local machine, in the new methodologies, the local estimators are obtained at various quantile levels for different data batches, and then the global estimator is elaborately constructed as a weighted sum of the local estimators. In the weighted sum, the weights and quantile levels are well-matched such that the bias of the global estimator is corrected significantly, especially for the case where the error distribution is asymmetric. Based on the asymptotic properties of the global estimator, the optimal weights are attained, and the corresponding algorithms are then suggested. The behaviors of the new methods are further illustrated by various numerical examples from simulation experiments and real data analyses. Compared with the competitors, the new methods have the favorable features of estimation accuracy, robustness, applicability and computational efficiency. △ Less

Submitted 29 January, 2022; originally announced January 2022.

Comments: 44 pages, 2 figures

arXiv:2112.09086 [pdf, ps, other]

A new locally linear embedding scheme in light of Hessian eigenmap

Authors: Liren Lin, Chih-Wei Chen

Abstract: We provide a new interpretation of Hessian locally linear embedding (HLLE), revealing that it is essentially a variant way to implement the same idea of locally linear embedding (LLE). Based on the new interpretation, a substantial simplification can be made, in which the idea of "Hessian" is replaced by rather arbitrary weights. Moreover, we show by numerical examples that HLLE may produce projec… ▽ More We provide a new interpretation of Hessian locally linear embedding (HLLE), revealing that it is essentially a variant way to implement the same idea of locally linear embedding (LLE). Based on the new interpretation, a substantial simplification can be made, in which the idea of "Hessian" is replaced by rather arbitrary weights. Moreover, we show by numerical examples that HLLE may produce projection-like results when the dimension of the target space is larger than that of the data manifold, and hence one further modification concerning the manifold dimension is suggested. Combining all the observations, we finally achieve a new LLE-type method, which is called tangential LLE (TLLE). It is simpler and more robust than HLLE. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 13 pages

MSC Class: 62-07

arXiv:2112.02580 [pdf, other]

Bayesian Optimal Two-sample Tests in High-dimension

Authors: Kyoungjae Lee, Kisung You, Lizhen Lin

Abstract: We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that only a few entries of two mean vectors or covariance matrices are different. Many existing tests that rely on aggregating the difference between empirical means o… ▽ More We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that only a few entries of two mean vectors or covariance matrices are different. Many existing tests that rely on aggregating the difference between empirical means or covariance matrices are not optimal or yield low power under such setups. Motivated by this, we develop Bayesian two-sample tests employing a divide-and-conquer idea, which is powerful especially when the difference between two populations is sparse but large. The proposed two-sample tests manifest closed forms of Bayes factors and allow scalable computations even in high-dimensions. We prove that the proposed tests are consistent under relatively mild conditions compared to existing tests in the literature. Furthermore, the testable regions from the proposed tests turn out to be optimal in terms of rates. Simulation studies show clear advantages of the proposed tests over other state-of-the-art methods in various scenarios. Our tests are also applied to the analysis of the gene expression data of two cancer data sets. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2111.00705 [pdf, other]

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Authors: Yujia Wang, Lu Lin, **ghui Chen

Abstract: Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer att… ▽ More Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory. △ Less

Submitted 23 February, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted by AISTATS 2022 (29 pages, 11 figures, 2 tables)

arXiv:2110.13240 [pdf, other]

Adaptive Weighted Multi-View Clustering

Authors: Shuo Shuo Liu, Lin Lin

Abstract: Learning multi-view data is an emerging problem in machine learning research, and nonnegative matrix factorization (NMF) is a popular dimensionality-reduction method for integrating information from multiple views. These views often provide not only consensus but also complementary information. However, most multi-view NMF algorithms assign equal weight to each view or tune the weight via line sea… ▽ More Learning multi-view data is an emerging problem in machine learning research, and nonnegative matrix factorization (NMF) is a popular dimensionality-reduction method for integrating information from multiple views. These views often provide not only consensus but also complementary information. However, most multi-view NMF algorithms assign equal weight to each view or tune the weight via line search empirically, which can be infeasible without any prior knowledge of the views or computationally expensive. In this paper, we propose a weighted multi-view NMF (WM-NMF) algorithm. In particular, we aim to address the critical technical gap, which is to learn both view-specific weight and observation-specific reconstruction weight to quantify each view's information content. The introduced weighting scheme can alleviate unnecessary views' adverse effects and enlarge the positive effects of the important views by assigning smaller and larger weights, respectively. Experimental results confirm the effectiveness and advantages of the proposed algorithm in terms of achieving better clustering performance and dealing with the noisy data compared to the existing algorithms. △ Less

Submitted 24 April, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

arXiv:2110.08849 [pdf, other]

A Bayesian Selection Model for Correcting Outcome Reporting Bias With Application to a Meta-analysis on Heart Failure Interventions

Authors: Ray Bai, Xiaokang Liu, Lifeng Lin, Yulun Liu, Stephen E. Kimmel, Haitao Chu, Yong Chen

Abstract: Multivariate meta-analysis (MMA) is a powerful tool for jointly estimating multiple outcomes' treatment effects. However, the validity of results from MMA is potentially compromised by outcome reporting bias (ORB), or the tendency for studies to selectively report outcomes. Until recently, ORB has been understudied. Since ORB can lead to biased conclusions, it is crucial to correct the estimates o… ▽ More Multivariate meta-analysis (MMA) is a powerful tool for jointly estimating multiple outcomes' treatment effects. However, the validity of results from MMA is potentially compromised by outcome reporting bias (ORB), or the tendency for studies to selectively report outcomes. Until recently, ORB has been understudied. Since ORB can lead to biased conclusions, it is crucial to correct the estimates of effect sizes and quantify their uncertainty in the presence of ORB. With this goal, we develop a Bayesian selection model to adjust for ORB in MMA. We further propose a measure for quantifying the impact of ORB on the results from MMA. We evaluate our approaches through a meta-evaluation of 748 bivariate meta-analyses from the Cochrane Database of Systematic Reviews. Our model is motivated by and applied to a meta-analysis of interventions on hospital readmission and quality of life for heart failure patients. In our analysis, the relative risk (RR) of hospital readmission for the intervention group changes from a significant decrease (RR: 0.931, 95% confidence interval [CI]: 0.862-0.993) to a statistically nonsignificant effect (RR: 0.955, 95% CI: 0.876-1.051) after adjusting for ORB. This study demonstrates that failing to account for ORB can lead to different conclusions in a meta-analysis. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: 26 pages, 5 tables, 8 figures

arXiv:2109.11795 [pdf, other]

Scalable Bayesian high-dimensional local dependence learning

Authors: Kyoungjae Lee, Lizhen Lin

Abstract: In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a va… ▽ More In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of fields such as finance, genome associations analysis and spatial modeling. We adopt a flexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a fixed bandwidth cannot be adapted for this general setup. We employ the modified Cholesky decomposition for the precision matrix and design a flexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to efficient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2109.03204 [pdf, other]

doi 10.1214/23-AOS2349

Adaptive variational Bayes: Optimality, computation and applications

Authors: Ilsang Ohn, Lizhen Lin

Abstract: In this paper, we explore adaptive inference based on variational Bayes. Although several studies have been conducted to analyze the contraction properties of variational posteriors, there is still a lack of a general and computationally tractable variational Bayes method that performs adaptive inference. To fill this gap, we propose a novel adaptive variational Bayes framework, which can operate… ▽ More In this paper, we explore adaptive inference based on variational Bayes. Although several studies have been conducted to analyze the contraction properties of variational posteriors, there is still a lack of a general and computationally tractable variational Bayes method that performs adaptive inference. To fill this gap, we propose a novel adaptive variational Bayes framework, which can operate on a collection of models. The proposed framework first computes a variational posterior over each individual model separately and then combines them with certain weights to produce a variational posterior over the entire model. It turns out that this combined variational posterior is the closest member to the posterior over the entire model in a predefined family of approximating distributions. We show that the adaptive variational Bayes attains optimal contraction rates adaptively under very general conditions. We also provide a methodology to maintain the tractability and adaptive optimality of the adaptive variational Bayes even in the presence of an enormous number of individual models, such as sparse models. We apply the general results to several examples, including deep learning and sparse factor models, and derive new and adaptive inference results. In addition, we characterize an implicit regularization effect of variational Bayes and show that the adaptive variational posterior can utilize this. △ Less

Submitted 11 March, 2024; v1 submitted 7 September, 2021; originally announced September 2021.

Journal ref: Ann. Statist. 52(1):335-363. (2024)

arXiv:2108.12680 [pdf, ps, other]

Avoiding unwanted results in locally linear embedding: A new understanding of regularization

Authors: Liren Lin

Abstract: We demonstrate that locally linear embedding (LLE) inherently admits some unwanted results when no regularization is used, even for cases in which regularization is not supposed to be needed in the original algorithm. The existence of one special type of result, which we call ``projection pattern'', is mathematically proved in the situation that an exact local linear relation is achieved in each n… ▽ More We demonstrate that locally linear embedding (LLE) inherently admits some unwanted results when no regularization is used, even for cases in which regularization is not supposed to be needed in the original algorithm. The existence of one special type of result, which we call ``projection pattern'', is mathematically proved in the situation that an exact local linear relation is achieved in each neighborhood of the data. These special patterns as well as some other bizarre results that may occur in more general situations are shown by numerical examples on the Swiss roll with a hole embedded in a high dimensional space. It is observed that all these bad results can be effectively prevented by using regularization. △ Less

Submitted 28 August, 2021; originally announced August 2021.

Comments: 11 pages

MSC Class: 62-07

arXiv:2108.04035 [pdf, other]

Mixture of Linear Models Co-supervised by Deep Neural Networks

Authors: Beomseok Seo, Lin Lin, Jia Li

Abstract: Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true f… ▽ More Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true for every application, in some applications, especially in economics, social science, healthcare industry, and administrative decision making, scientists or practitioners are resistant to use predictions made by a black-box system for multiple reasons. One reason is that a major purpose of a study can be to make discoveries based upon the prediction function, e.g., to reveal the relationships between measurements. Another reason can be that the training dataset is not large enough to make researchers feel completely sure about a purely data-driven result. Being able to examine and interpret the prediction function will enable researchers to connect the result with existing knowledge or gain insights about new directions to explore. Although classic statistical models are much more explainable, their accuracy often falls considerably below DNN. In this paper, we propose an approach to fill the gap between relatively simple explainable models and DNN such that we can more flexibly tune the trade-off between interpretability and accuracy. Our main idea is a mixture of discriminative models that is trained with the guidance from a DNN. Although mixtures of discriminative models have been studied before, our way of generating the mixture is quite different. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: Submitted to Journal of Computational and Graphical Statistics on April 19, 2021

arXiv:2106.01552 [pdf, other]

Uncertainty Quantification of a Computer Model for Binary Black Hole Formation

Authors: Luyao Lin, Derek Bingham, Floor Broekgaarden, Ilya Mandel

Abstract: In this paper, a fast and parallelizable method based on Gaussian Processes (GPs) is introduced to emulate computer models that simulate the formation of binary black holes (BBHs) through the evolution of pairs of massive stars. Two obstacles that arise in this application are the a priori unknown conditions of BBH formation and the large scale of the simulation data. We address them by proposing… ▽ More In this paper, a fast and parallelizable method based on Gaussian Processes (GPs) is introduced to emulate computer models that simulate the formation of binary black holes (BBHs) through the evolution of pairs of massive stars. Two obstacles that arise in this application are the a priori unknown conditions of BBH formation and the large scale of the simulation data. We address them by proposing a local emulator which combines a GP classifier and a GP regression model. The resulting emulator can also be utilized in planning future computer simulations through a proposed criterion for sequential design. By propagating uncertainties of simulation input through the emulator, we are able to obtain the distribution of BBH properties under the distribution of physical parameters. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: 24 pages, 11 figures

arXiv:2105.04046 [pdf, other]

A likelihood approach to nonparametric estimation of a singular distribution using deep generative models

Authors: Minwoo Chae, Dongha Kim, Yongdai Kim, Lizhen Lin

Abstract: We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional… ▽ More We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly. △ Less

Submitted 28 March, 2023; v1 submitted 9 May, 2021; originally announced May 2021.

Comments: 42 pages, 13 figures, 1 table

MSC Class: 62G05 (Primary); 62G20 (Secondary)

arXiv:2101.09809 [pdf, other]

NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy

Authors: Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva, Lin Lin, Vernon M. Chinchilli

Abstract: Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-lev… ▽ More Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-level covariates. We propose NeurT-FDR which boosts statistical power and controls FDR for multiple hypothesis testing while leveraging the hierarchy among test-level covariates. Our method parametrizes the test-level covariates as a neural network and adjusts the feature hierarchy through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets compared to competitive baselines. △ Less

Submitted 24 January, 2021; originally announced January 2021.

arXiv:2101.08639 [pdf, other]

A General Framework of Online Updating Variable Selection for Generalized Linear Models with Streaming Datasets

Authors: Xiaoyu Ma, Lu Lin, Yujie Gai

Abstract: In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and parameter estimation in generalized linear models with streaming datasets. This is a type of online updating penalty likelihoods with differentiable or n… ▽ More In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and parameter estimation in generalized linear models with streaming datasets. This is a type of online updating penalty likelihoods with differentiable or non-differentiable penalty function. The online updating coordinate descent algorithm is proposed to solve the online updating optimization problem. Moreover, a tuning parameter selection is suggested in an online updating way. The selection and estimation consistencies, and the oracle property are established, theoretically. Our methods are further examined and illustrated by various numerical examples from both simulation experiments and a real data analysis. △ Less

Submitted 21 January, 2021; originally announced January 2021.

Comments: 35 pages, 2 figures, 13 tables

arXiv:2011.12416 [pdf, other]

A spectral-based framework for hypothesis testing in populations of networks

Authors: Li Chen, Nathaniel Josephs, Lizhen Lin, Jie Zhou, Eric D. Kolaczyk

Abstract: In this paper, we propose a new spectral-based approach to hypothesis testing for populations of networks. The primary goal is to develop a test to determine whether two given samples of networks come from the same random model or distribution. Our test statistic is based on the trace of the third order for a centered and scaled adjacency matrix, which we prove converges to the standard normal dis… ▽ More In this paper, we propose a new spectral-based approach to hypothesis testing for populations of networks. The primary goal is to develop a test to determine whether two given samples of networks come from the same random model or distribution. Our test statistic is based on the trace of the third order for a centered and scaled adjacency matrix, which we prove converges to the standard normal distribution as the number of nodes tends to infinity. The asymptotic power guarantee of the test is also provided. The proper interplay between the number of networks and the number of nodes for each network is explored in characterizing the theoretical properties of the proposed testing statistics. Our tests are applicable to both binary and weighted networks, operate under a very general framework where the networks are allowed to be large and sparse, and can be extended to multiple-sample testing. We provide an extensive simulation study to demonstrate the superior performance of our test over existing methods and apply our test to three real datasets. △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2011.09815 [pdf]

A sco** review of causal methods enabling predictions under hypothetical interventions

Authors: Li**g Lin, Matthew Sperrin, David A. Jenkins, Glen P. Martin, Niels Peek

Abstract: Background and Aims: The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. We aimed to identify published methods for develo** and validating prediction mo… ▽ More Background and Aims: The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. We aimed to identify published methods for develo** and validating prediction models that enable risk estimation of outcomes under hypothetical interventions, utilizing causal inference: their main methodological approaches, underlying assumptions, targeted estimands, and potential pitfalls and challenges with using the method, and unresolved methodological challenges. Methods: We systematically reviewed literature published by December 2019, considering papers in the health domain that used causal considerations to enable prediction models to be used for predictions under hypothetical interventions. Results: We identified 4919 papers through database searches and a further 115 papers through manual searches, of which 13 were selected for inclusion, from both the statistical and the machine learning literature. Most of the identified methods for causal inference from observational data were based on marginal structural models and g-estimation. Conclusions: There exist two broad methodological approaches for allowing prediction under hypothetical intervention into clinical prediction models: 1) enriching prediction models derived from observational studies with estimated causal effects from clinical trials and meta-analyses; and 2) estimating prediction models and causal effects directly from observational data. These methods require extending to dynamic treatment regimes, and consideration of multiple interventions to operationalise a clinical decision support system. Techniques for validating 'causal prediction models' are still in their infancy. △ Less

Submitted 12 January, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

Journal ref: Diagnostic and Prognostic Research, 2021

arXiv:2010.08908 [pdf, other]

Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds

Authors: Lizhen Lin, Bayan Saparbayeva, Michael Minyi Zhang, David B. Dunson

Abstract: We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we "convexify" the objective function and solve a series of convex sub-problems in the optimization procedure. One of the key challenges for optimization on manifolds is the difficu… ▽ More We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we "convexify" the objective function and solve a series of convex sub-problems in the optimization procedure. One of the key challenges for optimization on manifolds is the difficulty of verifying the complexity of the objective function, e.g., whether the objective function is convex or non-convex, and the degree of non-convexity. Our proposed algorithm adapts to the level of complexity in the objective function. We show that when the objective function is convex, the algorithm provably converges to the optimum and leads to accelerated convergence. When the objective function is non-convex, the algorithm will converge to a stationary point. Our proposed method unifies insights from Nesterov's original idea for accelerating gradient descent algorithms with recent developments in optimization algorithms in Euclidean space. We demonstrate the utility of our algorithms on several manifold optimization tasks such as estimating intrinsic and extrinsic Fréchet means on spheres and low-rank matrix factorization with Grassmann manifolds applied to the Netflix rating data set. △ Less

Submitted 17 October, 2020; originally announced October 2020.

arXiv:2010.05295 [pdf, other]

Efficient Long-Range Convolutions for Point Clouds

Authors: Yifan Peng, Lin Lin, Lexing Ying, Leonardo Zepeda-Núñez

Abstract: The efficient treatment of long-range interactions for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that direc… ▽ More The efficient treatment of long-range interactions for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that directly incorporates long-range information for a point cloud. This layer, dubbed the long-range convolutional (LRC)-layer, leverages the convolutional theorem coupled with the non-uniform Fourier transform. In a nutshell, the LRC-layer mollifies the point cloud to an adequately sized regular grid, computes its Fourier transform, multiplies the result by a set of trainable Fourier multipliers, computes the inverse Fourier transform, and finally interpolates the result back to the point cloud. The resulting global all-to-all convolution operation can be performed in nearly-linear time asymptotically with respect to the number of input points. The LRC-layer is a particularly powerful tool when combined with local convolution as together they offer efficient and seamless treatment of both short and long range interactions. We showcase this framework by introducing a neural network architecture that combines LRC-layers with short-range convolutional layers to accurately learn the energy and force associated with a $N$-body potential. We also exploit the induced two-level decomposition and propose an efficient strategy to train the combined architecture with a reduced number of samples. △ Less

Submitted 11 October, 2020; originally announced October 2020.

arXiv:2010.05170 [pdf, other]

What causes the test error? Going beyond bias-variance via ANOVA

Authors: Licong Lin, Edgar Dobriban

Abstract: Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level. This can seem puzzling; in the worst case, such models do not need to generalize. This puzzle inspired a great amount of work, arguing when overparametrization reduces test error, in a phenomenon called "double descent". Recent work aimed to understand in greater depth why overparametrizati… ▽ More Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level. This can seem puzzling; in the worst case, such models do not need to generalize. This puzzle inspired a great amount of work, arguing when overparametrization reduces test error, in a phenomenon called "double descent". Recent work aimed to understand in greater depth why overparametrization is helpful for generalization. This leads to discovering the unimodality of variance as a function of the level of parametrization, and to decomposing the variance into that arising from label noise, initialization, and randomness in the training data to understand the sources of the error. In this work we develop a deeper understanding of this area. Specifically, we propose using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way, for studying the generalization performance of certain two-layer linear and non-linear networks. The advantage of the analysis of variance is that it reveals the effects of initialization, label noise, and training data more clearly than prior approaches. Moreover, we also study the monotonicity and unimodality of the variance components. While prior work studied the unimodality of the overall variance, we study the properties of each term in variance decomposition. One key insight is that in typical settings, the interaction between training samples and initialization can dominate the variance; surprisingly being larger than their marginal effect. Also, we characterize "phase transitions" where the variance changes from unimodal to monotone. On a technical level, we leverage advanced deterministic equivalent techniques for Haar random matrices, that -- to our knowledge -- have not yet been used in the area. We also verify our results in numerical simulations and on empirical data examples. △ Less

Submitted 9 June, 2021; v1 submitted 11 October, 2020; originally announced October 2020.

arXiv:2010.00435 [pdf, other]

Community detection, pattern recognition, and hypergraph-based learning: approaches using metric geometry and persistent homology

Authors: Dong Quan Ngoc Nguyen, Lin Xing, Lizhen Lin

Abstract: Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structu… ▽ More Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structure. Using this new topological space structure of hypergraph data, we propose several approaches to study community detection problem, detecting persistent features arising from homological structure of hypergraph data. Also based on the topological space structure of hypergraph data introduced in our paper, we introduce a modified nearest neighbors methods which is a generalization of the classical nearest neighbors methods from machine learning. Our modified nearest neighbors methods have an advantage of being very flexible and applicable even for discrete structures as in hypergraphs. We then apply our modified nearest neighbors methods to study sign prediction problem in hypegraph data constructed using our method. △ Less

Submitted 29 September, 2020; originally announced October 2020.

Showing 1–50 of 125 results for author: Lin, L