Search | arXiv e-print repository

Variational Nonparametric Inference in Functional Stochastic Block Model

Authors: Zuofeng Shang, Peijun Sang, Yang Feng, Chong **

Abstract: We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual… ▽ More We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual or quarterly GDP over certain time period, and MyFitnessPal data in which a network vertex (MyFitnessPal user) is associated with daily calorie information measured over certain time period. Two statistical tasks will be jointly executed. First, we will detect community structures of the network vertices assisted by the functional nodal information. Second, we propose computationally efficient variational test to examine the significance of the functional nodal information. We show that the community detection algorithms achieve weak and strong consistency, and the variational test is asymptotically chi-square with diverging degrees of freedom. As a byproduct, we propose pointwise confidence intervals for the slop function of the functional nodal information. Our methods are examined through both simulated and real datasets. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2312.07727 [pdf, other]

Two-sample inference for sparse functional data

Authors: Chi Zhang, Peijun Sang, Yingli Qin

Abstract: We propose a novel test procedure for comparing mean functions across two groups within the reproducing kernel Hilbert space (RKHS) framework. Our proposed method is adept at handling sparsely and irregularly sampled functional data when observation times are random for each subject. Conventional approaches, which are built upon functional principal components analysis, usually assume a homogeneou… ▽ More We propose a novel test procedure for comparing mean functions across two groups within the reproducing kernel Hilbert space (RKHS) framework. Our proposed method is adept at handling sparsely and irregularly sampled functional data when observation times are random for each subject. Conventional approaches, which are built upon functional principal components analysis, usually assume a homogeneous covariance structure across groups. Nonetheless, justifying this assumption in real-world scenarios can be challenging. To eliminate the need for a homogeneous covariance structure, we first develop the functional Bahadur representation for the mean estimator under the RKHS framework; this representation naturally leads to the desirable pointwise limiting distributions. Moreover, we establish weak convergence for the mean estimator, allowing us to construct a test statistic for the mean difference. Our method is easily implementable and outperforms some conventional tests in controlling type I errors across various settings. We demonstrate the finite sample performance of our approach through extensive simulations and two real-world applications. △ Less

Submitted 29 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.10952 [pdf, other]

Restricted Tweedie Stochastic Block Models

Authors: Jie Jian, Mu Zhu, Peijun Sang

Abstract: The stochastic block model (SBM) is a widely used framework for community detection in networks, where the network structure is typically represented by an adjacency matrix. However, conventional SBMs are not directly applicable to an adjacency matrix that consists of non-negative zero-inflated continuous edge weights. To model the international trading network, where edge weights represent tradin… ▽ More The stochastic block model (SBM) is a widely used framework for community detection in networks, where the network structure is typically represented by an adjacency matrix. However, conventional SBMs are not directly applicable to an adjacency matrix that consists of non-negative zero-inflated continuous edge weights. To model the international trading network, where edge weights represent trading values between countries, we propose an innovative SBM based on a restricted Tweedie distribution. Additionally, we incorporate nodal information, such as the geographical distance between countries, and account for its dynamic effect on edge weights. Notably, we show that given a sufficiently large number of nodes, estimating this covariate effect becomes independent of community labels of each node when computing the maximum likelihood estimator of parameters in our model. This result enables the development of an efficient two-step algorithm that separates the estimation of covariate effects from other parameters. We demonstrate the effectiveness of our proposed method through extensive simulation studies and an application to real-world international trading data. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2304.02127 [pdf, other]

A Bayesian Collocation Integral Method for Parameter Estimation in Ordinary Differential Equations

Authors: Mingwei Xu, Samuel W. K. Wong, Peijun Sang

Abstract: Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these met… ▽ More Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these methods can be hindered by estimation error, especially when only sparse time-course observations are available. We present a Bayesian collocation framework that operates on the integrated form of the ODEs and also avoids the expensive use of numerical solvers. Our methodology has the capability to handle general nonlinear ODE systems. We demonstrate the accuracy of the proposed method through simulation studies, where the estimated parameters and recovered system trajectories are compared with other recent methods. A real data example is also provided. △ Less

Submitted 23 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2302.02457

Scalable inference in functional linear regression with streaming data

Authors: **han Xie, Enze Shi, Peijun Sang, Zuofeng Shang, Bei Jiang, Linglong Kong

Abstract: Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on… ▽ More Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on the batch learning setting. In this paper, we tackle these issues by develo** functional stochastic gradient descent algorithms and proposing an online bootstrap resampling procedure to systematically study the inference problem for functional linear regression. In particular, the proposed estimation and inference procedures use only one pass over the data; thus they are easy to implement and suitable to the situation where data arrive in a streaming manner. Furthermore, we establish the convergence rate as well as the asymptotic distribution of the proposed estimator. Meanwhile, the proposed perturbed estimator from the bootstrap procedure is shown to enjoy the same theoretical properties, which provide the theoretical justification for our online inference tool. As far as we know, this is the first inference result on the functional linear regression model with streaming data. Simulation studies are conducted to investigate the finite-sample performance of the proposed procedure. An application is illustrated with the Bei**g multi-site air-quality data. △ Less

Submitted 10 October, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Due to the request of one of the co-authors, we tentatively withdrew the manuscript

arXiv:2207.08211 [pdf, other]

Nonlinear function-on-function regression by RKHS

Authors: Peijun Sang, Bing Li

Abstract: We propose a nonlinear function-on-function regression model where both the covariate and the response are random functions. The nonlinear regression is carried out in two steps: we first construct Hilbert spaces to accommodate the functional covariate and the functional response, and then build a second-layer Hilbert space for the covariate to capture nonlinearity. The second-layer space is assum… ▽ More We propose a nonlinear function-on-function regression model where both the covariate and the response are random functions. The nonlinear regression is carried out in two steps: we first construct Hilbert spaces to accommodate the functional covariate and the functional response, and then build a second-layer Hilbert space for the covariate to capture nonlinearity. The second-layer space is assumed to be a reproducing kernel Hilbert space, which is generated by a positive definite kernel determined by the inner product of the first-layer Hilbert space for $X$--this structure is known as the nested Hilbert spaces. We develop estimation procedures to implement the proposed method, which allows the functional data to be observed at different time points for different subjects. Furthermore, we establish the convergence rate of our estimator as well as the weak convergence of the predicted response in the Hilbert space. Numerical studies including both simulations and a data application are conducted to investigate the performance of our estimator in finite sample. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2204.13197 [pdf, other]

Stop** time detection of wood panel compression: A functional time series approach

Authors: H. L. Shang, J. Cao, P. Sang

Abstract: We consider determining the optimal stop** time for the glue curing of wood panels in an automatic process environment. Using the near-infrared spectroscopy technology to monitor the manufacturing process ensures substantial savings in energy and time. We collect a time series of curves from a near-infrared spectrum probe consisting of 72 spectra and aim to detect an optimal stop** time. We pr… ▽ More We consider determining the optimal stop** time for the glue curing of wood panels in an automatic process environment. Using the near-infrared spectroscopy technology to monitor the manufacturing process ensures substantial savings in energy and time. We collect a time series of curves from a near-infrared spectrum probe consisting of 72 spectra and aim to detect an optimal stop** time. We propose an estimation procedure to determine the optimal stop** time of wood panel compression and the estimation uncertainty associated with the estimated stop** time. Our method first divides the entire data set into a training sample and a testing sample, then iteratively computes integrated squared forecast errors based on the testing sample. We then apply a structural break detection method with one breakpoint to determine an estimated optimal stop** time from a univariate time series of the integrated squared forecast errors. We also investigate the finite-sample performance of the proposed method via a series of simulation studies. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: 24 pages, 5 figures, to appear at the Journal of the Royal Statistical Society: Series C

MSC Class: 62R10

arXiv:2203.14760 [pdf, other]

Functional principal component analysis for longitudinal observations with sampling at random

Authors: Peijun Sang, Dehan Kong, Shu Yang

Abstract: Functional principal component analysis has been shown to be invaluable for revealing variation modes of longitudinal outcomes, which serves as important building blocks for forecasting and model building. Decades of research have advanced methods for functional principal component analysis often assuming independence between the observation times and longitudinal outcomes. Yet such assumptions ar… ▽ More Functional principal component analysis has been shown to be invaluable for revealing variation modes of longitudinal outcomes, which serves as important building blocks for forecasting and model building. Decades of research have advanced methods for functional principal component analysis often assuming independence between the observation times and longitudinal outcomes. Yet such assumptions are fragile in real-world settings where observation times may be driven by outcome-related reasons. Rather than ignoring the informative observation time process, we explicitly model the observational times by a counting process dependent on time-varying prognostic factors. Identification of the mean, covariance function, and functional principal components ensues via inverse intensity weighting. We propose using weighted penalized splines for estimation and establish consistency and convergence rates for the weighted estimators. Simulation studies demonstrate that the proposed estimators are substantially more accurate than the existing ones in the presence of a correlation between the observation time process and the longitudinal outcome process. We further examine the finite-sample performance of the proposed method using the Acute Infection and Early Disease Research Program study. △ Less

Submitted 28 March, 2022; originally announced March 2022.

arXiv:2202.11747 [pdf, other]

Statistical Inference for Functional Linear Quantile Regression

Authors: Peijun Sang, Zuofeng Shang, Pang Du

Abstract: We propose inferential tools for functional linear quantile regression where the conditional quantile of a scalar response is assumed to be a linear functional of a functional covariate. In contrast to conventional approaches, we employ kernel convolution to smooth the original loss function. The coefficient function is estimated under a reproducing kernel Hilbert space framework. A gradient desce… ▽ More We propose inferential tools for functional linear quantile regression where the conditional quantile of a scalar response is assumed to be a linear functional of a functional covariate. In contrast to conventional approaches, we employ kernel convolution to smooth the original loss function. The coefficient function is estimated under a reproducing kernel Hilbert space framework. A gradient descent algorithm is designed to minimize the smoothed loss function with a roughness penalty. With the aid of the Banach fixed-point theorem, we show the existence and uniqueness of our proposed estimator as the minimizer of the regularized loss function in an appropriate Hilbert space. Furthermore, we establish the convergence rate as well as the weak convergence of our estimator. As far as we know, this is the first weak convergence result for a functional quantile regression model. Pointwise confidence intervals and a simultaneous confidence band for the true coefficient function are then developed based on these theoretical properties. Numerical studies including both simulations and a data application are conducted to investigate the performance of our estimator and inference tools in finite sample. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2202.07099 [pdf, other]

Two Gaussian regularization methods for time-varying networks

Authors: Jie Jian, Peijun Sang, Mu Zhu

Abstract: We model time-varying network data as realizations from multivariate Gaussian distributions with precision matrices that change over time. To facilitate parameter estimation, we require not only that each precision matrix at any given time point be sparse, but also that precision matrices at neighboring time points be similar. We accomplish this with two different algorithms, by generalizing the e… ▽ More We model time-varying network data as realizations from multivariate Gaussian distributions with precision matrices that change over time. To facilitate parameter estimation, we require not only that each precision matrix at any given time point be sparse, but also that precision matrices at neighboring time points be similar. We accomplish this with two different algorithms, by generalizing the elastic net and the fused LASSO, respectively. Our main focuses are efficient computational algorithms and convenient degree-of-freedom formulae for choosing tuning parameters. We illustrate our methods with two simulation studies. By applying them to an fMRI data set, we also detect some interesting differences in brain connectivity between healthy individuals and ADHD patients. △ Less

Submitted 9 March, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

arXiv:2103.04504 [pdf, other]

A reproducing kernel Hilbert space framework for functional data classification

Authors: Peijun Sang, Adam B Kashlak, Linglong Kong

Abstract: We encounter a bottleneck when we try to borrow the strength of classical classifiers to classify functional data. The major issue is that functional data are intrinsically infinite dimensional, thus classical classifiers cannot be applied directly or have poor performance due to the curse of dimensionality. To address this concern, we propose to project functional data onto one specific direction… ▽ More We encounter a bottleneck when we try to borrow the strength of classical classifiers to classify functional data. The major issue is that functional data are intrinsically infinite dimensional, thus classical classifiers cannot be applied directly or have poor performance due to the curse of dimensionality. To address this concern, we propose to project functional data onto one specific direction, and then a distance-weighted discrimination DWD classifier is built upon the projection score. The projection direction is identified through minimizing an empirical risk function that contains the particular loss function in a DWD classifier, over a reproducing kernel Hilbert space. Hence our proposed classifier can avoid overfitting and enjoy appealing properties of DWD classifiers. This framework is further extended to accommodate functional data classification problems where scalar covariates are involved. In contrast to previous work, we establish a non-asymptotic estimation error bound on the relative misclassification rate. In finite sample case, we demonstrate that the proposed classifiers compare favorably with some commonly used functional classifiers in terms of prediction accuracy through simulation studies and a real-world application. △ Less

Submitted 7 March, 2021; originally announced March 2021.

arXiv:2102.06130 [pdf, other]

Continuum centroid classifier for functional data

Authors: Zhiyang Zhou, Peijun Sang

Abstract: Aiming at the binary classification of functional data, we propose the continuum centroid classifier (CCC) built upon projections of functional data onto one specific direction. This direction is obtained via bridging the regression and classification. Controlling the extent of supervision, our technique is neither unsupervised nor fully supervised. Thanks to the intrinsic infinite dimension of fu… ▽ More Aiming at the binary classification of functional data, we propose the continuum centroid classifier (CCC) built upon projections of functional data onto one specific direction. This direction is obtained via bridging the regression and classification. Controlling the extent of supervision, our technique is neither unsupervised nor fully supervised. Thanks to the intrinsic infinite dimension of functional data, one of two subtypes of CCC enjoys the (asymptotic) zero misclassification rate. Our proposal includes an effective algorithm that yields a consistent empirical counterpart of CCC. Simulation studies demonstrate the performance of CCC in different scenarios. Finally, we apply CCC to two real examples. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: 38 pages, 4 figures, 2 tables

MSC Class: 62G08; 62H30

Showing 1–12 of 12 results for author: Sang, P