Skip to main content

Showing 1–33 of 33 results for author: Sha, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.07451  [pdf, other

    stat.CO

    SNSeg: An R Package for Time Series Segmentation via Self-Normalization

    Authors: Shubo Sun, Zifeng Zhao, Feiyu Jiang, Xiaofeng Shao

    Abstract: Time series segmentation aims to identify potential change-points in a sequence of temporally dependent data, so that the original sequence can be partitioned into several homogeneous subsequences. It is useful for modeling and predicting non-stationary time series and is widely applied in natural and social sciences. Existing segmentation methods primarily focus on only one type of parameter chan… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  2. arXiv:2311.11054  [pdf, other

    stat.ME

    Modern extreme value statistics for Utopian extremes

    Authors: Jordan Richards, Noura Alotaibi, Daniela Cisneros, Yan Gong, Matheus B. Guerrero, Paolo Redondo, Xuanjie Shao

    Abstract: Capturing the extremal behaviour of data often requires bespoke marginal and dependence models which are grounded in rigorous asymptotic theory, and hence provide reliable extrapolation into the upper tails of the data-generating distribution. We present a toolbox of four methodological frameworks, motivated by modern extreme value theory, that can be used to accurately estimate extreme exceedance… ▽ More

    Submitted 1 May, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

  3. arXiv:2311.09419  [pdf, other

    stat.ME

    Change-point Inference for High-dimensional Heteroscedastic Data

    Authors: Teng Wu, Stanislav Volgushev, Xiaofeng Shao

    Abstract: We propose a bootstrap-based test to detect a mean shift in a sequence of high-dimensional observations with unknown time-varying heteroscedasticity. The proposed test builds on the U-statistic based approach in Wang et al. (2022), targets a dense alternative, and adopts a wild bootstrap procedure to generate critical values. The bootstrap-based test is free of tuning parameters and is capable of… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted by Electronic Journal of Statistics

    MSC Class: 62F40; 62H15

  4. arXiv:2307.04318  [pdf, other

    stat.ME

    Two-Sample and Change-Point Inference for Non-Euclidean Valued Time Series

    Authors: Feiyu Jiang, Changbo Zhu, Xiaofeng Shao

    Abstract: Data objects taking value in a general metric space have become increasingly common in modern data analysis. In this paper, we study two important statistical inference problems, namely, two-sample testing and change-point detection, for such non-Euclidean data under temporal dependence. Typical examples of non-Euclidean valued time series include yearly mortality distributions, time-varying netwo… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  5. Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction

    Authors: Qing Mai, Xiaofeng Shao, Runmin Wang, Xin Zhang

    Abstract: Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most recognized method in sufficient dimension reduction. While promising progress has been made in theory and methods of high-dimensional SIR, two remaining challenges are still nagging high-dimensional multivariate applications. First, choosing the number of slices in SIR is a difficult problem, and it depends on the sample si… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  6. arXiv:2303.10808  [pdf, other

    stat.ME

    Dimension-agnostic Change Point Detection

    Authors: Hanjia Gao, Runmin Wang, Xiaofeng Shao

    Abstract: Change point testing for high-dimensional data has attracted a lot of attention in statistics and machine learning owing to the emergence of high-dimensional data with structural breaks from many fields. In practice, when the dimension is less than the sample size but is not small, it is often unclear whether a method that is tailored to high-dimensional data or simply a classical method that is d… ▽ More

    Submitted 3 December, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

  7. arXiv:2303.08197  [pdf, ps, other

    math.ST stat.ME

    Adaptive Testing for High-dimensional Data

    Authors: Yangfan Zhang, Runmin Wang, Xiaofeng Shao

    Abstract: In this article, we propose a class of $L_q$-norm based U-statistics for a family of global testing problems related to high-dimensional data. This includes testing of mean vector and its spatial sign, simultaneous testing of linear model coefficients, and testing of component-wise independence for high-dimensional observations, among others. Under the null hypothesis, we derive asymptotic normali… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  8. arXiv:2302.12322  [pdf, other

    stat.ME

    Testing Serial Independence of Object-Valued Time Series

    Authors: Feiyu Jiang, Hanjia Gao, Xiaofeng Shao

    Abstract: We propose a novel method for testing serial independence of object-valued time series in metric spaces, which is more general than Euclidean or Hilbert spaces. The proposed method is fully nonparametric, free of tuning parameters, and can capture all nonlinear pairwise dependence. The key concept used in this paper is the distance covariance in metric spaces, which is extended to auto distance co… ▽ More

    Submitted 27 July, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

  9. arXiv:2212.13686  [pdf, other

    math.ST stat.ME

    Statistical inference for high-dimensional spectral density matrix

    Authors: **yuan Chang, Qing Jiang, Tucker S. McElroy, Xiaofeng Shao

    Abstract: The spectral density matrix is a fundamental object of interest in time series analysis, and it encodes both contemporary and dynamic linear relationships between component processes of the multivariate system. In this paper we develop novel inference procedures for the spectral density matrix in the high-dimensional setting. Specifically, we introduce a new global testing procedure to test the nu… ▽ More

    Submitted 25 February, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

  10. Dynamics of Fecal Coliform Bacteria along Canada's Coast

    Authors: Shuai You, Xiaolin Huang, Li Xing, Mary Lesperance, Charles LeBlanc, Paul Moccia, Vincent Mercier, Xiaojian Shao, Youlian Pan, Xuekui Zhang

    Abstract: The vast coastline provides Canada with a flourishing seafood industry including bivalve shellfish production. To sustain a healthy bivalve molluscan shellfish production, the Canadian Shellfish Sanitation Program was established to monitor the health of shellfish harvesting habitats, and fecal coliform bacteria data have been collected at nearly 15,000 marine sample sites across six coastal provi… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: 30 pages, 7 figures, 0 table

    Journal ref: You et. al., Dynamics of fecal coliform bacteria along Canada's coast, Marine Pollution Bulletin, Volume 189, 2023, 114712

  11. arXiv:2210.05792  [pdf, other

    stat.ME

    Flexible Modeling of Nonstationary Extremal Dependence using Spatially-Fused LASSO and Ridge Penalties

    Authors: Xuanjie Shao, Arnab Hazra, Jordan Richards, Raphaƫl Huser

    Abstract: Statistical modeling of a nonstationary spatial extremal dependence structure is challenging. Max-stable processes are common choices for modeling spatially-indexed block maxima, where an assumption of stationarity is usual to make inference feasible. However, this assumption is often unrealistic for data observed over a large or complex domain. We propose a computationally-efficient method for es… ▽ More

    Submitted 30 April, 2024; v1 submitted 11 October, 2022; originally announced October 2022.

  12. Testing the martingale difference hypothesis in high dimension

    Authors: **yuan Chang, Qing Jiang, Xiaofeng Shao

    Abstract: In this paper, we consider testing the martingale difference hypothesis for high-dimensional time series. Our test is built on the sum of squares of the element-wise max-norm of the proposed matrix-valued nonlinear dependence measure at different lags. To conduct the inference, we approximate the null distribution of our test statistic by Gaussian approximation and provide a simulation-based appro… ▽ More

    Submitted 30 September, 2022; v1 submitted 10 September, 2022; originally announced September 2022.

    Journal ref: Journal of Econometrics 2023, Vol. 235, pp. 972-1000

  13. arXiv:2206.02738  [pdf, ps, other

    stat.ME

    Robust Inference for Change Points in High Dimension

    Authors: Feiyu Jiang, Runmin Wang, Xiaofeng Shao

    Abstract: This paper proposes a new test for a change point in the mean of high-dimensional data based on the spatial sign and self-normalization. The test is easy to implement with no tuning parameters, robust to heavy-tailedness and theoretically justified with both fixed-$n$ and sequential asymptotics under both null and alternatives, where $n$ is the sample size. We demonstrate that the fixed-$n$ asympt… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  14. arXiv:2202.09008  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    On Variance Estimation of Random Forests with Infinite-Order U-statistics

    Authors: Tianning Xu, Ruoqing Zhu, Xiaofeng Shao

    Abstract: Infinite-order U-statistics (IOUS) has been used extensively on subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation approaches and theoretical properties remain mostly unexplored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decom… ▽ More

    Submitted 14 February, 2023; v1 submitted 17 February, 2022; originally announced February 2022.

  15. arXiv:2112.05331  [pdf, ps, other

    stat.ME math.ST

    Segmenting Time Series via Self-Normalization

    Authors: Zifeng Zhao, Feiyu Jiang, Xiaofeng Shao

    Abstract: We propose a novel and unified framework for change-point estimation in multivariate time series. The proposed method is fully nonparametric, enjoys effortless tuning and is robust to temporal dependence. One salient and distinct feature of the proposed method is its versatility, where it allows change-point detection for a broad class of parameters (such as mean, variance, correlation and quantil… ▽ More

    Submitted 8 September, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  16. arXiv:2101.12357  [pdf, other

    stat.ME

    Adaptive Inference for Change Points in High-Dimensional Data

    Authors: Yangfan Zhang, Runmin Wang, Xiaofeng Shao

    Abstract: In this article, we propose a class of test statistics for a change point in the mean of high-dimensional independent data. Our test integrates the U-statistic based approach in a recent work by \cite{hdcp} and the $L_q$-norm based high-dimensional test in \cite{he2018}, and inherits several appealing features such as being tuning parameter free and asymptotic independence for test statistics corr… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

  17. arXiv:2101.06839  [pdf, other

    stat.ME

    Adaptive Change Point Monitoring for High-Dimensional Data

    Authors: Teng Wu, Runmin Wang, Hao Yan, Xiaofeng Shao

    Abstract: In this paper, we propose a class of monitoring statistics for a mean shift in a sequence of high-dimensional observations. Inspired by the recent U-statistic based retrospective tests developed by Wang et al.(2019) and Zhang et al.(2020), we advance the U-statistic based approach to the sequential monitoring problem by develo** a new adaptive monitoring procedure that can detect both dense and… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

  18. arXiv:2007.04553  [pdf, ps, other

    econ.EM physics.soc-ph stat.AP

    Time Series Analysis of COVID-19 Infection Curve: A Change-Point Perspective

    Authors: Feiyu Jiang, Zifeng Zhao, Xiaofeng Shao

    Abstract: In this paper, we model the trajectory of the cumulative confirmed cases and deaths of COVID-19 (in log scale) via a piecewise linear trend model. The model naturally captures the phase transitions of the epidemic growth rate via change-points and further enjoys great interpretability due to its semiparametric nature. On the methodological front, we advance the nascent self-normalization (SN) tech… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

  19. arXiv:2003.00433  [pdf, other

    cs.LG cs.MA math.OC stat.ML

    Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

    Authors: Xingyu Sha, Jiaqi Zhang, Keyou You, Kaiqing Zhang, Tamer Başar

    Abstract: This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme wher… ▽ More

    Submitted 22 January, 2021; v1 submitted 1 March, 2020; originally announced March 2020.

  20. arXiv:2002.04115  [pdf, other

    stat.ME math.ST

    Dating the Break in High-dimensional Data

    Authors: Runmin Wang, Xiaofeng Shao

    Abstract: This paper is concerned with estimation and inference for the location of a change point in the mean of independent high-dimensional data. Our change point location estimator maximizes a new U-statistic based objective function, and its convergence rate and asymptotic distribution after suitable centering and normalization are obtained under mild assumptions. Our estimator turns out to have better… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

  21. arXiv:2001.05371  [pdf, other

    cs.LG cs.AI stat.ML

    Making deep neural networks right for the right scientific reasons by interacting with their explanations

    Authors: Patrick Schramowski, Wolfgang Stammer, Stefano Teso, Anna Brugger, Xiaoting Shao, Hans-Georg Luigs, Anne-Katrin Mahlein, Kristian Kersting

    Abstract: Deep neural networks have shown excellent performances in many real-world applications. Unfortunately, they may show "Clever Hans"-like behavior -- making use of confounding factors within datasets -- to achieve high performance. In this work, we introduce the novel learning setting of "explanatory interactive learning" (XIL) and illustrate its benefits on a plant phenoty** research task. XIL ad… ▽ More

    Submitted 5 March, 2024; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: arXiv admin note: text overlap with arXiv:1805.08578

  22. arXiv:1905.08550  [pdf, other

    cs.LG stat.ML

    Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures

    Authors: Xiaoting Shao, Alejandro Molina, Antonio Vergari, Karl Stelzner, Robert Peharz, Thomas Liebig, Kristian Kersting

    Abstract: Probabilistic graphical models are a central tool in AI; however, they are generally not as expressive as deep neural models, and inference is notoriously hard and slow. In contrast, deep probabilistic models such as sum-product networks (SPNs) capture joint distributions in a tractable fashion, but still lack the expressive power of intractable models based on deep neural networks. Therefore, we… ▽ More

    Submitted 29 September, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: 13 pages, 6 figures

  23. arXiv:1905.08446  [pdf, other

    math.ST stat.ME

    Inference for Change Points in High Dimensional Data via Self-Normalization

    Authors: Runmin Wang, Changbo Zhu, Stanislav Volgushev, Xiaofeng Shao

    Abstract: This article considers change point testing and estimation for a sequence of high-dimensional data. In the case of testing for a mean shift for high-dimensional independent data, we propose a new test which is based on $U$-statistic in Chen and Qin (2010) and utilizes the self-normalization principle [Shao (2010), Shao and Zhang (2010)]. Our test targets dense alternatives in the high-dimensional… ▽ More

    Submitted 8 August, 2021; v1 submitted 21 May, 2019; originally announced May 2019.

  24. arXiv:1903.06422  [pdf, ps, other

    stat.AP cs.DL

    The $CI$-index: a new index to characterize the scientific output of researchers

    Authors: Xuehua Yin, Xiuyan Sha, Chuancun Yin

    Abstract: We propose a simple new index, named the $CI$-index, based on the Choquet integral to characterize the scientific output of researchers. This index is an improvement of the $A$-index and $R$-index and has a notable feature that highly cited papers have highly weights and lowly cited papers have lowly weights. In applications many researchers may have the same $h$-index, $g$-index or $R$-index. The… ▽ More

    Submitted 15 May, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: 13 pages

  25. arXiv:1902.07279  [pdf, ps, other

    stat.ME

    Interpoint Distance Based Two Sample Tests in High Dimension

    Authors: Changbo Zhu, Xiaofeng Shao

    Abstract: In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensiona… ▽ More

    Submitted 10 April, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

  26. arXiv:1902.03291  [pdf, other

    math.ST stat.ME

    Distance-based and RKHS-based Dependence Metrics in High Dimension

    Authors: Changbo Zhu, Shun Yao, Xianyang Zhang, Xiaofeng Shao

    Abstract: In this paper, we study distance covariance, Hilbert-Schmidt covariance (aka Hilbert-Schmidt independence criterion [Gretton et al. (2008)]) and related independence tests under the high dimensional scenario. We show that the sample distance/Hilbert-Schmidt covariance between two random vectors can be approximated by the sum of squared componentwise sample cross-covariances up to an asymptotically… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  27. arXiv:1809.10862  [pdf

    cs.LG cs.CV stat.ML

    Semantic Segmentation for Urban Planning Maps based on U-Net

    Authors: Zhiling Guo, Hiroaki Shengoku, Guangming Wu, Qi Chen, Wei Yuan, Xiaodan Shi, Xiaowei Shao, Yongwei Xu, Ryosuke Shibasaki

    Abstract: The automatic digitizing of paper maps is a significant and challenging task for both academia and industry. As an important procedure of map digitizing, the semantic segmentation section mainly relies on manual visual interpretation with low efficiency. In this study, we select urban planning maps as a representative sample and investigate the feasibility of utilizing U-shape fully convolutional… ▽ More

    Submitted 30 September, 2018; v1 submitted 28 September, 2018; originally announced September 2018.

    Comments: 4 pages, 3 figures, conference, International Geoscience and Remote Sensing Symposium (IGARSS 2018), Jul 2018, Valencia, Spain

  28. arXiv:1609.09380  [pdf, other

    stat.ME

    Testing mutual independence in high dimension via distance covariance

    Authors: Shun Yao, Xianyang Zhang, Xiaofeng Shao

    Abstract: In this paper, we introduce a ${\mathcal L}_2$ type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed based on the pairwise distance covariance and it accounts for the non-linear and non-monotone dependences among the data, which cannot be fully captured by the existing tests based on either Pearson correlation or rank correlati… ▽ More

    Submitted 18 September, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

    Comments: 30 pages, 2 figures

  29. arXiv:1508.01126  [pdf, ps, other

    stat.ME stat.CO

    A subsampled double bootstrap for massive data

    Authors: Srijan Sengupta, Stanislav Volgushev, Xiaofeng Shao

    Abstract: The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (… ▽ More

    Submitted 5 August, 2015; originally announced August 2015.

  30. arXiv:1401.5002  [pdf, ps, other

    stat.ME

    On the Coverage Bound Problem of Empirical Likelihood Methods For Time Series

    Authors: Xianyang Zhang, Xiaofeng Shao

    Abstract: The upper bounds on the coverage probabilities of the confidence regions based on blockwise empirical likelihood [Kitamura (1997)] and nonstandard expansive empirical likelihood [Nordman et al. (2013)] methods for time series data are investigated via studying the probability for the violation of the convex hull constraint. The large sample bounds are derived on the basis of the pivotal limit of t… ▽ More

    Submitted 31 July, 2014; v1 submitted 20 January, 2014; originally announced January 2014.

  31. arXiv:1005.2137  [pdf, ps, other

    stat.ME math.ST

    A self-normalized approach to confidence interval construction in time series

    Authors: Xiaofeng Shao

    Abstract: We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of choosing any user-chosen number or smoothing parameter. The interval is constructed on the basis… ▽ More

    Submitted 12 May, 2010; originally announced May 2010.

    Comments: 35 pages, 4 figures, 5 tables

  32. arXiv:0906.5179  [pdf, ps, other

    math.ST math.PR stat.ME

    Testing for white noise under unknown dependence and its applications to goodness-of-fit for time series models

    Authors: Xiaofeng Shao

    Abstract: Testing for white noise has been well studied in the literature of econometrics and statistics. For most of the proposed test statistics, such as the well-known Box-Pierce's test statistic with fixed lag truncation number, the asymptotic null distributions are obtained under independent and identically distributed assumptions and may not be valid for the dependent white noise. Due to recent popu… ▽ More

    Submitted 29 June, 2009; originally announced June 2009.

    Comments: 38 pages

    MSC Class: 62M10; 62M07

  33. arXiv:0903.3180  [pdf, ps, other

    stat.ME math.ST

    Nonstationarity-extended Whittle Estimation

    Authors: Xiaofeng Shao

    Abstract: For long memory time series models with uncorrelated but dependent errors, we establish the asymptotic normality of the Whittle estimator under mild conditions. Our framework includes the widely used FARIMA models with GARCH-type innovations. To cover nonstationary fractionally integrated processes, we extend the idea of Abadir, Distaso and Giraitis (2007, Journal of Econometrics 141, 1353-1384)… ▽ More

    Submitted 18 March, 2009; originally announced March 2009.

    Comments: 32 pages, 3 tables