Search | arXiv e-print repository

Hierarchical Semi-parametric Duration Models

Abstract: This research attempts to model the stochastic process of trades in a limit order book market as a marked point process. We propose a semi-parametric model for the conditional distribution given the past, attempting to capture the effect of the recent past in a nonparametric way and the effect of the more distant past using a parametric time series model. Our framework provides more flexibility th… ▽ More This research attempts to model the stochastic process of trades in a limit order book market as a marked point process. We propose a semi-parametric model for the conditional distribution given the past, attempting to capture the effect of the recent past in a nonparametric way and the effect of the more distant past using a parametric time series model. Our framework provides more flexibility than the most commonly used family of models, known as Autoregressive Conditional Duration (ACD), in terms of the shape of the density of durations and in the form of dependence across time. We also propose an online learning algorithm for intraday trends that vary from day to day. This allows us both to do prediction of future trade times and to incorporate the effects of additional explanatory variables. In this paper, we show that the framework works better than the ACD family both in the sense of prediction log-likelihood and according to various diagnostic tests using data from the New York Stock Exchange. In general, the framework can be used both to estimate the intensity of a point process, and to estimate a the joint density of a time series. △ Less

Submitted 4 March, 2014; originally announced March 2014.

arXiv:1212.0463 [pdf, other]

Nonparametric risk bounds for time-series forecasting

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarante… ▽ More We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools---a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification. △ Less

Submitted 10 September, 2016; v1 submitted 3 December, 2012; originally announced December 2012.

Comments: 34 pages, 3 figures

MSC Class: 62M20 (Primary) 91B84; 62G99 (Secondary)

Journal ref: Journal of Machine Learning Research. (2017). Vol 18. p. 1-40

arXiv:1111.3404 [pdf, ps, other]

Estimated VC dimension for risk bounds

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of… ▽ More Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of classifiers, because there were no bounds for the estimation error of the VC dimension itself. We rectify this omission, providing high probability concentration results for the proposed estimator and deriving corresponding generalization bounds. △ Less

Submitted 14 November, 2011; originally announced November 2011.

Comments: 11 pages

arXiv:1103.0942 [pdf, other]

Generalization error bounds for stationary autoregressive models

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: We derive generalization error bounds for stationary univariate autoregressive (AR) models. We show that imposing stationarity is enough to control the Gaussian complexity without further regularization. This lets us use structural risk minimization for model selection. We demonstrate our methods by predicting interest rate movements. We derive generalization error bounds for stationary univariate autoregressive (AR) models. We show that imposing stationarity is enough to control the Gaussian complexity without further regularization. This lets us use structural risk minimization for model selection. We demonstrate our methods by predicting interest rate movements. △ Less

Submitted 3 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

Comments: 10 pages, 3 figures. CMU Statistics Technical Report

arXiv:1103.0941 [pdf, ps, other]

Estimating $β$-mixing coefficients

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. △ Less

Submitted 4 March, 2011; originally announced March 2011.

Comments: 9 pages, accepted by AIStats. CMU Statistics Technical Report

Journal ref: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), pp. 516--524

Showing 1–5 of 5 results for author: Schervish, M