Search | arXiv e-print repository

On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

Authors: Jerry Yao-Chieh Hu, Weimin Wu, Zhuoru Li, Zhao Song, Han Liu

Abstract: We investigate the statistical and computational limits of latent \textbf{Di}ffusion \textbf{T}ransformers (\textbf{DiT}s) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score function, as well as the distribution recovery property of the initial data. Specifically, under mild data assumptions, we deri… ▽ More We investigate the statistical and computational limits of latent \textbf{Di}ffusion \textbf{T}ransformers (\textbf{DiT}s) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score function, as well as the distribution recovery property of the initial data. Specifically, under mild data assumptions, we derive an approximation error bound for the score network of latent DiTs, which is sub-linear in the latent space dimension. Additionally, we derive the corresponding sample complexity bound and show that the data distribution generated from the estimated score function converges toward a proximate area of the original one. Computationally, we characterize the hardness of both forward inference and backward computation of latent DiTs, assuming the Strong Exponential Time Hypothesis (SETH). For forward inference, we identify efficient criteria for all possible latent DiTs inference algorithms and showcase our theory by pushing the efficiency toward almost-linear time inference. For backward computation, we leverage the low-rank structure within the gradient computation of DiTs training for possible algorithmic speedup. Specifically, we show that such speedup achieves almost-linear time latent DiTs training by casting the DiTs gradient as a series of chained low-rank approximations with bounded error. Under the low-dimensional assumption, we show that the convergence rate and the computational efficiency are both dominated by the dimension of the subspace, suggesting that latent DiTs have the potential to bypass the challenges associated with the high dimensionality of initial data. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.14753 [pdf, other]

A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

Authors: Weiqin Chen, Mark S. Squillante, Chai Wah Wu, Santiago Paternain

Abstract: We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish theoretical properties of our approach and derive an algorithm based on a specific instance of this approach. Our empirical results demonstrate the significant benefits of our approach. We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish theoretical properties of our approach and derive an algorithm based on a specific instance of this approach. Our empirical results demonstrate the significant benefits of our approach. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.09194 [pdf, ps, other]

Benign overfitting in Fixed Dimension via Physics-Informed Learning with Smooth Inductive Bias

Authors: Honam Wong, Wendao Wu, Fanghui Liu, Yi** Lu

Abstract: Recent advances in machine learning have inspired a surge of research into reconstructing specific quantities of interest from measurements that comply with certain physical laws. These efforts focus on inverse problems that are governed by partial differential equations (PDEs). In this work, we develop an asymptotic Sobolev norm learning curve for kernel ridge(less) regression when addressing (el… ▽ More Recent advances in machine learning have inspired a surge of research into reconstructing specific quantities of interest from measurements that comply with certain physical laws. These efforts focus on inverse problems that are governed by partial differential equations (PDEs). In this work, we develop an asymptotic Sobolev norm learning curve for kernel ridge(less) regression when addressing (elliptical) linear inverse problems. Our results show that the PDE operators in the inverse problem can stabilize the variance and even behave benign overfitting for fixed-dimensional problems, exhibiting different behaviors from regression problems. Besides, our investigation also demonstrates the impact of various inductive biases introduced by minimizing different Sobolev norms as a form of implicit regularization. For the regularized least squares estimator, we find that all considered inductive biases can achieve the optimal convergence rate, provided the regularization parameter is appropriately chosen. The convergence rate is actually independent to the choice of (smooth enough) inductive bias for both ridge and ridgeless regression. Surprisingly, our smoothness requirement recovered the condition found in Bayesian setting and extend the conclusion to the minimum norm interpolation estimators. △ Less

Submitted 16 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.17221 [pdf, other]

Are Made and Missed Different? An analysis of Field Goal Attempts of Professional Basketball Players via Depth Based Testing Procedure

Authors: Kai Qi, Guanyu Hu, Wei Wu

Abstract: In this paper, we develop a novel depth-based testing procedure on spatial point processes to examine the difference in made and missed field goal attempts for NBA players. Specifically, our testing procedure can statistically detect the differences between made and missed field goal attempts for NBA players. We first obtain the depths of two processes under the polar coordinate system. A two-dime… ▽ More In this paper, we develop a novel depth-based testing procedure on spatial point processes to examine the difference in made and missed field goal attempts for NBA players. Specifically, our testing procedure can statistically detect the differences between made and missed field goal attempts for NBA players. We first obtain the depths of two processes under the polar coordinate system. A two-dimensional Kolmogorov-Smirnov test is then performed to test the difference between the depths of the two processes. Throughout extensive simulation studies, we show our testing procedure with good frequentist properties under both null hypothesis and alternative hypothesis. A comparison against the competing methods shows that our proposed procedure has better testing reliability and testing power. Application to the shot chart data of 191 NBA players in the 2017-2018 regular season offers interesting insights about these players' made and missed shot patterns. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 26 pages, 6 figures

arXiv:2403.00600 [pdf, other]

Random Interval Distillation for Detecting Multiple Changes in General Dependent Data

Authors: Xinyuan Fan, Weichi Wu

Abstract: We propose a new and generic approach for detecting multiple change-points in general dependent data, termed random interval distillation (RID). By collecting random intervals with sufficient strength of signals and reassembling them into a sequence of informative short intervals, our new approach captures the shifts in signal characteristics across diverse dependent data forms including locally s… ▽ More We propose a new and generic approach for detecting multiple change-points in general dependent data, termed random interval distillation (RID). By collecting random intervals with sufficient strength of signals and reassembling them into a sequence of informative short intervals, our new approach captures the shifts in signal characteristics across diverse dependent data forms including locally stationary high-dimensional time series and dynamic networks with Markov formation. We further propose a range of secondary refinements tailored to various data types to enhance the localization precision. Notably, for univariate time series and low-rank autoregressive networks, our methods achieve the minimax optimality as their independent counterparts. For practical applications, we introduce a clustering-based and data-driven procedure to determine the optimal threshold for signal strength, which is adaptable to a wide array of dependent data scenarios utilizing the connection between RID and clustering. Additionally, our method has been extended to identify kinks and changes in signals characterized by piecewise polynomial trends. We examine the effectiveness and usefulness of our methodology via extensive simulation studies and a real data example, implementing it in the R-package rid. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 59 pages, 5 figures

arXiv:2401.09346 [pdf, other]

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

Authors: Wanrong Zhu, Zhipeng Lou, Ziyang Wei, Wei Biao Wu

Abstract: Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution info… ▽ More Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution information and construct a t-based confidence interval. Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free. We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference. In particular, a new Gaussian approximation result is developed for the online estimators to characterize the coverage properties of our confidence intervals in terms of relative errors. Additionally, our method also allows for leveraging parallel computing to further accelerate calculations using multiple cores. It is easy to implement and can be integrated with existing stochastic algorithms without the need for complicated modifications. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2311.13676 [pdf, ps, other]

Depth-Based Statistical Inferences in the Spike Train Space

Authors: Xinyu Zhou, Wei Wu

Abstract: Metric-based summary statistics such as mean and covariance have been introduced in neural spike train space. They can properly describe template and variability in spike train data, but are often sensitive to outliers and expensive to compute. Recent studies also examine outlier detection and classification methods on point processes. These tools provide reasonable and efficient result, whereas t… ▽ More Metric-based summary statistics such as mean and covariance have been introduced in neural spike train space. They can properly describe template and variability in spike train data, but are often sensitive to outliers and expensive to compute. Recent studies also examine outlier detection and classification methods on point processes. These tools provide reasonable and efficient result, whereas the accuracy remains at a low level in certain cases. In this study, we propose to adopt a well-established notion of statistical depth to the spike train space. This framework can naturally define the median in a set of spike trains, which provides a robust description of the 'center' or 'template' of the observations. It also provides a principled method to identify 'outliers' in the data and classify data from different categories. We systematically compare the median with the state-of-the-art 'mean spike trains' in terms of robustness and efficiency. The performance of our novel outlier detection and classification tools will be compared with previous methods. The result shows the median has superior description for 'template' than the mean. Moreover, the proposed outlier detection and classification perform more accurately than previous methods. The advantages and superiority are well illustrated with simulations and real data. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.00577 [pdf, other]

Personalized Assignment to One of Many Treatment Arms via Regularized and Clustered Joint Assignment Forests

Authors: Rahul Ladhania, Jann Spiess, Lyle Ungar, Wenbo Wu

Abstract: We consider learning personalized assignments to one of many treatment arms from a randomized controlled trial. Standard methods that estimate heterogeneous treatment effects separately for each arm may perform poorly in this case due to excess variance. We instead propose methods that pool information across treatment arms: First, we consider a regularized forest-based assignment algorithm based… ▽ More We consider learning personalized assignments to one of many treatment arms from a randomized controlled trial. Standard methods that estimate heterogeneous treatment effects separately for each arm may perform poorly in this case due to excess variance. We instead propose methods that pool information across treatment arms: First, we consider a regularized forest-based assignment algorithm based on greedy recursive partitioning that shrinks effect estimates across arms. Second, we augment our algorithm by a clustering scheme that combines treatment arms with consistently similar outcomes. In a simulation study, we compare the performance of these approaches to predicting arm-wise outcomes separately, and document gains of directly optimizing the treatment assignment with regularization and clustering. In a theoretical model, we illustrate how a high number of treatment arms makes finding the best arm hard, while we can achieve sizable utility gains from personalization by regularized optimization. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2309.15316 [pdf, other]

Leveraging Neural Networks to Profile Health Care Providers with Application to Medicare Claims

Authors: Wenbo Wu, Fan Li, Richard Liu, Yiting Li, Mara McAdams-DeMarco, Krzysztof J. Geras, Douglas E. Schaubel, Iván Díaz

Abstract: Encompassing numerous nationwide, statewide, and institutional initiatives in the United States, provider profiling has evolved into a major health care undertaking with ubiquitous applications, profound implications, and high-stakes consequences. In line with such a significant profile, the literature has accumulated a number of developments dedicated to enhancing the statistical paradigm of prov… ▽ More Encompassing numerous nationwide, statewide, and institutional initiatives in the United States, provider profiling has evolved into a major health care undertaking with ubiquitous applications, profound implications, and high-stakes consequences. In line with such a significant profile, the literature has accumulated a number of developments dedicated to enhancing the statistical paradigm of provider profiling. Tackling wide-ranging profiling issues, these methods typically adjust for risk factors using linear predictors. While this approach is simple, it can be too restrictive to characterize complex and dynamic factor-outcome associations in certain contexts. One such example arises from evaluating dialysis facilities treating Medicare beneficiaries with end-stage renal disease. It is of primary interest to consider how the coronavirus disease (COVID-19) affected 30-day unplanned readmissions in 2020. The impact of COVID-19 on the risk of readmission varied dramatically across pandemic phases. To efficiently capture the variation while profiling facilities, we develop a generalized partially linear model (GPLM) that incorporates a neural network. Considering provider-level clustering, we implement the GPLM as a stratified sampling-based stochastic optimization algorithm that features accelerated convergence. Furthermore, an exact test is designed to identify under- and over-performing facilities, with an accompanying funnel plot to visualize profiles. The advantages of the proposed methods are demonstrated through simulation experiments and profiling dialysis facilities using 2020 Medicare claims from the United States Renal Data System. △ Less

Submitted 20 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: 8 figures, 6 tables

arXiv:2309.08488 [pdf, other]

A Random Graph-based Autoregressive Model for Networked Time Series

Authors: Weichi Wu, Chenlei Leng

Abstract: Contemporary time series data often feature objects connected by a social network that naturally induces temporal dependence involving connected neighbours. The network vector autoregressive model is useful for describing the influence of linked neighbours, while recent generalizations aim to separate influence and homophily. Existing approaches, however, require either correct specification of a… ▽ More Contemporary time series data often feature objects connected by a social network that naturally induces temporal dependence involving connected neighbours. The network vector autoregressive model is useful for describing the influence of linked neighbours, while recent generalizations aim to separate influence and homophily. Existing approaches, however, require either correct specification of a time series model or accurate estimation of a network model or both, and rely exclusively on least-squares for parameter estimation. This paper proposes a new autoregressive model incorporating a flexible form for latent variables used to depict homophily. We develop a first-order differencing method for the estimation of influence requiring only the influence part of the model to be correctly specified. When the part including homophily is correctly specified admitting a semiparametric form, we leverage and generalize the recent notion of neighbour smoothing for parameter estimation, bypassing the need to specify the generative mechanism of the network. We develop new theory to show that all the estimated parameters are consistent and asymptotically normal. The efficacy of our approach is confirmed via extensive simulations and an analysis of a social media dataset. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.00760 [pdf, other]

Spatial Regression With Multiplicative Errors, and Its Application With Lidar Measurements

Authors: Hojun You, Wei-Ying Wu, Chae Young Lim, Kyubaek Yoon, Jongeun Choi

Abstract: Multiplicative errors in addition to spatially referenced observations often arise in geodetic applications, particularly in surface estimation with light detection and ranging (LiDAR) measurements. However, spatial regression involving multiplicative errors remains relatively unexplored in such applications. In this regard, we present a penalized modified least squares estimator to handle the com… ▽ More Multiplicative errors in addition to spatially referenced observations often arise in geodetic applications, particularly in surface estimation with light detection and ranging (LiDAR) measurements. However, spatial regression involving multiplicative errors remains relatively unexplored in such applications. In this regard, we present a penalized modified least squares estimator to handle the complexities of a multiplicative error structure while identifying significant variables in spatially dependent observations for surface estimation. The proposed estimator can be also applied to classical additive error spatial regression. By establishing asymptotic properties of the proposed estimator under increasing domain asymptotics with stochastic sampling design, we provide a rigorous foundation for its effectiveness. A comprehensive simulation study confirms the superior performance of our proposed estimator in accurately estimating and selecting parameters, outperforming existing approaches. To demonstrate its real-world applicability, we employ our proposed method, along with other alternative techniques, to estimate a rotational landslide surface using LiDAR measurements. The results highlight the efficacy and potential of our approach in tackling complex spatial regression problems involving multiplicative errors. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2307.06915 [pdf, other]

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Authors: Ziyang Wei, Wanrong Zhu, Wei Biao Wu

Abstract: Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a… ▽ More Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE). △ Less

Submitted 18 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

arXiv:2305.19001 [pdf, other]

High-probability sample complexities for policy evaluation with linear function approximation

Authors: Gen Li, Weichen Wu, Yuejie Chi, Cong Ma, Alessandro Rinaldo, Yuting Wei

Abstract: This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale li… ▽ More This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the off-policy setting, where samples are drawn from a behavior policy potentially different from the target policy, we establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level. We also exhihit an explicit dependence on problem-related quantities, and show in the on-policy setting that our upper bound matches the minimax lower bound on crucial problem parameters, including the choice of the feature maps and the problem dimension. △ Less

Submitted 2 May, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: The first two authors contributed equally; paper accepted to IEEE Transactions on Information Theory

arXiv:2303.16599 [pdf, other]

Difference-based covariance matrix estimate in time series nonparametric regression with applications to specification tests

Authors: Lujia Bai, Weichi Wu

Abstract: Long-run covariance matrix estimation is the building block of time series inference. The corresponding difference-based estimator, which avoids detrending, has attracted considerable interest due to its robustness to both smooth and abrupt structural breaks and its competitive finite sample performance. However, existing methods mainly focus on estimators for the univariate process while their di… ▽ More Long-run covariance matrix estimation is the building block of time series inference. The corresponding difference-based estimator, which avoids detrending, has attracted considerable interest due to its robustness to both smooth and abrupt structural breaks and its competitive finite sample performance. However, existing methods mainly focus on estimators for the univariate process while their direct and multivariate extensions for most linear models are asymptotically biased. We propose a novel difference-based and debiased long-run covariance matrix estimator for functional linear models with time-varying regression coefficients, allowing time series non-stationarity, long-range dependence, state-heteroscedasticity and their mixtures. We apply the new estimator to (i) the structural stability test, overcoming the notorious non-monotonic power phenomena caused by piecewise smooth alternatives for regression coefficients, and (ii) the nonparametric residual-based tests for long memory, improving the performance via the residual-free formula of the proposed estimator. The effectiveness of the proposed method is justified theoretically and demonstrated by superior performance in simulation studies, while its usefulness is elaborated via real data analysis. Our method is implemented in the R package mlrv. △ Less

Submitted 28 February, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2110.08089

arXiv:2303.10117 [pdf, other]

Estimation of Grouped Time-Varying Network Vector Autoregression Models

Authors: Degui Li, Bin Peng, Songqiao Tang, Weibiao Wu

Abstract: This paper introduces a flexible time-varying network vector autoregressive model framework for large-scale time series. A latent group structure is imposed on the heterogeneous and node-specific time-varying momentum and network spillover effects so that the number of unknown time-varying coefficients to be estimated can be reduced considerably. A classic agglomerative clustering algorithm with n… ▽ More This paper introduces a flexible time-varying network vector autoregressive model framework for large-scale time series. A latent group structure is imposed on the heterogeneous and node-specific time-varying momentum and network spillover effects so that the number of unknown time-varying coefficients to be estimated can be reduced considerably. A classic agglomerative clustering algorithm with nonparametrically estimated distance matrix is combined with a ratio criterion to consistently estimate the latent group number and membership. A post-grou** local linear smoothing method is proposed to estimate the group-specific time-varying momentum and network effects, substantially improving the convergence rates of the preliminary estimates which ignore the latent structure. We further modify the methodology and theory to allow for structural breaks in either the group membership, group number or group-specific coefficient functions. Numerical studies including Monte-Carlo simulation and an empirical application are presented to examine the finite-sample performance of the developed model and methodology. △ Less

Submitted 10 March, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

arXiv:2302.05158 [pdf, other]

Time-varying correlation network analysis of non-stationary multivariate time series with complex trends

Authors: Lujia Bai, Weichi Wu

Abstract: This paper proposes a flexible framework for inferring large-scale time-varying and time-lagged correlation networks from multivariate or high-dimensional non-stationary time series with piecewise smooth trends. Built on a novel and unified multiple-testing procedure of time-lagged cross-correlation functions with a fixed or diverging number of lags, our method can accurately disclose flexible tim… ▽ More This paper proposes a flexible framework for inferring large-scale time-varying and time-lagged correlation networks from multivariate or high-dimensional non-stationary time series with piecewise smooth trends. Built on a novel and unified multiple-testing procedure of time-lagged cross-correlation functions with a fixed or diverging number of lags, our method can accurately disclose flexible time-varying network structures associated with complex functional structures at all time points. We broaden the applicability of our method to the structure breaks by develo** difference-based nonparametric estimators of cross-correlations, achieve accurate family-wise error control via a bootstrap-assisted procedure adaptive to the complex temporal dynamics, and enhance the probability of recovering the time-varying network structures using a new uniform variance reduction technique. We prove the asymptotic validity of the proposed method and demonstrate its effectiveness in finite samples through simulation studies and empirical applications. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2301.04209 [pdf, other]

High Dimensional Analysis of Variance in Multivariate Linear Regression

Authors: Zhipeng Lou, Xianyang Zhang, Wei Biao Wu

Abstract: In this paper, we develop a systematic theory for high dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new \emph{U}~type test statistic to test linear hypotheses and establish a high dimensional Gaussian approximation result under fairly mild moment assumptions. Our general frame… ▽ More In this paper, we develop a systematic theory for high dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new \emph{U}~type test statistic to test linear hypotheses and establish a high dimensional Gaussian approximation result under fairly mild moment assumptions. Our general framework and theory can be applied to deal with the classical one-way multivariate ANOVA and the nonparametric one-way MANOVA in high dimensions. To implement the test procedure in practice, we introduce a sample-splitting based estimator of the second moment of the error covariance and discuss its properties. A simulation study shows that our proposed test outperforms some existing tests in various settings. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2210.13550 [pdf, other]

doi 10.1007/s10463-023-00895-1

Regularized Nonlinear Regression with Dependent Errors and its Application to a Biomechanical Model

Authors: Hojun You, Kyubaek Yoon, Wei-Ying Wu, Jongeun Choi, Chae Young Lim

Abstract: A biomechanical model often requires parameter estimation and selection in a known but complicated nonlinear function. Motivated by observing that data from a head-neck position tracking system, one of biomechanical models, show multiplicative time dependent errors, we develop a modified penalized weighted least squares estimator. The proposed method can be also applied to a model with non-zero me… ▽ More A biomechanical model often requires parameter estimation and selection in a known but complicated nonlinear function. Motivated by observing that data from a head-neck position tracking system, one of biomechanical models, show multiplicative time dependent errors, we develop a modified penalized weighted least squares estimator. The proposed method can be also applied to a model with non-zero mean time dependent additive errors. Asymptotic properties of the proposed estimator are investigated under mild conditions on a weight matrix and the error process. A simulation study demonstrates that the proposed estimation works well in both parameter estimation and selection with time dependent error. The analysis and comparison with an existing method for head-neck position tracking data show better performance of the proposed method in terms of the variance accounted for (VAF). △ Less

Submitted 11 October, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: The article revised in overall

Journal ref: Annals of the Institute of Statistical Mathematics, 2024

arXiv:2209.00181 [pdf, other]

Understanding the dynamic impact of COVID-19 through competing risk modeling with bivariate varying coefficients

Authors: Wenbo Wu, John D. Kalbfleisch, Jeremy M. G. Taylor, Jian Kang, Kevin He

Abstract: The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischar… ▽ More The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischarge time and time since the onset of the pandemic. However, the complex dynamics of the COVID-19 effect trajectories cannot be characterized by existing varying coefficient models. To address this issue, we propose a bivariate varying coefficient model for competing risks within a cause-specific hazard framework, where tensor-product B-splines are used to estimate the surface of the COVID-19 effect. An efficient proximal Newton algorithm is developed to facilitate the fitting of the new model to the massive Medicare data for dialysis patients. Difference-based anisotropic penalization is introduced to mitigate model overfitting and the wiggliness of the estimated trajectories; various cross-validation methods are considered in the determination of optimal tuning parameters. Hypothesis testing procedures are designed to examine whether the COVID-19 effect varies significantly with postdischarge time and the time since pandemic onset, either jointly or separately. Simulation experiments are conducted to evaluate the estimation accuracy, type I error rate, statistical power, and model selection procedures. Applications to Medicare dialysis patients demonstrate the real-world performance of the proposed methods. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: 40 pages, 8 figures, 1 table

arXiv:2208.13074 [pdf, other]

$\ell^2$ Inference for Change Points in High-Dimensional Time Series via a Two-Way MOSUM

Authors: Jiaqi Li, Likai Chen, Weining Wang, Wei Biao Wu

Abstract: We propose an inference method for detecting multiple change points in high-dimensional time series, targeting dense or spatially clustered signals. Our method aggregates moving sum (MOSUM) statistics cross-sectionally by an $\ell^2$-norm and maximizes them over time. We further introduce a novel Two-Way MOSUM, which utilizes spatial-temporal moving regions to search for breaks, with the added adv… ▽ More We propose an inference method for detecting multiple change points in high-dimensional time series, targeting dense or spatially clustered signals. Our method aggregates moving sum (MOSUM) statistics cross-sectionally by an $\ell^2$-norm and maximizes them over time. We further introduce a novel Two-Way MOSUM, which utilizes spatial-temporal moving regions to search for breaks, with the added advantage of enhancing testing power when breaks occur in only a few groups. The limiting distribution of an $\ell^2$-aggregated statistic is established for testing break existence by extending a high-dimensional Gaussian approximation theorem to spatial-temporal non-stationary processes. Simulation studies exhibit promising performance of our test in detecting non-sparse weak signals. Two applications, analyzing equity returns and COVID-19 cases in the United States, showcase the real-world relevance of our proposed algorithms. △ Less

Submitted 3 July, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

Comments: 111 pages, 10 figures

arXiv:2207.05195 [pdf, other]

Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting

Authors: Bohan Tang, Yiqi Zhong, Chenxin Xu, Wei-Tao Wu, Ulrich Neumann, Yanfeng Wang, Ya Zhang, Siheng Chen

Abstract: In multi-modal multi-agent trajectory forecasting, two major challenges have not been fully tackled: 1) how to measure the uncertainty brought by the interaction module that causes correlations among the predicted trajectories of multiple agents; 2) how to rank the multiple predictions and select the optimal predicted trajectory. In order to handle these challenges, this work first proposes a nove… ▽ More In multi-modal multi-agent trajectory forecasting, two major challenges have not been fully tackled: 1) how to measure the uncertainty brought by the interaction module that causes correlations among the predicted trajectories of multiple agents; 2) how to rank the multiple predictions and select the optimal predicted trajectory. In order to handle these challenges, this work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules. Then we build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation. Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty. We conduct extensive experiments on a synthetic dataset and two public large-scale multi-agent trajectory forecasting benchmarks. Experiments show that: 1) on the synthetic dataset, the CU-aware regression framework allows the model to appropriately approximate the ground-truth Laplace distribution; 2) on the multi-agent trajectory forecasting benchmarks, the CU-aware regression framework steadily helps SOTA systems improve their performances. Specially, the proposed framework helps VectorNet improve by 262 cm regarding the Final Displacement Error of the chosen optimal prediction on the nuScenes dataset; 3) for multi-agent multi-modal trajectory forecasting systems, prediction uncertainty is positively correlated with future stochasticity; and 4) the estimated CU values are highly related to the interactive information among agents. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2110.13947

arXiv:2205.04341 [pdf, other]

Asymptotic comparison of identifying constraints for Bradley-Terry models

Authors: Weichen Wu, Brian W. Junker, Nynke M. D. Niezink

Abstract: The Bradley-Terry model is widely used for pairwise comparison data analysis. In this paper, we analyze the asymptotic behavior of the maximum likelihood estimator of the Bradley-Terry model in its logistic parameterization, under a general class of linear identifiability constraints. We show that the constraint requiring the Bradley-Terry scores for all compared objects to sum to zero minimizes t… ▽ More The Bradley-Terry model is widely used for pairwise comparison data analysis. In this paper, we analyze the asymptotic behavior of the maximum likelihood estimator of the Bradley-Terry model in its logistic parameterization, under a general class of linear identifiability constraints. We show that the constraint requiring the Bradley-Terry scores for all compared objects to sum to zero minimizes the sum of the variances of the estimated scores, and recommend using this constraint in practice. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2203.14810 [pdf, other]

Data-Driven, Soft Alignment of Functional Data Using Shapes and Landmarks

Authors: Xiaoyang Guo, Wei Wu, Anuj Srivastava

Abstract: Alignment or registration of functions is a fundamental problem in statistical analysis of functions and shapes. While there are several approaches available, a more recent approach based on Fisher-Rao metric and square-root velocity functions (SRVFs) has been shown to have good performance. However, this SRVF method has two limitations: (1) it is susceptible to over alignment, i.e., alignment of… ▽ More Alignment or registration of functions is a fundamental problem in statistical analysis of functions and shapes. While there are several approaches available, a more recent approach based on Fisher-Rao metric and square-root velocity functions (SRVFs) has been shown to have good performance. However, this SRVF method has two limitations: (1) it is susceptible to over alignment, i.e., alignment of noise as well as the signal, and (2) in case there is additional information in form of landmarks, the original formulation does not prescribe a way to incorporate that information. In this paper we propose an extension that allows for incorporation of landmark information to seek a compromise between matching curves and landmarks. This results in a soft landmark alignment that pushes landmarks closer, without requiring their exact overlays to finds a compromise between contributions from functions and landmarks. The proposed method is demonstrated to be superior in certain practical scenarios. △ Less

Submitted 9 April, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.04454 [pdf, ps, other]

Statistical Depth for Point Process via the Isometric Log-Ratio Transformation

Authors: Xinyu Zhou, Yijia Ma, Wei Wu

Abstract: Statistical depth, a useful tool to measure the center-outward rank of multivariate and functional data, is still under-explored in temporal point processes. Recent studies on point process depth proposed a weighted product of two terms - one indicates the depth of the cardinality of the process, and the other characterizes the conditional depth of the temporal events given the cardinality. The se… ▽ More Statistical depth, a useful tool to measure the center-outward rank of multivariate and functional data, is still under-explored in temporal point processes. Recent studies on point process depth proposed a weighted product of two terms - one indicates the depth of the cardinality of the process, and the other characterizes the conditional depth of the temporal events given the cardinality. The second term is of great challenge because of the apparent nonlinear structure of event times, and so far only basic parametric representations such as Gaussian and Dirichlet densities were adopted in the definitions. However, these simplified forms ignore the underlying distribution of the process events, which makes the methods difficult to interpret and to apply to complicated patterns. To deal with these problems, we in this paper propose a distribution-based approach to the conditional depth via the well-known Isometric Log-Ratio (ILR) transformation on the inter-event times. The new depth, called the ILR depth, is at first defined for homogeneous Poisson process by using the density function on the transformed space. The definition is then extended to any general point process via a time-rescaling transformation. We illustrate the ILR depth using simulations of Poisson and non-Poisson processes and demonstrate its superiority over previous methods. We also thoroughly examine its mathematical properties and asymptotics in large samples. Finally, we apply the ILR depth in a real dataset and the result clearly shows the effectiveness of the new method. △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2201.09970 [pdf, ps, other]

A Stochastic Process Model for Time War** Functions

Authors: Yijia Ma, Xinyu Zhou, Wei Wu

Abstract: Time war** function provides a mathematical representation to measure phase variability in functional data. Recent studies have developed various approaches to estimate optimal war** between functions and provide non-Euclidean models. However, a principled, linear, generative model on time war** functions is still under-explored. This is a highly challenging problem because the space of warp… ▽ More Time war** function provides a mathematical representation to measure phase variability in functional data. Recent studies have developed various approaches to estimate optimal war** between functions and provide non-Euclidean models. However, a principled, linear, generative model on time war** functions is still under-explored. This is a highly challenging problem because the space of war** functions is non-linear with the conventional Euclidean metric. To address this problem, we propose a stochastic process model for time war** functions, where the key is to define a linear, inner-product structure on the time war** space and then transform the war** functions into a sub-space of the $\mathbb L^2$ Euclidean space. With certain constraints on the war** functions, this transformation is an isometric isomorphism. In the transformed space, we adopt the $\mathbb L^2$ basis in the Hilbert space for representation. This new framework can easily build generative model on time war** by using different types of stochastic process. It can also be used to conduct statistical inferences such as functional PCA, functional ANOVA, and functional regressions. Furthermore, we demonstrate the effectiveness of this new framework by using it as a new prior in the Bayesian registration, and propose an efficient gradient method to address the important maximum a posteriori estimation. We illustrate the new Bayesian method using simulations which properly characterize nonuniform and correlated constraints in the time domain. Finally, we apply the new framework to the famous Berkeley growth data and obtain reasonable results on modeling, resampling, group comparison, and classification analysis. △ Less

Submitted 13 April, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2110.14177 [pdf, other]

Federated Linear Contextual Bandits

Authors: Ruiquan Huang, Weiqiang Wu, **g Yang, Cong Shen

Abstract: This paper presents a novel federated linear contextual bandits model, where individual clients face different $K$-armed stochastic bandits coupled through common global parameters. By leveraging the geometric structure of the linear rewards, a collaborative algorithm called Fed-PE is proposed to cope with the heterogeneity across clients without exchanging local feature vectors or raw data. Fed-P… ▽ More This paper presents a novel federated linear contextual bandits model, where individual clients face different $K$-armed stochastic bandits coupled through common global parameters. By leveraging the geometric structure of the linear rewards, a collaborative algorithm called Fed-PE is proposed to cope with the heterogeneity across clients without exchanging local feature vectors or raw data. Fed-PE relies on a novel multi-client G-optimal design, and achieves near-optimal regrets for both disjoint and shared parameter cases with logarithmic communication costs. In addition, a new concept called collinearly-dependent policies is introduced, based on which a tight minimax regret lower bound for the disjoint parameter case is derived. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2107.02043 [pdf]

An extended watershed-based zonal statistical AHP model for flood risk estimation: Constraining runoff converging related indicators by sub-watersheds

Authors: Hong** Zhang, Zhenfeng Shao, **qi Zhao, Xiao Huang, Jie Yang, Bin Hu, Wenfu Wu

Abstract: Floods are highly uncertain events, occurring in different regions, with varying prerequisites and intensities. A highly reliable flood disaster risk map can help reduce the impact of floods for flood management, disaster decreasing, and urbanization resilience. In flood risk estimation, the widely used analytic hierarchy process (AHP) usually adopts pixel as a basic unit, it cannot capture the si… ▽ More Floods are highly uncertain events, occurring in different regions, with varying prerequisites and intensities. A highly reliable flood disaster risk map can help reduce the impact of floods for flood management, disaster decreasing, and urbanization resilience. In flood risk estimation, the widely used analytic hierarchy process (AHP) usually adopts pixel as a basic unit, it cannot capture the similar threaten caused by neighborhood source flooding cells at sub-watershed scale. Thus, an extended watershed-based zonal statistical AHP model constraining runoff converging related indicators by sub-watersheds (WZSAHP-Slope & Stream) is proposed to fill this gap. Taking the Chaohu basin as test case, we validated the proposed method with a real-flood area extracted in July 2020. The results indicate that the WZSAHP-Slope & Stream model using multiple flow direction division watersheds to calculate statistics of distance from stream and slope by maximum statistic method outperformed other tested methods. Compering with pixel-based AHP method, the proposed method can improve the correct ratio by 16% (from 67% to 83%) and fit ratio by 1% (from 13% to 14%) as in validation 1, and improve the correct ratio by 37% (from 23% to 60%) and fit ratio by 6% (from 12% to 18%) as in validation 2. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: This paper is a research paper, it contains 40 pages and 8 figures. This paper is a modest contribution to the ongoing discussions the accuracy of flood risk estimation via AHP model improved by adopting pixels replaced with sub-watersheds as basic unit

MSC Class: 86A05 ACM Class: H.1

arXiv:2105.08893 [pdf, ps, other]

A unified framework on defining depth for point process using function smoothing

Authors: Zishen Xu, Chenran Wang, Wei Wu

Abstract: The notion of statistical depth has been extensively studied in multivariate and functional data over the past few decades. In contrast, the depth on temporal point process is still under-explored. The problem is challenging because a point process has two types of randomness: 1) the number of events in a process, and 2) the distribution of these events. Recent studies proposed depths in a weighte… ▽ More The notion of statistical depth has been extensively studied in multivariate and functional data over the past few decades. In contrast, the depth on temporal point process is still under-explored. The problem is challenging because a point process has two types of randomness: 1) the number of events in a process, and 2) the distribution of these events. Recent studies proposed depths in a weighted product of two terms, describing the above two types of randomness, respectively. In this paper, we propose to unify these two randomnesses under one framework by a smoothing procedure. Basically, we transform the point process observations into functions using conventional kernel smoothing methods, and then adopt the well-known functional $h$-depth and its modified, center-based, version to describe the center-outward rank in the original data. To do so, we define a proper metric on the point processes with smoothed functions. We then propose an efficient algorithm to estimated the defined "center". We further explore the mathematical properties of the newly defined depths and study asymptotics. Simulation results show that the proposed depths can properly rank the point process observations. Finally, we demonstrate the new method in a classification task using a real neuronal spike train dataset. △ Less

Submitted 20 May, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

arXiv:2104.14525 [pdf, other]

Testing and estimation of clustered signals

Authors: Hongyuan Cao, Wei Biao Wu

Abstract: We propose a change-point detection method for large scale multiple testing problems with data having clustered signals. Unlike the classic change-point setup, the signals can vary in size within a cluster. The clustering structure on the signals enables us to effectively delineate the boundaries between signal and non-signal segments. New test statistics are proposed for observations from one and… ▽ More We propose a change-point detection method for large scale multiple testing problems with data having clustered signals. Unlike the classic change-point setup, the signals can vary in size within a cluster. The clustering structure on the signals enables us to effectively delineate the boundaries between signal and non-signal segments. New test statistics are proposed for observations from one and/or multiple realizations. Their asymptotic distributions are derived. We also study the associated variance estimation problem. We allow the variances to be heteroscedastic in the multiple realization case, which substantially expands the applicability of the proposed method. Simulation studies demonstrate that the proposed approach has a favorable performance. Our procedure is applied to {an array based Comparative Genomic Hybridization (aCGH)} dataset. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Report number: BEJ2007-047

Journal ref: Bernoulli, 2021

arXiv:2104.11426 [pdf, other]

doi 10.1016/j.engappai.2022.104974

Regularized Nonlinear Regression for Simultaneously Selecting and Estimating Key Model Parameters

Authors: Kyubaek Yoon, Hojun You, Wei-Ying Wu, Chae Young Lim, Jongeun Choi, Connor Boss, Ahmed Ramadan, John M. Popovich Jr., Jacek Cholewicki, N. Peter Reeves, Clark J. Radcliffe

Abstract: In system identification, estimating parameters of a model using limited observations results in poor identifiability. To cope with this issue, we propose a new method to simultaneously select and estimate sensitive parameters as key model parameters and fix the remaining parameters to a set of typical values. Our method is formulated as a nonlinear least squares estimator with L1-regularization o… ▽ More In system identification, estimating parameters of a model using limited observations results in poor identifiability. To cope with this issue, we propose a new method to simultaneously select and estimate sensitive parameters as key model parameters and fix the remaining parameters to a set of typical values. Our method is formulated as a nonlinear least squares estimator with L1-regularization on the deviation of parameters from a set of typical values. First, we provide consistency and oracle properties of the proposed estimator as a theoretical foundation. Second, we provide a novel approach based on Levenberg-Marquardt optimization to numerically find the solution to the formulated problem. Third, to show the effectiveness, we present an application identifying a biomechanical parametric model of a head position tracking task for 10 human subjects from limited data. In a simulation study, the variances of estimated parameters are decreased by 96.1% as compared to that of the estimated parameters without L1-regularization. In an experimental study, our method improves the model interpretation by reducing the number of parameters to be estimated while maintaining variance accounted for (VAF) at above 82.5%. Moreover, the variances of estimated parameters are reduced by 71.1% as compared to that of the estimated parameters without L1-regularization. Our method is 54 times faster than the standard simplex-based optimization to solve the regularized nonlinear regression. △ Less

Submitted 2 June, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

Comments: 13 pages, 4 figures, 2 Tables

arXiv:2104.01114 [pdf, other]

The general conformable fractional grey system model and its applications

Authors: Wanli Xie, Mingyong Pang, Wen-Ze Wu, Chong Liu, Caixia Liu

Abstract: Grey system theory is an important mathematical tool for describing uncertain information in the real world. It has been used to solve the uncertainty problems specially caused by lack of information. As a novel theory, the theory can deal with various fields and plays an important role in modeling the small sample problems. But many modeling mechanisms of grey system need to be answered, such as… ▽ More Grey system theory is an important mathematical tool for describing uncertain information in the real world. It has been used to solve the uncertainty problems specially caused by lack of information. As a novel theory, the theory can deal with various fields and plays an important role in modeling the small sample problems. But many modeling mechanisms of grey system need to be answered, such as why grey accumulation can be successfully applied to grey prediction model? What is the key role of grey accumulation? Some scholars have already given answers to a certain extent. In this paper, we explain the role from the perspective of complex networks. Further, we propose generalized conformable accumulation and difference, and clarify its physical meaning in the grey model. We use our newly proposed fractional accumulation and difference to our generalized conformable fractional grey model, or GCFGM(1,1), and employ practical cases to verify that GCFGM(1,1) has higher accuracy compared to traditional models. △ Less

Submitted 14 July, 2021; v1 submitted 28 March, 2021; originally announced April 2021.

arXiv:2103.07626 [pdf, other]

Helmholtzian Eigenmap: Topological feature discovery & edge flow learning from point cloud data

Authors: Yu-Chia Chen, Weicheng Wu, Marina Meilă, Ioannis G. Kevrekidis

Abstract: The manifold Helmholtzian (1-Laplacian) operator $Δ_1$ elegantly generalizes the Laplace-Beltrami operator to vector fields on a manifold $\mathcal M$. In this work, we propose the estimation of the manifold Helmholtzian from point cloud data by a weighted 1-Laplacian $\mathcal L_1$. While higher order Laplacians have been introduced and studied, this work is the first to present a graph Helmholtz… ▽ More The manifold Helmholtzian (1-Laplacian) operator $Δ_1$ elegantly generalizes the Laplace-Beltrami operator to vector fields on a manifold $\mathcal M$. In this work, we propose the estimation of the manifold Helmholtzian from point cloud data by a weighted 1-Laplacian $\mathcal L_1$. While higher order Laplacians have been introduced and studied, this work is the first to present a graph Helmholtzian constructed from a simplicial complex as a consistent estimator for the continuous operator in a non-parametric setting. Equipped with the geometric and topological information about $\mathcal M$, the Helmholtzian is a useful tool for the analysis of flows and vector fields on $\mathcal M$ via the Helmholtz-Hodge theorem. In addition, the $\mathcal L_1$ allows the smoothing, prediction, and feature extraction of the flows. We demonstrate these possibilities on substantial sets of synthetic and real point cloud datasets with non-trivial topological structures; and provide theoretical results on the limit of $\mathcal L_1$ to $Δ_1$. △ Less

Submitted 31 October, 2023; v1 submitted 13 March, 2021; originally announced March 2021.

arXiv:2012.14708 [pdf, ps, other]

Adaptive Estimation for Non-stationary Factor Models And A Test for Static Factor Loadings

Authors: Weichi Wu, Zhou Zhou

Abstract: This paper considers the estimation and testing of a class of locally stationary time series factor models with evolutionary temporal dynamics. In particular, the entries and the dimension of the factor loading matrix are allowed to vary with time while the factors and the idiosyncratic noise components are locally stationary. We propose an adaptive sieve estimator for the span of the varying load… ▽ More This paper considers the estimation and testing of a class of locally stationary time series factor models with evolutionary temporal dynamics. In particular, the entries and the dimension of the factor loading matrix are allowed to vary with time while the factors and the idiosyncratic noise components are locally stationary. We propose an adaptive sieve estimator for the span of the varying loading matrix and the locally stationary factor processes. A uniformly consistent estimator of the effective number of factors is investigated via eigenanalysis of a non-negative definite time-varying matrix. A possibly high-dimensional bootstrap-assisted test for the hypothesis of static factor loadings is proposed by comparing the kernels of the covariance matrices of the whole time series with their local counterparts. We examine our estimator and test via simulation studies and real data analysis. Finally, all our results hold at the following popular but distinct assumptions: (a) the white noise idiosyncratic errors with either fixed or diverging dimension, and (b) the correlated idiosyncratic errors with diverging dimension. △ Less

Submitted 3 February, 2024; v1 submitted 29 December, 2020; originally announced December 2020.

arXiv:2012.08223 [pdf, other]

Long-term prediction intervals with many covariates

Authors: Sayar Karmakar, Marek Chudy, Wei Biao Wu

Abstract: Accurate forecasting is one of the fundamental focus in the literature of econometric time-series. Often practitioners and policy makers want to predict outcomes of an entire time horizon in the future instead of just a single $k$-step ahead prediction. These series, apart from their own possible non-linear dependence, are often also influenced by many external predictors. In this paper, we constr… ▽ More Accurate forecasting is one of the fundamental focus in the literature of econometric time-series. Often practitioners and policy makers want to predict outcomes of an entire time horizon in the future instead of just a single $k$-step ahead prediction. These series, apart from their own possible non-linear dependence, are often also influenced by many external predictors. In this paper, we construct prediction intervals of time-aggregated forecasts in a high-dimensional regression setting. Our approach is based on quantiles of residuals obtained by the popular LASSO routine. We allow for general heavy-tailed, long-memory, and nonlinear stationary error process and stochastic predictors. Through a series of systematically arranged consistency results we provide theoretical guarantees of our proposed quantile-based method in all of these scenarios. After validating our approach using simulations we also propose a novel bootstrap based method that can boost the coverage of the theoretical intervals. Finally analyzing the EPEX Spot data, we construct prediction intervals for hourly electricity prices over horizons spanning 17 weeks and contrast them to selected Bayesian and bootstrap interval forecasts. △ Less

Submitted 30 September, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2011.06333 [pdf, other]

Shift identification in time varying regression quantiles

Authors: Subhra Sankar Dhar, Weichi Wu

Abstract: This article investigates whether time-varying quantile regression curves are the same up to the horizontal shift or not. The errors and the covariates involved in the regression model are allowed to be locally stationary. We formalize this issue in a corresponding non-parametric hypothesis testing problem, and develop an integrated-squared-norm based test (SIT) as well as a simultaneous confidenc… ▽ More This article investigates whether time-varying quantile regression curves are the same up to the horizontal shift or not. The errors and the covariates involved in the regression model are allowed to be locally stationary. We formalize this issue in a corresponding non-parametric hypothesis testing problem, and develop an integrated-squared-norm based test (SIT) as well as a simultaneous confidence band (SCB) approach. The asymptotic properties of SIT and SCB under null and local alternatives are derived. Moreover, the asymptotic properties of these tests are also studied when the compared data sets are dependent. We then propose valid wild bootstrap algorithms to implement SIT and SCB. Furthermore, the usefulness of the proposed methodology is illustrated via analysing simulated and real data related to COVID-19 outbreak and climate science. △ Less

Submitted 24 December, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

arXiv:2010.03130 [pdf]

Computational analysis of pathological image enables interpretable prediction for microsatellite instability

Authors: ** Zhu, Wangwei Wu, Yuting Zhang, Shiyun Lin, Yukang Jiang, Ruixian Liu, Xueqin Wang

Abstract: Microsatellite instability (MSI) is associated with several tumor types and its status has become increasingly vital in guiding patient treatment decisions. However, in clinical practice, distinguishing MSI from its counterpart is challenging since the diagnosis of MSI requires additional genetic or immunohistochemical tests. In this study, interpretable pathological image analysis strategies are… ▽ More Microsatellite instability (MSI) is associated with several tumor types and its status has become increasingly vital in guiding patient treatment decisions. However, in clinical practice, distinguishing MSI from its counterpart is challenging since the diagnosis of MSI requires additional genetic or immunohistochemical tests. In this study, interpretable pathological image analysis strategies are established to help medical experts to automatically identify MSI. The strategies only require ubiquitous Haematoxylin and eosin-stained whole-slide images and can achieve decent performance in the three cohorts collected from The Cancer Genome Atlas. The strategies provide interpretability in two aspects. On the one hand, the image-level interpretability is achieved by generating localization heat maps of important regions based on the deep learning network; on the other hand, the feature-level interpretability is attained through feature importance and pathological feature interaction analysis. More interestingly, both from the image-level and feature-level interpretability, color features and texture characteristics are shown to contribute the most to the MSI predictions. Therefore, the classification models under the proposed strategies can not only serve as an efficient tool for predicting the MSI status of patients, but also provide more insights to pathologists with clinical understanding. △ Less

Submitted 6 October, 2020; originally announced October 2020.

arXiv:2008.09667 [pdf, other]

A Blockchain Transaction Graph based Machine Learning Method for Bitcoin Price Prediction

Authors: Xiao Li, Weili Wu

Abstract: Bitcoin, as one of the most popular cryptocurrency, is recently attracting much attention of investors. Bitcoin price prediction task is consequently a rising academic topic for providing valuable insights and suggestions. Existing bitcoin prediction works mostly base on trivial feature engineering, that manually designs features or factors from multiple areas, including Bticoin Blockchain informa… ▽ More Bitcoin, as one of the most popular cryptocurrency, is recently attracting much attention of investors. Bitcoin price prediction task is consequently a rising academic topic for providing valuable insights and suggestions. Existing bitcoin prediction works mostly base on trivial feature engineering, that manually designs features or factors from multiple areas, including Bticoin Blockchain information, finance and social media sentiments. The feature engineering not only requires much human effort, but the effectiveness of the intuitively designed features can not be guaranteed. In this paper, we aim to mining the abundant patterns encoded in bitcoin transactions, and propose k-order transaction graph to reveal patterns under different scope. We propose the transaction graph based feature to automatically encode the patterns. A novel prediction method is proposed to accept the features and make price prediction, which can take advantage from particular patterns from different history period. The results of comparison experiments demonstrate that the proposed method outperforms the most recent state-of-art methods. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:2007.14365 [pdf, other]

Tractably Modelling Dependence in Networks Beyond Exchangeability

Authors: Weichi Wu, Sofia Olhede, Patrick Wolfe

Abstract: We propose a general framework for modelling network data that is designed to describe aspects of non-exchangeable networks. Conditional on latent (unobserved) variables, the edges of the network are generated by their finite growth history (with latent orders) while the marginal probabilities of the adjacency matrix are modeled by a generalization of a graph limit function (or a graphon). In part… ▽ More We propose a general framework for modelling network data that is designed to describe aspects of non-exchangeable networks. Conditional on latent (unobserved) variables, the edges of the network are generated by their finite growth history (with latent orders) while the marginal probabilities of the adjacency matrix are modeled by a generalization of a graph limit function (or a graphon). In particular, we study the estimation, clustering and degree behavior of the network in our setting. We determine (i) the minimax estimator of a composite graphon with respect to squared error loss; (ii) that spectral clustering is able to consistently detect the latent membership when the block-wise constant composite graphon is considered under additional conditions; and (iii) we are able to construct models with heavy-tailed empirical degrees under specific scenarios and parameter choices. This explores why and under which general conditions non-exchangeable network data can be described by a stochastic block model. The new modelling framework is able to capture empirically important characteristics of network data such as sparsity combined with heavy tailed degree distribution, and add understanding as to what generative mechanisms will make them arise. Keywords: statistical network analysis, exchangeable arrays, stochastic block model, nonlinear stochastic processes. △ Less

Submitted 28 July, 2020; originally announced July 2020.

MSC Class: 62G05; 62R07; 62E20; 62G20; secondary 53C20

arXiv:2006.08828 [pdf, other]

doi 10.1109/TAI.2021.3065011

Explainable AI for a No-Teardown Vehicle Component Cost Estimation: A Top-Down Approach

Authors: Ayman Moawad, Ehsan Islam, Namdoo Kim, Ram Vijayagopal, Aymeric Rousseau, Wei Biao Wu

Abstract: The broader ambition of this article is to popularize an approach for the fair distribution of the quantity of a system's output to its subsystems, while allowing for underlying complex subsystem level interactions. Particularly, we present a data-driven approach to vehicle price modeling and its component price estimation by leveraging a combination of concepts from machine learning and game theo… ▽ More The broader ambition of this article is to popularize an approach for the fair distribution of the quantity of a system's output to its subsystems, while allowing for underlying complex subsystem level interactions. Particularly, we present a data-driven approach to vehicle price modeling and its component price estimation by leveraging a combination of concepts from machine learning and game theory. We show an alternative to common teardown methodologies and surveying approaches for component and vehicle price estimation at the manufacturer's suggested retail price (MSRP) level that has the advantage of bypassing the uncertainties involved in 1) the gathering of teardown data, 2) the need to perform expensive and biased surveying, and 3) the need to perform retail price equivalent (RPE) or indirect cost multiplier (ICM) adjustments to mark up direct manufacturing costs to MSRP. This novel exercise not only provides accurate pricing of the technologies at the customer level, but also shows the, a priori known, large gaps in pricing strategies between manufacturers, vehicle sizes, classes, market segments, and other factors. There is also clear synergism or interaction between the price of certain technologies and other specifications present in the same vehicle. Those (unsurprising) results are indication that old methods of manufacturer-level component costing, aggregation, and the application of a flat and rigid RPE or ICM adjustment factor should be carefully examined. The findings are based on an extensive database, developed by Argonne National Laboratory, that includes more than 64,000 vehicles covering MY1990 to MY2020 over hundreds of vehicle specs. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 17 pages, 18 figures

Journal ref: IEEE Transactions on Artificial Intelligence (Volume: 2, Issue: 2, April 2021, Page(s): 185 - 199)

arXiv:2005.05117 [pdf, other]

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

Authors: Bojan Karlaš, Peng Li, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang

Abstract: Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, and their impact on ML applications remains elusive. In this paper, we present a formal study of this impact by extending the notion of Certain Answers for Codd tables, which has been expl… ▽ More Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, and their impact on ML applications remains elusive. In this paper, we present a formal study of this impact by extending the notion of Certain Answers for Codd tables, which has been explored by the database research community for decades, into the field of machine learning. Specifically, we focus on classification problems and propose the notion of "Certain Predictions" (CP) -- a test data example can be certainly predicted (CP'ed) if all possible classifiers trained on top of all possible worlds induced by the incompleteness of data would yield the same prediction. We study two fundamental CP queries: (Q1) checking query that determines whether a data example can be CP'ed; and (Q2) counting query that computes the number of classifiers that support a particular prediction (i.e., label). Given that general solutions to CP queries are, not surprisingly, hard without assumption over the type of classifier, we further present a case study in the context of nearest neighbor (NN) classifiers, where efficient solutions to CP queries can be developed -- we show that it is possible to answer both queries in linear or polynomial time over exponentially many possible worlds. We demonstrate one example use case of CP in the important application of "data cleaning for machine learning (DC for ML)." We show that our proposed CPClean approach built based on CP can often significantly outperform existing techniques in terms of classification accuracy with mild manual cleaning effort. △ Less

Submitted 12 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

arXiv:2005.00397 [pdf, other]

doi 10.1016/j.jbi.2020.103547

Multi-View Self-Attention for Interpretable Drug-Target Interaction Prediction

Authors: Brighter Agyemang, Wei-** Wu, Michael Yelpengne Kpiebaareh, Zhihua Lei, Ebenezer Nanor, Lei Chen

Abstract: The drug discovery stage is a vital aspect of the drug development process and forms part of the initial stages of the development pipeline. In recent times, machine learning-based methods are actively being used to model drug-target interactions for rational drug discovery due to the successful application of these methods in other domains. In machine learning approaches, the numerical representa… ▽ More The drug discovery stage is a vital aspect of the drug development process and forms part of the initial stages of the development pipeline. In recent times, machine learning-based methods are actively being used to model drug-target interactions for rational drug discovery due to the successful application of these methods in other domains. In machine learning approaches, the numerical representation of molecules is critical to the performance of the model. While significant progress has been made in molecular representation engineering, this has resulted in several descriptors for both targets and compounds. Also, the interpretability of model predictions is a vital feature that could have several pharmacological applications. In this study, we propose a self-attention-based multi-view representation learning approach for modeling drug-target interactions. We evaluated our approach using three benchmark kinase datasets and compared the proposed method to some baseline models. Our experimental results demonstrate the ability of our method to achieve competitive prediction performance and offer biologically plausible drug-target interaction interpretations. △ Less

Submitted 23 August, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

arXiv:2003.12628 [pdf, other]

MCFlow: Monte Carlo Flow Models for Data Imputation

Authors: Trevor W. Richardson, Wencheng Wu, Lei Lin, Beilei Xu, Edgar A. Bernal

Abstract: We consider the topic of data imputation, a foundational task in machine learning that addresses issues with missing data. To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Carlo sampling. We address the causality dilemma that arises when training models with incomplete data by introducing an iterative learning scheme which… ▽ More We consider the topic of data imputation, a foundational task in machine learning that addresses issues with missing data. To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Carlo sampling. We address the causality dilemma that arises when training models with incomplete data by introducing an iterative learning scheme which alternately updates the density estimate and the values of the missing entries in the training data. We provide extensive empirical validation of the effectiveness of the proposed method on standard multivariate and image datasets, and benchmark its performance against state-of-the-art alternatives. We demonstrate that MCFlow is superior to competing methods in terms of the quality of the imputed data, as well as with regards to its ability to preserve the semantic structure of the data. △ Less

Submitted 27 March, 2020; originally announced March 2020.

Journal ref: 2020 Computer Vision and Pattern Recognition (CVPR)

arXiv:2003.09902 [pdf, other]

doi 10.1109/TKDE.2020.3033829

K-Core based Temporal Graph Convolutional Network for Dynamic Graphs

Authors: **gxin Liu, Chang Xu, Chang Yin, Weiqiang Wu, You Song

Abstract: Graph representation learning is a fundamental task in various applications that strives to learn low-dimensional embeddings for nodes that can preserve graph topology information. However, many existing methods focus on static graphs while ignoring evolving graph patterns. Inspired by the success of graph convolutional networks(GCNs) in static graph embedding, we propose a novel k-core based temp… ▽ More Graph representation learning is a fundamental task in various applications that strives to learn low-dimensional embeddings for nodes that can preserve graph topology information. However, many existing methods focus on static graphs while ignoring evolving graph patterns. Inspired by the success of graph convolutional networks(GCNs) in static graph embedding, we propose a novel k-core based temporal graph convolutional network, the CTGCN, to learn node representations for dynamic graphs. In contrast to previous dynamic graph embedding methods, CTGCN can preserve both local connective proximity and global structural similarity while simultaneously capturing graph dynamics. In the proposed framework, the traditional graph convolution is generalized into two phases, feature transformation and feature aggregation, which gives the CTGCN more flexibility and enables the CTGCN to learn connective and structural information under the same framework. Experimental results on 7 real-world graphs demonstrate that the CTGCN outperforms existing state-of-the-art graph embedding methods in several tasks, including link prediction and structural role classification. The source code of this work can be obtained from \url{https://github.com/jhljx/CTGCN}. △ Less

Submitted 6 November, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

arXiv:2003.02681 [pdf, other]

Stochastic Linear Contextual Bandits with Diverse Contexts

Authors: Weiqiang Wu, **g Yang, Cong Shen

Abstract: In this paper, we investigate the impact of context diversity on stochastic linear contextual bandits. As opposed to the previous view that contexts lead to more difficult bandit learning, we show that when the contexts are sufficiently diverse, the learner is able to utilize the information obtained during exploitation to shorten the exploration process, thus achieving reduced regret. We design t… ▽ More In this paper, we investigate the impact of context diversity on stochastic linear contextual bandits. As opposed to the previous view that contexts lead to more difficult bandit learning, we show that when the contexts are sufficiently diverse, the learner is able to utilize the information obtained during exploitation to shorten the exploration process, thus achieving reduced regret. We design the LinUCB-d algorithm, and propose a novel approach to analyze its regret performance. The main theoretical result is that under the diverse context assumption, the cumulative expected regret of LinUCB-d is bounded by a constant. As a by-product, our results improve the previous understanding of LinUCB and strengthen its performance guarantee. △ Less

Submitted 5 March, 2020; originally announced March 2020.

Comments: Accepted to AISTATS 2020

arXiv:2002.03979 [pdf, other]

Online Covariance Matrix Estimation in Stochastic Gradient Descent

Authors: Wanrong Zhu, Xi Chen, Wei Biao Wu

Abstract: The stochastic gradient descent (SGD) algorithm is widely used for parameter estimation, especially for huge data sets and online learning. While this recursive algorithm is popular for computation and memory efficiency, quantifying variability and randomness of the solutions has been rarely studied. This paper aims at conducting statistical inference of SGD-based estimates in an online setting. I… ▽ More The stochastic gradient descent (SGD) algorithm is widely used for parameter estimation, especially for huge data sets and online learning. While this recursive algorithm is popular for computation and memory efficiency, quantifying variability and randomness of the solutions has been rarely studied. This paper aims at conducting statistical inference of SGD-based estimates in an online setting. In particular, we propose a fully online estimator for the covariance matrix of averaged SGD iterates (ASGD) only using the iterates from SGD. We formally establish our online estimator's consistency and show that the convergence rate is comparable to offline counterparts. Based on the classic asymptotic normality results of ASGD, we construct asymptotically valid confidence intervals for model parameters. Upon receiving new observations, we can quickly update the covariance matrix estimate and the confidence intervals. This approach fits in an online setting and takes full advantage of SGD: efficiency in computation and memory. △ Less

Submitted 22 June, 2021; v1 submitted 10 February, 2020; originally announced February 2020.

arXiv:2001.00419 [pdf, other]

Prediction in locally stationary time series

Authors: Holger Dette, Weichi Wu

Abstract: We develop an estimator for the high-dimensional covariance matrix of a locally stationary process with a smoothly varying trend and use this statistic to derive consistent predictors in non-stationary time series. In contrast to the currently available methods for this problem the predictor developed here does not rely on fitting an autoregressive model and does not require a vanishing trend. The… ▽ More We develop an estimator for the high-dimensional covariance matrix of a locally stationary process with a smoothly varying trend and use this statistic to derive consistent predictors in non-stationary time series. In contrast to the currently available methods for this problem the predictor developed here does not rely on fitting an autoregressive model and does not require a vanishing trend. The finite sample properties of the new methodology are illustrated by means of a simulation study and a financial indices study. △ Less

Submitted 3 January, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

arXiv:1912.09536 [pdf, other]

Data Science through the looking glass and what we found there

Authors: Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Matteo Interlandi, Avrilia Floratou, Konstantinos Karanasos, Wentao Wu, Ce Zhang, Subru Krishnan, Carlo Curino, Markus Weimer

Abstract: The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to c… ▽ More The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to capture this panorama through a wide-angle lens, by performing the largest analysis of DS projects to date, focusing on questions that can help determine investments on either side. Specifically, we download and analyze: (a) over 6M Python notebooks publicly available on GITHUB, (b) over 2M enterprise DS pipelines developed within COMPANYX, and (c) the source code and metadata of over 900 releases from 12 important DS libraries. The analysis we perform ranges from coarse-grained statistical characterizations to analysis of library imports, pipelines, and comparative studies across datasets and time. We report a large number of measurements for our readers to interpret, and dare to draw a few (actionable, yet subjective) conclusions on (a) what systems builders should focus on to better serve practitioners, and (b) what technologies should practitioners bet on given current trends. We plan to automate this analysis and release associated tools and results periodically. △ Less

Submitted 19 December, 2019; originally announced December 2019.

arXiv:1912.01163 [pdf, ps, other]

doi 10.1109/ICCWAMTIP47768.2019.9067510

Drug-Target Indication Prediction by Integrating End-to-End Learning and Fingerprints

Authors: Brighter Agyemang, Wei-** Wu, Michael Y. Kpiebaareh, Ebenezer Nanor

Abstract: Computer-Aided Drug Discovery research has proven to be a promising direction in drug discovery. In recent years, Deep Learning approaches have been applied to problems in the domain such as Drug-Target Interaction Prediction and have shown improvements over traditional screening methods. An existing challenge is how to represent compound-target pairs in deep learning models. While several represe… ▽ More Computer-Aided Drug Discovery research has proven to be a promising direction in drug discovery. In recent years, Deep Learning approaches have been applied to problems in the domain such as Drug-Target Interaction Prediction and have shown improvements over traditional screening methods. An existing challenge is how to represent compound-target pairs in deep learning models. While several representation methods exist, such descriptor schemes tend to complement one another in many instances, as reported in the literature. In this study, we propose a multi-view architecture trained adversarially to leverage this complementary behavior by integrating both differentiable and predefined molecular descriptors. We conduct experiments on clinically relevant benchmark datasets to demonstrate the potential of our approach. △ Less

Submitted 5 December, 2019; v1 submitted 2 December, 2019; originally announced December 2019.

Comments: Accepted at IEEE ICCWAMTIP 2019

arXiv:1910.08699 [pdf, ps, other]

doi 10.1016/j.jclepro.2019.118573

Application of a new information priority accumulated grey model with time power to predict short-term wind turbine capacity

Authors: Jie Xia, Xin Ma, Wenqing Wu, Baolian Huang, Wanpeng Li

Abstract: Wind energy makes a significant contribution to global power generation. Predicting wind turbine capacity is becoming increasingly crucial for cleaner production. For this purpose, a new information priority accumulated grey model with time power is proposed to predict short-term wind turbine capacity. Firstly, the computational formulas for the time response sequence and the prediction values are… ▽ More Wind energy makes a significant contribution to global power generation. Predicting wind turbine capacity is becoming increasingly crucial for cleaner production. For this purpose, a new information priority accumulated grey model with time power is proposed to predict short-term wind turbine capacity. Firstly, the computational formulas for the time response sequence and the prediction values are deduced by grey modeling technique and the definite integral trapezoidal approximation formula. Secondly, an intelligent algorithm based on particle swarm optimization is applied to determine the optimal nonlinear parameters of the novel model. Thirdly, three real numerical examples are given to examine the accuracy of the new model by comparing with six existing prediction models. Finally, based on the wind turbine capacity from 2007 to 2017, the proposed model is established to predict the total wind turbine capacity in Europe, North America, Asia, and the world. The numerical results reveal that the novel model is superior to other forecasting models. It has a great advantage for small samples with new characteristic behaviors. Besides, reasonable suggestions are put forward from the standpoint of the practitioners and governments, which has high potential to advance the sustainable improvement of clean energy production in the future. △ Less

Submitted 19 October, 2019; originally announced October 2019.

Journal ref: Journal of Cleaner Production, Volume 244, 2020, 118573

arXiv:1910.00727 [pdf, other]

Analyzing and Improving Neural Networks by Generating Semantic Counterexamples through Differentiable Rendering

Authors: Lakshya Jain, Varun Chandrasekaran, Uyeong Jang, Wilson Wu, Andrew Lee, Andy Yan, Steven Chen, Somesh Jha, Sanjit A. Seshia

Abstract: Even as deep neural networks (DNNs) have achieved remarkable success on vision-related tasks, their performance is brittle to transformations in the input. Of particular interest are semantic transformations that model changes that have a basis in the physical world, such as rotations, translations, changes in lighting or camera pose. In this paper, we show how differentiable rendering can be util… ▽ More Even as deep neural networks (DNNs) have achieved remarkable success on vision-related tasks, their performance is brittle to transformations in the input. Of particular interest are semantic transformations that model changes that have a basis in the physical world, such as rotations, translations, changes in lighting or camera pose. In this paper, we show how differentiable rendering can be utilized to generate images that are informative, yet realistic, and which can be used to analyze DNN performance and improve its robustness through data augmentation. Given a differentiable renderer and a DNN, we show how to use off-the-shelf attacks from adversarial machine learning to generate semantic counterexamples -- images where semantic features are changed as to produce misclassifications or misdetections. We validate our approach on DNNs for image classification and object detection. For classification, we show that semantic counterexamples, when used to augment the dataset, (i) improve generalization performance (ii) enhance robustness to semantic transformations, and (iii) transfer between models. Additionally, in comparison to sampling-based semantic augmentation, our technique generates more informative data in a sample efficient manner. △ Less

Submitted 17 July, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Showing 1–50 of 92 results for author: Wu, W