-
Learning with User-Level Local Differential Privacy
Authors:
Puning Zhao,
Li Shen,
Rongfei Fan,
Qingming Li,
Huiwen Wu,
Jiafei Wu,
Zhe Liu
Abstract:
User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially dif…
▽ More
User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially different. In this paper, we first analyze the mean estimation problem and then apply it to stochastic optimization, classification, and regression. In particular, we propose adaptive strategies to achieve optimal performance at all privacy levels. Moreover, we also obtain information-theoretic lower bounds, which show that the proposed methods are minimax optimal up to logarithmic factors. Unlike the central DP model, where user-level DP always leads to slower convergence, our result shows that under the local model, the convergence rates are nearly the same between user-level and item-level cases for distributions with bounded support. For heavy-tailed distributions, the user-level rate is even faster than the item-level one.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Landmark Alternating Diffusion
Authors:
Sing-Yuan Yeh,
Hau-Tieng Wu,
Ronen Talmon,
Mao-Pei Tsui
Abstract:
Alternating Diffusion (AD) is a commonly applied diffusion-based sensor fusion algorithm. While it has been successfully applied to various problems, its computational burden remains a limitation. Inspired by the landmark diffusion idea considered in the Robust and Scalable Embedding via Landmark Diffusion (ROSELAND), we propose a variation of AD, called Landmark AD (LAD), which captures the essen…
▽ More
Alternating Diffusion (AD) is a commonly applied diffusion-based sensor fusion algorithm. While it has been successfully applied to various problems, its computational burden remains a limitation. Inspired by the landmark diffusion idea considered in the Robust and Scalable Embedding via Landmark Diffusion (ROSELAND), we propose a variation of AD, called Landmark AD (LAD), which captures the essence of AD while offering superior computational efficiency. We provide a series of theoretical analyses of LAD under the manifold setup and apply it to the automatic sleep stage annotation problem with two electroencephalogram channels to demonstrate its application.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Convergence analysis of t-SNE as a gradient flow for point cloud on a manifold
Authors:
Seonghyeon Jeong,
Hau-Tieng Wu
Abstract:
We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a…
▽ More
We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Design a Metric Robust to Complicated High Dimensional Noise for Efficient Manifold Denoising
Authors:
Hau-Tieng Wu
Abstract:
In this manuscript, we propose an efficient manifold denoiser based on landmark diffusion and optimal shrinkage under the complicated high dimensional noise and compact manifold setup. It is flexible to handle several setups, including the high ambient space dimension with a manifold embedding that occupies a subspace of high or low dimensions, and the noise could be colored and dependent. A syste…
▽ More
In this manuscript, we propose an efficient manifold denoiser based on landmark diffusion and optimal shrinkage under the complicated high dimensional noise and compact manifold setup. It is flexible to handle several setups, including the high ambient space dimension with a manifold embedding that occupies a subspace of high or low dimensions, and the noise could be colored and dependent. A systematic comparison with other existing algorithms on both simulated and real datasets is provided. This manuscript is mainly algorithmic and we report several existing tools and numerical results. Theoretical guarantees and more comparisons will be reported in the official paper of this manuscript.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Energy based diffusion generator for efficient sampling of Boltzmann distributions
Authors:
Yan Wang,
Ling Guo,
Hao Wu,
Tao Zhou
Abstract:
We introduce a novel sampler called the energy based diffusion generator for generating samples from arbitrary target distributions. The sampling model employs a structure similar to a variational autoencoder, utilizing a decoder to transform latent variables from a simple distribution into random variables approximating the target distribution, and we design an encoder based on the diffusion mode…
▽ More
We introduce a novel sampler called the energy based diffusion generator for generating samples from arbitrary target distributions. The sampling model employs a structure similar to a variational autoencoder, utilizing a decoder to transform latent variables from a simple distribution into random variables approximating the target distribution, and we design an encoder based on the diffusion model. Leveraging the powerful modeling capacity of the diffusion model for complex distributions, we can obtain an accurate variational estimate of the Kullback-Leibler divergence between the distributions of the generated samples and the target. Moreover, we propose a decoder based on generalized Hamiltonian dynamics to further enhance sampling performance. Through empirical evaluation, we demonstrate the effectiveness of our method across various complex distribution functions, showcasing its superiority compared to existing methods.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Stability of Random Forests and Coverage of Random-Forest Prediction Intervals
Authors:
Yan Wang,
Huaiqing Wu,
Dan Nettleton
Abstract:
We establish stability of random forests under the mild condition that the squared response ($Y^2$) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed $Y^2$. Us…
▽ More
We establish stability of random forests under the mild condition that the squared response ($Y^2$) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed $Y^2$. Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when $Y$ is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that random forests, with its stability property, is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Ridge detection for nonstationary multicomponent signals with time-varying wave-shape functions and its applications
Authors:
Yan-Wei Su,
Gi-Ren Liu,
Yuan-Chung Sheu,
Hau-Tieng Wu
Abstract:
We introduce a novel ridge detection algorithm for time-frequency (TF) analysis, particularly tailored for intricate nonstationary time series encompassing multiple non-sinusoidal oscillatory components. The algorithm is rooted in the distinctive geometric patterns that emerge in the TF domain due to such non-sinusoidal oscillations. We term this method \textit{shape-adaptive mode decomposition-ba…
▽ More
We introduce a novel ridge detection algorithm for time-frequency (TF) analysis, particularly tailored for intricate nonstationary time series encompassing multiple non-sinusoidal oscillatory components. The algorithm is rooted in the distinctive geometric patterns that emerge in the TF domain due to such non-sinusoidal oscillations. We term this method \textit{shape-adaptive mode decomposition-based multiple harmonic ridge detection} (\textsf{SAMD-MHRD}). A swift implementation is available when supplementary information is at hand. We demonstrate the practical utility of \textsf{SAMD-MHRD} through its application to a real-world challenge. We employ it to devise a cutting-edge walking activity detection algorithm, leveraging accelerometer signals from an inertial measurement unit across diverse body locations of a moving subject.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Reaction coordinate flows for model reduction of molecular kinetics
Authors:
Hao Wu,
Frank Noé
Abstract:
In this work, we introduce a flow based machine learning approach, called reaction coordinate (RC) flow, for discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast…
▽ More
In this work, we introduce a flow based machine learning approach, called reaction coordinate (RC) flow, for discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Qini Curves for Multi-Armed Treatment Rules
Authors:
Erik Sverdrup,
Han Wu,
Susan Athey,
Stefan Wager
Abstract:
Qini curves have emerged as an attractive and popular approach for evaluating the benefit of data-driven targeting rules for treatment allocation. We propose a generalization of the Qini curve to multiple costly treatment arms, that quantifies the value of optimally selecting among both units and treatment arms at different budget levels. We develop an efficient algorithm for computing these curve…
▽ More
Qini curves have emerged as an attractive and popular approach for evaluating the benefit of data-driven targeting rules for treatment allocation. We propose a generalization of the Qini curve to multiple costly treatment arms, that quantifies the value of optimally selecting among both units and treatment arms at different budget levels. We develop an efficient algorithm for computing these curves and propose bootstrap-based confidence intervals that are exact in large samples for any point on the curve. These confidence intervals can be used to conduct hypothesis tests comparing the value of treatment targeting using an optimal combination of arms with using just a subset of arms, or with a non-targeting assignment rule ignoring covariates, at different budget levels. We demonstrate the statistical performance in a simulation experiment and an application to treatment targeting for election turnout.
△ Less
Submitted 23 April, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Topological Data Analysis Assisted Automated Sleep Stage Scoring Using Airflow Signals
Authors:
Yu-Min Chung,
Whitney K. Huang,
Hau-Tieng Wu
Abstract:
Objective: Breathing pattern variability (BPV), as a universal physiological feature, encodes rich health information. We aim to show that, a high-quality automatic sleep stage scoring based on a proper quantification of BPV extracting from the single airflow signal can be achieved.
Methods: Topological data analysis (TDA) is applied to characterize BPV from the intrinsically nonstationary airfl…
▽ More
Objective: Breathing pattern variability (BPV), as a universal physiological feature, encodes rich health information. We aim to show that, a high-quality automatic sleep stage scoring based on a proper quantification of BPV extracting from the single airflow signal can be achieved.
Methods: Topological data analysis (TDA) is applied to characterize BPV from the intrinsically nonstationary airflow signal, where the extracted features are used to train an automatic sleep stage scoring model using the XGBoost learner. The noise and artifacts commonly present in the airflow signal are recycled to enhance the performance of the trained system. The state-of-the-art approach is implemented for a comparison.
Results: When applied to 30 whole night polysomnogram signals with standard annotations, the leave-one-subject-out cross-validation shows that the proposed features (overall accuracy 78.8\%$\pm$8.7\% and Cohen's kappa 0.56$\pm 0.15$) outperforms those considered in the state-of-the-art work (overall accuracy 75.0\%$\pm$9.6\% and Cohen's kappa 0.50$\pm 0.15$) when applied to automatically score wake, rapid eyeball movement (REM) and non-REM (NREM). The TDA features are shown to contain complementary information to the traditional features commonly used in the literature via examining the feature importance. The respiratory quality index is found to be essential in the trained system.
Conclusion: The proposed TDA-assisted automatic annotation system can accurately distinguish wake, REM and NREM from the airflow signal.
Significance: Since only one single airflow channel is needed and BPV is universal, the result suggests that the TDA-assisted signal processing has potential to be applied to other biomedical signals and homecare problems other than the sleep stage annotation.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Multi-scale wavelet coherence with its applications
Authors:
Haibo Wu,
MI Knight,
H Ombao
Abstract:
The goal in this paper is to develop a novel statistical approach to characterize functional interactions between channels in a brain network. Wavelets are effective for capturing transient properties of non-stationary signals because they have compact support that can be compressed or stretched according to the dynamic properties of the signal. Wavelets give a multi-scale decomposition of signals…
▽ More
The goal in this paper is to develop a novel statistical approach to characterize functional interactions between channels in a brain network. Wavelets are effective for capturing transient properties of non-stationary signals because they have compact support that can be compressed or stretched according to the dynamic properties of the signal. Wavelets give a multi-scale decomposition of signals and thus can be few for studying potential cross-scale interactions between signals. To achieve this, we develop the scale-specific sub-processes of a multivariate locally stationary wavelet stochastic process. Under this proposed framework, a novel cross-scale dependence measure is developed. This provides a measure for dependence structure of components at different scales of multivariate time series. Extensive simulation studies are conducted to demonstrate that the theoretical properties hold in practice. The proposed cross-scale analysis is applied to the electroencephalogram (EEG) data to study alterations in the functional connectivity structure in children diagnosed with attention deficit hyperactivity disorder (ADHD). Our approach identified novel interesting cross-scale interactions between channels in the brain network. The proposed framework can be applied to other signals, which can also capture the statistical association between the stocks at different time scales.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
A Constrained BA Algorithm for Rate-Distortion and Distortion-Rate Functions
Authors:
Lingyi Chen,
Shitong Wu,
Wenhao Ye,
Huihui Wu,
Wenyi Zhang,
Hao Wu,
Bo Bai
Abstract:
The Blahut-Arimoto (BA) algorithm has played a fundamental role in the numerical computation of rate-distortion (RD) functions. This algorithm possesses a desirable monotonic convergence property by alternatively minimizing its Lagrangian with a fixed multiplier. In this paper, we propose a novel modification of the BA algorithm, wherein the multiplier is updated through a one-dimensional root-fin…
▽ More
The Blahut-Arimoto (BA) algorithm has played a fundamental role in the numerical computation of rate-distortion (RD) functions. This algorithm possesses a desirable monotonic convergence property by alternatively minimizing its Lagrangian with a fixed multiplier. In this paper, we propose a novel modification of the BA algorithm, wherein the multiplier is updated through a one-dimensional root-finding step using a monotonic univariate function, efficiently implemented by Newton's method in each iteration. Consequently, the modified algorithm directly computes the RD function for a given target distortion, without exploring the entire RD curve as in the original BA algorithm. Moreover, this modification presents a versatile framework, applicable to a wide range of problems, including the computation of distortion-rate (DR) functions. Theoretical analysis shows that the outputs of the modified algorithms still converge to the solutions of the RD and DR functions with rate $O(1/n)$, where $n$ is the number of iterations. Additionally, these algorithms provide $\varepsilon$-approximation solutions with $O\left(\frac{MN\log N}{\varepsilon}(1+\log |\log \varepsilon|)\right)$ arithmetic operations, where $M,N$ are the sizes of source and reproduced alphabets respectively. Numerical experiments demonstrate that the modified algorithms exhibit significant acceleration compared with the original BA algorithms and showcase commendable performance across classical source distributions such as discretized Gaussian, Laplacian and uniform sources.
△ Less
Submitted 18 January, 2024; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Detecting Interference in A/B Testing with Increasing Allocation
Authors:
Kevin Han,
Shuangning Li,
Jialiang Mao,
Han Wu
Abstract:
In the past decade, the technology industry has adopted online randomized controlled experiments (a.k.a. A/B testing) to guide product development and make business decisions. In practice, A/B tests are often implemented with increasing treatment allocation: the new treatment is gradually released to an increasing number of units through a sequence of randomized experiments. In scenarios such as e…
▽ More
In the past decade, the technology industry has adopted online randomized controlled experiments (a.k.a. A/B testing) to guide product development and make business decisions. In practice, A/B tests are often implemented with increasing treatment allocation: the new treatment is gradually released to an increasing number of units through a sequence of randomized experiments. In scenarios such as experimenting in a social network setting or in a bipartite online marketplace, interference among units may exist, which can harm the validity of simple inference procedures. In this work, we introduce a widely applicable procedure to test for interference in A/B testing with increasing allocation. Our procedure can be implemented on top of an existing A/B testing platform with a separate flow and does not require a priori a specific interference mechanism. In particular, we introduce two permutation tests that are valid under different assumptions. Firstly, we introduce a general statistical test for interference requiring no additional assumption. Secondly, we introduce a testing procedure that is valid under a time fixed effect assumption. The testing procedure is of very low computational complexity, it is powerful, and it formalizes a heuristic algorithm implemented already in industry. We demonstrate the performance of the proposed testing procedure through simulations on synthetic data. Finally, we discuss one application at LinkedIn, where a screening step is implemented to detect potential interference in all their marketplace experiments with the proposed methods in the paper.
△ Less
Submitted 24 March, 2023; v1 submitted 6 November, 2022;
originally announced November 2022.
-
Geometric Scattering on Measure Spaces
Authors:
Joyce Chew,
Matthew Hirn,
Smita Krishnaswamy,
Deanna Needell,
Michael Perlmutter,
Holly Steach,
Siddharth Viswanath,
Hau-Tieng Wu
Abstract:
The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and man…
▽ More
The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and manifolds, leading to the emerging field of geometric deep learning. In order to improve our understanding of the architectures used in this new field, several papers have proposed generalizations of the scattering transform for non-Euclidean data structures such as undirected graphs and compact Riemannian manifolds without boundary.
In this paper, we introduce a general, unified model for geometric scattering on measure spaces. Our proposed framework includes previous work on geometric scattering as special cases but also applies to more general settings such as directed graphs, signed graphs, and manifolds with boundary. We propose a new criterion that identifies to which groups a useful representation should be invariant and show that this criterion is sufficient to guarantee that the scattering transform has desirable stability and invariance properties. Additionally, we consider finite measure spaces that are obtained from randomly sampling an unknown manifold. We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold. Moreover, we use a diffusion-maps based approach to prove quantitative estimates on the rate of convergence of one of these approximations as the number of sample points tends to infinity. Lastly, we showcase the utility of our method on spherical images, directed graphs, and on high-dimensional single-cell data.
△ Less
Submitted 13 October, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
An Optimal Transport Approach to the Computation of the LM Rate
Authors:
Wenhao Ye,
Huihui Wu,
Shitong Wu,
Yizhu Wang,
Wenyi Zhang,
Hao Wu,
Bo Bai
Abstract:
Mismatch capacity characterizes the highest information rate for a channel under a prescribed decoding metric, and is thus a highly relevant fundamental performance metric when dealing with many practically important communication scenarios. Compared with the frequently used generalized mutual information (GMI), the LM rate has been known as a tighter lower bound of the mismatch capacity. The comp…
▽ More
Mismatch capacity characterizes the highest information rate for a channel under a prescribed decoding metric, and is thus a highly relevant fundamental performance metric when dealing with many practically important communication scenarios. Compared with the frequently used generalized mutual information (GMI), the LM rate has been known as a tighter lower bound of the mismatch capacity. The computation of the LM rate, however, has been a difficult task, due to the fact that the LM rate involves a maximization over a function of the channel input, which becomes challenging as the input alphabet size grows, and direct numerical methods (e.g., interior point methods) suffer from intensive memory and computational resource requirements. Noting that the computation of the LM rate can also be formulated as an entropy-based optimization problem with constraints, in this work, we transform the task into an optimal transport (OT) problem with an extra constraint. This allows us to efficiently and accurately accomplish our task by using the well-known Sinkhorn algorithm. Indeed, only a few iterations are required for convergence, due to the fact that the formulated problem does not contain additional regularization terms. Moreover, we convert the extra constraint into a root-finding procedure for a one-dimensional monotonic function. Numerical experiments demonstrate the feasibility and efficiency of our OT approach to the computation of the LM rate.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Data-Driven optimal shrinkage of singular values under high-dimensional noise with separable covariance structure with application
Authors:
Pei-Chun Su,
Hau-Tieng Wu
Abstract:
We develop a data-driven optimal shrinkage algorithm for matrix denoising in the presence of high-dimensional noise with a separable covariance structure; that is, the noise is colored and dependent across samples. The algorithm, coined {\em extended OptShrink} (eOptShrink) depends on the asymptotic behavior of singular values and singular vectors of the random matrix associated with the noisy dat…
▽ More
We develop a data-driven optimal shrinkage algorithm for matrix denoising in the presence of high-dimensional noise with a separable covariance structure; that is, the noise is colored and dependent across samples. The algorithm, coined {\em extended OptShrink} (eOptShrink) depends on the asymptotic behavior of singular values and singular vectors of the random matrix associated with the noisy data. Based on the developed theory, including the sticking property of non-outlier singular values and delocalization of the non-outlier singular vectors associated with weak signals with a convergence rate, and the spectral behavior of outlier singular values and vectors, we develop three estimators, each of these has its own interest. First, we design a novel rank estimator, based on which we provide an estimator for the spectral distribution of the pure noise matrix, and hence the optimal shrinker called eOptShrink. In this algorithm we do not need to estimate the separable covariance structure of the noise. A theoretical guarantee of these estimators with a convergence rate is given. On the application side, in addition to a series of numerical simulations with a comparison with various state-of-the-art optimal shrinkage algorithms, we apply eOptShrink to extract maternal and fetal electrocardiograms from the single channel trans-abdominal maternal electrocardiogram.
△ Less
Submitted 11 May, 2024; v1 submitted 7 July, 2022;
originally announced July 2022.
-
The Effective Sample Size in Bayesian Information Criterion for Level-Specific Fixed and Random Effects Selection in a Two-Level Nested Model
Authors:
Sun-Joo Cho,
Hao Wu,
Matthew Naveiras
Abstract:
Popular statistical software provides Bayesian information criterion (BIC) for multilevel models or linear mixed models. However, it has been observed that the combination of statistical literature and software documentation has led to discrepancies in the formulas of the BIC and uncertainties of the proper use of the BIC in selecting a multilevel model with respect to level-specific fixed and ran…
▽ More
Popular statistical software provides Bayesian information criterion (BIC) for multilevel models or linear mixed models. However, it has been observed that the combination of statistical literature and software documentation has led to discrepancies in the formulas of the BIC and uncertainties of the proper use of the BIC in selecting a multilevel model with respect to level-specific fixed and random effects. These discrepancies and uncertainties result from different specifications of sample size in the BIC's penalty term for multilevel models. In this study, we derive the BIC's penalty term for level-specific fixed and random effect selection in a two-level nested design. In this new version of BIC, called BIC_E, this penalty term is decomposed into two parts if the random effect variance-covariance matrix has full rank: (a) a term with the log of average sample size per cluster whose multiplier involves the overlap** number of dimensions between the column spaces of the random and fixed effect design matrices and (b) the total number of parameters times the log of the total number of clusters. Furthermore, we study the behavior of BIC_E in the presence of redundant random effects. The use of BIC_E is illustrated with a textbook example data set and a numerical demonstration shows that the derived formulae adheres to empirical values.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
The Manifold Scattering Transform for High-Dimensional Point Cloud Data
Authors:
Joyce Chew,
Holly R. Steach,
Siddharth Viswanath,
Hau-Tieng Wu,
Matthew Hirn,
Deanna Needell,
Smita Krishnaswamy,
Michael Perlmutter
Abstract:
The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case…
▽ More
The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case of two-dimensional surfaces with predefined meshes. In this work, we present practical schemes, based on the theory of diffusion maps, for implementing the manifold scattering transform to datasets arising in naturalistic systems, such as single cell genetics, where the data is a high-dimensional point cloud modeled as lying on a low-dimensional manifold. We show that our methods are effective for signal classification and manifold classification tasks.
△ Less
Submitted 21 January, 2024; v1 submitted 20 June, 2022;
originally announced June 2022.
-
Restricted mean survival time regression model with time-dependent covariates
Authors:
Chengfeng Zhang,
Hongji Wu,
Baoyi Huang,
Hao Yuan,
Yawen Hou,
Zheng Chen
Abstract:
In clinical or epidemiological follow-up studies, methods based on time scale indicators such as the restricted mean survival time (RMST) have been developed to some extent. Compared with traditional hazard rate indicator system methods, the RMST is easier to interpret and does not require the proportional hazard assumption. To date, regression models based on the RMST are indirect or direct model…
▽ More
In clinical or epidemiological follow-up studies, methods based on time scale indicators such as the restricted mean survival time (RMST) have been developed to some extent. Compared with traditional hazard rate indicator system methods, the RMST is easier to interpret and does not require the proportional hazard assumption. To date, regression models based on the RMST are indirect or direct models of the RMST and baseline covariates. However, time-dependent covariates are becoming increasingly common in follow-up studies. Based on the inverse probability of censoring weighting (IPCW) method, we developed a regression model of the RMST and time-dependent covariates. Through Monte Carlo simulation, we verified the estimation performance of the regression parameters of the proposed model. Compared with the time-dependent Cox model and the fixed (baseline) covariate RMST model, the time-dependent RMST model has a better prediction ability. Finally, an example of heart transplantation was used to verify the above conclusions.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Application of de-shape synchrosqueezing to estimate gait cadence from a single-sensor accelerometer placed in different body locations
Authors:
Hau-Tieng Wu,
Jacek Urbanek
Abstract:
Objective: Commercial and research-grade wearable devices have become increasingly popular over the past decade. Information extracted from devices using accelerometers is frequently summarized as ``number of steps" (commercial devices) or ``activity counts" (research-grade devices). Raw accelerometry data that can be easily extracted from accelerometers used in research, for instance ActiGraph GT…
▽ More
Objective: Commercial and research-grade wearable devices have become increasingly popular over the past decade. Information extracted from devices using accelerometers is frequently summarized as ``number of steps" (commercial devices) or ``activity counts" (research-grade devices). Raw accelerometry data that can be easily extracted from accelerometers used in research, for instance ActiGraph GT3X+, are frequently discarded. Approach: Our primary goal is proposing an innovative use of the {\em de-shape synchrosqueezing transform} to analyze the raw accelerometry data recorded from a single sensor installed in different body locations, particularly the wrist, to extract {\em gait cadence} when a subject is walking. The proposed methodology is tested on data collected in a semi-controlled experiment with 32 participants walking on a one-kilometer predefined course. Walking was executed on a flat surface as well as on the stairs (up and down). Main Results: The cadences of walking on a flat surface, ascending stairs, and descending stairs, determined from the wrist sensor, are 1.98$\pm$0.15 Hz, 1.99$\pm$0.26 Hz, and 2.03$\pm$0.26 Hz respectively. The cadences are 1.98$\pm$0.14 Hz, 1.97$\pm$0.25 Hz, and 2.02$\pm$0.23 Hz, respectively if determined from the hip sensor, 1.98$\pm$0.14 Hz, 1.93$\pm$0.22 Hz and 2.06$\pm$0.24 Hz, respectively if determined from the left ankle sensor, and 1.98$\pm$0.14 Hz, 1.97$\pm$0.22 Hz, and 2.04$\pm$0.24 Hz, respectively if determined from the right ankle sensor. The difference is statistically significant indicating that the cadence is fastest while descending stairs and slowest when ascending stairs. Also, the standard deviation when the sensor is on the wrist is larger. These findings are in line with our expectations. Conclusion: We show that our proposed algorithm can extract the cadence with high accuracy, even when the sensor is placed on the wrist.
△ Less
Submitted 5 February, 2023; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Interpretable Latent Variables in Deep State Space Models
Authors:
Haoxuan Wu,
David S. Matteson,
Martin T. Wells
Abstract:
We introduce a new version of deep state-space models (DSSMs) that combines a recurrent neural network with a state-space framework to forecast time series data. The model estimates the observed series as functions of latent variables that evolve non-linearly through time. Due to the complexity and non-linearity inherent in DSSMs, previous works on DSSMs typically produced latent variables that ar…
▽ More
We introduce a new version of deep state-space models (DSSMs) that combines a recurrent neural network with a state-space framework to forecast time series data. The model estimates the observed series as functions of latent variables that evolve non-linearly through time. Due to the complexity and non-linearity inherent in DSSMs, previous works on DSSMs typically produced latent variables that are very difficult to interpret. Our paper focus on producing interpretable latent parameters with two key modifications. First, we simplify the predictive decoder by restricting the response variables to be a linear transformation of the latent variables plus some noise. Second, we utilize shrinkage priors on the latent variables to reduce redundancy and improve robustness. These changes make the latent variables much easier to understand and allow us to interpret the resulting latent variables as random effects in a linear mixed model. We show through two public benchmark datasets the resulting model improves forecasting performances.
△ Less
Submitted 19 May, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Partial Likelihood Thompson Sampling
Authors:
Han Wu,
Stefan Wager
Abstract:
We consider the problem of deciding how best to target and prioritize existing vaccines that may offer protection against new variants of an infectious disease. Sequential experiments are a promising approach; however, challenges due to delayed feedback and the overall ebb and flow of disease prevalence make available methods inapplicable for this task. We present a method, partial likelihood Thom…
▽ More
We consider the problem of deciding how best to target and prioritize existing vaccines that may offer protection against new variants of an infectious disease. Sequential experiments are a promising approach; however, challenges due to delayed feedback and the overall ebb and flow of disease prevalence make available methods inapplicable for this task. We present a method, partial likelihood Thompson sampling, that can handle these challenges. Our method involves running Thompson sampling with belief updates determined by partial likelihood each time we observe an event. To test our approach, we ran a semi-synthetic experiment based on 200 days of COVID-19 infection data in the US.
△ Less
Submitted 19 June, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Ensemble Method for Estimating Individualized Treatment Effects
Authors:
Kevin Wu Han,
Han Wu
Abstract:
In many medical and business applications, researchers are interested in estimating individualized treatment effects using data from a randomized experiment. For example in medical applications, doctors learn the treatment effects from clinical trials and in technology companies, researchers learn them from A/B testing experiments. Although dozens of machine learning models have been proposed for…
▽ More
In many medical and business applications, researchers are interested in estimating individualized treatment effects using data from a randomized experiment. For example in medical applications, doctors learn the treatment effects from clinical trials and in technology companies, researchers learn them from A/B testing experiments. Although dozens of machine learning models have been proposed for this task, it is challenging to determine which model will be best for the problem at hand because ground-truth treatment effects are unobservable. In contrast to several recent papers proposing methods to select one of these competing models, we propose an algorithm for aggregating the estimates from a diverse library of models. We compare ensembling to model selection on 43 benchmark datasets, and find that ensembling wins almost every time. Theoretically, we prove that our ensemble model is (asymptotically) at least as accurate as the best model under consideration, even if the number of candidate models is allowed to grow with the sample size.
△ Less
Submitted 27 February, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Thompson Sampling with Unrestricted Delays
Authors:
Han Wu,
Stefan Wager
Abstract:
We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms…
▽ More
We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms, and only depend on delays via selected quantiles of the delay distributions. Furthermore, in extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals, including methods specifically designed for settings with delayed feedback.
△ Less
Submitted 22 May, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Spatiotemporal Analysis Using Riemannian Composition of Diffusion Operators
Authors:
Tal Shnitzer,
Hau-Tieng Wu,
Ronen Talmon
Abstract:
Multivariate time-series have become abundant in recent years, as many data-acquisition systems record information through multiple sensors simultaneously. In this paper, we assume the variables pertain to some geometry and present an operator-based approach for spatiotemporal analysis. Our approach combines three components that are often considered separately: (i) manifold learning for building…
▽ More
Multivariate time-series have become abundant in recent years, as many data-acquisition systems record information through multiple sensors simultaneously. In this paper, we assume the variables pertain to some geometry and present an operator-based approach for spatiotemporal analysis. Our approach combines three components that are often considered separately: (i) manifold learning for building operators representing the geometry of the variables, (ii) Riemannian geometry of symmetric positive-definite matrices for multiscale composition of operators corresponding to different time samples, and (iii) spectral analysis of the composite operators for extracting different dynamic modes. We propose a method that is analogous to the classical wavelet analysis, which we term Riemannian multi-resolution analysis (RMRA). We provide some theoretical results on the spectral analysis of the composite operators, and we demonstrate the proposed method on simulations and on real data.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Drift vs Shift: Decoupling Trends and Changepoint Analysis
Authors:
Haoxuan Wu,
Toryn L. J. Schafer,
Sean Ryan,
David S. Matteson
Abstract:
We introduce a new approach for decoupling trends (drift) and changepoints (shifts) in time series. Our locally adaptive model-based approach for robustly decoupling combines Bayesian trend filtering and machine learning based regularization. An over-parameterized Bayesian dynamic linear model (DLM) is first applied to characterize drift. Then a weighted penalized likelihood estimator is paired wi…
▽ More
We introduce a new approach for decoupling trends (drift) and changepoints (shifts) in time series. Our locally adaptive model-based approach for robustly decoupling combines Bayesian trend filtering and machine learning based regularization. An over-parameterized Bayesian dynamic linear model (DLM) is first applied to characterize drift. Then a weighted penalized likelihood estimator is paired with the estimated DLM posterior distribution to identify shifts. We show how Bayesian DLMs specified with so-called shrinkage priors can provide smooth estimates of underlying trends in the presence of complex noise components. However, their inability to shrink exactly to zero inhibits direct changepoint detection. In contrast, penalized likelihood methods are highly effective in locating changepoints. However, they require data with simple patterns in both signal and noise. The proposed decoupling approach combines the strengths of both, i.e. the flexibility of Bayesian DLMs with the hard thresholding property of penalized likelihood estimators, to provide changepoint analysis in complex, modern settings. The proposed framework is outlier robust and can identify a variety of changes, including in mean and slope. It is also easily extended for analysis of parameter shifts in time-varying parameter models like dynamic regressions. We illustrate the flexibility and contrast the performance and robustness of our approach with several alternative methods across a wide range of simulations and application examples.
△ Less
Submitted 6 January, 2024; v1 submitted 17 January, 2022;
originally announced January 2022.
-
How do kernel-based sensor fusion algorithms behave under high dimensional noise?
Authors:
Xiucai Ding,
Hau-Tieng Wu
Abstract:
We study the behavior of two kernel based sensor fusion algorithms, nonparametric canonical correlation analysis (NCCA) and alternating diffusion (AD), under the nonnull setting that the clean datasets collected from two sensors are modeled by a common low dimensional manifold embedded in a high dimensional Euclidean space and the datasets are corrupted by high dimensional noise. We establish the…
▽ More
We study the behavior of two kernel based sensor fusion algorithms, nonparametric canonical correlation analysis (NCCA) and alternating diffusion (AD), under the nonnull setting that the clean datasets collected from two sensors are modeled by a common low dimensional manifold embedded in a high dimensional Euclidean space and the datasets are corrupted by high dimensional noise. We establish the asymptotic limits and convergence rates for the eigenvalues of the associated kernel matrices assuming that the sample dimension and sample size are comparably large, where NCCA and AD are conducted using the Gaussian kernel. It turns out that both the asymptotic limits and convergence rates depend on the signal-to-noise ratio (SNR) of each sensor and selected bandwidths. On one hand, we show that if NCCA and AD are directly applied to the noisy point clouds without any sanity check, it may generate artificial information that misleads scientists' interpretation. On the other hand, we prove that if the bandwidths are selected adequately, both NCCA and AD can be made robust to high dimensional noise when the SNRs are relatively large.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
Interpretable Personalized Experimentation
Authors:
Han Wu,
Sarah Tan,
Weiwei Li,
Mia Garrard,
Adam Obeng,
Drew Dimmery,
Shaun Singh,
Hanson Wang,
Daniel Jiang,
Eytan Bakshy
Abstract:
Black-box heterogeneous treatment effect (HTE) models are increasingly being used to create personalized policies that assign individuals to their optimal treatments. However, they are difficult to understand, and can be burdensome to maintain in a production environment. In this paper, we present a scalable, interpretable personalized experimentation system, implemented and deployed in production…
▽ More
Black-box heterogeneous treatment effect (HTE) models are increasingly being used to create personalized policies that assign individuals to their optimal treatments. However, they are difficult to understand, and can be burdensome to maintain in a production environment. In this paper, we present a scalable, interpretable personalized experimentation system, implemented and deployed in production at Meta. The system works in a multiple treatment, multiple outcome setting typical at Meta to: (1) learn explanations for black-box HTE models; (2) generate interpretable personalized policies. We evaluate the methods used in the system on publicly available data and Meta use cases, and discuss lessons learnt during the development of the system.
△ Less
Submitted 5 August, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Deeptime: a Python library for machine learning dynamical models from time series data
Authors:
Moritz Hoffmann,
Martin Scherer,
Tim Hempel,
Andreas Mardt,
Brian de Silva,
Brooke E. Husic,
Stefan Klus,
Hao Wu,
Nathan Kutz,
Steven L. Brunton,
Frank Noé
Abstract:
Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic…
▽ More
Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software.
△ Less
Submitted 11 December, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Conjugate Energy-Based Models
Authors:
Hao Wu,
Babak Esmaeili,
Michael Wick,
Jean-Baptiste Tristan,
Jan-Willem van de Meent
Abstract:
In this paper, we propose conjugate energy-based models (CEBMs), a new class of energy-based models that define a joint density over data and latent variables. The joint density of a CEBM decomposes into an intractable distribution over data and a tractable posterior over latent variables. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mappin…
▽ More
In this paper, we propose conjugate energy-based models (CEBMs), a new class of energy-based models that define a joint density over data and latent variables. The joint density of a CEBM decomposes into an intractable distribution over data and a tractable posterior over latent variables. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised map** from data to latent variables. However, these models omit a generator network, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that conjugate EBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-domain detection on a variety of datasets.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Implementation of an alternative method for assessing competing risks: restricted mean time lost
Authors:
Hongji Wu,
Hao Yuan,
Zi**g Yang,
Yawen Hou,
Zheng Chen
Abstract:
In clinical and epidemiological studies, hazard ratios are often applied to compare treatment effects between two groups for survival data. For competing risks data, the corresponding quantities of interest are cause-specific hazard ratios (cHRs) and subdistribution hazard ratios (sHRs). However, they both have some limitations related to model assumptions and clinical interpretation. Therefore, w…
▽ More
In clinical and epidemiological studies, hazard ratios are often applied to compare treatment effects between two groups for survival data. For competing risks data, the corresponding quantities of interest are cause-specific hazard ratios (cHRs) and subdistribution hazard ratios (sHRs). However, they both have some limitations related to model assumptions and clinical interpretation. Therefore, we recommend restricted mean time lost (RMTL) as an alternative that is easy to interpret in a competing risks framework. Based on the difference in restricted mean time lost (RMTLd), we propose a new estimator, hypothetical test and sample size formula. The simulation results show that the estimation of the RMTLd is accurate and that the RMTLd test has robust statistical performance (both type I error and power). The results of three example analyses also verify the performance of the RMTLd test. From the perspectives of clinical interpretation, application conditions and statistical performance, we recommend that the RMTLd be reported with the HR in the analysis of competing risks data and that the RMTLd even be regarded as the primary outcome when the proportional hazard assumption fails. The R code (crRMTL) is publicly available from Github (https://github.com/chenzgz/crRMTL.1). Keywords: survival analysis, competing risks, hazard ratio, restricted mean time lost, sample size, hypothesis test
△ Less
Submitted 19 December, 2021; v1 submitted 24 June, 2021;
originally announced June 2021.
-
Nested Variational Inference
Authors:
Heiko Zimmermann,
Hao Wu,
Babak Esmaeili,
Jan-Willem van de Meent
Abstract:
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments appl…
▽ More
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Dynamic prediction and analysis based on restricted mean survival time in survival analysis with nonproportional hazards
Authors:
Zi**g Yang,
Hongji Wu,
Yawen Hou,
Hao Yuan,
Zheng Chen
Abstract:
In the process of clinical diagnosis and treatment, the restricted mean survival time (RMST), which reflects the life expectancy of patients up to a specified time, can be used as an appropriate outcome measure. However, the RMST only calculates the mean survival time of patients within a period of time after the start of follow-up and may not accurately portray the change in a patient's life expe…
▽ More
In the process of clinical diagnosis and treatment, the restricted mean survival time (RMST), which reflects the life expectancy of patients up to a specified time, can be used as an appropriate outcome measure. However, the RMST only calculates the mean survival time of patients within a period of time after the start of follow-up and may not accurately portray the change in a patient's life expectancy over time. The life expectancy can be adjusted for the time the patient has already survived and defined as the conditional restricted mean survival time (cRMST). A dynamic RMST model based on the cRMST can be established by incorporating time-dependent covariates and covariates with time-varying effects. We analysed data from a study of primary biliary cirrhosis (PBC) to illustrate the use of the dynamic RMST model. The predictive performance was evaluated using the C-index and the prediction error. The proposed dynamic RMST model, which can explore the dynamic effects of prognostic factors on survival time, has better predictive performance than the RMST model. Three PBC patient examples were used to illustrate how the predicted cRMST changed at different prediction times during follow-up. The use of the dynamic RMST model based on the cRMST allows for optimization of evidence-based decision-making by updating personalized dynamic life expectancy for patients.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
A News-based Machine Learning Model for Adaptive Asset Pricing
Authors:
Liao Zhu,
Haoxuan Wu,
Martin T. Wells
Abstract:
The paper proposes a new asset pricing model -- the News Embedding UMAP Selection (NEUS) model, to explain and predict the stock returns based on the financial news. Using a combination of various machine learning algorithms, we first derive a company embedding vector for each basis asset from the financial news. Then we obtain a collection of the basis assets based on their company embedding. Aft…
▽ More
The paper proposes a new asset pricing model -- the News Embedding UMAP Selection (NEUS) model, to explain and predict the stock returns based on the financial news. Using a combination of various machine learning algorithms, we first derive a company embedding vector for each basis asset from the financial news. Then we obtain a collection of the basis assets based on their company embedding. After that for each stock, we select the basis assets to explain and predict the stock return with high-dimensional statistical methods. The new model is shown to have a significantly better fitting and prediction power than the Fama-French 5-factor model.
△ Less
Submitted 13 June, 2021;
originally announced June 2021.
-
Unbalanced Optimal Transport through Non-negative Penalized Linear Regression
Authors:
Laetitia Chapel,
Rémi Flamary,
Haoran Wu,
Cédric Févotte,
Gilles Gasso
Abstract:
This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to…
▽ More
This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
Learning Proposals for Probabilistic Programs with Inference Combinators
Authors:
Sam Stites,
Heiko Zimmermann,
Hao Wu,
Eli Sennesh,
Jan-Willem van de Meent
Abstract:
We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernel and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimi…
▽ More
We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernel and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. We demonstrate the flexibility of this framework by implementing advanced variational methods based on amortized Gibbs sampling and annealing.
△ Less
Submitted 16 June, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.
-
Impact of Interventional Policies Including Vaccine on Covid-19 Propagation and Socio-Economic Factors
Authors:
Haonan Wu,
Rajarshi Banerjee,
Indhumathi Venkatachalam,
Daniel Percy-Hughes,
Praveen Chougale
Abstract:
A novel coronavirus disease has emerged (later named COVID-19) and caused the world to enter a new reality, with many direct and indirect factors influencing it. Some are human-controllable (e.g. interventional policies, mobility and the vaccine); some are not (e.g. the weather). We have sought to test how a change in these human-controllable factors might influence two measures: the number of dai…
▽ More
A novel coronavirus disease has emerged (later named COVID-19) and caused the world to enter a new reality, with many direct and indirect factors influencing it. Some are human-controllable (e.g. interventional policies, mobility and the vaccine); some are not (e.g. the weather). We have sought to test how a change in these human-controllable factors might influence two measures: the number of daily cases against economic impact. If applied at the right level and with up-to-date data to measure, policymakers would be able to make targeted interventions and measure their cost. This study aims to provide a predictive analytics framework to model, predict and simulate COVID-19 propagation and the socio-economic impact of interventions intended to reduce the spread of the disease such as policy and/or vaccine. It allows policymakers, government representatives and business leaders to make better-informed decisions about the potential effect of various interventions with forward-looking views via scenario planning. We have leveraged a recently launched open-source COVID-19 big data platform and used published research to find potentially relevant variables (features) and leveraged in-depth data quality checks and analytics for feature selection and predictions. An advanced machine learning pipeline has been developed armed with a self-evolving model, deployed on a modern machine learning architecture. It has high accuracy for trend prediction (back-tested with r-squared) and is augmented with interpretability for deeper insights.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Central and Non-central Limit Theorems arising from the Scattering Transform and its Neural Activation Generalization
Authors:
Gi-Ren Liu,
Yuan-Chung Sheu,
Hau-Tieng Wu
Abstract:
Motivated by analyzing complicated and non-stationary time series, we study a generalization of the scattering transform (ST) that includes broad neural activation functions, which is called neural activation ST (NAST). On the whole, NAST is a transform that comprises a sequence of ``neural processing units'', each of which applies a high pass filter to the input from the previous layer followed b…
▽ More
Motivated by analyzing complicated and non-stationary time series, we study a generalization of the scattering transform (ST) that includes broad neural activation functions, which is called neural activation ST (NAST). On the whole, NAST is a transform that comprises a sequence of ``neural processing units'', each of which applies a high pass filter to the input from the previous layer followed by a composition with a nonlinear function as the output to the next neuron. Here, the nonlinear function models how a neuron gets excited by the input signal. In addition to showing properties like non-expansion, horizontal translational invariability and insensitivity to local deformation, the statistical properties of the second order NAST of a Gaussian process with various dependence and (non-)stationarity structure and its interaction with the chosen high pass filters and activation functions are explored and central limit theorem (CLT) and non-CLT results are provided. Numerical simulations are also provided. The results explain how NAST processes complicated and non-stationary time series, and pave a way towards statistical inference based on NAST under the non-null case.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
Trend and Variance Adaptive Bayesian Changepoint Analysis & Local Outlier Scoring
Authors:
Haoxuan Wu,
Toryn L. J. Schafer,
David S. Matteson
Abstract:
We adaptively estimate both changepoints and local outlier processes in a Bayesian dynamic linear model with global-local shrinkage priors in a novel model we call Adaptive Bayesian Changepoints with Outliers (ABCO). We utilize a state-space approach to identify a dynamic signal in the presence of outliers and measurement error with stochastic volatility. We find that global state equation paramet…
▽ More
We adaptively estimate both changepoints and local outlier processes in a Bayesian dynamic linear model with global-local shrinkage priors in a novel model we call Adaptive Bayesian Changepoints with Outliers (ABCO). We utilize a state-space approach to identify a dynamic signal in the presence of outliers and measurement error with stochastic volatility. We find that global state equation parameters are inadequate for most real applications and we include local parameters to track noise at each time-step. This setup provides a flexible framework to detect unspecified changepoints in complex series, such as those with large interruptions in local trends, with robustness to outliers and heteroskedastic noise. Finally, we compare our algorithm against several alternatives to demonstrate its efficacy in diverse simulation scenarios and two empirical examples on the U.S. economy.
△ Less
Submitted 13 March, 2024; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Convergence of Graph Laplacian with kNN Self-tuned Kernels
Authors:
Xiuyuan Cheng,
Hau-Tieng Wu
Abstract:
Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {σ^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $σ$, and a common practice called self-tuned kernel adaptively sets a $σ_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) dis…
▽ More
Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {σ^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $σ$, and a common practice called self-tuned kernel adaptively sets a $σ_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$'s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(α)}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ ε\hatρ(x_i) \hatρ(x_j)})/\hatρ(x_i)^α\hatρ(x_j)^α$, where $\hatρ$ is the estimated bandwidth function {by kNN}, and the limiting operator is also parametrized by $α$. When $α= 1$, the limiting operator is the weighted manifold Laplacian $Δ_p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hatρ$ which bounds the relative estimation error $|\hatρ - \barρ|/\barρ$ uniformly with high probability, where $\barρ = p^{-1/d}$, and $p$ is the data density function. Our theoretical results reveal the advantage of self-tuned kernel over fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.
△ Less
Submitted 9 February, 2021; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Classification Accuracy and Parameter Estimation in Multilevel Contexts: A Study of Conditional Nonparametric Multilevel Latent Class Analysis
Authors:
Chi Chang,
Kimberly Kelly,
M. Lee Van Horn,
Richard T. Houang,
Joseph Gardiner,
Laurie Van Egeren,
Heng-Chieh Wu
Abstract:
The current research has two aims. First, to demonstrate the utility conditional nonparametric multilevel latent class analysis (NP-MLCA) for multi-site program evaluation using an empirical dataset. Second, to investigate how classification accuracy and parameter estimation of a conditional NP-MLCA are affected by six study factors: the quality of latent class indicators, the number of latent cla…
▽ More
The current research has two aims. First, to demonstrate the utility conditional nonparametric multilevel latent class analysis (NP-MLCA) for multi-site program evaluation using an empirical dataset. Second, to investigate how classification accuracy and parameter estimation of a conditional NP-MLCA are affected by six study factors: the quality of latent class indicators, the number of latent class indicators, level-1 covariate effects, cross-level covariate effects, the number of level-2 units, and the size of level-2 units. A total of 96 conditions was examined using a simulation study. The resulting classification accuracy rates, the power and type-I error of cross-level covariate effects and contextual effects suggest that the nonparametric multilevel latent class model can be applied broadly in multilevel contexts.
△ Less
Submitted 30 October, 2020; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Graph Based Gaussian Processes on Restricted Domains
Authors:
David B Dunson,
Hau-Tieng Wu,
Nan Wu
Abstract:
In nonparametric regression, it is common for the inputs to fall in a restricted subset of Euclidean space. Typical kernel-based methods that do not take into account the intrinsic geometry of the domain across which observations are collected may produce sub-optimal results. In this article, we focus on solving this problem in the context of Gaussian process (GP) models, proposing a new class of…
▽ More
In nonparametric regression, it is common for the inputs to fall in a restricted subset of Euclidean space. Typical kernel-based methods that do not take into account the intrinsic geometry of the domain across which observations are collected may produce sub-optimal results. In this article, we focus on solving this problem in the context of Gaussian process (GP) models, proposing a new class of Graph Laplacian based GPs (GL-GPs), which learn a covariance that respects the geometry of the input domain. As the heat kernel is intractable computationally, we approximate the covariance using finitely-many eigenpairs of the Graph Laplacian (GL). The GL is constructed from a kernel which depends only on the Euclidean coordinates of the inputs. Hence, we can benefit from the full knowledge about the kernel to extend the covariance structure to newly arriving samples by a Nyström type extension. We provide substantial theoretical support for the GL-GP methodology, and illustrate performance gains in various applications.
△ Less
Submitted 2 November, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Interpretable Machine Learning for COVID-19: An Empirical Study on Severity Prediction Task
Authors:
Han Wu,
Wenjie Ruan,
Jiangtao Wang,
Dingchang Zheng,
Bei Liu,
Yayuan Gen,
Xiangfei Chai,
Jian Chen,
Kunwei Li,
Shaolin Li,
Sumi Helal
Abstract:
The black-box nature of machine learning models hinders the deployment of some high-accuracy models in medical diagnosis. It is risky to put one's life in the hands of models that medical researchers do not fully understand. However, through model interpretation, black-box models can promptly reveal significant biomarkers that medical practitioners may have overlooked due to the surge of infected…
▽ More
The black-box nature of machine learning models hinders the deployment of some high-accuracy models in medical diagnosis. It is risky to put one's life in the hands of models that medical researchers do not fully understand. However, through model interpretation, black-box models can promptly reveal significant biomarkers that medical practitioners may have overlooked due to the surge of infected patients in the COVID-19 pandemic.
This research leverages a database of 92 patients with confirmed SARS-CoV-2 laboratory tests between 18th Jan. 2020 and 5th Mar. 2020, in Zhuhai, China, to identify biomarkers indicative of severity prediction. Through the interpretation of four machine learning models, decision tree, random forests, gradient boosted trees, and neural networks using permutation feature importance, Partial Dependence Plot (PDP), Individual Conditional Expectation (ICE), Accumulated Local Effects (ALE), Local Interpretable Model-agnostic Explanations (LIME), and Shapley Additive Explanation (SHAP), we identify an increase in N-Terminal pro-Brain Natriuretic Peptide (NTproBNP), C-Reaction Protein (CRP), and lactic dehydrogenase (LDH), a decrease in lymphocyte (LYM) is associated with severe infection and an increased risk of death, which is consistent with recent medical research on COVID-19 and other research using dedicated models. We further validate our methods on a large open dataset with 5644 confirmed patients from the Hospital Israelita Albert Einstein, at São Paulo, Brazil from Kaggle, and unveil leukocytes, eosinophils, and platelets as three indicative biomarkers for COVID-19.
△ Less
Submitted 20 October, 2021; v1 submitted 30 September, 2020;
originally announced October 2020.
-
Public Health Informatics: Proposing Causal Sequence of Death Using Neural Machine Translation
Authors:
Yuanda Zhu,
Ying Sha,
Hang Wu,
Mai Li,
Ryan A. Hoffman,
May D. Wang
Abstract:
Each year there are nearly 57 million deaths around the world, with over 2.7 million in the United States. Timely, accurate and complete death reporting is critical in public health, as institutions and government agencies rely on death reports to analyze vital statistics and to formulate responses to communicable diseases. Inaccurate death reporting may result in potential misdirection of public…
▽ More
Each year there are nearly 57 million deaths around the world, with over 2.7 million in the United States. Timely, accurate and complete death reporting is critical in public health, as institutions and government agencies rely on death reports to analyze vital statistics and to formulate responses to communicable diseases. Inaccurate death reporting may result in potential misdirection of public health policies. Determining the causes of death is, nevertheless, challenging even for experienced physicians. To facilitate physicians in accurately reporting causes of death, we present an advanced AI approach to determine a chronically ordered sequence of clinical conditions that lead to death, based on decedent's last hospital discharge record. The sequence of clinical codes on the death report is named as causal chain of death, coded in the tenth revision of International Statistical Classification of Diseases (ICD-10); in line with the ICD-9-CM Official Guidelines for Coding and Reporting, the priority-ordered clinical conditions on the discharge record are coded in ICD-9. We identify three challenges in proposing the causal chain of death: two versions of coding system in clinical codes, medical domain knowledge conflict, and data interoperability. To overcome the first challenge in this sequence-to-sequence problem, we apply neural machine translation models to generate target sequence. Along with three accuracy metrics, we evaluate the quality of generated sequences with the BLEU (BiLingual Evaluation Understudy) score and achieve 16.04 out of 100. To address the second challenge, we incorporate expert-verified medical domain knowledge as constraint in generating output sequence to exclude infeasible causal chains. Lastly, we demonstrate the usability of our work in a Fast Healthcare Interoperability Resources (FHIR) interface to address the third challenge.
△ Less
Submitted 9 March, 2021; v1 submitted 22 September, 2020;
originally announced September 2020.
-
QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning
Authors:
Jian Hu,
Seth Austin Harding,
Haibin Wu,
Siyue Hu,
Shih-wei Liao
Abstract:
In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network…
▽ More
In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness of returns, but also more tolerant of the randomness of monotonic constraints. The experimental results demonstrate that QR-MIX outperforms the previous state-of-the-art method QMIX in the StarCraft Multi-Agent Challenge (SMAC) environment.
△ Less
Submitted 23 February, 2021; v1 submitted 9 September, 2020;
originally announced September 2020.
-
Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization
Authors:
Chaojie Ji,
Yijia Zheng,
Ruxin Wang,
Yunpeng Cai,
Hongyan Wu
Abstract:
Molecular optimization, which transforms a given input molecule X into another Y with desirable properties, is essential in molecular drug discovery. The traditional translating approaches, generating the molecular graphs from scratch by adding some substructures piece by piece, prone to error because of the large set of candidate substructures in a large number of steps to the final target. In th…
▽ More
Molecular optimization, which transforms a given input molecule X into another Y with desirable properties, is essential in molecular drug discovery. The traditional translating approaches, generating the molecular graphs from scratch by adding some substructures piece by piece, prone to error because of the large set of candidate substructures in a large number of steps to the final target. In this study, we present a novel molecular optimization paradigm, Graph Polish, which changes molecular optimization from the traditional "two-language translating" task into a "single-language polishing" task. The key to this optimization paradigm is to find an optimization center subject to the conditions that the preserved areas around it ought to be maximized and thereafter the removed and added regions should be minimized. We then propose an effective and efficient learning framework T&S polish to capture the long-term dependencies in the optimization steps. The T component automatically identifies and annotates the optimization centers and the preservation, removal and addition of some parts of the molecule, and the S component learns these behaviors and applies these actions to a new molecule. Furthermore, the proposed paradigm can offer an intuitive interpretation for each molecular optimization result. Experiments with multiple optimization tasks are conducted on four benchmark datasets. The proposed T&S polish approach achieves significant advantage over the five state-of-the-art baseline methods on all the tasks. In addition, extensive studies are conducted to validate the effectiveness, explainability and time saving of the novel optimization paradigm.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
Airflow recovery from thoracic and abdominal movements using Synchrosqueezing Transform and Locally Stationary Gaussian Process Regression
Authors:
Whitney K. Huang,
Yu-Min Chung,
Yu-Bo Wang,
Jeff E. Mandel,
Hau-Tieng Wu
Abstract:
Airflow signal encodes rich information about respiratory system. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, this is not practical for ambulatory monitoring of patients. Advances in sensor technology have made measurement of motion of the thorax and abdomen feasible with small inexpensive devices, but estimation of airflow from these time series is…
▽ More
Airflow signal encodes rich information about respiratory system. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, this is not practical for ambulatory monitoring of patients. Advances in sensor technology have made measurement of motion of the thorax and abdomen feasible with small inexpensive devices, but estimation of airflow from these time series is challenging. We propose to use the nonlinear-type time-frequency analysis tool, synchrosqueezing transform, to properly represent the thoracic and abdominal movement signals as the features, which are used to recover the airflow by the locally stationary Gaussian process. We show that, using a dataset that contains respiratory signals under normal sleep conditions, an accurate prediction can be achieved by fitting the proposed model in the feature space both in the intra- and inter-subject setups. We also apply our method to a more challenging case, where subjects under general anesthesia underwent transitions from pressure support to unassisted ventilation to further demonstrate the utility of the proposed method.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Strong Uniform Consistency with Rates for Kernel Density Estimators with General Kernels on Manifolds
Authors:
Hau-Tieng Wu,
Nan Wu
Abstract:
When analyzing modern machine learning algorithms, we may need to handle kernel density estimation (KDE) with intricate kernels that are not designed by the user and might even be irregular and asymmetric. To handle this emerging challenge, we provide a strong uniform consistency result with the $L^\infty$ convergence rate for KDE on Riemannian manifolds with Riemann integrable kernels (in the amb…
▽ More
When analyzing modern machine learning algorithms, we may need to handle kernel density estimation (KDE) with intricate kernels that are not designed by the user and might even be irregular and asymmetric. To handle this emerging challenge, we provide a strong uniform consistency result with the $L^\infty$ convergence rate for KDE on Riemannian manifolds with Riemann integrable kernels (in the ambient Euclidean space). We also provide an $L^1$ consistency result for kernel density estimation on Riemannian manifolds with Lebesgue integrable kernels. The isotropic kernels considered in this paper are different from the kernels in the Vapnik-Chervonenkis class that are frequently considered in statistics society. We illustrate the difference when we apply them to estimate the probability density function. Moreover, we elaborate the delicate difference when the kernel is designed on the intrinsic manifold and on the ambient Euclidian space, both might be encountered in practice. At last, we prove the necessary and sufficient condition for an isotropic kernel to be Riemann integrable on a submanifold in the Euclidean space.
△ Less
Submitted 8 June, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Unsupervised Ensembling of Multiple Software Sensors with Phase Synchronization: A Robust Approach For Electrocardiogram-derived Respiration
Authors:
Jacob McErlean,
John Malik,
Yu-Ting Lin,
Ronen Talmon,
Hau-Tieng Wu
Abstract:
Objective: We aimed to fuse the outputs of different electrocardiogram-derived respiration (EDR) algorithms to create one EDR signal that is of higher quality. Methods: We viewed each EDR algorithm as a software sensor that recorded breathing activity from a different vantage point, identified high-quality software sensors based on the respiratory signal quality index, aligned the highest-quality…
▽ More
Objective: We aimed to fuse the outputs of different electrocardiogram-derived respiration (EDR) algorithms to create one EDR signal that is of higher quality. Methods: We viewed each EDR algorithm as a software sensor that recorded breathing activity from a different vantage point, identified high-quality software sensors based on the respiratory signal quality index, aligned the highest-quality EDRs with a phase synchronization technique based on the graph connection Laplacian, and finally fused those aligned, high-quality EDRs. We refer to the output as the sync-ensembled EDR signal. The proposed algorithm was evaluated on two large-scale databases of whole-night polysomnograms. We evaluated the performance of the proposed algorithm using three respiratory signals recorded from different hardware sensors, and compared it with other existing EDR algorithms. A sensitivity analysis was carried out for a total of five cases: fusion by taking the mean of EDR signals, and the four cases of EDR signal alignment without and with synchronization and without and with signal quality selection. Results: The sync-ensembled EDR algorithm outperforms existing EDR algorithms when evaluated by the synchronized correlation (γ-score), optimal transport (OT) distance, and estimated average respiratory rate (EARR) score, all with statistical significance. The sensitivity analysis shows that the signal quality selection and EDR signal alignment are both critical for the performance, both with statistical significance. Conclusion: The sync-ensembled EDR provides robust respiratory information from electrocardiogram. Significance: Phase synchronization is not only theoretically rigorous but also practical to design a robust EDR.
△ Less
Submitted 13 April, 2024; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Consistent Second-Order Conic Integer Programming for Learning Bayesian Networks
Authors:
Simge Kucukyavuz,
Ali Shojaie,
Hasan Manzour,
Linchuan Wei,
Hao-Hsiang Wu
Abstract:
Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective fun…
▽ More
Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational times. To address this difficulty, we tackle the problem from both computational and statistical perspectives. On the one hand, we propose a concrete early stop** criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution. On the other hand, we improve the existing formulations by replacing the linear "big-$M$" constraints that represent the relationship between the continuous and binary indicator variables with second-order conic constraints. Our numerical results demonstrate the effectiveness of the proposed approaches.
△ Less
Submitted 6 May, 2022; v1 submitted 28 May, 2020;
originally announced May 2020.