-
Generative Modeling for Tabular Data via Penalized Optimal Transport Network
Authors:
Wenhui Sophia Lu,
Chenyang Zhong,
Wing Hung Wong
Abstract:
The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodal…
▽ More
The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodalities prevalent in tabular data, the delicate equilibrium between the generator and discriminator, as well as the inherent instability of Wasserstein distance in high dimensions, WGAN often fails to produce high-fidelity samples. To this end, we propose POTNet (Penalized Optimal Transport Network), a generative deep neural network based on a novel, robust, and interpretable marginally-penalized Wasserstein (MPW) loss. POTNet can effectively model tabular data containing both categorical and continuous features. Moreover, it offers the flexibility to condition on a subset of features. We provide theoretical justifications for the motivation behind the MPW loss. We also empirically demonstrate the effectiveness of our proposed method on four different benchmarks across a variety of real-world and simulated datasets. Our proposed model achieves orders of magnitude speedup during the sampling stage compared to state-of-the-art generative models for tabular data, thereby enabling efficient large-scale synthetic data generation.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
CeCNN: Copula-enhanced convolutional neural networks in joint prediction of refraction error and axial length based on ultra-widefield fundus images
Authors:
Chong Zhong,
Yang Li,
Danjuan Yang,
Meiyan Li,
Xingyao Zhou,
Bo Fu,
Catherine C. Liu,
A. H. Welsh
Abstract:
Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular c…
▽ More
Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular component for assessing myopia. Cutting-edge studies show that SE and AL are strongly correlated. Using the joint information from SE and AL is potentially better than using either separately. In the deep learning community, though there is research on multiple-response tasks with a 3D image biomarker, dependence among responses is only sporadically taken into consideration. Inspired by the spirit that information extracted from the data by statistical methods can improve the prediction accuracy of deep learning models, we formulate a class of multivariate response regression models with a higher-order tensor biomarker, for the bivariate tasks of regression-classification and regression-regression. Specifically, we propose a copula-enhanced convolutional neural network (CeCNN) framework that incorporates the dependence between responses through a Gaussian copula (with parameters estimated from a warm-up CNN) and uses the induced copula-likelihood loss with the backbone CNNs. We establish the statistical framework and algorithms for the aforementioned two bivariate tasks. We show that the CeCNN has better prediction accuracy after adding the dependency information to the backbone models. The modeling and the proposed CeCNN algorithm are applicable beyond the UWF scenario and can be effective with other backbones beyond ResNet and LeNet.
△ Less
Submitted 1 June, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
OKRidge: Scalable Optimal k-Sparse Ridge Regression
Authors:
Jiachang Liu,
Sam Rosen,
Chudi Zhong,
Cynthia Rudin
Abstract:
We consider an important problem in scientific discovery, namely identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, firs…
▽ More
We consider an important problem in scientific discovery, namely identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.
△ Less
Submitted 11 January, 2024; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Exploring and Interacting with the Set of Good Sparse Generalized Additive Models
Authors:
Chudi Zhong,
Zhi Chen,
Jiachang Liu,
Margo Seltzer,
Cynthia Rudin
Abstract:
In real applications, interaction between machine learning models and domain experts is critical; however, the classical machine learning paradigm that usually produces only a single model does not facilitate such interaction. Approximating and exploring the Rashomon set, i.e., the set of all near-optimal models, addresses this practical challenge by providing the user with a searchable space cont…
▽ More
In real applications, interaction between machine learning models and domain experts is critical; however, the classical machine learning paradigm that usually produces only a single model does not facilitate such interaction. Approximating and exploring the Rashomon set, i.e., the set of all near-optimal models, addresses this practical challenge by providing the user with a searchable space containing a diverse set of models from which domain experts can choose. We present algorithms to efficiently and accurately approximate the Rashomon set of sparse, generalized additive models with ellipsoids for fixed support sets and use these ellipsoids to approximate Rashomon sets for many different support sets. The approximated Rashomon set serves as a cornerstone to solve practical challenges such as (1) studying the variable importance for the model class; (2) finding models under user-specified constraints (monotonicity, direct editing); and (3) investigating sudden changes in the shape functions. Experiments demonstrate the fidelity of the approximated Rashomon set and its effectiveness in solving practical challenges.
△ Less
Submitted 17 November, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Inferring urban polycentricity from the variability in human mobility patterns
Authors:
Carmen Cabrera-Arnau,
Chen Zhong,
Michael Batty,
Ricardo Silva,
Soong Moon Kang
Abstract:
The polycentric city model has gained popularity in spatial planning policy, since it is believed to overcome some of the problems often present in monocentric metropolises, ranging from congestion to difficult accessibility to jobs and services. However, the concept 'polycentric city' has a fuzzy definition and as a result, the extent to which a city is polycentric cannot be easily determined. He…
▽ More
The polycentric city model has gained popularity in spatial planning policy, since it is believed to overcome some of the problems often present in monocentric metropolises, ranging from congestion to difficult accessibility to jobs and services. However, the concept 'polycentric city' has a fuzzy definition and as a result, the extent to which a city is polycentric cannot be easily determined. Here, we leverage the fine spatio-temporal resolution of smart travel card data to infer urban polycentricity by examining how a city departs from a well-defined monocentric model. In particular, we analyse the human movements that arise as a result of sophisticated forms of urban structure by introducing a novel probabilistic approach which captures the complexity of these human movements. We focus on London (UK) and Seoul (South Korea) as our two case studies, and we specifically find evidence that London displays a higher degree of monocentricity than Seoul, suggesting that Seoul is likely to be more polycentric than London.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Non-segmental Bayesian Detection of Multiple Change-points
Authors:
Chong Zhong,
Zhihua Ma,
Xu Zhang,
Catherine C. Liu
Abstract:
We propose an original and general NOn-SEgmental (NOSE) approach for the detection of multiple change-points. NOSE identifies change-points by the non-negligibility of posterior estimates of the jump heights. Alternatively, under the Bayesian paradigm, NOSE treats the step-wise signal as a global infinite dimensional parameter drawn from a proposed process of atomic representation, where the rando…
▽ More
We propose an original and general NOn-SEgmental (NOSE) approach for the detection of multiple change-points. NOSE identifies change-points by the non-negligibility of posterior estimates of the jump heights. Alternatively, under the Bayesian paradigm, NOSE treats the step-wise signal as a global infinite dimensional parameter drawn from a proposed process of atomic representation, where the random jump heights determine the locations and the number of change-points simultaneously. The random jump heights are further modeled by a Gamma-Indian buffet process shrinkage prior under the form of discrete spike-and-slab. The induced maximum a posteriori estimates of the jump heights are consistent and enjoy zerodiminishing false negative rate in discrimination under a 3-sigma rule. The success of NOSE is guaranteed by the posterior inferential results such as the minimaxity of posterior contraction rate, and posterior consistency of both locations and the number of abrupt changes. NOSE is applicable and effective to detect scale shifts, mean shifts, and structural changes in regression coefficients under linear or autoregression models. Comprehensive simulations and several real-world examples demonstrate the superiority of NOSE in detecting abrupt changes under various data settings.
△ Less
Submitted 16 June, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Bayesian prediction via nonparametric transformation models
Authors:
Chong Zhong,
** Yang,
Junshan Shen,
Catherine Liu,
Zhaohai Li
Abstract:
This article tackles the old problem of prediction via a nonparametric transformation model (NTM) in a new Bayesian way. Estimation of NTMs is known challenging due to model unidentifiability though appealing because of its robust prediction capability in survival analysis. Inspired by the uniqueness of the posterior predictive distribution, we achieve efficient prediction via the NTM aforemention…
▽ More
This article tackles the old problem of prediction via a nonparametric transformation model (NTM) in a new Bayesian way. Estimation of NTMs is known challenging due to model unidentifiability though appealing because of its robust prediction capability in survival analysis. Inspired by the uniqueness of the posterior predictive distribution, we achieve efficient prediction via the NTM aforementioned under the Bayesian paradigm. Our strategy is to assign weakly informative priors to nonparametric components rather than identify the model by adding complicated constraints in the existing literature. The Bayesian success pays tribute to i) a subtle cast of NTMs by an exponential transformation for the purpose of compressing spaces of infinite-dimensional parameters to positive quadrants considering non-negativity of the failure time; ii) a newly constructed weakly informative quantile-knots I-splines prior for the recast transformation function together with the Dirichlet process mixture model assigned to the error distribution. In addition, we provide a convenient and precise estimator for the identified parameter component subject to the general unit-norm restriction through posterior modification, enabling effective relative risks. Simulations and applications on real datasets reveal that our method is robust and outperforms the competing methods. An R package BuLTM is available to predict survival curves, estimate relative risks, and facilitate posterior checking.
△ Less
Submitted 7 February, 2023; v1 submitted 28 May, 2022;
originally announced May 2022.
-
Quantum Kerr Learning
Authors:
Junyu Liu,
Changchun Zhong,
Matthew Otten,
Anirban Chandra,
Cristian L. Cortes,
Chaoyang Ti,
Stephen K Gray,
Xu Han
Abstract:
Quantum machine learning is a rapidly evolving field of research that could facilitate important applications for quantum computing and also significantly impact data-driven sciences. In our work, based on various arguments from complexity theory and physics, we demonstrate that a single Kerr mode can provide some "quantum enhancements" when dealing with kernel-based methods. Using kernel properti…
▽ More
Quantum machine learning is a rapidly evolving field of research that could facilitate important applications for quantum computing and also significantly impact data-driven sciences. In our work, based on various arguments from complexity theory and physics, we demonstrate that a single Kerr mode can provide some "quantum enhancements" when dealing with kernel-based methods. Using kernel properties, neural tangent kernel theory, first-order perturbation theory of the Kerr non-linearity, and non-perturbative numerical simulations, we show that quantum enhancements could happen in terms of convergence time and generalization error. Furthermore, we make explicit indications on how higher-dimensional input data could be considered. Finally, we propose an experimental protocol, that we call \emph{quantum Kerr learning}, based on circuit QED.
△ Less
Submitted 30 November, 2022; v1 submitted 20 May, 2022;
originally announced May 2022.
-
Fast Sparse Classification for Generalized Linear and Additive Models
Authors:
Jiachang Liu,
Chudi Zhong,
Margo Seltzer,
Cynthia Rudin
Abstract:
We present fast classification techniques for sparse generalized linear and additive models. These techniques can handle thousands of features and thousands of observations in minutes, even in the presence of many highly correlated features. For fast sparse logistic regression, our computational speed-up over other best-subset search techniques owes to linear and quadratic surrogate cuts for the l…
▽ More
We present fast classification techniques for sparse generalized linear and additive models. These techniques can handle thousands of features and thousands of observations in minutes, even in the presence of many highly correlated features. For fast sparse logistic regression, our computational speed-up over other best-subset search techniques owes to linear and quadratic surrogate cuts for the logistic loss that allow us to efficiently screen features for elimination, as well as use of a priority queue that favors a more uniform exploration of features. As an alternative to the logistic loss, we propose the exponential loss, which permits an analytical solution to the line search at each iteration. Our algorithms are generally 2 to 5 times faster than previous approaches. They produce interpretable models that have accuracy comparable to black box models on challenging datasets.
△ Less
Submitted 29 October, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Mallows permutation models with $L^1$ and $L^2$ distances I: hit and run algorithms and mixing times
Authors:
Chenyang Zhong
Abstract:
Mallows permutation model, introduced by Mallows in statistical ranking theory, is a class of non-uniform probability measures on the symmetric group $S_n$. The model depends on a distance metric $d(σ,τ)$ on $S_n$, which can be chosen from a host of metrics on permutations. In this paper, we focus on Mallows permutation models with $L^1$ and $L^2$ distances, respectively known in the statistics li…
▽ More
Mallows permutation model, introduced by Mallows in statistical ranking theory, is a class of non-uniform probability measures on the symmetric group $S_n$. The model depends on a distance metric $d(σ,τ)$ on $S_n$, which can be chosen from a host of metrics on permutations. In this paper, we focus on Mallows permutation models with $L^1$ and $L^2$ distances, respectively known in the statistics literature as Spearman's footrule and Spearman's rank correlation.
Unlike most of the random permutation models that have been analyzed in the literature, Mallows permutation models with $L^1$ and $L^2$ distances do not have an explicit expression for their normalizing constants. This poses challenges to the task of sampling from these Mallows models. In this paper, we consider hit and run algorithms for sampling from both models. Hit and run algorithms are a unifying class of Markov chain Monte Carlo (MCMC) algorithms including the celebrated Swendsen-Wang and data augmentation algorithms. For both models, we show order $\log{n}$ mixing time upper bounds for the hit and run algorithms. This demonstrates much faster mixing of the hit and run algorithms compared to local MCMC algorithms such as the Metropolis algorithm. The proof of the results on mixing times is based on the path coupling technique, for which a novel coupling for permutations with one-sided restrictions is involved.
Extensions of the hit and run algorithms to weighted versions of the above models, a two-parameter permutation model that involves the $L^1$ distance and Cayley distance, and lattice permutation models in dimensions greater than or equal to $2$ are also discussed. The order $\log{n}$ mixing time upper bound pertains to the two-parameter permutation model.
△ Less
Submitted 26 December, 2021;
originally announced December 2021.
-
Scalable and Decentralized Algorithms for Anomaly Detection via Learning-Based Controlled Sensing
Authors:
Geethu Joseph,
Chen Zhong,
M. Cenk Gursoy,
Senem Velipasalar,
Pramod K. Varshney
Abstract:
We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the processes to be observed…
▽ More
We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the processes to be observed at a given time instant, decides when to stop taking observations, and declares the decision on anomalous processes. The objective of the detection algorithm is to identify the anomalies with an accuracy exceeding the desired value while minimizing the delay in decision making. We devise a centralized algorithm where the processes are jointly selected by a common agent as well as a decentralized algorithm where the decision of whether to select a process is made independently for each process. Our algorithms rely on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithms using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithms have computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of these algorithms using numerical experiments by comparing them with state-of-the-art methods.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model
Authors:
Chong Zhong,
Zhihua Ma,
Junshan Shen,
Catherine Liu
Abstract:
Bayesian paradigm takes advantage of well fitting complicated survival models and feasible computing in survival analysis owing to the superiority in tackling the complex censoring scheme, compared with the frequentist paradigm. In this chapter, we aim to display the latest tendency in Bayesian computing, in the sense of automating the posterior sampling, through Bayesian analysis of survival mode…
▽ More
Bayesian paradigm takes advantage of well fitting complicated survival models and feasible computing in survival analysis owing to the superiority in tackling the complex censoring scheme, compared with the frequentist paradigm. In this chapter, we aim to display the latest tendency in Bayesian computing, in the sense of automating the posterior sampling, through Bayesian analysis of survival modeling for multivariate survival outcomes with complicated data structure. Motivated by relaxing the strong assumption of proportionality and the restriction of a common baseline population, we propose a generalized shared frailty model which includes both parametric and nonparametric frailty random effects so as to incorporate both treatment-wise and temporal variation for multiple events. We develop a survival-function version of ANOVA dependent Dirichlet process to model the dependency among the baseline survival functions. The posterior sampling is implemented by the No-U-Turn sampler in Stan, a contemporary Bayesian computing tool, automatically. The proposed model is validated by analysis of the bladder cancer recurrences data. The estimation is consistent with existing results. Our model and Bayesian inference provide evidence that the Bayesian paradigm fosters complex modeling and feasible computing in survival analysis and Stan relaxes the posterior inference.
△ Less
Submitted 9 September, 2021; v1 submitted 8 September, 2021;
originally announced September 2021.
-
Convergence rate of a collapsed Gibbs sampler for crossed random effects models
Authors:
Swarnadip Ghosh,
Chenyang Zhong
Abstract:
In this paper, we analyze the convergence rate of a collapsed Gibbs sampler for crossed random effects models. Our results apply to a substantially larger range of models than previous works, including models that incorporate missingness mechanism and unbalanced level data. The theoretical tools involved in our analysis include a connection between relaxation time and autoregression matrix, concen…
▽ More
In this paper, we analyze the convergence rate of a collapsed Gibbs sampler for crossed random effects models. Our results apply to a substantially larger range of models than previous works, including models that incorporate missingness mechanism and unbalanced level data. The theoretical tools involved in our analysis include a connection between relaxation time and autoregression matrix, concentration inequalities, and random matrix theory.
△ Less
Submitted 21 October, 2021; v1 submitted 7 September, 2021;
originally announced September 2021.
-
A Bayesian group sequential schema for ordinal endpoints
Authors:
Chengxue Zhong,
Haitao Pan,
Hongyu Miao
Abstract:
The ordinal endpoint is prevalent in clinical studies. For example, for the COVID-19, the most common endpoint used was 7-point ordinal scales. Another example is in phase II cancer studies, efficacy is often assessed as an ordinal variable based on a level of response of solid tumors with four categories: complete response, partial response, stable disease, and progression, though often a dichoto…
▽ More
The ordinal endpoint is prevalent in clinical studies. For example, for the COVID-19, the most common endpoint used was 7-point ordinal scales. Another example is in phase II cancer studies, efficacy is often assessed as an ordinal variable based on a level of response of solid tumors with four categories: complete response, partial response, stable disease, and progression, though often a dichotomized approach is used in practices. However, there lack of designs for the ordinal endpoint despite Whitehead et al. (1993, 2017), Jaki et al. (2003) to list a few. In this paper, we propose a generic group sequential schema based on Bayesian methods for ordinal endpoints, including three methods, the proportional-odds-model (PO)-based, non-proportional-odds-model (NPO)-based, and PO/NPO switch-model-based designs, which makes our proposed methods generic to be able to deal with various scenarios. We conducted extensive simulations to demonstrate the desirable performances of the proposed method and an R package BayesOrdDesign has also been developed.
△ Less
Submitted 9 June, 2022; v1 submitted 14 August, 2021;
originally announced August 2021.
-
Survival stacking: casting survival analysis as a classification problem
Authors:
Erin Craig,
Chenyang Zhong,
Robert Tibshirani
Abstract:
While there are many well-developed data science methods for classification and regression, there are relatively few methods for working with right-censored data. Here, we present "survival stacking": a method for casting survival analysis problems as classification problems, thereby allowing the use of general classification methods and software in a survival setting. Inspired by the Cox partial…
▽ More
While there are many well-developed data science methods for classification and regression, there are relatively few methods for working with right-censored data. Here, we present "survival stacking": a method for casting survival analysis problems as classification problems, thereby allowing the use of general classification methods and software in a survival setting. Inspired by the Cox partial likelihood, survival stacking collects features and outcomes of survival data in a large data frame with a binary outcome. We show that survival stacking with logistic regression is approximately equivalent to the Cox proportional hazards model. We further recommend methods for evaluating model performance in the survival stacked setting, and we illustrate survival stacking on real and simulated data. By reframing survival problems as classification problems, we make it possible for data scientists to use well-known learning algorithms (including random forests, gradient boosting machines and neural networks) in a survival setting, and lower the barrier for flexible survival modeling.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges
Authors:
Cynthia Rudin,
Chaofan Chen,
Zhi Chen,
Haiyang Huang,
Lesia Semenova,
Chudi Zhong
Abstract:
Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of thes…
▽ More
Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) Optimizing sparse logical models such as decision trees; (2) Optimization of scoring systems; (3) Placing constraints into generalized additive models to encourage sparsity and better interpretability; (4) Modern case-based reasoning, including neural networks and matching for causal inference; (5) Complete supervised disentanglement of neural networks; (6) Complete or even partial unsupervised disentanglement of neural networks; (7) Dimensionality reduction for data visualization; (8) Machine learning models that can incorporate physics and other generative or causal constraints; (9) Characterization of the "Rashomon set" of good models; and (10) Interpretable reinforcement learning. This survey is suitable as a starting point for statisticians and computer scientists interested in working in interpretable machine learning.
△ Less
Submitted 9 July, 2021; v1 submitted 20 March, 2021;
originally announced March 2021.
-
Anomaly Detection and Sampling Cost Control via Hierarchical GANs
Authors:
Chen Zhong,
M. Cenk Gursoy,
Senem Velipasalar
Abstract:
Anomaly detection incurs certain sampling and sensing costs and therefore it is of great importance to strike a balance between the detection accuracy and these costs. In this work, we study anomaly detection by considering the detection of threshold crossings in a stochastic time series without the knowledge of its statistics. To reduce the sampling cost in this detection process, we propose the…
▽ More
Anomaly detection incurs certain sampling and sensing costs and therefore it is of great importance to strike a balance between the detection accuracy and these costs. In this work, we study anomaly detection by considering the detection of threshold crossings in a stochastic time series without the knowledge of its statistics. To reduce the sampling cost in this detection process, we propose the use of hierarchical generative adversarial networks (GANs) to perform nonuniform sampling. In order to improve the detection accuracy and reduce the delay in detection, we introduce a buffer zone in the operation of the proposed GAN-based detector. In the experiments, we analyze the performance of the proposed hierarchical GAN detector considering the metrics of detection delay, miss rates, average cost of error, and sampling ratio. We identify the tradeoffs in the performance as the buffer zone sizes and the number of GAN levels in the hierarchy vary. We also compare the performance with that of a sampling policy that approximately minimizes the sum of average costs of sampling and error given the parameters of the stochastic process. We demonstrate that the proposed GAN-based detector can have significant performance improvements in terms of detection delay and average cost of error with a larger buffer zone but at the cost of increased sampling rates.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Generalized and Scalable Optimal Sparse Decision Trees
Authors:
Jimmy Lin,
Chudi Zhong,
Diane Hu,
Cynthia Rudin,
Margo Seltzer
Abstract:
Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possi…
▽ More
Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art.
△ Less
Submitted 22 November, 2022; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Survival analysis as a classification problem
Authors:
Chenyang Zhong,
Robert Tibshirani
Abstract:
In this paper, we explore a method for treating survival analysis as a classification problem. The method uses a "stacking" idea that collects the features and outcomes of the survival data in a large data frame, and then treats it as a classification problem. In this framework, various statistical learning algorithms (including logistic regression, random forests, gradient boosting machines and n…
▽ More
In this paper, we explore a method for treating survival analysis as a classification problem. The method uses a "stacking" idea that collects the features and outcomes of the survival data in a large data frame, and then treats it as a classification problem. In this framework, various statistical learning algorithms (including logistic regression, random forests, gradient boosting machines and neural networks) can be applied to estimate the parameters and make predictions. For stacking with logistic regression, we show that this approach is approximately equivalent to the Cox proportional hazards model with both theoretical analysis and simulation studies. For stacking with other machine learning algorithms, we show through simulation studies that our method can outperform Cox proportional hazards model in terms of estimated survival curves. This idea is not new, but we believe that it should be better known by statistiicians and other data scientists.
△ Less
Submitted 26 September, 2019; v1 submitted 24 September, 2019;
originally announced September 2019.
-
A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access
Authors:
Chen Zhong,
Ziyang Lu,
M. Cenk Gursoy,
Senem Velipasalar
Abstract:
To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized mult…
▽ More
To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized multi-agent framework in the multi-user scenario. In both cases, we develop algorithms for the actor-critic deep reinforcement learning and evaluate the proposed learning policies via experiments and numerical results. In the single-user model, in order to evaluate the performance of the proposed channel access policy and the framework's tolerance against uncertainty, we explore different channel switching patterns and different switching probabilities. In the case of multiple users, we analyze the probabilities of each user accessing channels with favorable channel conditions and the probability of collision. We also address a time-varying environment to identify the adaptive ability of the proposed framework. Additionally, we provide comparisons (in terms of both the average reward and time efficiency) between the proposed actor-critic deep reinforcement learning framework, Deep-Q network (DQN) based approach, random access, and the optimal policy when the channel dynamics are known.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.