-
Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints
Authors:
Zifeng Zhao,
Feiyu Jiang,
Yi Yu
Abstract:
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model. The firm aims to maximize its revenue, i.e. minimize its regret over a clairvoyant that knows the model in advance. The demand model is a generalized linear model (GLM), allowing for a stochastic feature vector in $\mathbb R^d$ that en…
▽ More
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model. The firm aims to maximize its revenue, i.e. minimize its regret over a clairvoyant that knows the model in advance. The demand model is a generalized linear model (GLM), allowing for a stochastic feature vector in $\mathbb R^d$ that encodes product and consumer information. We first show that the optimal regret upper bound is of order $\sqrt{dT}$, up to a logarithmic factor, improving upon existing upper bounds in the literature by a $\sqrt{d}$ factor. This sharper rate is materialised by two algorithms: a confidence bound-type (supCB) algorithm and an explore-then-commit (ETC) algorithm. A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem with many arms based on a careful discretization. We further study contextual dynamic pricing under the local differential privacy (LDP) constraints. In particular, we propose a stochastic gradient descent based ETC algorithm that achieves an optimal regret upper bound of order $d\sqrt{T}/ε$, up to a logarithmic factor, where $ε>0$ is the privacy parameter. The regret upper bounds with and without LDP constraints are accompanied by newly constructed minimax lower bounds, which further characterize the cost of privacy. Extensive numerical experiments and a real data application on online lending are conducted to illustrate the efficiency and practical value of the proposed algorithms in dynamic pricing.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Conditioning diffusion models by explicit forward-backward bridging
Authors:
Adrien Corenflos,
Zheng Zhao,
Simo Särkkä,
Jens Sjölund,
Thomas B. Schön
Abstract:
Given an unconditional diffusion model $π(x, y)$, using it to perform conditional simulation $π(x \mid y)$ is still largely an open question and is typically achieved by learning conditional drifts to the denoising SDE after the fact. In this work, we express conditional simulation as an inference problem on an augmented space corresponding to a partial SDE bridge. This perspective allows us to im…
▽ More
Given an unconditional diffusion model $π(x, y)$, using it to perform conditional simulation $π(x \mid y)$ is still largely an open question and is typically achieved by learning conditional drifts to the denoising SDE after the fact. In this work, we express conditional simulation as an inference problem on an augmented space corresponding to a partial SDE bridge. This perspective allows us to implement efficient and principled particle Gibbs and pseudo-marginal samplers marginally targeting the conditional distribution $π(x \mid y)$. Contrary to existing methodology, our methods do not introduce any additional approximation to the unconditional diffusion model aside from the Monte Carlo error. We showcase the benefits and drawbacks of our approach on a series of synthetic and real data examples.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Predicting Future Change-points in Time Series
Authors:
Chak Fung Choi,
Chunxue Li,
Chun Yip Yau,
Zifeng Zhao
Abstract:
Change-point detection and estimation procedures have been widely developed in the literature. However, commonly used approaches in change-point analysis have mainly been focusing on detecting change-points within an entire time series (off-line methods), or quickest detection of change-points in sequentially observed data (on-line methods). Both classes of methods are concerned with change-points…
▽ More
Change-point detection and estimation procedures have been widely developed in the literature. However, commonly used approaches in change-point analysis have mainly been focusing on detecting change-points within an entire time series (off-line methods), or quickest detection of change-points in sequentially observed data (on-line methods). Both classes of methods are concerned with change-points that have already occurred. The arguably more important question of when future change-points may occur, remains largely unexplored. In this paper, we develop a novel statistical model that describes the mechanism of change-point occurrence. Specifically, the model assumes a latent process in the form of a random walk driven by non-negative innovations, and an observed process which behaves differently when the latent process belongs to different regimes. By construction, an occurrence of a change-point is equivalent to hitting a regime threshold by the latent process. Therefore, by predicting when the latent process will hit the next regime threshold, future change-points can be forecasted. The probabilistic properties of the model such as stationarity and ergodicity are established. A composite likelihood-based approach is developed for parameter estimation and model selection. Moreover, we construct the predictor and prediction interval for future change points based on the estimated model.
△ Less
Submitted 23 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
SNSeg: An R Package for Time Series Segmentation via Self-Normalization
Authors:
Shubo Sun,
Zifeng Zhao,
Feiyu Jiang,
Xiaofeng Shao
Abstract:
Time series segmentation aims to identify potential change-points in a sequence of temporally dependent data, so that the original sequence can be partitioned into several homogeneous subsequences. It is useful for modeling and predicting non-stationary time series and is widely applied in natural and social sciences. Existing segmentation methods primarily focus on only one type of parameter chan…
▽ More
Time series segmentation aims to identify potential change-points in a sequence of temporally dependent data, so that the original sequence can be partitioned into several homogeneous subsequences. It is useful for modeling and predicting non-stationary time series and is widely applied in natural and social sciences. Existing segmentation methods primarily focus on only one type of parameter changes such as mean and variance, and they typically depend on laborious tuning or smoothing parameters, which can be challenging to choose in practice. The self-normalization based change-point estimation framework SNCP by Zhao et al. (2022), however, offers users more flexibility and convenience as it allows for change-point estimation of different types of parameters (e.g. mean, variance, quantile and autocovariance) in a unified fashion, and requires effortless tuning. In this paper, the R package SNSeg is introduced to implement SNCP for segmentation of univariate and multivariate time series. An extension of SNCP, named SNHD, is also designed and implemented for change-point estimation in the mean vector of high-dimensional time series. The estimated changepoints as well as segmented time series are available with graphical tools. Detailed examples of SNSeg are given in simulations of multivariate autoregressive processes with change-points.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Comparison of the LASSO and Integrative LASSO with Penalty Factors (IPF-LASSO) methods for multi-omics data: Variable selection with Type I error control
Authors:
Charlotte Castel,
Zhi Zhao,
Magne Thoresen
Abstract:
Variable selection in relation to regression modeling has constituted a methodological problem for more than 60 years. Especially in the context of high-dimensional regression, develo** stable and reliable methods, algorithms, and computational tools for variable selection has become an important research topic. Omics data is one source of such high-dimensional data, characterized by diverse gen…
▽ More
Variable selection in relation to regression modeling has constituted a methodological problem for more than 60 years. Especially in the context of high-dimensional regression, develo** stable and reliable methods, algorithms, and computational tools for variable selection has become an important research topic. Omics data is one source of such high-dimensional data, characterized by diverse genomic layers, and an additional analytical challenge is how to integrate these layers into various types of analyses. While the IPF-LASSO model has previously explored the integration of multiple omics modalities for feature selection and prediction by introducing distinct penalty parameters for each modality, the challenge of incorporating heterogeneous data layers into variable selection with Type I error control remains an open problem. To address this problem, we applied stability selection as a method for variable selection with false positives control in both IPF-LASSO and regular LASSO. The objective of this study was to compare the LASSO algorithm with IPF-LASSO, investigating whether introducing different penalty parameters per omics modality could improve statistical power while controlling false positives. Two high-dimensional data structures were investigated, one with independent data and the other with correlated data. The different models were also illustrated using data from a study on breast cancer treatment, where the IPF-LASSO model was able to select some highly relevant clinical variables.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Integrating Active Learning in Causal Inference with Interference: A Novel Approach in Online Experiments
Authors:
Hongtao Zhu,
Sizhe Zhang,
Yang Su,
Zhenyu Zhao,
Nan Chen
Abstract:
In the domain of causal inference research, the prevalent potential outcomes framework, notably the Rubin Causal Model (RCM), often overlooks individual interference and assumes independent treatment effects. This assumption, however, is frequently misaligned with the intricate realities of real-world scenarios, where interference is not merely a possibility but a common occurrence. Our research e…
▽ More
In the domain of causal inference research, the prevalent potential outcomes framework, notably the Rubin Causal Model (RCM), often overlooks individual interference and assumes independent treatment effects. This assumption, however, is frequently misaligned with the intricate realities of real-world scenarios, where interference is not merely a possibility but a common occurrence. Our research endeavors to address this discrepancy by focusing on the estimation of direct and spillover treatment effects under two assumptions: (1) network-based interference, where treatments on neighbors within connected networks affect one's outcomes, and (2) non-random treatment assignments influenced by confounders. To improve the efficiency of estimating potentially complex effects functions, we introduce an novel active learning approach: Active Learning in Causal Inference with Interference (ACI). This approach uses Gaussian process to flexibly model the direct and spillover treatment effects as a function of a continuous measure of neighbors' treatment assignment. The ACI framework sequentially identifies the experimental settings that demand further data. It further optimizes the treatment assignments under the network interference structure using genetic algorithms to achieve efficient learning outcome. By applying our method to simulation data and a Tencent game dataset, we demonstrate its feasibility in achieving accurate effects estimations with reduced data requirements. This ACI approach marks a significant advancement in the realm of data efficiency for causal inference, offering a robust and efficient alternative to traditional methodologies, particularly in scenarios characterized by complex interference patterns.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Treatment Effect Estimation Amidst Dynamic Network Interference in Online Gaming Experiments
Authors:
Yu Zhu,
Zehang Richard Li,
Yang Su,
Zhenyu Zhao
Abstract:
The evolving landscape of online multiplayer gaming presents unique challenges in assessing the causal impacts of game features. Traditional A/B testing methodologies fall short due to complex player interactions, leading to violations of fundamental assumptions like the Stable Unit Treatment Value Assumption (SUTVA). Unlike traditional social networks with stable and long-term connections, networ…
▽ More
The evolving landscape of online multiplayer gaming presents unique challenges in assessing the causal impacts of game features. Traditional A/B testing methodologies fall short due to complex player interactions, leading to violations of fundamental assumptions like the Stable Unit Treatment Value Assumption (SUTVA). Unlike traditional social networks with stable and long-term connections, networks in online games are often dynamic and short-lived. Players are temporarily teamed up for the duration of a game, forming transient networks that dissolve once the game ends. This fleeting nature of interactions presents a new challenge compared with running experiments in a stable social network. This study introduces a novel framework for treatment effect estimation in online gaming environments, considering the dynamic and ephemeral network interference that occurs among players. We propose an innovative estimator tailored for scenarios where a completely randomized experimental design is implemented without explicit knowledge of network structures. Notably, our method facilitates post-hoc interference adjustment on experimental data, significantly reducing the complexities and costs associated with intricate experimental designs and randomization strategies. The proposed framework stands out for its ability to accommodate varying levels of interference, thereby yielding more accurate and robust estimations. Through comprehensive simulations set against a variety of interference scenarios, along with empirical validation using real-world data from a mobile gaming environment, we demonstrate the efficacy of our approach. This study represents a pioneering effort in exploring causal inference in user-randomized experiments impacted by dynamic network effects.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Optimal investment, consumption and life insurance decisions for households with consumption habits under the health shock risk
Authors:
Zhen Zhao,
Wei Liu,
Xiaoyi Tang
Abstract:
This paper investigates the optimal investment, consumption, and life insurance strategies for households under the impact of health shock risk. Considering the uncertainty of the future health status of family members, a non-homogeneous Markov process is used to model the health status of the breadwinner. Drawing upon the theory of habit formation, we investigate the influence of different consum…
▽ More
This paper investigates the optimal investment, consumption, and life insurance strategies for households under the impact of health shock risk. Considering the uncertainty of the future health status of family members, a non-homogeneous Markov process is used to model the health status of the breadwinner. Drawing upon the theory of habit formation, we investigate the influence of different consumption habits on households' investment, consumption, and life insurance strategies. Based on whether the breadwinner is alive or not, we formulate and solve the corresponding Hamilton-Jacobi-Bellman (HJB) equations for the two scenarios of breadwinner survival and breadwinner's demise, respectively, and obtain explicit expressions for the optimal investment, consumption, and life insurance strategies. Through sensitivity analysis, it has been shown that the presence of health shocks within households has a negative impact on investment and consumption decisions, while the formation of consumption habits increases household propensity for precautionary savings.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
A Unified Three-State Model Framework for Analysis of Treatment Crossover in Survival Trials
Authors:
Zile Zhao,
Ye Li,
Xiaodong Luo,
Ray Bai
Abstract:
We present a unified three-state model (TSM) framework for evaluating treatment effects in clinical trials in the presence of treatment crossover. Researchers have proposed diverse methodologies to estimate the treatment effect that would have hypothetically been observed if treatment crossover had not occurred. However, there is little work on understanding the connections between these different…
▽ More
We present a unified three-state model (TSM) framework for evaluating treatment effects in clinical trials in the presence of treatment crossover. Researchers have proposed diverse methodologies to estimate the treatment effect that would have hypothetically been observed if treatment crossover had not occurred. However, there is little work on understanding the connections between these different approaches from a statistical point of view. Our proposed TSM framework unifies existing methods, effectively identifying potential biases, model assumptions, and inherent limitations for each method. This can guide researchers in understanding when these methods are appropriate and choosing a suitable approach for their data. The TSM framework also facilitates the creation of new methods to adjust for confounding effects from treatment crossover. To illustrate this capability, we introduce a new imputation method that falls under its scope. Using a piecewise constant prior for the hazard, our proposed method directly estimates the hazard function with increased flexibility. Through simulation experiments, we demonstrate the performance of different approaches for estimating the treatment effects.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
False Discovery Rate Control For Structured Multiple Testing: Asymmetric Rules And Conformal Q-values
Authors:
Zinan Zhao,
Wenguang Sun
Abstract:
The effective utilization of structural information in data while ensuring statistical validity poses a significant challenge in false discovery rate (FDR) analyses. Conformal inference provides rigorous theory for grounding complex machine learning methods without relying on strong assumptions or highly idealized models. However, existing conformal methods have limitations in handling structured…
▽ More
The effective utilization of structural information in data while ensuring statistical validity poses a significant challenge in false discovery rate (FDR) analyses. Conformal inference provides rigorous theory for grounding complex machine learning methods without relying on strong assumptions or highly idealized models. However, existing conformal methods have limitations in handling structured multiple testing. This is because their validity requires the deployment of symmetric rules, which assume the exchangeability of data points and permutation-invariance of fitting algorithms. To overcome these limitations, we introduce the pseudo local index of significance (PLIS) procedure, which is capable of accommodating asymmetric rules and requires only pairwise exchangeability between the null conformity scores. We demonstrate that PLIS offers finite-sample guarantees in FDR control and the ability to assign higher weights to relevant data points. Numerical results confirm the effectiveness and robustness of PLIS and show improvements in power compared to existing model-free methods in various scenarios.
△ Less
Submitted 16 June, 2024; v1 submitted 26 November, 2023;
originally announced November 2023.
-
On Feynman--Kac training of partial Bayesian neural networks
Authors:
Zheng Zhao,
Sebastian Mair,
Thomas B. Schön,
Jens Sjölund
Abstract:
Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wh…
▽ More
Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. Using various synthetic and real-world datasets we show that our proposed training scheme outperforms the state of the art in terms of predictive performance.
△ Less
Submitted 27 February, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data
Authors:
**gtan Wang,
Xinyang Lu,
Zitong Zhao,
Zhongxiang Dai,
Chuan-Sheng Foo,
See-Kiong Ng,
Bryan Kian Hsiang Low
Abstract:
The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who…
▽ More
The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who contributed to the generation of a synthetic text by an LLM (source attribution) and (b) verify whether the text data from a data provider has been used to train an LLM (data provenance). In this paper, we show that both problems can be solved by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a WAtermarking for Source Attribution (WASA) framework that satisfies these key properties due to our algorithmic designs. Our WASA framework enables an LLM to learn an accurate map** from the texts of different data providers to their corresponding unique watermarks, which sets the foundation for effective source attribution (and hence data provenance). Extensive empirical evaluations show that our WASA framework achieves effective source attribution and data provenance.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Gaussian-Based Parametric Bijections For Automatic Projection Filters
Authors:
Muhammad F. Emzir,
Zheng Zhao,
Lahouari Cheded,
Simo Särkkä
Abstract:
The automatic projection filter is a recently developed numerical method for projection filtering that leverages sparse-grid integration and automatic differentiation. However, its accuracy is highly sensitive to the accuracy of the cumulant-generating function computed via the sparse-grid integration, which in turn is also sensitive to the choice of the bijection from the canonical hypercube to t…
▽ More
The automatic projection filter is a recently developed numerical method for projection filtering that leverages sparse-grid integration and automatic differentiation. However, its accuracy is highly sensitive to the accuracy of the cumulant-generating function computed via the sparse-grid integration, which in turn is also sensitive to the choice of the bijection from the canonical hypercube to the state space. In this paper, we propose two new adaptive parametric bijections for the automatic projection filter. The first bijection relies on the minimization of Kullback--Leibler divergence, whereas the second method employs the sparse-grid Gauss--Hermite quadrature. The two new bijections allow the sparse-grid nodes to adaptively move within the high-density region of the state space, resulting in a substantially improved approximation while using only a small number of quadrature nodes. The practical applicability of the methodology is illustrated in three simulated nonlinear filtering problems.
△ Less
Submitted 21 September, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Stochastic filtering with moment representation
Authors:
Zheng Zhao,
Juha Sarmavuori
Abstract:
Stochastic filtering refers to estimating the probability distribution of the latent stochastic process conditioned on the observed measurements in time. In this paper, we introduce a new class of convergent filters that represent the filtering distributions by their moments. The key enablement is a quadrature method that uses orthonormal polynomials spanned by the moments. We prove that this mome…
▽ More
Stochastic filtering refers to estimating the probability distribution of the latent stochastic process conditioned on the observed measurements in time. In this paper, we introduce a new class of convergent filters that represent the filtering distributions by their moments. The key enablement is a quadrature method that uses orthonormal polynomials spanned by the moments. We prove that this moment-based filter is asymptotically exact in the order of moments, and show that the filter is also computationally efficient and is in line with the state of the art.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection
Authors:
Zifeng Zhao,
Feiyu Jiang,
Yi Yu,
Xi Chen
Abstract:
We consider a high-dimensional dynamic pricing problem under non-stationarity, where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model with potential changes at unknown times. The demand model is assumed to be a high-dimensional generalized linear model (GLM), allowing for a feature vector in $\mathbb R^d$ that encodes products and consum…
▽ More
We consider a high-dimensional dynamic pricing problem under non-stationarity, where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model with potential changes at unknown times. The demand model is assumed to be a high-dimensional generalized linear model (GLM), allowing for a feature vector in $\mathbb R^d$ that encodes products and consumer information. To achieve optimal revenue (i.e., least regret), the firm needs to learn and exploit the unknown GLMs while monitoring for potential change-points. To tackle such a problem, we first design a novel penalized likelihood-based online change-point detection algorithm for high-dimensional GLMs, which is the first algorithm in the change-point literature that achieves optimal minimax localization error rate for high-dimensional GLMs. A change-point detection assisted dynamic pricing (CPDP) policy is further proposed and achieves a near-optimal regret of order $O(s\sqrt{Υ_T T}\log(Td))$, where $s$ is the sparsity level and $Υ_T$ is the number of change-points. This regret is accompanied with a minimax lower bound, demonstrating the optimality of CPDP (up to logarithmic factors). In particular, the optimality with respect to $Υ_T$ is seen for the first time in the dynamic pricing literature, and is achieved via a novel accelerated exploration mechanism. Extensive simulation experiments and a real data application on online lending illustrate the efficiency of the proposed policy and the importance and practical value of handling non-stationarity in dynamic pricing.
△ Less
Submitted 20 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Tutorial on survival modeling with applications to omics data
Authors:
Zhi Zhao,
John Zobolas,
Manuela Zucknick,
Tero Aittokallio
Abstract:
Motivation: Identification of genomic, molecular and clinical markers prognostic of patient survival is important for develo** personalized disease prevention, diagnostic and treatment approaches. Modern omics technologies have made it possible to investigate the prognostic impact of markers at multiple molecular levels, including genomics, epigenomics, transcriptomics, proteomics and metabolomi…
▽ More
Motivation: Identification of genomic, molecular and clinical markers prognostic of patient survival is important for develo** personalized disease prevention, diagnostic and treatment approaches. Modern omics technologies have made it possible to investigate the prognostic impact of markers at multiple molecular levels, including genomics, epigenomics, transcriptomics, proteomics and metabolomics, and how these potential risk factors complement clinical characterization of patient outcomes for survival prognosis. However, the massive sizes of the omics data sets, along with their correlation structures, pose challenges for studying relationships between the molecular information and patients' survival outcomes. Results: We present a general workflow for survival analysis that is applicable to high-dimensional omics data as inputs when identifying survival-associated features and validating survival models. In particular, we focus on the commonly used Cox-type penalized regressions and hierarchical Bayesian models for feature selection in survival analysis, which are are especially useful for high-dimensional data, but the framework is applicable more generally. Availability and implementation: A step-by-step R tutorial using The Cancer Genome Atlas survival and omics data for the execution and evaluation of survival models has been made available at https://ocbe-uio.github.io/survomics/survomics.html.
△ Less
Submitted 4 March, 2024; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection
Authors:
Lu** Liu,
Yi Ren,
Xize Cheng,
Rongjie Huang,
Chongxuan Li,
Zhou Zhao
Abstract:
Out-of-distribution (OOD) detection is a crucial task for ensuring the reliability and safety of deep learning. Currently, discriminator models outperform other methods in this regard. However, the feature extraction process used by discriminator models suffers from the loss of critical information, leaving room for bad cases and malicious attacks. In this paper, we introduce a new perceptron bias…
▽ More
Out-of-distribution (OOD) detection is a crucial task for ensuring the reliability and safety of deep learning. Currently, discriminator models outperform other methods in this regard. However, the feature extraction process used by discriminator models suffers from the loss of critical information, leaving room for bad cases and malicious attacks. In this paper, we introduce a new perceptron bias assumption that suggests discriminator models are more sensitive to certain features of the input, leading to the overconfidence problem. To address this issue, we propose a novel framework that combines discriminator and generation models and integrates diffusion models (DMs) into OOD detection. We demonstrate that the diffusion denoising process (DDP) of DMs serves as a novel form of asymmetric interpolation, which is well-suited to enhance the input and mitigate the overconfidence problem. The discriminator model features of OOD data exhibit sharp changes under DDP, and we utilize the norm of this change as the indicator score. Our experiments on CIFAR10, CIFAR100, and ImageNet show that our method outperforms SOTA approaches. Notably, for the challenging InD ImageNet and OOD species datasets, our method achieves an AUROC of 85.7, surpassing the previous SOTA method's score of 77.4. Our implementation is available at \url{https://github.com/lu**-liu/DiffOOD}.
△ Less
Submitted 3 June, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Learning Individual Treatment Effects under Heterogeneous Interference in Networks
Authors:
Ziyu Zhao,
Yuqi Bai,
Kun Kuang,
Ruoxuan Xiong,
Fei Wu
Abstract:
Estimates of individual treatment effects from networked observational data are attracting increasing attention these days. One major challenge in network scenarios is the violation of the stable unit treatment value assumption (SUTVA), which assumes that the treatment assignment of a unit does not influence others' outcomes. In network data, due to interference, the outcome of a unit is influence…
▽ More
Estimates of individual treatment effects from networked observational data are attracting increasing attention these days. One major challenge in network scenarios is the violation of the stable unit treatment value assumption (SUTVA), which assumes that the treatment assignment of a unit does not influence others' outcomes. In network data, due to interference, the outcome of a unit is influenced not only by its treatment (i.e., direct effects) but also by others' treatments (i.e., spillover effects). Furthermore, the influences from other units are always heterogeneous (e.g., friends with similar interests affect a person differently than friends with different interests). In this paper, we focus on the problem of estimating individual treatment effects (both direct and spillover effects) under heterogeneous interference. To address this issue, we propose a novel Dual Weighting Regression (DWR) algorithm by simultaneously learning attention weights that capture the heterogeneous interference and sample weights to eliminate the complex confounding bias in networks. We formulate the entire learning process as a bi-level optimization problem. In theory, we present generalization error bounds for individual treatment effect estimation. Extensive experiments on four benchmark datasets demonstrate that the proposed DWR algorithm outperforms state-of-the-art methods for estimating individual treatment effects under heterogeneous interference.
△ Less
Submitted 25 January, 2024; v1 submitted 25 October, 2022;
originally announced October 2022.
-
On the testing of multiple hypothesis in sliced inverse regression
Authors:
Zhigen Zhao,
Xin Xing
Abstract:
We consider the multiple testing of the general regression framework aiming at studying the relationship between a univariate response and a p-dimensional predictor. To test the hypothesis of the effect of each predictor, we construct an Angular Balanced Statistic (ABS) based on the estimator of the sliced inverse regression without assuming a model of the conditional distribution of the response.…
▽ More
We consider the multiple testing of the general regression framework aiming at studying the relationship between a univariate response and a p-dimensional predictor. To test the hypothesis of the effect of each predictor, we construct an Angular Balanced Statistic (ABS) based on the estimator of the sliced inverse regression without assuming a model of the conditional distribution of the response. According to the developed limiting distribution results in this paper, we have shown that ABS is asymptotically symmetric with respect to zero under the null hypothesis. We then propose a Model-free multiple Testing procedure using Angular balanced statistics (MTA) and show theoretically that the false discovery rate of this method is less than or equal to a designated level asymptotically. Numerical evidence has shown that the MTA method is much more powerful than its alternatives, subject to the control of the false discovery rate.
△ Less
Submitted 16 June, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Consistent Covariance estimation for stratum imbalances under minimization method for covariate-adaptive randomization
Authors:
Zixuan Zhao,
Yanglei Song,
Wenyu Jiang,
Dongsheng Tu
Abstract:
Pocock and Simon's minimization method is a popular approach for covariate-adaptive randomization in clinical trials. Valid statistical inference with data collected under the minimization method requires the knowledge of the limiting covariance matrix of within-stratum imbalances, whose existence is only recently established. In this work, we propose a bootstrap-based estimator for this limit and…
▽ More
Pocock and Simon's minimization method is a popular approach for covariate-adaptive randomization in clinical trials. Valid statistical inference with data collected under the minimization method requires the knowledge of the limiting covariance matrix of within-stratum imbalances, whose existence is only recently established. In this work, we propose a bootstrap-based estimator for this limit and establish its consistency, in particular, by Le Cam's third lemma. As an application, we consider in simulation studies adjustments to existing robust tests for treatment effects with survival data by the proposed estimator. It shows that the adjusted tests achieve a size close to the nominal level, and unlike other designs, the robust tests without adjustment may have an asymptotic size inflation issue under the minimization method.
△ Less
Submitted 26 December, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Change point inference in high-dimensional regression models under temporal dependence
Authors:
Haotian Xu,
Daren Wang,
Zifeng Zhao,
Yi Yu
Abstract:
This paper concerns about the limiting distributions of change point estimators, in a high-dimensional linear regression time series context, where a regression object $(y_t, X_t) \in \mathbb{R} \times \mathbb{R}^p$ is observed at every time point $t \in \{1, \ldots, n\}$. At unknown time points, called change points, the regression coefficients change, with the jump sizes measured in $\ell_2$-nor…
▽ More
This paper concerns about the limiting distributions of change point estimators, in a high-dimensional linear regression time series context, where a regression object $(y_t, X_t) \in \mathbb{R} \times \mathbb{R}^p$ is observed at every time point $t \in \{1, \ldots, n\}$. At unknown time points, called change points, the regression coefficients change, with the jump sizes measured in $\ell_2$-norm. We provide limiting distributions of the change point estimators in the regimes where the minimal jump size vanishes and where it remains a constant. We allow for both the covariate and noise sequences to be temporally dependent, in the functional dependence framework, which is the first time seen in the change point inference literature. We show that a block-type long-run variance estimator is consistent under the functional dependence, which facilitates the practical implementation of our derived limiting distributions. We also present a few important byproducts of our analysis, which are of their own interest. These include a novel variant of the dynamic programming algorithm to boost the computational efficiency, consistent change point localisation rates under temporal dependence and a new Bernstein inequality for data possessing functional dependence. Extensive numerical results are provided to support our theoretical results. The proposed methods are implemented in the R package \texttt{changepoints} \citep{changepoints_R}.
△ Less
Submitted 1 October, 2023; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Scalable Bayesian Inference for Detection and Deblending in Astronomical Images
Authors:
Derek Hansen,
Ismael Mendoza,
Run**g Liu,
Ziteng Pang,
Zhe Zhao,
Camille Avestruz,
Jeffrey Regier
Abstract:
We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS). BLISS is based on deep generative models, which embed neural networks within a Bayesian model. For posterior inference, BLISS uses a new form of variational inference known as Forward Amortized Variational Inference. The BLISS inference routine is…
▽ More
We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS). BLISS is based on deep generative models, which embed neural networks within a Bayesian model. For posterior inference, BLISS uses a new form of variational inference known as Forward Amortized Variational Inference. The BLISS inference routine is fast, requiring a single forward pass of the encoder networks on a GPU once the encoder networks are trained. BLISS can perform fully Bayesian inference on megapixel images in seconds, and produces highly accurate catalogs. BLISS is highly extensible, and has the potential to directly answer downstream scientific questions in addition to producing probabilistic catalogs.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Orthogonal Matrix Retrieval with Spatial Consensus for 3D Unknown-View Tomography
Authors:
Shuai Huang,
Mona Zehni,
Ivan Dokmanić,
Zhizhen Zhao
Abstract:
Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR…
▽ More
Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR) approaches based on matrix factorization, which, while elegant, either require side information about the density that is not available, or fail to be sufficiently robust. For OMR to break free from those restrictions, we propose to jointly recover the density map and the orthogonal matrices by requiring that they be mutually consistent. We regularize the resulting non-convex optimization problem by a denoised reference projection and a nonnegativity constraint. This is enabled by the new closed-form expressions for spatial autocorrelation features. Further, we design an easy-to-compute initial density map which effectively mitigates the non-convexity of the reconstruction problem. Experimental results show that the proposed OMR with spatial consensus is more robust and performs significantly better than the previous state-of-the-art OMR approach in the typical low-SNR scenario of 3D UVT.
△ Less
Submitted 10 June, 2023; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Multi-Frequency Joint Community Detection and Phase Synchronization
Authors:
Lingda Wang,
Zhizhen Zhao
Abstract:
This paper studies the joint community detection and phase synchronization problem on the \textit{stochastic block model with relative phase}, where each node is associated with an unknown phase angle. This problem, with a variety of real-world applications, aims to recover the cluster structure and associated phase angles simultaneously. We show this problem exhibits a \textit{``multi-frequency''…
▽ More
This paper studies the joint community detection and phase synchronization problem on the \textit{stochastic block model with relative phase}, where each node is associated with an unknown phase angle. This problem, with a variety of real-world applications, aims to recover the cluster structure and associated phase angles simultaneously. We show this problem exhibits a \textit{``multi-frequency''} structure by closely examining its maximum likelihood estimation (MLE) formulation, whereas existing methods are not originated from this perspective. To this end, two simple yet efficient algorithms that leverage the MLE formulation and benefit from the information across multiple frequencies are proposed. The former is a spectral method based on the novel multi-frequency column-pivoted QR factorization. The factorization applied to the top eigenvectors of the observation matrix provides key information about the cluster structure and associated phase angles. The second approach is an iterative multi-frequency generalized power method, where each iteration updates the estimation in a matrix-multiplication-then-projection manner. Numerical experiments show that our proposed algorithms significantly improve the ability of exactly recovering the cluster structure and the accuracy of the estimated phase angles, compared to state-of-the-art algorithms.
△ Less
Submitted 8 December, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Change-point Detection for Sparse and Dense Functional Data in General Dimensions
Authors:
Carlos Misael Madrid Padilla,
Daren Wang,
Zifeng Zhao,
Yi Yu
Abstract:
We study the problem of change-point detection and localisation for functional data sequentially observed on a general d-dimensional space, where we allow the functional curves to be either sparsely or densely sampled. Data of this form naturally arise in a wide range of applications such as biology, neuroscience, climatology, and finance. To achieve such a task, we propose a kernel-based algorith…
▽ More
We study the problem of change-point detection and localisation for functional data sequentially observed on a general d-dimensional space, where we allow the functional curves to be either sparsely or densely sampled. Data of this form naturally arise in a wide range of applications such as biology, neuroscience, climatology, and finance. To achieve such a task, we propose a kernel-based algorithm named functional seeded binary segmentation (FSBS). FSBS is computationally efficient, can handle discretely observed functional data, and is theoretically sound for heavy-tailed and temporally-dependent observations. Moreover, FSBS works for a general d-dimensional domain, which is the first in the literature of change-point estimation for functional data. We show the consistency of FSBS for multiple change-point estimations and further provide a sharp localisation error rate, which reveals an interesting phase transition phenomenon depending on the number of functional curves observed and the sampling frequency for each curve. Extensive numerical experiments illustrate the effectiveness of FSBS and its advantage over existing methods in the literature under various settings. A real data application is further conducted, where FSBS localises change-points of sea surface temperature patterns in the south Pacific attributed to El Nino.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Probabilistic Estimation of Instantaneous Frequencies of Chirp Signals
Authors:
Zheng Zhao,
Simo Särkkä,
Jens Sjölund,
Thomas B. Schön
Abstract:
We present a continuous-time probabilistic approach for estimating the chirp signal and its instantaneous frequency function when the true forms of these functions are not accessible. Our model represents these functions by non-linearly cascaded Gaussian processes represented as non-linear stochastic differential equations. The posterior distribution of the functions is then estimated with stochas…
▽ More
We present a continuous-time probabilistic approach for estimating the chirp signal and its instantaneous frequency function when the true forms of these functions are not accessible. Our model represents these functions by non-linearly cascaded Gaussian processes represented as non-linear stochastic differential equations. The posterior distribution of the functions is then estimated with stochastic filters and smoothers. We compute a (posterior) Cramér--Rao lower bound for the Gaussian process model, and derive a theoretical upper bound for the estimation error in the mean squared sense. The experiments show that the proposed method outperforms a number of state-of-the-art methods on a synthetic data. We also show that the method works out-of-the-box for two real-world datasets.
△ Less
Submitted 13 February, 2023; v1 submitted 12 May, 2022;
originally announced May 2022.
-
A new class of composite GBII regression models with varying threshold for modelling heavy-tailed data
Authors:
Zhengxiao Li,
Fei Wang,
Zhengtang Zhao
Abstract:
The four-parameter generalized beta distribution of the second kind (GBII) has been proposed for modelling insurance losses with heavy-tailed features. The aim of this paper is to present a parametric composite GBII regression modelling by splicing two GBII distributions using mode matching method. It is designed for simultaneous modeling of small and large claims and capturing the policyholder he…
▽ More
The four-parameter generalized beta distribution of the second kind (GBII) has been proposed for modelling insurance losses with heavy-tailed features. The aim of this paper is to present a parametric composite GBII regression modelling by splicing two GBII distributions using mode matching method. It is designed for simultaneous modeling of small and large claims and capturing the policyholder heterogeneity by introducing the covariates into the location parameter. In such cases, the threshold that splits two GBII distributions varies across individuals policyholders based on their risk features. The proposed regression modelling also contains a wide range of insurance loss distributions as the head and the tail respectively and provides the close-formed expressions for parameter estimation and model prediction. A simulation study is conducted to show the accuracy of the proposed estimation method and the flexibility of the regressions. Some illustrations of the applicability of the new class of distributions and regressions are provided with a Danish fire losses data set and a Chinese medical insurance claims data set, comparing with the results of competing models from the literature.
△ Less
Submitted 26 January, 2024; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Mathematically Quantifying Non-responsiveness of the 2021 Georgia Congressional Districting Plan
Authors:
Zhanzhan Zhao,
Cyrus Hettle,
Swati Gupta,
Jonathan Mattingly,
Dana Randall,
Gregory Herschlag
Abstract:
To audit political district maps for partisan gerrymandering, one may determine a baseline for the expected distribution of partisan outcomes by sampling an ensemble of maps. One approach to sampling is to use redistricting policy as a guide to precisely codify preferences between maps. Such preferences give rise to a probability distribution on the space of redistricting plans, and Metropolis-Has…
▽ More
To audit political district maps for partisan gerrymandering, one may determine a baseline for the expected distribution of partisan outcomes by sampling an ensemble of maps. One approach to sampling is to use redistricting policy as a guide to precisely codify preferences between maps. Such preferences give rise to a probability distribution on the space of redistricting plans, and Metropolis-Hastings methods allow one to sample ensembles of maps from the specified distribution. Although these approaches have nice theoretical properties and have successfully detected gerrymandering in legal settings, sampling from commonly-used policy-driven distributions is often computationally difficult. As of yet, there is no algorithm that can be used off-the-shelf for checking maps under generic redistricting criteria. In this work, we mitigate the computational challenges in a Metropolized-sampling technique through a parallel tempering method combined with ReCom[12] and, for the first time, validate that such techniques are effective on these problems at the scale of statewide precinct graphs for more policy informed measures. We develop these improvements through the first case study of district plans in Georgia. Our analysis projects that any election in Georgia will reliably elect 9 Republicans and 5 Democrats under the enacted plan. This result is largely fixed even as public opinion shifts toward either party and the partisan outcome of the enacted plan does not respond to the will of the people. Only 0.12% of the $\sim$160K plans in our ensemble were similarly non-responsive.
△ Less
Submitted 9 October, 2022; v1 submitted 12 March, 2022;
originally announced March 2022.
-
Pseudo Numerical Methods for Diffusion Models on Manifolds
Authors:
Lu** Liu,
Yi Ren,
Zhijie Lin,
Zhou Zhao
Abstract:
Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples. However, DDPMs require hundreds to thousands of iterations to produce final samples. Several prior works have successfully accelerated DDPMs through adjusting the variance schedule (e.g., Improved Denoising Diffusion Probabilistic Models) or the denoising equation (e.g., Denoising Di…
▽ More
Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples. However, DDPMs require hundreds to thousands of iterations to produce final samples. Several prior works have successfully accelerated DDPMs through adjusting the variance schedule (e.g., Improved Denoising Diffusion Probabilistic Models) or the denoising equation (e.g., Denoising Diffusion Implicit Models (DDIMs)). However, these acceleration methods cannot maintain the quality of samples and even introduce new noise at a high speedup rate, which limit their practicability. To accelerate the inference process while kee** the sample quality, we provide a fresh perspective that DDPMs should be treated as solving differential equations on manifolds. Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs). Specifically, we figure out how to solve differential equations on manifolds and show that DDIMs are simple cases of pseudo numerical methods. We change several classical numerical methods to corresponding pseudo numerical methods and find that the pseudo linear multi-step method is the best in most situations. According to our experiments, by directly using pre-trained models on Cifar10, CelebA and LSUN, PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs (20x speedup), significantly outperform DDIMs with 250 steps (by around 0.4 in FID) and have good generalization on different variance schedules. Our implementation is available at https://github.com/lu**-liu/PNDM.
△ Less
Submitted 31 October, 2022; v1 submitted 20 February, 2022;
originally announced February 2022.
-
Inform Product Change through Experimentation with Data-Driven Behavioral Segmentation
Authors:
Zhenyu Zhao,
Yan He,
Miao Chen
Abstract:
Online controlled experimentation is widely adopted for evaluating new features in the rapid development cycle for web products and mobile applications. Measurement of the overall experiment sample is a common practice to quantify the overall treatment effect. In order to understand why the treatment effect occurs in a certain way, segmentation becomes a valuable approach to a finer analysis of ex…
▽ More
Online controlled experimentation is widely adopted for evaluating new features in the rapid development cycle for web products and mobile applications. Measurement of the overall experiment sample is a common practice to quantify the overall treatment effect. In order to understand why the treatment effect occurs in a certain way, segmentation becomes a valuable approach to a finer analysis of experiment results. This paper introduces a framework for creating and utilizing user behavioral segments in online experimentation. By using the data of user engagement with individual product components as input, this method defines segments that are closely related to the features being evaluated in the product development cycle. With a real-world example, we demonstrate that the analysis with such behavioral segments offered deep, actionable insights that successfully informed product decision-making.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
A Spectral Method for Joint Community Detection and Orthogonal Group Synchronization
Authors:
Yifeng Fan,
Yuehaw Khoo,
Zhizhen Zhao
Abstract:
Community detection and orthogonal group synchronization are both fundamental problems with a variety of important applications in science and engineering. In this work, we consider the joint problem of community detection and orthogonal group synchronization which aims to recover the communities and perform synchronization simultaneously. To this end, we propose a simple algorithm that consists o…
▽ More
Community detection and orthogonal group synchronization are both fundamental problems with a variety of important applications in science and engineering. In this work, we consider the joint problem of community detection and orthogonal group synchronization which aims to recover the communities and perform synchronization simultaneously. To this end, we propose a simple algorithm that consists of a spectral decomposition step followed by a blockwise column pivoted QR factorization (CPQR). The proposed algorithm is efficient and scales linearly with the number of edges in the graph. We also leverage the recently developed `leave-one-out' technique to establish a near-optimal guarantee for exact recovery of the cluster memberships and stable recovery of the orthogonal transforms. Numerical experiments demonstrate the efficiency and efficacy of our algorithm and confirm our theoretical characterization of it.
△ Less
Submitted 15 September, 2022; v1 submitted 25 December, 2021;
originally announced December 2021.
-
Multidimensional Projection Filters via Automatic Differentiation and Sparse-Grid Integration
Authors:
Muhammad Fuady Emzir,
Zheng Zhao,
Simo Särkkä
Abstract:
The projection filter is a technique for approximating the solutions of optimal filtering problems. In projection filters, the Kushner--Stratonovich stochastic partial differential equation that governs the propagation of the optimal filtering density is projected to a manifold of parametric densities, resulting in a finite-dimensional stochastic differential equation. Despite the fact that projec…
▽ More
The projection filter is a technique for approximating the solutions of optimal filtering problems. In projection filters, the Kushner--Stratonovich stochastic partial differential equation that governs the propagation of the optimal filtering density is projected to a manifold of parametric densities, resulting in a finite-dimensional stochastic differential equation. Despite the fact that projection filters are capable of representing complicated probability densities, their current implementations are limited to Gaussian family or unidimensional filtering applications. This work considers a combination of numerical integration and automatic differentiation to construct projection filter algorithms for more generic problems. Specifically, we provide a detailed exposition of this combination for the manifold of the exponential family, and show how to apply the projection filter to multidimensional cases. We demonstrate numerically that based on comparison to a finite-difference solution to the Kushner--Stratonovich equation and a bootstrap particle filter with systematic resampling, the proposed algorithm retains an accurate approximation of the filtering density while requiring a comparatively low number of quadrature points. Due to the sparse-grid integration and automatic differentiation used to calculate the expected values of the natural statistics and the Fisher metric, the proposed filtering algorithms are highly scalable. They therefore are suitable to many applications in which the number of dimensions exceeds the practical limit of particle filters, but where the Gaussian-approximations are deemed unsatisfactory.
△ Less
Submitted 14 September, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Segmenting Time Series via Self-Normalization
Authors:
Zifeng Zhao,
Feiyu Jiang,
Xiaofeng Shao
Abstract:
We propose a novel and unified framework for change-point estimation in multivariate time series. The proposed method is fully nonparametric, enjoys effortless tuning and is robust to temporal dependence. One salient and distinct feature of the proposed method is its versatility, where it allows change-point detection for a broad class of parameters (such as mean, variance, correlation and quantil…
▽ More
We propose a novel and unified framework for change-point estimation in multivariate time series. The proposed method is fully nonparametric, enjoys effortless tuning and is robust to temporal dependence. One salient and distinct feature of the proposed method is its versatility, where it allows change-point detection for a broad class of parameters (such as mean, variance, correlation and quantile) in a unified fashion. At the core of our method, we couple the self-normalization (SN) based tests with a novel nested local-window segmentation algorithm, which seems new in the growing literature of change-point analysis. Due to the presence of an inconsistent long-run variance estimator in the SN test, non-standard theoretical arguments are further developed to derive the consistency and convergence rate of the proposed SN-based change-point detection method. Extensive numerical experiments and relevant real data analysis are conducted to illustrate the effectiveness and broad applicability of our proposed method in comparison with state-of-the-art approaches in the literature.
△ Less
Submitted 8 September, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
State-space deep Gaussian processes with applications
Authors:
Zheng Zhao
Abstract:
This thesis is mainly concerned with state-space approaches for solving deep (temporal) Gaussian process (DGP) regression problems. More specifically, we represent DGPs as hierarchically composed systems of stochastic differential equations (SDEs), and we consequently solve the DGP regression problem by using state-space filtering and smoothing methods. The resulting state-space DGP (SS-DGP) model…
▽ More
This thesis is mainly concerned with state-space approaches for solving deep (temporal) Gaussian process (DGP) regression problems. More specifically, we represent DGPs as hierarchically composed systems of stochastic differential equations (SDEs), and we consequently solve the DGP regression problem by using state-space filtering and smoothing methods. The resulting state-space DGP (SS-DGP) models generate a rich class of priors compatible with modelling a number of irregular signals/functions. Moreover, due to their Markovian structure, SS-DGPs regression problems can be solved efficiently by using Bayesian filtering and smoothing methods. The second contribution of this thesis is that we solve continuous-discrete Gaussian filtering and smoothing problems by using the Taylor moment expansion (TME) method. This induces a class of filters and smoothers that can be asymptotically exact in predicting the mean and covariance of stochastic differential equations (SDEs) solutions. Moreover, the TME method and TME filters and smoothers are compatible with simulating SS-DGPs and solving their regression problems. Lastly, this thesis features a number of applications of state-space (deep) GPs. These applications mainly include, (i) estimation of unknown drift functions of SDEs from partially observed trajectories and (ii) estimation of spectro-temporal features of signals.
△ Less
Submitted 24 November, 2021;
originally announced November 2021.
-
Non-linear Gaussian smoothing with Taylor moment expansion
Authors:
Zheng Zhao,
Simo Särkkä
Abstract:
This letter is concerned with solving continuous-discrete Gaussian smoothing problems by using the Taylor moment expansion (TME) scheme. In the proposed smoothing method, we apply the TME method to approximate the transition density of the stochastic differential equation in the dynamic model. Furthermore, we derive a theoretical error bound (in the mean square sense) of the TME smoothing estimate…
▽ More
This letter is concerned with solving continuous-discrete Gaussian smoothing problems by using the Taylor moment expansion (TME) scheme. In the proposed smoothing method, we apply the TME method to approximate the transition density of the stochastic differential equation in the dynamic model. Furthermore, we derive a theoretical error bound (in the mean square sense) of the TME smoothing estimates showing that the smoother is stable under weak assumptions. Numerical experiments are presented in order to illustrate practical use of the method.
△ Less
Submitted 4 November, 2021; v1 submitted 30 September, 2021;
originally announced October 2021.
-
Identifying Hidden Visits from Sparse Call Detail Record Data
Authors:
Zhan Zhao,
Haris N. Koutsopoulos,
**hua Zhao
Abstract:
Despite a large body of literature on trip inference using call detail record (CDR) data, a fundamental understanding of their limitations is lacking. In particular, because of the sparse nature of CDR data, users may travel to a location without being revealed in the data, which we refer to as a "hidden visit". The existence of hidden visits hinders our ability to extract reliable information abo…
▽ More
Despite a large body of literature on trip inference using call detail record (CDR) data, a fundamental understanding of their limitations is lacking. In particular, because of the sparse nature of CDR data, users may travel to a location without being revealed in the data, which we refer to as a "hidden visit". The existence of hidden visits hinders our ability to extract reliable information about human mobility and travel behavior from CDR data. In this study, we propose a data fusion approach to obtain labeled data for statistical inference of hidden visits. In the absence of complementary data, this can be accomplished by extracting labeled observations from more granular cellular data access records, and extracting features from voice call and text messaging records. The proposed approach is demonstrated using a real-world CDR dataset of 3 million users from a large Chinese city. Logistic regression, support vector machine, random forest, and gradient boosting are used to infer whether a hidden visit exists during a displacement observed from CDR data. The test results show significant improvement over the naive no-hidden-visit rule, which is an implicit assumption adopted by most existing studies. Based on the proposed model, we estimate that over 10% of the displacements extracted from CDR data involve hidden visits. The proposed data fusion method offers a systematic statistical approach to inferring individual mobility patterns based on telecommunication records.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning
Authors:
Hussein Hazimeh,
Zhe Zhao,
Aakanksha Chowdhery,
Maheswaran Sathiamoorthy,
Yihua Chen,
Rahul Mazumder,
Lichan Hong,
Ed H. Chi
Abstract:
The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness ca…
▽ More
The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness can lead to convergence and statistical performance issues when training with gradient-based methods. In this paper, we develop DSelect-k: a continuously differentiable and sparse gate for MoE, based on a novel binary encoding formulation. The gate can be trained using first-order methods, such as stochastic gradient descent, and offers explicit control over the number of experts to select. We demonstrate the effectiveness of DSelect-k on both synthetic and real MTL datasets with up to $128$ tasks. Our experiments indicate that DSelect-k can achieve statistically significant improvements in prediction and expert selection over popular MoE gates. Notably, on a real-world, large-scale recommender system, DSelect-k achieves over $22\%$ improvement in predictive performance compared to Top-k. We provide an open-source implementation of DSelect-k.
△ Less
Submitted 31 December, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Hierarchical Non-Stationary Temporal Gaussian Processes With $L^1$-Regularization
Authors:
Zheng Zhao,
Rui Gao,
Simo Särkkä
Abstract:
This paper is concerned with regularized extensions of hierarchical non-stationary temporal Gaussian processes (NSGPs) in which the parameters (e.g., length-scale) are modeled as GPs. In particular, we consider two commonly used NSGP constructions which are based on explicitly constructed non-stationary covariance functions and stochastic differential equations, respectively. We extend these NSGPs…
▽ More
This paper is concerned with regularized extensions of hierarchical non-stationary temporal Gaussian processes (NSGPs) in which the parameters (e.g., length-scale) are modeled as GPs. In particular, we consider two commonly used NSGP constructions which are based on explicitly constructed non-stationary covariance functions and stochastic differential equations, respectively. We extend these NSGPs by including $L^1$-regularization on the processes in order to induce sparseness. To solve the resulting regularized NSGP (R-NSGP) regression problem we develop a method based on the alternating direction method of multipliers (ADMM) and we also analyze its convergence properties theoretically. We also evaluate the performance of the proposed methods in simulated and real-world datasets.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Joint Community Detection and Rotational Synchronization via Semidefinite Programming
Authors:
Yifeng Fan,
Yuehaw Khoo,
Zhizhen Zhao
Abstract:
In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extendin…
▽ More
In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.
△ Less
Submitted 14 September, 2023; v1 submitted 12 May, 2021;
originally announced May 2021.
-
BayesSUR: An R package for high-dimensional multivariate Bayesian variable and covariance selection in linear regression
Authors:
Zhi Zhao,
Marco Banterle,
Leonardo Bottolo,
Sylvia Richardson,
Alex Lewin,
Manuela Zucknick
Abstract:
In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be studied with high-dimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced sev…
▽ More
In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be studied with high-dimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. We also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
BEAUTY Powered BEAST
Authors:
Kai Zhang,
Zhigen Zhao,
Wen Zhou
Abstract:
We study distribution-free goodness-of-fit tests with the proposed Binary Expansion Approximation of UniformiTY (BEAUTY) approach. This method generalizes the renowned Euler's formula, and approximates the characteristic function of any copula through a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many importa…
▽ More
We study distribution-free goodness-of-fit tests with the proposed Binary Expansion Approximation of UniformiTY (BEAUTY) approach. This method generalizes the renowned Euler's formula, and approximates the characteristic function of any copula through a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many important tests of independence via approximations from specific quadratic forms of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a robust power, we examine test statistics with data-adaptive weights, referred to as the Binary Expansion Adaptive Symmetry Test (BEAST). Using properties of the binary expansion filtration, we demonstrate that the Neyman-Pearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle provides a useful benchmark of feasible power. To approach this oracle power, we devise the BEAST through a regularized resampling approximation of the oracle test. The BEAST improves the empirical power of many existing tests against a wide spectrum of common alternatives and delivers a clear interpretation of dependency forms when significant.
△ Less
Submitted 16 October, 2023; v1 submitted 28 February, 2021;
originally announced March 2021.
-
Temporal Gaussian Process Regression in Logarithmic Time
Authors:
Adrien Corenflos,
Zheng Zhao,
Simo Särkkä
Abstract:
The aim of this article is to present a novel parallelization method for temporal Gaussian process (GP) regression problems. The method allows for solving GP regression problems in logarithmic O(log N) time, where N is the number of time steps. Our approach uses the state-space representation of GPs which in its original form allows for linear O(N) time GP regression by leveraging the Kalman filte…
▽ More
The aim of this article is to present a novel parallelization method for temporal Gaussian process (GP) regression problems. The method allows for solving GP regression problems in logarithmic O(log N) time, where N is the number of time steps. Our approach uses the state-space representation of GPs which in its original form allows for linear O(N) time GP regression by leveraging the Kalman filtering and smoothing methods. By using a recently proposed parallelization method for Bayesian filters and smoothers, we are able to reduce the linear computational complexity of the temporal GP regression problems into logarithmic span complexity. This ensures logarithmic time complexity when run on parallel hardware such as a graphics processing unit (GPU). We experimentally demonstrate the computational benefits on simulated and real datasets via our open-source implementation leveraging the GPflow framework.
△ Less
Submitted 17 May, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Multivariate Bayesian structured variable selection for pharmacogenomic studies
Authors:
Zhi Zhao,
Marco Banterle,
Alex Lewin,
Manuela Zucknick
Abstract:
Precision cancer medicine aims to determine the optimal treatment for each patient. In-vitro cancer drug sensitivity screens combined with multi-omics characterization of the cancer cells have become an important tool to achieve this aim. Analyzing such pharmacogenomic studies requires flexible and efficient joint statistical models for associating drug sensitivity with high-dimensional multi-omic…
▽ More
Precision cancer medicine aims to determine the optimal treatment for each patient. In-vitro cancer drug sensitivity screens combined with multi-omics characterization of the cancer cells have become an important tool to achieve this aim. Analyzing such pharmacogenomic studies requires flexible and efficient joint statistical models for associating drug sensitivity with high-dimensional multi-omics data. We propose a multivariate Bayesian structured variable selection model for sparse identification of omics features associated with multiple correlated drug responses. Since many anti-cancer drugs are designed for specific molecular targets, our approach makes use of known structure between responses and predictors, e.g. molecular pathways and related omics features targeted by specific drugs, via a Markov random field (MRF) prior for the latent indicator variables of the coefficients in sparse seemingly unrelated regression. The structure information included in the MRF prior can improve the model performance, i.e. variable selection and response prediction, compared to other common priors. In addition, we employ random effects to capture heterogeneity between cancer types in a pan-cancer setting. The proposed approach is validated by simulation studies and applied to the Genomics of Drug Sensitivity in Cancer data, which includes pharmacological profiling and multi-omics characterization of a large set of heterogeneous cell lines.
△ Less
Submitted 13 February, 2023; v1 submitted 14 January, 2021;
originally announced January 2021.
-
Individual Mobility Prediction: An Interpretable Activity-based Hidden Markov Approach
Authors:
Baichuan Mo,
Zhan Zhao,
Haris N. Koutsopoulos,
**hua Zhao
Abstract:
Individual mobility is driven by demand for activities with diverse spatiotemporal patterns, but existing methods for mobility prediction often overlook the underlying activity patterns. To address this issue, this study develops an activity-based modeling framework for individual mobility prediction. Specifically, an input-output hidden Markov model (IOHMM) framework is proposed to simultaneously…
▽ More
Individual mobility is driven by demand for activities with diverse spatiotemporal patterns, but existing methods for mobility prediction often overlook the underlying activity patterns. To address this issue, this study develops an activity-based modeling framework for individual mobility prediction. Specifically, an input-output hidden Markov model (IOHMM) framework is proposed to simultaneously predict the (continuous) time and (discrete) location of an individual's next trip using transit smart card data. The prediction task can be transformed into predicting the hidden activity duration and end location. Based on a case study of Hong Kong's metro system, we show that the proposed model can achieve similar prediction performance as the state-of-the-art long short-term memory (LSTM) model. Unlike LSTM, the proposed IOHMM model can also be used to analyze hidden activity patterns, which provides meaningful behavioral interpretation for why an individual makes a certain trip. Therefore, the activity-based prediction framework offers a way to preserve the predictive power of advanced machine learning methods while enhancing our ability to generate insightful behavioral explanations, which is useful for enhancing situational awareness in user-centric transportation applications such as personalized traveler information.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Functional Linear Regression with Mixed Predictors
Authors:
Daren Wang,
Zifeng Zhao,
Yi Yu,
Rebecca Willett
Abstract:
We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can…
▽ More
We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can accommodate functional variables observed on discrete sample points. Besides a conventional smoothness penalty, a group Lasso-type penalty is further imposed to induce sparsity in the high-dimensional vector predictors. We derive finite sample theoretical guarantees and show that the excess prediction risk of our estimator is minimax optimal. Furthermore, our analysis reveals an interesting phase transition phenomenon that the optimal excess risk is determined jointly by the smoothness and the sparsity of the functional regression coefficients. A novel efficient optimization algorithm based on iterative coordinate descent is devised to handle the smoothness and group penalties simultaneously. Simulation studies and real data applications illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature.
△ Less
Submitted 23 August, 2022; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Functional Autoregressive Processes in Reproducing Kernel Hilbert Spaces
Authors:
Daren Wang,
Zifeng Zhao,
Rebecca Willett,
Chun Yip Yau
Abstract:
We study the estimation and prediction of functional autoregressive~(FAR) processes, a statistical tool for modeling functional time series data. Due to the infinite-dimensional nature of FAR processes, the existing literature addresses its inference via dimension reduction and theoretical results therein require the (unrealistic) assumption of fully observed functional time series. We propose an…
▽ More
We study the estimation and prediction of functional autoregressive~(FAR) processes, a statistical tool for modeling functional time series data. Due to the infinite-dimensional nature of FAR processes, the existing literature addresses its inference via dimension reduction and theoretical results therein require the (unrealistic) assumption of fully observed functional time series. We propose an alternative inference framework based on Reproducing Kernel Hilbert Spaces~(RKHS). Specifically, a nuclear norm regularization method is proposed for estimating the transition operators of the FAR process directly from discrete samples of the functional time series. We derive a representer theorem for the FAR process, which enables infinite-dimensional inference without dimension reduction. Sharp theoretical guarantees are established under the (more realistic) assumption that we only have finite discrete samples of the FAR process. Extensive numerical experiments and a real data application of energy consumption prediction are further conducted to illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
Optimal Provable Robustness of Quantum Classification via Quantum Hypothesis Testing
Authors:
Maurice Weber,
Nana Liu,
Bo Li,
Ce Zhang,
Zhikuan Zhao
Abstract:
Quantum machine learning models have the potential to offer speedups and better predictive accuracy compared to their classical counterparts. However, these quantum algorithms, like their classical counterparts, have been shown to also be vulnerable to input perturbations, in particular for classification problems. These can arise either from noisy implementations or, as a worst-case type of noise…
▽ More
Quantum machine learning models have the potential to offer speedups and better predictive accuracy compared to their classical counterparts. However, these quantum algorithms, like their classical counterparts, have been shown to also be vulnerable to input perturbations, in particular for classification problems. These can arise either from noisy implementations or, as a worst-case type of noise, adversarial attacks. In order to develop defence mechanisms and to better understand the reliability of these algorithms, it is crucial to understand their robustness properties in presence of natural noise sources or adversarial manipulation. From the observation that measurements involved in quantum classification algorithms are naturally probabilistic, we uncover and formalize a fundamental link between binary quantum hypothesis testing and provably robust quantum classification. This link leads to a tight robustness condition which puts constraints on the amount of noise a classifier can tolerate, independent of whether the noise source is natural or adversarial. Based on this result, we develop practical protocols to optimally certify robustness. Finally, since this is a robustness condition against worst-case types of noise, our result naturally extends to scenarios where the noise source is known. Thus, we also provide a framework to study the reliability of quantum classification protocols beyond the adversarial, worst-case noise scenarios.
△ Less
Submitted 26 May, 2021; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Review of Machine-Learning Methods for RNA Secondary Structure Prediction
Authors:
Qi Zhao,
Zheng Zhao,
Xiaoya Fan,
Zhengwei Yuan,
Qian Mao,
Yudong Yao
Abstract:
Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagn…
▽ More
Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine-learning technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on machine-learning technologies and a tabularized summary of the most important methods in this field. The current pending issues in the field of RNA secondary structure prediction and future trends are also discussed.
△ Less
Submitted 31 August, 2020;
originally announced September 2020.
-
Small Towers Make Big Differences
Authors:
Yuyan Wang,
Zhe Zhao,
Bo Dai,
Christopher Fifty,
Dong Lin,
Lichan Hong,
Ed H. Chi
Abstract:
Multi-task learning aims at solving multiple machine learning tasks at the same time. A good solution to a multi-task learning problem should be generalizable in addition to being Pareto optimal. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization as a result of parameterization in multi-task deep learning models. As a multi-objective…
▽ More
Multi-task learning aims at solving multiple machine learning tasks at the same time. A good solution to a multi-task learning problem should be generalizable in addition to being Pareto optimal. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization as a result of parameterization in multi-task deep learning models. As a multi-objective optimization problem, enough parameterization is needed for handling task conflicts in a constrained solution space; however, from a multi-task generalization perspective, over-parameterization undermines the benefit of learning a shared representation which helps harder tasks or tasks with limited training examples. A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization. To this end, we propose a method of under-parameterized self-auxiliaries for multi-task models to achieve the best of both worlds. It is task-agnostic and works with other multi-task learning algorithms. Empirical results show that small towers of under-parameterized self-auxiliaries can make big differences in improving Pareto efficiency in various multi-task applications.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.