Search | arXiv e-print repository

Detecting and Identifying Selection Structure in Sequential Data

Authors: Yujia Zheng, Zeyu Tang, Yiwen Qiu, Bernhard Schölkopf, Kun Zhang

Abstract: We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportun… ▽ More We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportunity to provide a deeper insight into the hidden generation process, as it is a fundamental mechanism underlying what we observe. In particular, overlooking selection in sequential data can lead to an incomplete or overcomplicated inductive bias in modeling, such as assuming a universal autoregressive structure for all dependencies. Therefore, rather than merely viewing it as a bias, we explore the causal structure of selection in sequential data to delve deeper into the complete causal process. Specifically, we show that selection structure is identifiable without any parametric assumptions or interventional experiments. Moreover, even in cases where selection variables coexist with latent confounders, we still establish the nonparametric identifiability under appropriate structural conditions. Meanwhile, we also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies. The framework has been validated empirically on both synthetic data and real-world music. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: ICML 2024

arXiv:2406.02611 [pdf, other]

LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments

Authors: Zikun Ye, Hema Yoganarasimhan, Yufeng Zheng

Abstract: In the rapidly evolving digital content landscape, media firms and news publishers require automated and efficient methods to enhance user engagement. This paper introduces the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery. Leveraging a large-scale dataset from Upworthy, which… ▽ More In the rapidly evolving digital content landscape, media firms and news publishers require automated and efficient methods to enhance user engagement. This paper introduces the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery. Leveraging a large-scale dataset from Upworthy, which includes 17,681 headline A/B tests aimed at evaluating the performance of various headlines associated with the same article content, we first investigate three broad pure-LLM approaches: prompt-based methods, embedding-based classification models, and fine-tuned open-source LLMs. Our findings indicate that prompt-based approaches perform poorly, achieving no more than 65% accuracy in identifying the catchier headline among two options. In contrast, OpenAI-embedding-based classification models and fine-tuned Llama-3-8b models achieve comparable accuracy, around 82-84%, though still falling short of the performance of experimentation with sufficient traffic. We then introduce LOLA, which combines the best pure-LLM approach with the Upper Confidence Bound algorithm to adaptively allocate traffic and maximize clicks. Our numerical experiments on Upworthy data show that LOLA outperforms the standard A/B testing method (the current status quo at Upworthy), pure bandit algorithms, and pure-LLM approaches, particularly in scenarios with limited experimental traffic or numerous arms. Our approach is both scalable and broadly applicable to content experiments across a variety of digital settings where firms seek to optimize user engagement, including digital advertising and social media recommendations. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.11720 [pdf, other]

Estimating optimal tailored active surveillance strategy under interval censoring

Authors: Muxuan Liang, Yingqi Zhao, Daniel W. Lin, Matthew Cooperberg, Yingye Zheng

Abstract: Active surveillance (AS) using repeated biopsies to monitor disease progression has been a popular alternative to immediate surgical intervention in cancer care. However, a biopsy procedure is invasive and sometimes leads to severe side effects of infection and bleeding. To reduce the burden of repeated surveillance biopsies, biomarker-assistant decision rules are sought to replace the fix-for-all… ▽ More Active surveillance (AS) using repeated biopsies to monitor disease progression has been a popular alternative to immediate surgical intervention in cancer care. However, a biopsy procedure is invasive and sometimes leads to severe side effects of infection and bleeding. To reduce the burden of repeated surveillance biopsies, biomarker-assistant decision rules are sought to replace the fix-for-all regimen with tailored biopsy intensity for individual patients. Constructing or evaluating such decision rules is challenging. The key AS outcome is often ascertained subject to interval censoring. Furthermore, patients will discontinue their participation in the AS study once they receive a positive surveillance biopsy. Thus, patient dropout is affected by the outcomes of these biopsies. In this work, we propose a nonparametric kernel-based method to estimate the true positive rates (TPRs) and true negative rates (TNRs) of a tailored AS strategy, accounting for interval censoring and immediate dropouts. Based on these estimates, we develop a weighted classification framework to estimate the optimal tailored AS strategy and further incorporate the cost-benefit ratio for cost-effectiveness in medical decision-making. Theoretically, we provide a uniform generalization error bound of the derived AS strategy accommodating all possible trade-offs between TPRs and TNRs. Simulation and application to a prostate cancer surveillance study show the superiority of the proposed method. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 14 pages, 4 figures, 2 tables

arXiv:2405.00626 [pdf, other]

SARMA: Scalable Low-Rank High-Dimensional Autoregressive Moving Averages via Tensor Decomposition

Authors: Feiqing Huang, Kexin Lu, Yao Zheng

Abstract: Existing models for high-dimensional time series are overwhelmingly developed within the finite-order vector autoregressive (VAR) framework, whereas the more flexible vector autoregressive moving averages (VARMA) have been much less considered. This paper introduces a high-dimensional model for capturing VARMA dynamics, namely the Scalable ARMA (SARMA) model, by combining novel reparameterization… ▽ More Existing models for high-dimensional time series are overwhelmingly developed within the finite-order vector autoregressive (VAR) framework, whereas the more flexible vector autoregressive moving averages (VARMA) have been much less considered. This paper introduces a high-dimensional model for capturing VARMA dynamics, namely the Scalable ARMA (SARMA) model, by combining novel reparameterization and tensor decomposition techniques. To ensure identifiability and computational tractability, we first consider a reparameterization of the VARMA model and discover that this interestingly amounts to a Tucker-low-rank structure for the AR coefficient tensor along the temporal dimension. Motivated by this finding, we further consider Tucker decomposition across the response and predictor dimensions of the AR coefficient tensor, enabling factor extraction across variables and time lags. Additionally, we consider sparsity assumptions on the factor loadings to accomplish automatic variable selection and greater estimation efficiency. For the proposed model, we develop both rank-constrained and sparsity-inducing estimators. Algorithms and model selection methods are also provided. Simulation studies and empirical examples confirm the validity of our theory and advantages of our approaches over existing competitors. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2403.18540 [pdf, other]

skscope: Fast Sparsity-Constrained Optimization in Python

Authors: Zezhi Wang, ** Zhu, Peng Chen, Huiyang Peng, Xiaoke Zhang, Anran Wang, Yu Zheng, Junxian Zhu, Xueqin Wang

Abstract: Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two… ▽ More Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two examples in the paper, where sparse linear regression and trend filtering are addressed with just four lines of code. More importantly, skscope's efficient implementation allows state-of-the-art solvers to quickly attain the sparse solution regardless of the high dimensionality of parameter space. Numerical experiments reveal the available solvers in skscope can achieve up to 80x speedup on the competing relaxation solutions obtained via the benchmarked convex solver. skscope is published on the Python Package Index (PyPI) and Conda, and its source code is available at: https://github.com/abess-team/skscope. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 4 pages

arXiv:2402.05052 [pdf, other]

Causal Representation Learning from Multiple Distributions: A General Setting

Authors: Kun Zhang, Shaoan Xie, Ignavier Ng, Yujia Zheng

Abstract: In many problems, the measured variables (e.g., image pixels) are just mathematical functions of the hidden causal variables (e.g., the underlying concepts or objects). For the purpose of making predictions in changing environments or making proper changes to the system, it is helpful to recover the hidden causal variables $Z_i$ and their causal relations represented by graph $\mathcal{G}_Z$. This… ▽ More In many problems, the measured variables (e.g., image pixels) are just mathematical functions of the hidden causal variables (e.g., the underlying concepts or objects). For the purpose of making predictions in changing environments or making proper changes to the system, it is helpful to recover the hidden causal variables $Z_i$ and their causal relations represented by graph $\mathcal{G}_Z$. This problem has recently been known as causal representation learning. This paper is concerned with a general, completely nonparametric setting of causal representation learning from multiple distributions (arising from heterogeneous data or nonstationary time series), without assuming hard interventions behind distribution changes. We aim to develop general solutions in this fundamental case; as a by product, this helps see the unique benefit offered by other assumptions such as parametric causal models or hard interventions. We show that under the sparsity constraint on the recovered graph over the latent variables and suitable sufficient change conditions on the causal influences, interestingly, one can recover the moralized graph of the underlying directed acyclic graph, and the recovered latent variables and their relations are related to the underlying causal model in a specific, nontrivial way. In some cases, each latent variable can even be recovered up to component-wise transformations. Experimental results verify our theoretical claims. △ Less

Submitted 9 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.11001 [pdf, other]

A Versatile Causal Discovery Framework to Allow Causally-Related Hidden Variables

Authors: Xinshuai Dong, Biwei Huang, Ignavier Ng, Xiangchen Song, Yujia Zheng, Songyao **, Roberto Legaspi, Peter Spirtes, Kun Zhang

Abstract: Most existing causal discovery methods rely on the assumption of no latent confounders, limiting their applicability in solving real-life problems. In this paper, we introduce a novel, versatile framework for causal discovery that accommodates the presence of causally-related hidden variables almost everywhere in the causal network (for instance, they can be effects of observed variables), based o… ▽ More Most existing causal discovery methods rely on the assumption of no latent confounders, limiting their applicability in solving real-life problems. In this paper, we introduce a novel, versatile framework for causal discovery that accommodates the presence of causally-related hidden variables almost everywhere in the causal network (for instance, they can be effects of observed variables), based on rank information of covariance matrix over observed variables. We start by investigating the efficacy of rank in comparison to conditional independence and, theoretically, establish necessary and sufficient conditions for the identifiability of certain latent structural patterns. Furthermore, we develop a Rank-based Latent Causal Discovery algorithm, RLCD, that can efficiently locate hidden variables, determine their cardinalities, and discover the entire causal structure over both measured and hidden ones. We also show that, under certain graphical conditions, RLCD correctly identifies the Markov Equivalence Class of the whole latent causal graph asymptotically. Experimental results on both synthetic and real-world personality data sets demonstrate the efficacy of the proposed approach in finite-sample cases. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.09758 [pdf, other]

Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach

Authors: Ziliang Chen, Yongsen Zheng, Zhao-Rong Lai, Quanlong Guan, Liang Lin

Abstract: Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail… ▽ More Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. The \emph{fake invariance} severely endangers OOD generalization since the trustful objective can not be diagnosed and existing causal surgeries are invalid to rectify. In this paper, we review a IRL family (InvRat) under the Partially and Fully Informative Invariant Feature Structural Causal Models (PIIF SCM /FIIF SCM) respectively, to certify their weaknesses in representing fake invariant features, then, unify their causal diagrams to propose ReStructured SCM (RS-SCM). RS-SCM can ideally rebuild the spurious and the fake invariant features simultaneously. Given this, we further develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects. It can be easily implemented by a small feature selection subnet introduced in the IRL family, which is alternatively optimized to achieve our goal. Experiments verified the superiority of our approach to fight against the fake invariant issue across a variety of OOD generalization benchmarks. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: AAAI-2024

arXiv:2312.08670 [pdf, other]

Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

Authors: Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

Abstract: In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables… ▽ More In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables handle data from a holistic perspective, which cannot ensure the precision of causal effects in specific temporal and spatial dimensions. However, temporal and spatial dimensions are extremely critical in the logistics field, and this limitation may directly affect the precision of subsidy and pricing strategies. To address these issues, this study proposes a technique based on flexible temporal-spatial grid partitioning. Furthermore, based on the flexible grid partitioning technique, we further propose a continuous entropy balancing method in the temporal-spatial domain, which named TS-EBCT (Temporal-Spatial Entropy Balancing for Causal Continue Treatments). The method proposed in this paper has been tested on two simulation datasets and two real datasets, all of which have achieved excellent performance. In fact, after applying the TS-EBCT method to the intracity freight transportation field, the prediction accuracy of the causal effect has been significantly improved. It brings good business benefits to the company's subsidy and pricing strategies. △ Less

Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 10 pages;

arXiv:2311.00866 [pdf, other]

Generalizing Nonlinear ICA Beyond Structural Sparsity

Authors: Yujia Zheng, Kun Zhang

Abstract: Nonlinear independent component analysis (ICA) aims to uncover the true latent sources from their observable nonlinear mixtures. Despite its significance, the identifiability of nonlinear ICA is known to be impossible without additional assumptions. Recent advances have proposed conditions on the connective structure from sources to observed variables, known as Structural Sparsity, to achieve iden… ▽ More Nonlinear independent component analysis (ICA) aims to uncover the true latent sources from their observable nonlinear mixtures. Despite its significance, the identifiability of nonlinear ICA is known to be impossible without additional assumptions. Recent advances have proposed conditions on the connective structure from sources to observed variables, known as Structural Sparsity, to achieve identifiability in an unsupervised manner. However, the sparsity constraint may not hold universally for all sources in practice. Furthermore, the assumptions of bijectivity of the mixing process and independence among all sources, which arise from the setting of ICA, may also be violated in many real-world scenarios. To address these limitations and generalize nonlinear ICA, we propose a set of new identifiability results in the general settings of undercompleteness, partial sparsity and source dependence, and flexible grou** structures. Specifically, we prove identifiability when there are more observed variables than sources (undercomplete), and when certain sparsity and/or source independence assumptions are not met for some changing sources. Moreover, we show that even in cases with flexible grou** structures (e.g., part of the sources can be divided into irreducible independent groups with various sizes), appropriate identifiability results can also be established. Theoretical claims are supported empirically on both synthetic and real-world datasets. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2309.09371 [pdf, other]

Gibbs Sampling using Anti-correlation Gaussian Data Augmentation, with Applications to L1-ball-type Models

Authors: Yu Zheng, Leo L. Duan

Abstract: L1-ball-type priors are a recent generalization of the spike-and-slab priors. By transforming a continuous precursor distribution to the L1-ball boundary, it induces exact zeros with positive prior and posterior probabilities. With great flexibility in choosing the precursor and threshold distributions, we can easily specify models under structured sparsity, such as those with dependent probabilit… ▽ More L1-ball-type priors are a recent generalization of the spike-and-slab priors. By transforming a continuous precursor distribution to the L1-ball boundary, it induces exact zeros with positive prior and posterior probabilities. With great flexibility in choosing the precursor and threshold distributions, we can easily specify models under structured sparsity, such as those with dependent probability for zeros and smoothness among the non-zeros. Motivated to significantly accelerate the posterior computation, we propose a new data augmentation that leads to a fast block Gibbs sampling algorithm. The latent variable, named ``anti-correlation Gaussian'', cancels out the quadratic exponent term in the latent Gaussian distribution, making the parameters of interest conditionally independent so that they can be updated in a block. Compared to existing algorithms such as the No-U-Turn sampler, the new blocked Gibbs sampler has a very low computing cost per iteration and shows rapid mixing of Markov chains. We establish the geometric ergodicity guarantee of the algorithm in linear models. Further, we show useful extensions of our algorithm for posterior estimation of general latent Gaussian models, such as those involving multivariate truncated Gaussian or latent Gaussian process. △ Less

Submitted 5 April, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.07332 [pdf, other]

Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining

Authors: Xianghao Zhan, Qinmei Xu, Yuanning Zheng, Guangming Lu, Olivier Gevaert

Abstract: Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability met… ▽ More Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subty** breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2307.16405 [pdf, other]

Causal-learn: Causal Discovery in Python

Authors: Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang

Abstract: Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, m… ▽ More Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, $\textit{causal-learn}$ is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Journal ref: Journal of Machine Learning Research 25 (2024)

arXiv:2306.08794 [pdf, other]

doi 10.1093/jrsssb/qkad068

Quantile autoregressive conditional heteroscedasticity

Authors: Qianqian Zhu, Songhua Tan, Yao Zheng, Guodong Li

Abstract: This paper proposes a novel conditional heteroscedastic time series model by applying the framework of quantile regression processes to the ARCH(\infty) form of the GARCH model. This model can provide varying structures for conditional quantiles of the time series across different quantile levels, while including the commonly used GARCH model as a special case. The strict stationarity of the model… ▽ More This paper proposes a novel conditional heteroscedastic time series model by applying the framework of quantile regression processes to the ARCH(\infty) form of the GARCH model. This model can provide varying structures for conditional quantiles of the time series across different quantile levels, while including the commonly used GARCH model as a special case. The strict stationarity of the model is discussed. For robustness against heavy-tailed distributions, a self-weighted quantile regression (QR) estimator is proposed. While QR performs satisfactorily at intermediate quantile levels, its accuracy deteriorates at high quantile levels due to data scarcity. As a remedy, a self-weighted composite quantile regression (CQR) estimator is further introduced and, based on an approximate GARCH model with a flexible Tukey-lambda distribution for the innovations, we can extrapolate the high quantile levels by borrowing information from intermediate ones. Asymptotic properties for the proposed estimators are established. Simulation experiments are carried out to access the finite sample performance of the proposed methods, and an empirical example is presented to illustrate the usefulness of the new model. △ Less

Submitted 12 November, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Journal ref: Journal of the Royal Statistical Society Series B: Statistical Methodology,2023,85,1099-1127

arXiv:2306.06510 [pdf, other]

Partial Identifiability for Domain Adaptation

Authors: Ling**g Kong, Shaoan Xie, Weiran Yao, Yujia Zheng, Guangyi Chen, Petar Stojanov, Victor Akinwande, Kun Zhang

Abstract: Unsupervised domain adaptation is critical to many real-world applications where label information is unavailable in the target domain. In general, without further assumptions, the joint distribution of the features and the label is not identifiable in the target domain. To address this issue, we rely on the property of minimal changes of causal mechanisms across domains to minimize unnecessary in… ▽ More Unsupervised domain adaptation is critical to many real-world applications where label information is unavailable in the target domain. In general, without further assumptions, the joint distribution of the features and the label is not identifiable in the target domain. To address this issue, we rely on the property of minimal changes of causal mechanisms across domains to minimize unnecessary influences of distribution shifts. To encode this property, we first formulate the data-generating process using a latent variable model with two partitioned latent subspaces: invariant components whose distributions stay the same across domains and sparse changing components that vary across domains. We further constrain the domain shift to have a restrictive influence on the changing components. Under mild conditions, we show that the latent variables are partially identifiable, from which it follows that the joint distribution of data and labels in the target domain is also identifiable. Given the theoretical insights, we propose a practical domain adaptation framework called iMSDA. Extensive experimental results reveal that iMSDA outperforms state-of-the-art domain adaptation algorithms on benchmark datasets, demonstrating the effectiveness of our framework. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: ICML 2022

arXiv:2305.18410 [pdf, other]

Understanding Breast Cancer Survival: Using Causality and Language Models on Multi-omics Data

Authors: Mugariya Farooq, Shahad Hardan, Aigerim Zhumbhayeva, Yujia Zheng, Preslav Nakov, Kun Zhang

Abstract: The need for more usable and explainable machine learning models in healthcare increases the importance of develo** and utilizing causal discovery algorithms, which aim to discover causal relations by analyzing observational data. Explainable approaches aid clinicians and biologists in predicting the prognosis of diseases and suggesting proper treatments. However, very little research has been c… ▽ More The need for more usable and explainable machine learning models in healthcare increases the importance of develo** and utilizing causal discovery algorithms, which aim to discover causal relations by analyzing observational data. Explainable approaches aid clinicians and biologists in predicting the prognosis of diseases and suggesting proper treatments. However, very little research has been conducted at the crossroads between causal discovery, genomics, and breast cancer, and we aim to bridge this gap. Moreover, evaluation of causal discovery methods on real data is in general notoriously difficult because ground-truth causal relations are usually unknown, and accordingly, in this paper, we also propose to address the evaluation problem with large language models. In particular, we exploit suitable causal discovery algorithms to investigate how various perturbations in the genome can affect the survival of patients diagnosed with breast cancer. We used three main causal discovery algorithms: PC, Greedy Equivalence Search (GES), and a Generalized Precision Matrix-based one. We experiment with a subset of The Cancer Genome Atlas, which contains information about mutations, copy number variations, protein levels, and gene expressions for 705 breast cancer patients. Our findings reveal important factors related to the vital status of patients using causal discovery algorithms. However, the reliability of these results remains a concern in the medical domain. Accordingly, as another contribution of the work, the results are validated through language models trained on biomedical literature, such as BlueBERT and other large language models trained on medical corpora. Our results profess proper utilization of causal discovery algorithms and language models for revealing reliable causal relations for clinical applications. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.11379 [pdf, other]

Generalized Precision Matrix for Scalable Estimation of Nonparametric Markov Networks

Authors: Yujia Zheng, Ignavier Ng, Yewen Fan, Kun Zhang

Abstract: A Markov network characterizes the conditional independence structure, or Markov property, among a set of random variables. Existing work focuses on specific families of distributions (e.g., exponential families) and/or certain structures of graphs, and most of them can only handle variables of a single data type (continuous or discrete). In this work, we characterize the conditional independence… ▽ More A Markov network characterizes the conditional independence structure, or Markov property, among a set of random variables. Existing work focuses on specific families of distributions (e.g., exponential families) and/or certain structures of graphs, and most of them can only handle variables of a single data type (continuous or discrete). In this work, we characterize the conditional independence structure in general distributions for all data types (i.e., continuous, discrete, and mixed-type) with a Generalized Precision Matrix (GPM). Besides, we also allow general functional relations among variables, thus giving rise to a Markov network structure learning algorithm in one of the most general settings. To deal with the computational challenge of the problem, especially for large graphs, we unify all cases under the same umbrella of a regularized score matching framework. We validate the theoretical results and demonstrate the scalability empirically in various settings. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: ICLR 2023

arXiv:2303.06186 [pdf, other]

The impacts of remote work on travel: insights from nearly three years of monthly surveys

Authors: Nicholas S. Caros, Xiaotong Guo, Yunhan Zheng, **hua Zhao

Abstract: Remote work has expanded dramatically since 2020, upending longstanding travel patterns and behavior. More fundamentally, the flexibility for remote workers to choose when and where to work has created much stronger connections between travel behavior and organizational behavior. This paper uses a large and comprehensive monthly longitudinal survey over nearly three years to identify new trends in… ▽ More Remote work has expanded dramatically since 2020, upending longstanding travel patterns and behavior. More fundamentally, the flexibility for remote workers to choose when and where to work has created much stronger connections between travel behavior and organizational behavior. This paper uses a large and comprehensive monthly longitudinal survey over nearly three years to identify new trends in work location choice, mode choice and departure time of remote workers. The travel behavior of remote workers is found to be highly associated with employer characteristics, task characteristics, employer remote work policies, coordination between colleagues and attitudes towards remote work. Approximately one third of all remote work hours are shown to take place outside of the home, accounting for over one third of all commuting trips. These commutes to "third places" are shorter, less likely to occur during peak periods, and more likely to use sustainable travel modes than commutes to an employer's primary workplace. Hybrid work arrangements are also associated with a greater number of non-work trips than fully remote and fully in-person arrangements. Implications of this research for policy makers, shared mobility provides and land use planning are discussed. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.06012 [pdf, other]

Examining the interactions between working from home, travel behavior and change in car ownership due to the impact of COVID-19

Authors: Yunhan Zheng, Nicholas Caros, Jim Aloisi, **hua Zhao

Abstract: COVID-19 has disrupted society and changed how people learn, work and live. The availability of vaccines in the spring of 2021, however, led to a gradual return of many pre-pandemic activities in Massachusetts in the fall of 2021. Leveraging data that were collected using a map-based survey tool in the Greater Boston area in the fall of 2021, this study explores changes in travel behavior due to C… ▽ More COVID-19 has disrupted society and changed how people learn, work and live. The availability of vaccines in the spring of 2021, however, led to a gradual return of many pre-pandemic activities in Massachusetts in the fall of 2021. Leveraging data that were collected using a map-based survey tool in the Greater Boston area in the fall of 2021, this study explores changes in travel behavior due to COVID-19 and investigates the underlying factors contributing to these changes. First, a structural equation modeling technique is developed to capture the interactions between various travel choices, including working from home, travel mode use and change in car ownership. Moreover, attitudinal factors such as risk perceptions and attitudes towards WFH are incorporated into the framework to explain behavior changes. Second, a discrete choice modeling approach is taken to study shifts in commuting mode choices in the fall of 2021. The results show that in the fall of 2021, people became more likely to use their cars to commute, and for those who bought cars during the pandemic, they tended to work on-site more. Our findings can provide planners and policymakers with information upon which to base travel demand management decisions in the post-pandemic era. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.11756 [pdf, ps, other]

Learning Manifold Dimensions with Conditional Variational Autoencoders

Authors: Yijia Zheng, Tong He, Yixuan Qiu, David Wipf

Abstract: Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold. For example, while prior work has suggested that the globally optimal VAE solution can learn the correc… ▽ More Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold. For example, while prior work has suggested that the globally optimal VAE solution can learn the correct manifold dimension, a necessary (but not sufficient) condition for producing samples from the true data distribution, this has never been rigorously proven. Moreover, it remains unclear how such considerations would change when various types of conditioning variables are introduced, or when the data support is extended to a union of manifolds (e.g., as is likely the case for MNIST digits and related). In this work, we address these points by first proving that VAE global minima are indeed capable of recovering the correct manifold dimension. We then extend this result to more general CVAEs, demonstrating practical scenarios whereby the conditioning variables allow the model to adaptively learn manifolds of varying dimension across samples. Our analyses, which have practical implications for various CVAE design choices, are also supported by numerical results on both synthetic and real-world datasets. △ Less

Submitted 13 June, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: Published in NeurIPS 2022

arXiv:2211.09295 [pdf, other]

Testing for context-dependent changes in neural encoding in naturalistic experiments

Authors: Yenho Chen, Carl W. Harris, Xiaoyu Ma, Zheng Li, Francisco Pereira, Charles Y. Zheng

Abstract: We propose a decoding-based approach to detect context effects on neural codes in longitudinal neural recording data. The approach is agnostic to how information is encoded in neural activity, and can control for a variety of possible confounding factors present in the data. We demonstrate our approach by determining whether it is possible to decode location encoding from prefrontal cortex in the… ▽ More We propose a decoding-based approach to detect context effects on neural codes in longitudinal neural recording data. The approach is agnostic to how information is encoded in neural activity, and can control for a variety of possible confounding factors present in the data. We demonstrate our approach by determining whether it is possible to decode location encoding from prefrontal cortex in the mouse and, further, testing whether the encoding changes due to task engagement. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 39 pages, 13 figures

arXiv:2210.08053 [pdf, other]

Flexible Spatio-Temporal Hawkes Process Models for Earthquake Occurrences

Authors: Junhyeon Kwon, Yingcai Zheng, Mikyoung Jun

Abstract: Hawkes process is one of the most commonly used models for investigating the self-exciting nature of earthquake occurrences. However, seismicity patterns have complicated characteristics due to heterogeneous geology and stresses, for which existing methods with Hawkes process cannot fully capture. This study introduces novel nonparametric Hawkes process models that are flexible in three distinct w… ▽ More Hawkes process is one of the most commonly used models for investigating the self-exciting nature of earthquake occurrences. However, seismicity patterns have complicated characteristics due to heterogeneous geology and stresses, for which existing methods with Hawkes process cannot fully capture. This study introduces novel nonparametric Hawkes process models that are flexible in three distinct ways. First, we incorporate the spatial inhomogeneity of the self-excitation earthquake productivity. Second, we consider the anisotropy in aftershock occurrences. Third, we reflect the space-time interactions between aftershocks with a non-separable spatio-temporal triggering structure. For model estimation, we extend the model-independent stochastic declustering (MISD) algorithm and suggest substituting its histogram-based estimators with kernel methods. We demonstrate the utility of the proposed methods by applying them to the seismicity data in regions with active seismic activities. △ Less

Submitted 14 February, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 53 pages

MSC Class: 62P12

arXiv:2209.01172 [pdf, ps, other]

An Interpretable and Efficient Infinite-Order Vector Autoregressive Model for High-Dimensional Time Series

Authors: Yao Zheng

Abstract: As a special infinite-order vector autoregressive (VAR) model, the vector autoregressive moving average (VARMA) model can capture much richer temporal patterns than the widely used finite-order VAR model. However, its practicality has long been hindered by its non-identifiability, computational intractability, and difficulty of interpretation, especially for high-dimensional time series. This pape… ▽ More As a special infinite-order vector autoregressive (VAR) model, the vector autoregressive moving average (VARMA) model can capture much richer temporal patterns than the widely used finite-order VAR model. However, its practicality has long been hindered by its non-identifiability, computational intractability, and difficulty of interpretation, especially for high-dimensional time series. This paper proposes a novel sparse infinite-order VAR model for high-dimensional time series, which avoids all above drawbacks while inheriting essential temporal patterns of the VARMA model. As another attractive feature, the temporal and cross-sectional structures of the VARMA-type dynamics captured by this model can be interpreted separately, since they are characterized by different sets of parameters. This separation naturally motivates the sparsity assumption on the parameters determining the cross-sectional dependence. As a result, greater statistical efficiency and interpretability can be achieved with little loss of temporal information. We introduce two $\ell_1$-regularized estimation methods for the proposed model, which can be efficiently implemented via block coordinate descent algorithms, and derive the corresponding nonasymptotic error bounds. A consistent model order selection method based on the Bayesian information criteria is also developed. The merit of the proposed approach is supported by simulation studies and a real-world macroeconomic data analysis. △ Less

Submitted 24 February, 2024; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2206.07751 [pdf, other]

On the Identifiability of Nonlinear ICA: Sparsity and Beyond

Authors: Yujia Zheng, Ignavier Ng, Kun Zhang

Abstract: Nonlinear independent component analysis (ICA) aims to recover the underlying independent latent sources from their observable nonlinear mixtures. How to make the nonlinear ICA model identifiable up to certain trivial indeterminacies is a long-standing problem in unsupervised learning. Recent breakthroughs reformulate the standard independence assumption of sources as conditional independence give… ▽ More Nonlinear independent component analysis (ICA) aims to recover the underlying independent latent sources from their observable nonlinear mixtures. How to make the nonlinear ICA model identifiable up to certain trivial indeterminacies is a long-standing problem in unsupervised learning. Recent breakthroughs reformulate the standard independence assumption of sources as conditional independence given some auxiliary variables (e.g., class labels and/or domain/time indexes) as weak supervision or inductive bias. However, nonlinear ICA with unconditional priors cannot benefit from such developments. We explore an alternative path and consider only assumptions on the mixing process, such as Structural Sparsity. We show that under specific instantiations of such constraints, the independent latent sources can be identified from their nonlinear mixtures up to a permutation and a component-wise transformation, thus achieving nontrivial identifiability of nonlinear ICA without auxiliary variables. We provide estimation methods and validate the theoretical results experimentally. The results on image data suggest that our conditions may hold in a number of practical data generating processes. △ Less

Submitted 25 February, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022

arXiv:2205.00756 [pdf, other]

VICE: Variational Interpretable Concept Embeddings

Authors: Lukas Muttenthaler, Charles Y. Zheng, Patrick McClure, Robert A. Vandermeulen, Martin N. Hebart, Francisco Pereira

Abstract: A central goal in the cognitive sciences is the development of numerical models for mental representations of object concepts. This paper introduces Variational Interpretable Concept Embeddings (VICE), an approximate Bayesian method for embedding object concepts in a vector space using data collected from humans in a triplet odd-one-out task. VICE uses variational inference to obtain sparse, non-n… ▽ More A central goal in the cognitive sciences is the development of numerical models for mental representations of object concepts. This paper introduces Variational Interpretable Concept Embeddings (VICE), an approximate Bayesian method for embedding object concepts in a vector space using data collected from humans in a triplet odd-one-out task. VICE uses variational inference to obtain sparse, non-negative representations of object concepts with uncertainty estimates for the embedding values. These estimates are used to automatically select the dimensions that best explain the data. We derive a PAC learning bound for VICE that can be used to estimate generalization performance or determine a sufficient sample size for experimental design. VICE rivals or outperforms its predecessor, SPoSE, at predicting human behavior in the triplet odd-one-out task. Furthermore, VICE's object representations are more reproducible and consistent across random initializations, highlighting the unique advantage of using VICE for deriving interpretable embeddings from human behavior. △ Less

Submitted 6 October, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2204.05109 [pdf, other]

doi 10.1016/j.physa.2022.127837

Temporal and spatial evolution of the distribution related to the number of COVID-19 pandemic

Authors: Peng Liu, Yanyan Zheng

Abstract: This work systematically conducts a data analysis based on the numbers of both cumulative and daily confirmed COVID-19 cases and deaths in a time span through April 2020 to June 2022 for over 200 countries around the world. Such research feature aims to reveal the temporal and spatial evolution of the country-level distribution observed in COVID-19 pandemic, and obtains some interesting results as… ▽ More This work systematically conducts a data analysis based on the numbers of both cumulative and daily confirmed COVID-19 cases and deaths in a time span through April 2020 to June 2022 for over 200 countries around the world. Such research feature aims to reveal the temporal and spatial evolution of the country-level distribution observed in COVID-19 pandemic, and obtains some interesting results as follows. (1) The distributions of the numbers for cumulative confirmed cases and deaths obey power-law in early stages of COVID-19 and stretched exponential function in subsequent course. (2) The distributions of the numbers for daily confirmed cases and deaths obey power-law in early and late stages of COVID-19 and stretched exponential function in middle stages. The crossover region between power-law and stretched exponential behaviour seems to depend on the evolution of "infection" event and "death" event. Such observation implies a kind of important symmetry related to the dynamics process of COVID-19 spreading. (3) The distributions of the normalized numbers for each metric show a temporal scaling behaviour in 2-year period, and are well described by stretched exponential function. The observation of power-law and stretched exponential behaviour in such country-level distributions suggests underlying intrinsic dynamics of a virus spreading process in human interconnected society. And thus it is important for understanding and mathematically modeling the COVID-19 pandemic. △ Less

Submitted 23 August, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

Journal ref: Physica A 603, 127837 (2022)

arXiv:2204.04876 [pdf, other]

Lyapunov-Guided Representation of Recurrent Neural Network Performance

Authors: Ryan Vogt, Yang Zheng, Eli Shlizerman

Abstract: Recurrent Neural Networks (RNN) are ubiquitous computing systems for sequences and multivariate time series data. While several robust architectures of RNN are known, it is unclear how to relate RNN initialization, architecture, and other hyperparameters with accuracy for a given task. In this work, we propose to treat RNN as dynamical systems and to correlate hyperparameters with accuracy through… ▽ More Recurrent Neural Networks (RNN) are ubiquitous computing systems for sequences and multivariate time series data. While several robust architectures of RNN are known, it is unclear how to relate RNN initialization, architecture, and other hyperparameters with accuracy for a given task. In this work, we propose to treat RNN as dynamical systems and to correlate hyperparameters with accuracy through Lyapunov spectral analysis, a methodology specifically designed for nonlinear dynamical systems. To address the fact that RNN features go beyond the existing Lyapunov spectral analysis, we propose to infer relevant features from the Lyapunov spectrum with an Autoencoder and an embedding of its latent representation (AeLLE). Our studies of various RNN architectures show that AeLLE successfully correlates RNN Lyapunov spectrum with accuracy. Furthermore, the latent representation learned by AeLLE is generalizable to novel inputs from the same task and is formed early in the process of RNN training. The latter property allows for the prediction of the accuracy to which RNN would converge when training is complete. We conclude that representation of RNN through Lyapunov spectrum along with AeLLE provides a novel method for organization and interpretation of variants of RNN architectures. △ Less

Submitted 27 December, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: 26 pages, 7 figures, 4 tables

arXiv:2203.10750 [pdf, other]

WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses

Authors: Zewang Zhang, Yibin Zheng, Xinhui Li, Li Lu

Abstract: In this paper, we develop a new multi-singer Chinese neural singing voice synthesis (SVS) system named WeSinger. To improve the accuracy and naturalness of synthesized singing voice, we design several specifical modules and techniques: 1) A deep bi-directional LSTM-based duration model with multi-scale rhythm loss and post-processing step; 2) A Transformer-alike acoustic model with progressive pit… ▽ More In this paper, we develop a new multi-singer Chinese neural singing voice synthesis (SVS) system named WeSinger. To improve the accuracy and naturalness of synthesized singing voice, we design several specifical modules and techniques: 1) A deep bi-directional LSTM-based duration model with multi-scale rhythm loss and post-processing step; 2) A Transformer-alike acoustic model with progressive pitch-weighted decoder loss; 3) a 24 kHz pitch-aware LPCNet neural vocoder to produce high-quality singing waveforms; 4) A novel data augmentation method with multi-singer pre-training for stronger robustness and naturalness. To our knowledge, WeSinger is the first SVS system to adopt 24 kHz LPCNet and multi-singer pre-training simultaneously. Both quantitative and qualitative evaluation results demonstrate the effectiveness of WeSinger in terms of accuracy and naturalness, and WeSinger achieves state-of-the-art performance on the recent public Chinese singing corpus Opencpop\footnote{https://wenet.org.cn/opencpop/}. Some synthesized singing samples are available online\footnote{https://zzw922cn.github.io/wesinger/}. △ Less

Submitted 25 June, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: accepted at InterSpeech2022

arXiv:2201.13324 [pdf, other]

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Authors: Pengyu Li, Christine Tseng, Yaxuan Zheng, Joyce A. Chew, Longxiu Huang, Benjamin Jarman, Deanna Needell

Abstract: Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we… ▽ More Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF). △ Less

Submitted 31 January, 2022; originally announced January 2022.

Comments: 14 pages, 4 figures

arXiv:2201.05666 [pdf, other]

Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions

Authors: Ignavier Ng, Yujia Zheng, Jiji Zhang, Kun Zhang

Abstract: Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they d… ▽ More Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they do not scale well to large graphs. In this work, we introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. In particular, we develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness, and apply it to restrict the search space of exact search. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the super-structure. Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy. △ Less

Submitted 14 January, 2022; originally announced January 2022.

Comments: NeurIPS 2021. The code is available at https://github.com/ignavierng/local-astar

arXiv:2112.04857 [pdf, other]

A New Measure of Model Redundancy for Compressed Convolutional Neural Networks

Authors: Feiqing Huang, Yuefeng Si, Yao Zheng, Guodong Li

Abstract: While recently many designs have been proposed to improve the model efficiency of convolutional neural networks (CNNs) on a fixed resource budget, theoretical understanding of these designs is still conspicuously lacking. This paper aims to provide a new framework for answering the question: Is there still any remaining model redundancy in a compressed CNN? We begin by develo** a general statist… ▽ More While recently many designs have been proposed to improve the model efficiency of convolutional neural networks (CNNs) on a fixed resource budget, theoretical understanding of these designs is still conspicuously lacking. This paper aims to provide a new framework for answering the question: Is there still any remaining model redundancy in a compressed CNN? We begin by develo** a general statistical formulation of CNNs and compressed CNNs via the tensor decomposition, such that the weights across layers can be summarized into a single tensor. Then, through a rigorous sample complexity analysis, we reveal an important discrepancy between the derived sample complexity and the naive parameter counting, which serves as a direct indicator of the model redundancy. Motivated by this finding, we introduce a new model redundancy measure for compressed CNNs, called the $K/R$ ratio, which further allows for nonlinear activations. The usefulness of this new measure is supported by ablation studies on popular block designs and datasets. △ Less

Submitted 9 December, 2021; originally announced December 2021.

arXiv:2111.10103 [pdf, other]

Uncertainty-aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

Authors: Tong Sang, Hongyao Tang, Jianye Hao, Yan Zheng, Zhaopeng Meng

Abstract: Value estimation is one key problem in Reinforcement Learning. Albeit many successes have been achieved by Deep Reinforcement Learning (DRL) in different fields, the underlying structure and learning dynamics of value function, especially with complex function approximation, are not fully understood. In this paper, we report that decreasing rank of $Q$-matrix widely exists during learning process… ▽ More Value estimation is one key problem in Reinforcement Learning. Albeit many successes have been achieved by Deep Reinforcement Learning (DRL) in different fields, the underlying structure and learning dynamics of value function, especially with complex function approximation, are not fully understood. In this paper, we report that decreasing rank of $Q$-matrix widely exists during learning process across a series of continuous control tasks for different popular algorithms. We hypothesize that the low-rank phenomenon indicates the common learning dynamics of $Q$-matrix from stochastic high dimensional space to smooth low dimensional space. Moreover, we reveal a positive correlation between value matrix rank and value estimation uncertainty. Inspired by above evidence, we propose a novel Uncertainty-Aware Low-rank Q-matrix Estimation (UA-LQE) algorithm as a general framework to facilitate the learning of value function. Through quantifying the uncertainty of state-action value estimation, we selectively erase the entries of highly uncertain values in state-action value matrix and conduct low-rank matrix reconstruction for them to recover their values. Such a reconstruction exploits the underlying structure of value matrix to improve the value approximation, thus leading to a more efficient learning process of value function. In the experiments, we evaluate the efficacy of UA-LQE in several representative OpenAI MuJoCo continuous control tasks. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: This paper is accepted by The 3rd International Conference on Distributed Artificial Intelligence (DAI 2021, Shanghai, China)

arXiv:2109.12422 [pdf, other]

Equality of opportunity in travel behavior prediction with deep neural networks and discrete choice models

Authors: Yunhan Zheng, Shenhao Wang, **hua Zhao

Abstract: Although researchers increasingly adopt machine learning to model travel behavior, they predominantly focus on prediction accuracy, ignoring the ethical challenges embedded in machine learning algorithms. This study introduces an important missing dimension - computational fairness - to travel behavior analysis. We first operationalize computational fairness by equality of opportunity, then differ… ▽ More Although researchers increasingly adopt machine learning to model travel behavior, they predominantly focus on prediction accuracy, ignoring the ethical challenges embedded in machine learning algorithms. This study introduces an important missing dimension - computational fairness - to travel behavior analysis. We first operationalize computational fairness by equality of opportunity, then differentiate between the bias inherent in data and the bias introduced by modeling. We then demonstrate the prediction disparities in travel behavior modeling using the 2017 National Household Travel Survey (NHTS) and the 2018-2019 My Daily Travel Survey in Chicago. Empirically, deep neural network (DNN) and discrete choice models (DCM) reveal consistent prediction disparities across multiple social groups: both over-predict the false negative rate of frequent driving for the ethnic minorities, the low-income and the disabled populations, and falsely predict a higher travel burden of the socially disadvantaged groups and the rural populations than reality. Comparing DNN with DCM, we find that DNN can outperform DCM in prediction disparities because of DNN's smaller misspecification error. To mitigate prediction disparities, this study introduces an absolute correlation regularization method, which is evaluated with synthetic and real-world data. The results demonstrate the prevalence of prediction disparities in travel behavior modeling, and the disparities still persist regarding a variety of model specifics such as the number of DNN layers, batch size and weight initialization. Since these prediction disparities can exacerbate social inequity if prediction results without fairness adjustment are used for transportation policy making, we advocate for careful consideration of the fairness problem in travel behavior modeling, and the use of bias mitigation algorithms for fair transport decisions. △ Less

Submitted 25 September, 2021; originally announced September 2021.

arXiv:2106.10364 [pdf, other]

Bayesian decision theory for tree-based adaptive screening tests with an application to youth delinquency

Authors: Chelsea Krantsevich, P. Richard Hahn, Yi Zheng, Charles Katz

Abstract: Crime prevention strategies based on early intervention depend on accurate risk assessment instruments for identifying high risk youth. It is important in this context that the instruments be convenient to administer, which means, in particular, that they should also be reasonably brief; adaptive screening tests are useful for this purpose. Adaptive tests constructed using classification and regre… ▽ More Crime prevention strategies based on early intervention depend on accurate risk assessment instruments for identifying high risk youth. It is important in this context that the instruments be convenient to administer, which means, in particular, that they should also be reasonably brief; adaptive screening tests are useful for this purpose. Adaptive tests constructed using classification and regression trees are becoming a popular alternative to traditional Item Response Theory (IRT) approaches for adaptive testing. However, tree-based adaptive tests lack a principled criterion for terminating the test. This paper develops a Bayesian decision theory framework for measuring the trade-off between brevity and accuracy, when considering tree-based adaptive screening tests of different lengths. We also present a novel method for designing tree-based adaptive tests, motivated by this framework. The framework and associated adaptive test method are demonstrated through an application to youth delinquency risk assessment in Honduras; it is shown that an adaptive test requiring a subject to answer fewer than 10 questions can identify high risk youth nearly as accurately as an unabridged survey containing 173 items. △ Less

Submitted 27 June, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 22 pages, 10 figures

arXiv:2106.05260 [pdf, other]

Sirius: Visualization of Mixed Features as a Mutual Information Network Graph

Authors: Jane L. Adams, Todd F. Deluca, Christopher M. Danforth, Peter S. Dodds, Yuhang Zheng, Konstantinos Anastasakis, Boyoon Choi, Allison Min, Michael M. Bessey

Abstract: Data scientists across disciplines are increasingly in need of exploratory analysis tools for data sets with a high volume of features of mixed data type (quantitative continuous and discrete categorical). We introduce Sirius, a novel visualization package for researchers to explore feature relationships among mixed data types using mutual information. The visualization of feature relationships ai… ▽ More Data scientists across disciplines are increasingly in need of exploratory analysis tools for data sets with a high volume of features of mixed data type (quantitative continuous and discrete categorical). We introduce Sirius, a novel visualization package for researchers to explore feature relationships among mixed data types using mutual information. The visualization of feature relationships aids data scientists in finding meaningful dependence among features prior to the development of predictive modeling pipelines, which can inform downstream analysis such as feature selection, feature extraction, and early detection of potential proxy variables. Using an information theoretic approach, Sirius supports network visualization of heterogeneous data sets (consisting of continuous and discrete data types), and provides a user interface for exploring feature pairs with locally significant mutual information scores. Mutual information algorithm and bivariate chart types are assigned on a data type pairing basis (continuous-continuous, discrete-discrete, and discrete-continuous). We show how this tool can be used for tasks such as hypothesis confirmation, identification of predictive features, suggestions for feature extraction, or early warning of data abnormalities. The accompanying website for this paper can be accessed at https://sirius.universalities.com/. All code and supplemental materials can be accessed at https://osf.io/pdm9r/. △ Less

Submitted 13 August, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

ACM Class: H.5.2; J.0

arXiv:2106.05165 [pdf, other]

A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

Authors: Semih Cayci, Yilin Zheng, Atilla Eryilmaz

Abstract: In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by a stringent budget constraint on the available resources, which are consumed in a random amount by each action, and a stochastic feasibility constraint that may impose important operational limitations on decision-making. In this work, we consider a general… ▽ More In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by a stringent budget constraint on the available resources, which are consumed in a random amount by each action, and a stochastic feasibility constraint that may impose important operational limitations on decision-making. In this work, we consider a general model to address such problems, where each action returns a random reward, cost, and penalty from an unknown joint distribution, and the decision-maker aims to maximize the total reward under a budget constraint $B$ on the total cost and a stochastic constraint on the time-average penalty. We propose a novel low-complexity algorithm based on Lyapunov optimization methodology, named ${\tt LyOn}$, and prove that for $K$ arms it achieves $O(\sqrt{K B\log B})$ regret and zero constraint-violation when $B$ is sufficiently large. The low computational cost and sharp performance bounds of ${\tt LyOn}$ suggest that Lyapunov-based algorithm design methodology can be effective in solving constrained bandit optimization problems. △ Less

Submitted 23 January, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2106.02878 [pdf, other]

doi 10.1016/j.physa.2021.126557

A Generative Node-attribute Network Model for Detecting Generalized Structure

Authors: Wei Liu, Zhenhai Chang, Caiyan Jia, Yimei Zheng

Abstract: Exploring meaningful structural regularities embedded in networks is a key to understanding and analyzing the structure and function of a network. The node-attribute information can help improve such understanding and analysis. However, most of the existing methods focus on detecting traditional communities, i.e., grou**s of nodes with dense internal connections and sparse external ones. In this… ▽ More Exploring meaningful structural regularities embedded in networks is a key to understanding and analyzing the structure and function of a network. The node-attribute information can help improve such understanding and analysis. However, most of the existing methods focus on detecting traditional communities, i.e., grou**s of nodes with dense internal connections and sparse external ones. In this paper, based on the connectivity behavior of nodes and homogeneity of attributes, we propose a principle model (named GNAN), which can generate both topology information and attribute information. The new model can detect not only community structure, but also a range of other types of structure in networks, such as bipartite structure, core-periphery structure, and their mixture structure, which are collectively referred to as generalized structure. The proposed model that combines topological information and node-attribute information can detect communities more accurately than the model that only uses topology information. The dependency between attributes and communities can be automatically learned by our model and thus we can ignore the attributes that do not contain useful information. The model parameters are inferred by using the expectation-maximization algorithm. And a case study is provided to show the ability of our model in the semantic interpretability of communities. Experiments on both synthetic and real-world networks show that the new model is competitive with other state-of-the-art models. △ Less

Submitted 5 June, 2021; originally announced June 2021.

arXiv:2105.13745 [pdf, other]

Robust Regularization with Adversarial Labelling of Perturbed Samples

Authors: Xiaohui Guo, Richong Zhang, Yaowei Zheng, Yongyi Mao

Abstract: Recent researches have suggested that the predictive accuracy of neural network may contend with its adversarial robustness. This presents challenges in designing effective regularization schemes that also provide strong adversarial robustness. Revisiting Vicinal Risk Minimization (VRM) as a unifying regularization principle, we propose Adversarial Labelling of Perturbed Samples (ALPS) as a regula… ▽ More Recent researches have suggested that the predictive accuracy of neural network may contend with its adversarial robustness. This presents challenges in designing effective regularization schemes that also provide strong adversarial robustness. Revisiting Vicinal Risk Minimization (VRM) as a unifying regularization principle, we propose Adversarial Labelling of Perturbed Samples (ALPS) as a regularization scheme that aims at improving the generalization ability and adversarial robustness of the trained model. ALPS trains neural networks with synthetic samples formed by perturbing each authentic input sample towards another one along with an adversarially assigned label. The ALPS regularization objective is formulated as a min-max problem, in which the outer problem is minimizing an upper-bound of the VRM loss, and the inner problem is L$_1$-ball constrained adversarial labelling on perturbed sample. The analytic solution to the induced inner maximization problem is elegantly derived, which enables computational efficiency. Experiments on the SVHN, CIFAR-10, CIFAR-100 and Tiny-ImageNet datasets show that the ALPS has a state-of-the-art regularization performance while also serving as an effective adversarial training scheme. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: Accepted to IJCAI2021

arXiv:2104.02665 [pdf, other]

A new weighting method when not all the events are selected as cases in a nested case-control study

Authors: Qian M. Zhou, Xuan Wang, Yingye Zheng, Tianxi Cai

Abstract: Nested case-control (NCC) is a sampling method widely used for develo** and evaluating risk models with expensive biomarkers on large prospective cohort studies. The biomarker values are typically obtained on a sub-cohort, consisting of all the events and a subset of non-events. However, when the number of events is not small, it might not be affordable to measure the biomarkers on all of them.… ▽ More Nested case-control (NCC) is a sampling method widely used for develo** and evaluating risk models with expensive biomarkers on large prospective cohort studies. The biomarker values are typically obtained on a sub-cohort, consisting of all the events and a subset of non-events. However, when the number of events is not small, it might not be affordable to measure the biomarkers on all of them. Due to the costs and limited availability of bio-specimens, only a subset of events is selected to the sub-cohort as cases. For these "untypical" NCC studies, we propose a new weighting method for the inverse probability weighted (IPW) estimation. We also design a perturbation method to estimate the variance of the IPW estimator with our new weights. It accounts for between-subject correlations induced by the sampling processes for both cases and controls through perturbing their sampling indicator variables, and thus, captures all the variations. Furthermore, we demonstrate, analytically and numerically, that when cases consist of only a subset of events, our new weight produces more efficient IPW estimators than the weight proposed in Samuelsen (1997) for a standard NCC design. We illustrate the estimating procedure with a study that aims to evaluate a biomarker-based risk prediction model using the Framingham cohort study. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 27 pages,3 figures, 5 tables

arXiv:2101.04276 [pdf, other]

High-Dimensional Low-Rank Tensor Autoregressive Time Series Modeling

Authors: Di Wang, Yao Zheng, Guodong Li

Abstract: Modern technological advances have enabled an unprecedented amount of structured data with complex temporal dependence, urging the need for new methods to efficiently model and forecast high-dimensional tensor-valued time series. This paper provides a new modeling framework to accomplish this task via autoregression (AR). By considering a low-rank Tucker decomposition for the transition tensor, th… ▽ More Modern technological advances have enabled an unprecedented amount of structured data with complex temporal dependence, urging the need for new methods to efficiently model and forecast high-dimensional tensor-valued time series. This paper provides a new modeling framework to accomplish this task via autoregression (AR). By considering a low-rank Tucker decomposition for the transition tensor, the proposed tensor AR can flexibly capture the underlying low-dimensional tensor dynamics, providing both substantial dimension reduction and meaningful multi-dimensional dynamic factor interpretations. For this model, we first study several nuclear-norm-regularized estimation methods and derive their non-asymptotic properties under the approximate low-rank setting. In particular, by leveraging the special balanced structure of the transition tensor, a novel convex regularization approach based on the sum of nuclear norms of square matricizations is proposed to efficiently encourage low-rankness of the coefficient tensor. To further improve the estimation efficiency under exact low-rankness, a non-convex estimator is proposed with a gradient descent algorithm, and its computational and statistical convergence guarantees are established. Simulation studies and an empirical analysis of tensor-valued time series data from multi-category import-export networks demonstrate the advantages of the proposed approach. △ Less

Submitted 27 September, 2023; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: Accepted by Journal of Econometrics

arXiv:2012.13940 [pdf, other]

A Doubly Stochastic Simulator with Applications in Arrivals Modeling and Simulation

Authors: Yufeng Zheng, Zeyu Zheng, Tingyu Zhu

Abstract: We propose a framework that integrates classical Monte Carlo simulators and Wasserstein generative adversarial networks to model, estimate, and simulate a broad class of arrival processes with general non-stationary and multi-dimensional random arrival rates. Classical Monte Carlo simulators have advantages at capturing the interpretable "physics" of a stochastic object, whereas neural-network-bas… ▽ More We propose a framework that integrates classical Monte Carlo simulators and Wasserstein generative adversarial networks to model, estimate, and simulate a broad class of arrival processes with general non-stationary and multi-dimensional random arrival rates. Classical Monte Carlo simulators have advantages at capturing the interpretable "physics" of a stochastic object, whereas neural-network-based simulators have advantages at capturing less-interpretable complicated dependence within a high-dimensional distribution. We propose a doubly stochastic simulator that integrates a stochastic generative neural network and a classical Monte Carlo Poisson simulator, to utilize both advantages. Such integration brings challenges to both theoretical reliability and computational tractability for the estimation of the simulator given real data, where the estimation is done through minimizing the Wasserstein distance between the distribution of the simulation output and the distribution of real data. Regarding theoretical properties, we prove consistency and convergence rate for the estimated simulator under a non-parametric smoothness assumption. Regarding computational efficiency and tractability for the estimation procedure, we address a challenge in gradient evaluation that arise from the discontinuity in the Monte Carlo Poisson simulator. Numerical experiments with synthetic and real data sets are implemented to illustrate the performance of the proposed framework. △ Less

Submitted 9 June, 2023; v1 submitted 27 December, 2020; originally announced December 2020.

Comments: We appreciate a lot the comments and suggestions from anonymous reviewers and editors. This is updated version, and with title changed from "Doubly Stochastic Generative Arrivals Modeling" to "A Doubly Stochastic Simulator with Applications in Arrivals Modeling and Simulation"

arXiv:2012.10980 [pdf]

Measurement bias: a structural perspective

Authors: Yijie Li, Wei Fan, Miao Zhang, Lili Liu, Jiangbo Bao, Yingjie Zheng

Abstract: The causal structure for measurement bias (MB) remains controversial. Aided by the Directed Acyclic Graph (DAG), this paper proposes a new structure for measuring one singleton variable whose MB arises in the selection of an imperfect I/O device-like measurement system. For effect estimation, however, an extra source of MB arises from any redundant association between a measured exposure and a mea… ▽ More The causal structure for measurement bias (MB) remains controversial. Aided by the Directed Acyclic Graph (DAG), this paper proposes a new structure for measuring one singleton variable whose MB arises in the selection of an imperfect I/O device-like measurement system. For effect estimation, however, an extra source of MB arises from any redundant association between a measured exposure and a measured outcome. The misclassification will be bidirectionally differential for a common outcome, unidirectionally differential for a causal relation, and non-differential for a common cause between the measured exposure and the measured outcome or a null effect. The measured exposure can actually affect the measured outcome, or vice versa. Reverse causality is a concept defined at the level of measurement. Our new DAGs have clarified the structures and mechanisms of MB. △ Less

Submitted 23 December, 2020; v1 submitted 20 December, 2020; originally announced December 2020.

arXiv:2010.09077 [pdf, other]

A Spatial-Temporal Graph Based Hybrid Infectious Disease Model with Application to COVID-19

Authors: Yunling Zheng, Zhijian Li, Jack Xin, Guofa Zhou

Abstract: As the COVID-19 pandemic evolves, reliable prediction plays an important role for policy making. The classical infectious disease model SEIR (susceptible-exposed-infectious-recovered) is a compact yet simplistic temporal model. The data-driven machine learning models such as RNN (recurrent neural networks) can suffer in case of limited time series data such as COVID-19. In this paper, we combine S… ▽ More As the COVID-19 pandemic evolves, reliable prediction plays an important role for policy making. The classical infectious disease model SEIR (susceptible-exposed-infectious-recovered) is a compact yet simplistic temporal model. The data-driven machine learning models such as RNN (recurrent neural networks) can suffer in case of limited time series data such as COVID-19. In this paper, we combine SEIR and RNN on a graph structure to develop a hybrid spatio-temporal model to achieve both accuracy and efficiency in training and forecasting. We introduce two features on the graph structure: node feature (local temporal infection trend) and edge feature (geographic neighbor effect). For node feature, we derive a discrete recursion (called I-equation) from SEIR so that gradient descend method applies readily to its optimization. For edge feature, we design an RNN model to capture the neighboring effect and regularize the landscape of loss function so that local minima are effective and robust for prediction. The resulting hybrid model (called IeRNN) improves the prediction accuracy on state-level COVID-19 new case data from the US, out-performing standard temporal models (RNN, SEIR, and ARIMA) in 1-day and 7-day ahead forecasting. Our model accommodates various degrees of reopening and provides potential outcomes for policymakers. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2010.04925 [pdf, other]

Regularizing Neural Networks via Adversarial Model Perturbation

Authors: Yaowei Zheng, Richong Zhang, Yongyi Mao

Abstract: Effective regularization techniques are highly desired in deep learning for alleviating overfitting and improving generalization. This work proposes a new regularization scheme, based on the understanding that the flat local minima of the empirical risk cause the model to generalize better. This scheme is referred to as adversarial model perturbation (AMP), where instead of directly minimizing the… ▽ More Effective regularization techniques are highly desired in deep learning for alleviating overfitting and improving generalization. This work proposes a new regularization scheme, based on the understanding that the flat local minima of the empirical risk cause the model to generalize better. This scheme is referred to as adversarial model perturbation (AMP), where instead of directly minimizing the empirical risk, an alternative "AMP loss" is minimized via SGD. Specifically, the AMP loss is obtained from the empirical risk by applying the "worst" norm-bounded perturbation on each point in the parameter space. Comparing with most existing regularization schemes, AMP has strong theoretical justifications, in that minimizing the AMP loss can be shown theoretically to favour flat local minima of the empirical risk. Extensive experiments on various modern deep architectures establish AMP as a new state of the art among regularization schemes. Our code is available at https://github.com/hiyouga/AMP-Regularizer. △ Less

Submitted 7 May, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

Comments: 16 pages, 13 figures, accepted to CVPR2021

arXiv:2009.10989 [pdf, other]

Towards a Flexible Embedding Learning Framework

Authors: Chin-Chia Michael Yeh, Dhruv Gelda, Zhongfang Zhuang, Yan Zheng, Liang Gou, Wei Zhang

Abstract: Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these methods have pre-determined assumptions on the type of semantics captured by the learned embeddings, and the assumptions may not well align with specific downstre… ▽ More Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these methods have pre-determined assumptions on the type of semantics captured by the learned embeddings, and the assumptions may not well align with specific downstream tasks. In this work, we propose an embedding learning framework that 1) uses an input format that is agnostic to input data type, 2) is flexible in terms of the relationships that can be embedded into the learned representations, and 3) provides an intuitive pathway to incorporate domain knowledge into the embedding learning process. Our proposed framework utilizes a set of entity-relation-matrices as the input, which quantifies the affinities among different entities in the database. Moreover, a sampling mechanism is carefully designed to establish a direct connection between the input and the information captured by the output embeddings. To complete the representation learning toolbox, we also outline a simple yet effective post-processing technique to properly visualize the learned embeddings. Our empirical results demonstrate that the proposed framework, in conjunction with a set of relevant entity-relation-matrices, outperforms the existing state-of-the-art approaches in various data mining tasks. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: 10 pages

arXiv:2009.02623 [pdf, other]

Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback

Authors: Zifeng Wang, Xi Chen, Rui Wen, Shao-Lun Huang, Ercan E. Kuruoglu, Yefeng Zheng

Abstract: Counterfactual learning for dealing with missing-not-at-random data (MNAR) is an intriguing topic in the recommendation literature since MNAR data are ubiquitous in modern recommender systems. Missing-at-random (MAR) data, namely randomized controlled trials (RCTs), are usually required by most previous counterfactual learning methods for debiasing learning. However, the execution of RCTs is extra… ▽ More Counterfactual learning for dealing with missing-not-at-random data (MNAR) is an intriguing topic in the recommendation literature since MNAR data are ubiquitous in modern recommender systems. Missing-at-random (MAR) data, namely randomized controlled trials (RCTs), are usually required by most previous counterfactual learning methods for debiasing learning. However, the execution of RCTs is extraordinarily expensive in practice. To circumvent the use of RCTs, we build an information-theoretic counterfactual variational information bottleneck (CVIB), as an alternative for debiasing learning without RCTs. By separating the task-aware mutual information term in the original information bottleneck Lagrangian into factual and counterfactual parts, we derive a contrastive information loss and an additional output confidence penalty, which facilitates balanced learning between the factual and counterfactual domains. Empirical evaluation on real-world datasets shows that our CVIB significantly enhances both shallow and deep models, which sheds light on counterfactual learning in recommendation that goes beyond RCTs. △ Less

Submitted 17 October, 2020; v1 submitted 5 September, 2020; originally announced September 2020.

arXiv:2009.02152 [pdf, other]

doi 10.1016/j.cities.2020.102869

Evaluating the effect of city lock-down on controlling COVID-19 propagation through deep learning and network science models

Authors: Xiaoqi Zhang, Zheng Ji, Yanqiao Zheng, Xinyue Ye, Dong Li

Abstract: The special epistemic characteristics of the COVID-19, such as the long incubation period and the infection through asymptomatic cases, put severe challenge to the containment of its outbreak. By the end of March 2020, China has successfully controlled the within-spreading of COVID-19 at a high cost of locking down most of its major cities, including the epicenter, Wuhan. Since the low accuracy of… ▽ More The special epistemic characteristics of the COVID-19, such as the long incubation period and the infection through asymptomatic cases, put severe challenge to the containment of its outbreak. By the end of March 2020, China has successfully controlled the within-spreading of COVID-19 at a high cost of locking down most of its major cities, including the epicenter, Wuhan. Since the low accuracy of outbreak data before the mid of Feb. 2020 forms a major technical concern on those studies based on statistic inference from the early outbreak. We apply the supervised learning techniques to identify and train NP-Net-SIR model which turns out robust under poor data quality condition. By the trained model parameters, we analyze the connection between population flow and the cross-regional infection connection strength, based on which a set of counterfactual analysis is carried out to study the necessity of lock-down and substitutability between lock-down and the other containment measures. Our findings support the existence of non-lock-down-typed measures that can reach the same containment consequence as the lock-down, and provide useful guideline for the design of a more flexible containment strategy. △ Less

Submitted 4 September, 2020; originally announced September 2020.

Comments: 27 pages, 9 figures

Journal ref: [J]. Cities, 2020: 102869

arXiv:2008.06246 [pdf, other]

Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization

Authors: Chaojie Ji, Yijia Zheng, Ruxin Wang, Yunpeng Cai, Hongyan Wu

Abstract: Molecular optimization, which transforms a given input molecule X into another Y with desirable properties, is essential in molecular drug discovery. The traditional translating approaches, generating the molecular graphs from scratch by adding some substructures piece by piece, prone to error because of the large set of candidate substructures in a large number of steps to the final target. In th… ▽ More Molecular optimization, which transforms a given input molecule X into another Y with desirable properties, is essential in molecular drug discovery. The traditional translating approaches, generating the molecular graphs from scratch by adding some substructures piece by piece, prone to error because of the large set of candidate substructures in a large number of steps to the final target. In this study, we present a novel molecular optimization paradigm, Graph Polish, which changes molecular optimization from the traditional "two-language translating" task into a "single-language polishing" task. The key to this optimization paradigm is to find an optimization center subject to the conditions that the preserved areas around it ought to be maximized and thereafter the removed and added regions should be minimized. We then propose an effective and efficient learning framework T&S polish to capture the long-term dependencies in the optimization steps. The T component automatically identifies and annotates the optimization centers and the preservation, removal and addition of some parts of the molecule, and the S component learns these behaviors and applies these actions to a new molecule. Furthermore, the proposed paradigm can offer an intuitive interpretation for each molecular optimization result. Experiments with multiple optimization tasks are conducted on four benchmark datasets. The proposed T&S polish approach achieves significant advantage over the five state-of-the-art baseline methods on all the tasks. In addition, extensive studies are conducted to validate the effectiveness, explainability and time saving of the novel optimization paradigm. △ Less

Submitted 14 August, 2020; originally announced August 2020.

arXiv:2007.10929 [pdf, other]

A Recurrent Neural Network and Differential Equation Based Spatiotemporal Infectious Disease Model with Application to COVID-19

Authors: Zhijian Li, Yunling Zheng, Jack Xin, Guofa Zhou

Abstract: The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the world significantly. Modeling the trend of infection and real-time forecasting of cases can help decision making and control of the disease spread. However, data-driven methods such as recurrent neural networks (RNN) can perform poorly due to limited daily samples in time. In this work, we develop an integrated spatiotemporal m… ▽ More The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the world significantly. Modeling the trend of infection and real-time forecasting of cases can help decision making and control of the disease spread. However, data-driven methods such as recurrent neural networks (RNN) can perform poorly due to limited daily samples in time. In this work, we develop an integrated spatiotemporal model based on the epidemic differential equations (SIR) and RNN. The former after simplification and discretization is a compact model of temporal infection trend of a region while the latter models the effect of nearest neighboring regions. The latter captures latent spatial information. %that is not publicly reported. We trained and tested our model on COVID-19 data in Italy, and show that it out-performs existing temporal models (fully connected NN, SIR, ARIMA) in 1-day, 3-day, and 1-week ahead forecasting especially in the regime of limited training data. △ Less

Submitted 17 September, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2007.05120 [pdf, other]

Development and Validation of a Novel Prognostic Model for Predicting AMD Progression Using Longitudinal Fundus Images

Authors: Joshua Bridge, Simon P. Harding, Yalin Zheng

Abstract: Prognostic models aim to predict the future course of a disease or condition and are a vital component of personalized medicine. Statistical models make use of longitudinal data to capture the temporal aspect of disease progression; however, these models require prior feature extraction. Deep learning avoids explicit feature extraction, meaning we can develop models for images where features are e… ▽ More Prognostic models aim to predict the future course of a disease or condition and are a vital component of personalized medicine. Statistical models make use of longitudinal data to capture the temporal aspect of disease progression; however, these models require prior feature extraction. Deep learning avoids explicit feature extraction, meaning we can develop models for images where features are either unknown or impossible to quantify accurately. Previous prognostic models using deep learning with imaging data require annotation during training or only utilize a single time point. We propose a novel deep learning method to predict the progression of diseases using longitudinal imaging data with uneven time intervals, which requires no prior feature extraction. Given previous images from a patient, our method aims to predict whether the patient will progress onto the next stage of the disease. The proposed method uses InceptionV3 to produce feature vectors for each image. In order to account for uneven intervals, a novel interval scaling is proposed. Finally, a Recurrent Neural Network is used to prognosticate the disease. We demonstrate our method on a longitudinal dataset of color fundus images from 4903 eyes with age-related macular degeneration (AMD), taken from the Age-Related Eye Disease Study, to predict progression to late AMD. Our method attains a testing sensitivity of 0.878, a specificity of 0.887, and an area under the receiver operating characteristic of 0.950. We compare our method to previous methods, displaying superior performance in our model. Class activation maps display how the network reaches the final decision. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Showing 1–50 of 95 results for author: Zheng, Y