Search | arXiv e-print repository

Subgroup Identification with Latent Factor Structure

Authors: Yong He, Dong Liu, Fuxin Wang, Mingjuan Zhang, Wen-Xin Zhou

Abstract: Subgroup analysis has attracted growing attention due to its ability to identify meaningful subgroups from a heterogeneous population and thereby improving predictive power. However, in many scenarios such as social science and biology, the covariates are possibly highly correlated due to the existence of common factors, which brings great challenges for group identification and is neglected in th… ▽ More Subgroup analysis has attracted growing attention due to its ability to identify meaningful subgroups from a heterogeneous population and thereby improving predictive power. However, in many scenarios such as social science and biology, the covariates are possibly highly correlated due to the existence of common factors, which brings great challenges for group identification and is neglected in the existing literature. In this paper, we aim to fill this gap in the ``diverging dimension" regime and propose a center-augmented subgroup identification method under the Factor Augmented (sparse) Linear Model framework, which bridge dimension reduction and sparse regression together. The proposed method is flexible to the possibly high cross-sectional dependence among covariates and inherits the computational advantage with complexity $O(nK)$, in contrast to the $O(n^2)$ complexity of the conventional pairwise fusion penalty method in the literature, where $n$ is the sample size and $K$ is the number of subgroups. We also investigate the asymptotic properties of its oracle estimators under conditions on the minimal distance between group centroids. To implement the proposed approach, we introduce a Difference of Convex functions based Alternating Direction Method of Multipliers (DC-ADMM) algorithm and demonstrate its convergence to a local minimizer in finite steps. We illustrate the superiority of the proposed method through extensive numerical experiments and a real macroeconomic data example. An \texttt{R} package \texttt{SILFS} implementing the method is also available on CRAN. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2405.20678 [pdf, ps, other]

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Authors: Mengxiao Zhang, Ramiro Deo-Campo Vuong, Haipeng Luo

Abstract: We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-regret is possible after $T$ rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean).… ▽ More We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-regret is possible after $T$ rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. Specifically, in stochastic $N$-agent $K$-armed bandits, we develop an algorithm with $\widetilde{\mathcal{O}}\left(K^{\frac{2}{N}}T^{\frac{N-1}{N}}\right)$ regret and prove that the dependence on $T$ is tight, making it a sharp contrast to the $\sqrt{T}$-regret bounds of Hossain et al. [2021], Jones et al. [2023]. We then consider a more challenging version of the problem with adversarial rewards. Somewhat surprisingly, despite NSW being a concave function, we prove that no algorithm can achieve sublinear regret. To circumvent such negative results, we further consider a setting with full-information feedback and design two algorithms with $\sqrt{T}$-regret: the first one has no dependence on $N$ at all and is applicable to not just NSW but a broad class of welfare functions, while the second one has better dependence on $K$ and is preferable when $N$ is small. Finally, we also show that logarithmic regret is possible whenever there exists one agent who is indifferent about different arms. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20677 [pdf, other]

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Authors: Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

Abstract: Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with contex… ▽ More Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on this estimator, we propose two algorithms, one based on explore-then-exploit and the other based on inverse-gap weighting. We apply IGL to learning from image feedback and learning from text feedback, which are reward-free settings that arise in practice. Experimental results showcase the importance of using our Lipschitz reward estimator and the overall effectiveness of our algorithms. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.15038 [pdf, other]

Preferential Latent Space Models for Networks with Textual Edges

Authors: Maoyu Zhang, Biao Cai, Dong Li, Xiaoyue Niu, **gfei Zhang

Abstract: Many real-world networks contain rich textual information in the edges, such as email networks where an edge between two nodes is an email exchange. Other examples include co-author networks and social media networks. The useful textual information carried in the edges is often discarded in most network analyses, resulting in an incomplete view of the relationships between nodes. In this work, we… ▽ More Many real-world networks contain rich textual information in the edges, such as email networks where an edge between two nodes is an email exchange. Other examples include co-author networks and social media networks. The useful textual information carried in the edges is often discarded in most network analyses, resulting in an incomplete view of the relationships between nodes. In this work, we propose to represent the text document between each pair of nodes as a vector counting the appearances of keywords extracted from the corpus, and introduce a new and flexible preferential latent space network model that can offer direct insights on how contents of the textual exchanges modulate the relationships between nodes. We establish identifiability conditions for the proposed model and tackle model estimation with a computationally efficient projected gradient descent algorithm. We further derive the non-asymptotic error bound of the estimator from each step of the algorithm. The efficacy of our proposed method is demonstrated through simulations and an analysis of the Enron email network. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 30 pages

MSC Class: G.3; F.2 ACM Class: G.3

arXiv:2405.01425 [pdf, other]

In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

Authors: Yunbum Kook, Santosh S. Vempala, Matthew S. Zhang

Abstract: We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in Rényi divergence (which implies TV, $\mathcal{W}_2$, KL, $χ^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to… ▽ More We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in Rényi divergence (which implies TV, $\mathcal{W}_2$, KL, $χ^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to show contraction to the target distribution with the rate of convergence determined by functional isoperimetric constants of the stationary density. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 32 pages

arXiv:2404.06055 [pdf, ps, other]

Online/Offline Learning to Enable Robust Beamforming: Limited Feedback Meets Deep Generative Models

Authors: Ying Li, Zhidi Lin, Kai Li, Michael Minyi Zhang

Abstract: Robust beamforming is a pivotal technique in massive multiple-input multiple-output (MIMO) systems as it mitigates interference among user equipment (UE). One current risk-neutral approach to robust beamforming is the stochastic weighted minimum mean square error method (WMMSE). However, this method necessitates statistical channel information, which is typically inaccessible, particularly in fift… ▽ More Robust beamforming is a pivotal technique in massive multiple-input multiple-output (MIMO) systems as it mitigates interference among user equipment (UE). One current risk-neutral approach to robust beamforming is the stochastic weighted minimum mean square error method (WMMSE). However, this method necessitates statistical channel information, which is typically inaccessible, particularly in fifth-generation new radio frequency division duplex cellular systems with limited feedback. To tackle this challenge, we propose a novel approach that leverages a channel variational auto-encoder (CVAE) to simulate channel behaviors using limited feedback, eliminating the need for specific distribution assumptions present in existing methods. To seamlessly integrate model learning into practical wireless communication systems, this paper introduces two learning strategies to prepare the CVAE model for practical deployment. Firstly, motivated by the digital twin technology, we advocate employing a high-performance channel simulator to generate training data, enabling pretraining of the proposed CVAE while ensuring non-disruption to the practical wireless communication system. Moreover, we present an alternative online method for CVAE learning, where online training data is sourced based on channel estimations using Type II codebook. Numerical results demonstrate the effectiveness of these strategies, highlighting their exceptional performance in channel generation and robust beamforming applications. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.01697 [pdf, other]

Preventing Model Collapse in Gaussian Process Latent Variable Models

Authors: Ying Li, Zhidi Lin, Feng Yin, Michael Minyi Zhang

Abstract: Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the unde… ▽ More Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the underlying data structure. This paper addresses these issues by, first, theoretically examining the impact of projection variance on model collapse through the lens of a linear GPLVM. Second, we tackle model collapse due to inadequate kernel flexibility by integrating the spectral mixture (SM) kernel and a differentiable random Fourier feature (RFF) kernel approximation, which ensures computational scalability and efficiency through off-the-shelf automatic differentiation tools for learning the kernel hyperparameters, projection variance, and latent representations within the variational inference framework. The proposed GPLVM, named advisedRFLVM, is evaluated across diverse datasets and consistently outperforms various salient competing models, including state-of-the-art variational autoencoders (VAEs) and other GPLVM variants, in terms of informative latent representations and missing data imputation. △ Less

Submitted 18 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: International Conference on Machine Learning (ICML), 2024

arXiv:2402.14840 [pdf, other]

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

Authors: Congyun **, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, **jie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang

Abstract: Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-Me… ▽ More Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-MedDQA, a comprehensive benchmark in the field of medical specialization, which poses several challenges: comprehensively interpreting imgage content across diverse challenging layouts, possessing numerical reasoning ability to identify abnormal indicators and demonstrating clinical reasoning ability to provide statements of disease diagnosis, status and advice based on medical contexts. We carefully design the data generation pipeline and proposed the Efficient Structural Restoration Annotation (ESRA) Method, aimed at restoring textual and tabular content in medical report images. This method substantially enhances annotation efficiency, doubling the productivity of each annotator, and yields a 26.8% improvement in accuracy. We conduct extensive evaluations, including few-shot assessments of 5 LMMs which are capable of solving Chinese medical QA tasks. To further investigate the limitations and potential of current LMMs, we conduct comparative experiments on a set of strong LLMs by using image-text generated by ESRA method. We report the performance of baselines and offer several observations: (1) The overall performance of existing LMMs is still limited; however LMMs more robust to low-quality and diverse-structured images compared to LLMs. (3) Reasoning across context and image content present significant challenges. We hope this benchmark helps the community make progress on these challenging tasks in multi-modal medical document understanding and facilitate its application in healthcare. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 15 pages, 13 figures

arXiv:2402.07355 [pdf, ps, other]

Sampling from the Mean-Field Stationary Distribution

Authors: Yunbum Kook, Matthew S. Zhang, Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li

Abstract: We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to decouple the two key aspects of this problem: (1) approximation of the mean-field SDE via a finite-particle system, via uniform-in-time propagation of ch… ▽ More We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to decouple the two key aspects of this problem: (1) approximation of the mean-field SDE via a finite-particle system, via uniform-in-time propagation of chaos, and (2) sampling from the finite-particle stationary distribution, via standard log-concave samplers. Our approach is conceptually simpler and its flexibility allows for incorporating the state-of-the-art for both algorithms and theory. This leads to improved guarantees in numerous settings, including better guarantees for optimizing certain two-layer neural networks in the mean-field regime. △ Less

Submitted 18 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.03933 [pdf]

Development of a Evaluation Tool for Age-Appropriate Software in Aging Environments: A Delphi Study

Authors: Zhenggang Bai, Yougxiang Fang, Hongtu Chen, Xinru Chen, Ning An, Min Zhang, Guoxin Rui, **g **

Abstract: Objective: We aimed to develop a dependable reliable tool for assessing software ageappropriateness. Methods: We conducted a systematic review to get the indicators of technology ageappropriateness from studies from January 2000 to April 2023.This study engaged 25 experts from the fields of anthropology, sociology,and social technology research across, three rounds of Delphi consultations were con… ▽ More Objective: We aimed to develop a dependable reliable tool for assessing software ageappropriateness. Methods: We conducted a systematic review to get the indicators of technology ageappropriateness from studies from January 2000 to April 2023.This study engaged 25 experts from the fields of anthropology, sociology,and social technology research across, three rounds of Delphi consultations were conducted. Experts were asked to screen, assess, add and provide feedback on the preliminary indicators identified in the initial indicator pool. Result: We found 76 criterias for evaluating quality criteria was extracted, grouped into 11 distinct domains. After completing three rounds of Delphi consultations,experts drew upon their personal experiences,theoretical frameworks,and industry insights to arrive at a three-dimensional structure for the evaluation tooluser experience,product quality,and social promotion.These metrics were further distilled into a 16-item scale, and a corresponding questionnaire was formulated.The developed tool exhibited strong internal reliability(Cronbach's Alpha is 0.867)and content validity(S-CVI is 0.93). Conclusion: This tool represents a straightforward,objective,and reliable mechanism for evaluating software's appropriateness across age groups. Moreover,it offers valuable insights and practical guidance for designing and develo** of high-quality age-appropriate software,and assisst age groups to select software they like. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.03008 [pdf, other]

Diffusive Gibbs Sampling

Authors: Wenlin Chen, Mingtian Zhang, Brooks Paige, José Miguel Hernández-Lobato, David Barber

Abstract: The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and… ▽ More The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics. △ Less

Submitted 29 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted for publication at ICML 2024. Code available: https://github.com/Wenlin-Chen/DiGS

arXiv:2402.01055 [pdf, other]

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

Authors: Mingyuan Zhang, Shivani Agarwal

Abstract: There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for exam… ▽ More There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for example the H-mean, Q-mean and G-mean in class imbalance settings, and the Micro $F_1$ in information retrieval. In this paper, we design algorithms to learn from noisy labels for two broad classes of multiclass non-decomposable performance measures, namely, monotonic convex and ratio-of-linear, which encompass all the above examples. Our work builds on the Frank-Wolfe and Bisection based methods of Narasimhan et al. (2015). In both cases, we develop noise-corrected versions of the algorithms under the widely studied family of class-conditional noise models. We provide regret (excess risk) bounds for our algorithms, establishing that even though they are trained on noisy data, they are Bayes consistent in the sense that their performance converges to the optimal performance w.r.t. the clean (non-noisy) distribution. Our experiments demonstrate the effectiveness of our algorithms in handling label noise. △ Less

Submitted 23 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.04900 [pdf, other]

SPT: Spectral Transformer for Red Giant Stars Age and Mass Estimation

Authors: Mengmeng Zhang, Fan Wu, Yude Bu, Shanshan Li, Zhen** Yi, Meng Liu, Xiaoming Kong

Abstract: The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlap** isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel fr… ▽ More The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlap** isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel framework, Spectral Transformer (SPT), to predict the age and mass of red giants aligned with asteroseismology from their spectra. A key component of SPT, the Multi-head Hadamard Self-Attention mechanism, designed specifically for spectra, can capture complex relationships across different wavelength. Further, we introduced a Mahalanobis distance-based loss function to address scale imbalance and interaction mode loss, and incorporated Monte Carlo dropout for quantitative analysis of prediction uncertainty.Trained and tested on 3,880 red giant spectra from LAMOST, the SPT achieved remarkable age and mass estimations with average percentage errors of 17.64% and 6.61%, respectively, and provided uncertainties for each corresponding prediction. The results significantly outperform those of traditional machine learning algorithms and demonstrate a high level of consistency with asteroseismology methods and isochrone fitting techniques. In the future, our work will leverage datasets from the Chinese Space Station Telescope and the Large Synoptic Survey Telescope to enhance the precision of the model and broaden its applicability in the field of astronomy and astrophysics. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by A&A

arXiv:2401.04693 [pdf, other]

Co-Clustering Multi-View Data Using the Latent Block Model

Authors: Joshua Tobin, Michaela Black, James Ng, Debbie Rankin, Jonathan Wallace, Catherine Hughes, Leane Hoey, Adrian Moore, **ling Wang, Geraldine Horigan, Paul Carlin, Helene McNulty, Anne M Molloy, Mimi Zhang

Abstract: The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations.… ▽ More The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations. In this work, we introduce the multi-view LBM, extending the LBM method to multi-view data, where each view marginally follows an LBM. In the case of two views, the dependence between them is captured by a cluster membership matrix, and we aim to learn the structure of this matrix. We develop a likelihood-based approach in which parameter estimation uses a stochastic EM algorithm integrating a Gibbs sampler, and an ICL criterion is derived to determine the number of row and column clusters in each view. To motivate the application of multi-view methods, we extend recent work develo** hypothesis tests for the null hypothesis that clusters of observations in each view are independent of each other. The testing procedure is integrated into the model estimation strategy. Furthermore, we introduce a penalty scheme to generate sparse row clusterings. We verify the performance of the developed algorithm using synthetic datasets, and provide guidance for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2312.08670 [pdf, other]

Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

Authors: Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

Abstract: In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables… ▽ More In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables handle data from a holistic perspective, which cannot ensure the precision of causal effects in specific temporal and spatial dimensions. However, temporal and spatial dimensions are extremely critical in the logistics field, and this limitation may directly affect the precision of subsidy and pricing strategies. To address these issues, this study proposes a technique based on flexible temporal-spatial grid partitioning. Furthermore, based on the flexible grid partitioning technique, we further propose a continuous entropy balancing method in the temporal-spatial domain, which named TS-EBCT (Temporal-Spatial Entropy Balancing for Causal Continue Treatments). The method proposed in this paper has been tested on two simulation datasets and two real datasets, all of which have achieved excellent performance. In fact, after applying the TS-EBCT method to the intracity freight transportation field, the prediction accuracy of the causal effect has been significantly improved. It brings good business benefits to the company's subsidy and pricing strategies. △ Less

Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 10 pages;

arXiv:2311.18489 [pdf, other]

Sex-Specific Variances in Anatomy and Blood Flow of the Left Main Coronary Bifurcation: Implications for Coronary Artery Disease Risk

Authors: Ramtin Gharleghi, Mingzi Zhang, Dona Adikari, Lucy McGrath-Cadell, Robert M. Graham, Jolanda Wentzel, Mark Webster, Chris Ellis, Sze-Yuan Ooi, Susann Beier

Abstract: Studies have shown marked sex disparities in Coronary Artery Diseases (CAD) epidemiology, yet the underlying mechanisms remain unclear. We explored sex disparities in the coronary anatomy and the resulting haemodynamics in patients with suspected, but no significant CAD. Left Main (LM) bifurcations were reconstructed from CTCA images of 127 cases (42 males and 85 females, aged 38 to 81). Detailed… ▽ More Studies have shown marked sex disparities in Coronary Artery Diseases (CAD) epidemiology, yet the underlying mechanisms remain unclear. We explored sex disparities in the coronary anatomy and the resulting haemodynamics in patients with suspected, but no significant CAD. Left Main (LM) bifurcations were reconstructed from CTCA images of 127 cases (42 males and 85 females, aged 38 to 81). Detailed shape parameters were measured for comparison, including bifurcation angles, curvature, and diameters, before solving the haemodynamic metrics using CFD. The severity and location of the normalised vascular area exposed to physiologically adverse haemodynamics were statistically compared between sexes for all branches. We found significant differences between sexes in potentially adverse haemodynamics. Females were more likely than males to exhibit adversely low Time Averaged Endothelial Shear Stress along the inner wall of a bifurcation (16.8% vs. 10.7%). Males had a higher percentage of areas exposed to both adversely high Relative Residence Time (6.1% vs 4.2%, p=0.001) and high Oscillatory Shear Index (4.6% vs 2.3%, p<0.001). However, the OSI values were generally small and should be interpreted cautiously. Males had larger arteries (M vs F, LM: 4.0mm vs 3.3mm, LAD: 3.6mm 3.0mm, LCX:3.5mm vs 2.9mm), and females exhibited higher curvatures in all three branches (M vs F, LM: 0.40 vs 0.46, LAD: 0.45 vs 0.51, LCx: 0.47 vs 0.55, p<0.001) and larger inflow angle of the LM trunk (M: 12.9° vs F: 18.5°, p=0.025). Haemodynamic differences were found between male and female patients, which may contribute, at least in part, to differences in CAD risk. This work may facilitate a better understanding of sex differences in the clinical presentation of CAD, contributing to improved sex-specific screening, especially relevant for women with CAD who currently have worse predictive outcomes. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 14 pages, 5 figures

arXiv:2311.08690 [pdf, other]

Enabling CMF Estimation in Data-Constrained Scenarios: A Semantic-Encoding Knowledge Mining Model

Authors: Yanlin Qi, Jia Li, Michael Zhang

Abstract: Precise estimation of Crash Modification Factors (CMFs) is central to evaluating the effectiveness of various road safety treatments and prioritizing infrastructure investment accordingly. While customized study for each countermeasure scenario is desired, the conventional CMF estimation approaches rely heavily on the availability of crash data at given sites. This not only makes the estimation co… ▽ More Precise estimation of Crash Modification Factors (CMFs) is central to evaluating the effectiveness of various road safety treatments and prioritizing infrastructure investment accordingly. While customized study for each countermeasure scenario is desired, the conventional CMF estimation approaches rely heavily on the availability of crash data at given sites. This not only makes the estimation costly, but the results are also less transferable, since the intrinsic similarities between different safety countermeasure scenarios are not fully explored. Aiming to fill this gap, this study introduces a novel knowledge-mining framework for CMF prediction. This framework delves into the connections of existing countermeasures and reduces the reliance of CMF estimation on crash data availability and manual data collection. Specifically, it draws inspiration from human comprehension processes and introduces advanced Natural Language Processing (NLP) techniques to extract intricate variations and patterns from existing CMF knowledge. It effectively encodes unstructured countermeasure scenarios into machine-readable representations and models the complex relationships between scenarios and CMF values. This new data-driven framework provides a cost-effective and adaptable solution that complements the case-specific approaches for CMF estimation, which is particularly beneficial when availability of crash data or time imposes constraints. Experimental validation using real-world CMF Clearinghouse data demonstrates the effectiveness of this new approach, which shows significant accuracy improvements compared to baseline methods. This approach provides insights into new possibilities of harnessing accumulated transportation knowledge in various applications. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 39 pages, 9 figures

arXiv:2311.00564 [pdf, other]

Online Student-$t$ Processes with an Overall-local Scale Structure for Modelling Non-stationary Data

Authors: Taole Sha, Michael Minyi Zhang

Abstract: Time-dependent data often exhibit characteristics, such as non-stationarity and heavy-tailed errors, that would be inappropriate to model with the typical assumptions used in popular models. Thus, more flexible approaches are required to be able to accommodate such issues. To this end, we propose a Bayesian mixture of student-$t$ processes with an overall-local scale structure for the covariance.… ▽ More Time-dependent data often exhibit characteristics, such as non-stationarity and heavy-tailed errors, that would be inappropriate to model with the typical assumptions used in popular models. Thus, more flexible approaches are required to be able to accommodate such issues. To this end, we propose a Bayesian mixture of student-$t$ processes with an overall-local scale structure for the covariance. Moreover, we use a sequential Monte Carlo (SMC) sampler in order to perform online inference as data arrive in real-time. We demonstrate the superiority of our proposed approach compared to typical Gaussian process-based models on real-world data sets in order to prove the necessity of using mixtures of student-$t$ processes. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 9 pages,5 figures

MSC Class: 62F15

arXiv:2310.13548 [pdf, other]

Towards Understanding Sycophancy in Language Models

Authors: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

Abstract: Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that… ▽ More Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior, we analyze existing human preference data. We find that when a response matches a user's views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses. △ Less

Submitted 27 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: 32 pages, 20 figures

ACM Class: I.2.6

arXiv:2310.03243 [pdf, other]

Sparse Deep Learning for Time Series Data: Theory and Applications

Authors: Mingxuan Zhang, Yan Sun, Faming Liang

Abstract: Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the ob… ▽ More Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.06782 [pdf, other]

Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors

Authors: Joosep Pata, Eric Wulff, Farouk Mokhtar, David Southwick, Mengke Zhang, Maria Girone, Javier Duarte

Abstract: Efficient and accurate algorithms are necessary to reconstruct particles in the highly granular detectors anticipated at the High-Luminosity Large Hadron Collider and the Future Circular Collider. We study scalable machine learning models for event reconstruction in electron-positron collisions based on a full detector simulation. Particle-flow reconstruction can be formulated as a supervised lear… ▽ More Efficient and accurate algorithms are necessary to reconstruct particles in the highly granular detectors anticipated at the High-Luminosity Large Hadron Collider and the Future Circular Collider. We study scalable machine learning models for event reconstruction in electron-positron collisions based on a full detector simulation. Particle-flow reconstruction can be formulated as a supervised learning task using tracks and calorimeter clusters. We compare a graph neural network and kernel-based transformer and demonstrate that we can avoid quadratic operations while achieving realistic reconstruction. We show that hyperparameter tuning significantly improves the performance of the models. The best graph neural network model shows improvement in the jet transverse momentum resolution by up to 50% compared to the rule-based algorithm. The resulting model is portable across Nvidia, AMD and Habana hardware. Accurate and fast machine-learning based reconstruction can significantly improve future measurements at colliders. △ Less

Submitted 8 March, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: 21 pages, 10 figures

arXiv:2309.05925 [pdf, other]

On Regularized Sparse Logistic Regression

Authors: Mengyuan Zhang, Kai Liu

Abstract: Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be na… ▽ More Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost. △ Less

Submitted 11 October, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted to ICDM2023

arXiv:2309.02425 [pdf, ps, other]

On the Minimax Regret in Online Ranking with Top-k Feedback

Authors: Mingyuan Zhang, Ambuj Tewari

Abstract: In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to ana… ▽ More In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top $k$ feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top $k$ feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top $k$ feedback model for all $k$ and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n. △ Less

Submitted 12 April, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.14048 [pdf, other]

A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy

Authors: Forough Fazeli-Asl, Michael Minyi Zhang

Abstract: Generative models have emerged as a promising technique for producing high-quality images that are indistinguishable from real images. Generative adversarial networks (GANs) and variational autoencoders (VAEs) are two of the most prominent and widely studied generative models. GANs have demonstrated excellent performance in generating sharp realistic images and VAEs have shown strong abilities to… ▽ More Generative models have emerged as a promising technique for producing high-quality images that are indistinguishable from real images. Generative adversarial networks (GANs) and variational autoencoders (VAEs) are two of the most prominent and widely studied generative models. GANs have demonstrated excellent performance in generating sharp realistic images and VAEs have shown strong abilities to generate diverse images. However, GANs suffer from ignoring a large portion of the possible output space which does not represent the full diversity of the target distribution, and VAEs tend to produce blurry images. To fully capitalize on the strengths of both models while mitigating their weaknesses, we employ a Bayesian non-parametric (BNP) approach to merge GANs and VAEs. Our procedure incorporates both Wasserstein and maximum mean discrepancy (MMD) measures in the loss function to enable effective learning of the latent space and generate diverse and high-quality samples. By fusing the discriminative power of GANs with the reconstruction capabilities of VAEs, our novel model achieves superior performance in various generative tasks, such as anomaly detection and data augmentation. Furthermore, we enhance the model's capability by employing an extra generator in the code space, which enables us to explore areas of the code space that the VAE might have overlooked. With a BNP perspective, we can model the data distribution using an infinite-dimensional space, which provides greater flexibility in the model and reduces the risk of overfitting. By utilizing this framework, we can enhance the performance of both GANs and VAEs to create a more robust generative model suitable for various applications. △ Less

Submitted 27 August, 2023; originally announced August 2023.

arXiv:2306.08352 [pdf, other]

Bayesian Non-linear Latent Variable Modeling via Random Fourier Features

Authors: Michael Minyi Zhang, Gregory W. Gundersen, Barbara E. Engelhardt

Abstract: The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to over… ▽ More The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to overfitting, or variational approximations, which mischaracterize the posterior uncertainty. Here, we present a method to perform Markov chain Monte Carlo (MCMC) inference for generalized Bayesian nonlinear latent variable modeling. The crucial insight necessary to generalize GPLVMs to arbitrary observation models is that we approximate the kernel function in the Gaussian process map**s with random Fourier features; this allows us to compute the gradient of the posterior in closed form with respect to the latent variables. We show that we can generalize GPLVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions, using our random feature latent variable model (RFLVM). Our generalized RFLVMs perform on par with state-of-the-art latent variable models on a wide range of applications, including motion capture, images, and text data for the purpose of estimating the latent structure and imputing the missing data of these complex data sets. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2306.03266 [pdf, other]

Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman

Authors: Jiarui Feng, Lecheng Kong, Hao Liu, Dacheng Tao, Fuhai Li, Muhan Zhang, Yixin Chen

Abstract: Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this li… ▽ More Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this line of research. In particular, (1) $k$-WL/FWL requires at least $O(n^k)$ space complexity, which is impractical for large graphs even when $k=3$; (2) The design space of $k$-WL/FWL is rigid, with the only adjustable hyper-parameter being $k$. To tackle the first limitation, we propose an extension, $(k,t)$-FWL. We theoretically prove that even if we fix the space complexity to $O(n^k)$ (for any $k\geq 2$) in $(k,t)$-FWL, we can construct an expressiveness hierarchy up to solving the graph isomorphism problem. To tackle the second problem, we propose $k$-FWL+, which considers any equivariant set as neighbors instead of all nodes, thereby greatly expanding the design space of $k$-FWL. Combining these two modifications results in a flexible and powerful framework $(k,t)$-FWL+. We demonstrate $(k,t)$-FWL+ can implement most existing models with matching expressiveness. We then introduce an instance of $(k,t)$-FWL+ called Neighborhood$^2$-FWL (N$^2$-FWL), which is practically and theoretically sound. We prove that N$^2$-FWL is no less powerful than 3-WL, and can encode many substructures while only requiring $O(n^2)$ space. Finally, we design its neural version named N$^2$-GNN and evaluate its performance on various tasks. N$^2$-GNN achieves record-breaking results on ZINC-Subset (0.059), outperforming previous SOTA results by 10.6%. Moreover, N$^2$-GNN achieves new SOTA results on the BREC dataset (71.8%) among all existing high-expressive GNN methods. △ Less

Submitted 14 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2306.00353 [pdf, other]

Constructing Semantics-Aware Adversarial Examples with Probabilistic Perspective

Authors: Andi Zhang, Mingtian Zhang, Damon Wischik

Abstract: We propose a probabilistic perspective on adversarial examples. This perspective allows us to view geometric restrictions on adversarial examples as distributions, enabling a seamless shift towards data-driven, semantic constraints. Building on this foundation, we present a method for creating semantics-aware adversarial examples in a principle way. Leveraging the advanced generalization capabilit… ▽ More We propose a probabilistic perspective on adversarial examples. This perspective allows us to view geometric restrictions on adversarial examples as distributions, enabling a seamless shift towards data-driven, semantic constraints. Building on this foundation, we present a method for creating semantics-aware adversarial examples in a principle way. Leveraging the advanced generalization capabilities of contemporary probabilistic generative models, our method produces adversarial perturbations that maintain the original image's semantics. Moreover, it offers users the flexibility to inject their own understanding of semantics into the adversarial examples. Our empirical findings indicate that the proposed methods achieve enhanced transferability and higher success rates in circumventing adversarial defense mechanisms, while maintaining a low detection rate by human observers. △ Less

Submitted 11 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 16 pages, 9 figures

arXiv:2305.11650 [pdf, other]

Moment Matching Denoising Gibbs Sampling

Authors: Mingtian Zhang, Alex Hawkins-Hooker, Brooks Paige, David Barber

Abstract: Energy-Based Models (EBMs) offer a versatile framework for modeling complex data distributions. However, training and sampling from EBMs continue to pose significant challenges. The widely-used Denoising Score Matching (DSM) method for scalable EBM training suffers from inconsistency issues, causing the energy model to learn a `noisy' data distribution. In this work, we propose an efficient sampli… ▽ More Energy-Based Models (EBMs) offer a versatile framework for modeling complex data distributions. However, training and sampling from EBMs continue to pose significant challenges. The widely-used Denoising Score Matching (DSM) method for scalable EBM training suffers from inconsistency issues, causing the energy model to learn a `noisy' data distribution. In this work, we propose an efficient sampling framework: (pseudo)-Gibbs sampling with moment matching, which enables effective sampling from the underlying clean model when given a `noisy' model that has been well-trained via DSM. We explore the benefits of our approach compared to related methods and demonstrate how to scale the method to high-dimensional datasets. △ Less

Submitted 19 March, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.07813 [pdf, other]

Fast robust location and scatter estimation: a depth-based method

Authors: Maoyu Zhang, Yan Song, Wenlin Dai

Abstract: The minimum covariance determinant (MCD) estimator is ubiquitous in multivariate analysis, the critical step of which is to select a subset of a given size with the lowest sample covariance determinant. The concentration step (C-step) is a common tool for subset-seeking; however, it becomes computationally demanding for high-dimensional data. To alleviate the challenge, we propose a depth-based al… ▽ More The minimum covariance determinant (MCD) estimator is ubiquitous in multivariate analysis, the critical step of which is to select a subset of a given size with the lowest sample covariance determinant. The concentration step (C-step) is a common tool for subset-seeking; however, it becomes computationally demanding for high-dimensional data. To alleviate the challenge, we propose a depth-based algorithm, termed as \texttt{FDB}, which replaces the optimal subset with the trimmed region induced by statistical depth. We show that the depth-based region is consistent with the MCD-based subset under a specific class of depth notions, for instance, the projection depth. With the two suggested depths, the \texttt{FDB} estimator is not only computationally more efficient but also reaches the same level of robustness as the MCD estimator. Extensive simulation studies are conducted to assess the empirical performance of our estimators. We also validate the computational efficiency and robustness of our estimators under several typical tasks such as principal component analysis, linear discriminant analysis, image denoise and outlier detection on real-life datasets. A R package \textit{FDB} and potential extensions are available in the Supplementary Materials. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2303.12677 [pdf, other]

Learning Brain Connectivity in Social Cognition with Dynamic Network Regression

Authors: Maoyu Zhang, Biao Cai, Wenlin Dai, Dehan Kong, Hongyu Zhao, **gfei Zhang

Abstract: Dynamic networks have been increasingly used to characterize brain connectivity that varies during resting and task states. In such characterizations, a connectivity network is typically measured at each time point for a subject over a common set of nodes representing brain regions, together with rich subject-level information. A common approach to analyzing such data is an edge-based method that… ▽ More Dynamic networks have been increasingly used to characterize brain connectivity that varies during resting and task states. In such characterizations, a connectivity network is typically measured at each time point for a subject over a common set of nodes representing brain regions, together with rich subject-level information. A common approach to analyzing such data is an edge-based method that models the connectivity between each pair of nodes separately. However, such approach may have limited performance when the noise level is high and the number of subjects is limited, as it does not take advantage of the inherent network structure. To better understand if and how the subject-level covariates affect the dynamic brain connectivity, we introduce a semi-parametric dynamic network response regression that relates a dynamic brain connectivity network to a vector of subject-level covariates. A key advantage of our method is to exploit the structure of dynamic imaging coefficients in the form of high-order tensors. We develop an efficient estimation algorithm and evaluate the efficacy of our approach through simulation studies. Finally, we present our results on the analysis of a task-related study on social cognition in the Human Connectome Project, where we identify known sex-specific effects on brain connectivity that cannot be inferred using alternative methods. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.02637 [pdf, other]

A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial Networks

Authors: Forough Fazeli-Asl, Michael Minyi Zhang, Lizhen Lin

Abstract: A classic inferential statistical problem is the goodness-of-fit (GOF) test. Such a test can be challenging when the hypothesized parametric model has an intractable likelihood and its distributional form is not available. Bayesian methods for GOF can be appealing due to their ability to incorporate expert knowledge through prior distributions. However, standard Bayesian methods for this test of… ▽ More A classic inferential statistical problem is the goodness-of-fit (GOF) test. Such a test can be challenging when the hypothesized parametric model has an intractable likelihood and its distributional form is not available. Bayesian methods for GOF can be appealing due to their ability to incorporate expert knowledge through prior distributions. However, standard Bayesian methods for this test often require strong distributional assumptions on the data and their relevant parameters. To address this issue, we propose a semi-Bayesian nonparametric (semi-BNP) procedure in the context of the maximum mean discrepancy (MMD) measure that can be applied to the GOF test. Our method introduces a novel Bayesian estimator for the MMD, enabling the development of a measure-based hypothesis test for intractable models. Through extensive experiments, we demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis. Furthermore, we showcase the versatility of our approach by embedding the proposed estimator within a generative adversarial network (GAN) framework. It facilitates a robust BNP learning approach as another significant application of our method. With our BNP procedure, this new GAN approach can enhance sample diversity and improve inferential accuracy compared to traditional techniques. △ Less

Submitted 10 November, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

Comments: Typos corrected, Secondary (simulation and theoretical) results added, Additional discussion added, references added

arXiv:2302.08049 [pdf, ps, other]

Improved Discretization Analysis for Underdamped Langevin Monte Carlo

Authors: Matthew Zhang, Sinho Chewi, Mufan Bill Li, Krishnakumar Balasubramanian, Murat A. Erdogdu

Abstract: Underdamped Langevin Monte Carlo (ULMC) is an algorithm used to sample from unnormalized densities by leveraging the momentum of a particle moving in a potential well. We provide a novel analysis of ULMC, motivated by two central questions: (1) Can we obtain improved sampling guarantees beyond strong log-concavity? (2) Can we achieve acceleration for sampling? For (1), prior results for ULMC onl… ▽ More Underdamped Langevin Monte Carlo (ULMC) is an algorithm used to sample from unnormalized densities by leveraging the momentum of a particle moving in a potential well. We provide a novel analysis of ULMC, motivated by two central questions: (1) Can we obtain improved sampling guarantees beyond strong log-concavity? (2) Can we achieve acceleration for sampling? For (1), prior results for ULMC only hold under a log-Sobolev inequality together with a restrictive Hessian smoothness condition. Here, we relax these assumptions by removing the Hessian smoothness condition and by considering distributions satisfying a Poincaré inequality. Our analysis achieves the state of art dimension dependence, and is also flexible enough to handle weakly smooth potentials. As a byproduct, we also obtain the first KL divergence guarantees for ULMC without Hessian smoothness under strong log-concavity, which is based on a new result on the log-Sobolev constant along the underdamped Langevin diffusion. For (2), the recent breakthrough of Cao, Lu, and Wang (2020) established the first accelerated result for sampling in continuous time via PDE methods. Our discretization analysis translates their result into an algorithmic guarantee, which indeed enjoys better condition number dependence than prior works on ULMC, although we leave open the question of full acceleration in discrete time. Both (1) and (2) necessitate Rényi discretization bounds, which are more challenging than the typically used Wasserstein coupling arguments. We address this using a flexible discretization analysis based on Girsanov's theorem that easily extends to more general settings. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2212.03905 [pdf, other]

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

Authors: Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Abstract: Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications. In practice, VAEs usually require multiple training rounds to choose the amount of information the latent variable should retain. This trade-off between the reconstruction error (distortion) and the KL divergence (rate) is typically parameterized by a hyperparameter… ▽ More Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications. In practice, VAEs usually require multiple training rounds to choose the amount of information the latent variable should retain. This trade-off between the reconstruction error (distortion) and the KL divergence (rate) is typically parameterized by a hyperparameter $β$. In this paper, we introduce Multi-Rate VAE (MR-VAE), a computationally efficient framework for learning optimal parameters corresponding to various $β$ in a single training run. The key idea is to explicitly formulate a response function that maps $β$ to the optimal parameters using hypernetworks. MR-VAEs construct a compact response hypernetwork where the pre-activations are conditionally gated based on $β$. We justify the proposed architecture by analyzing linear VAEs and showing that it can represent response functions exactly for linear VAEs. With the learned hypernetwork, MR-VAEs can construct the rate-distortion curve without additional training and can be deployed with significantly less hyperparameter tuning. Empirically, our approach is competitive and often exceeds the performance of multiple $β$-VAEs training with minimal computation and memory overheads. △ Less

Submitted 16 August, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: 22 pages, 9 figures

arXiv:2211.00407 [pdf, other]

Missing data interpolation in integrative multi-cohort analysis with disparate covariate information

Authors: Ekaterina Smirnova, Yongqi Zhong, Rasha Alsaadawi, Xu Ning, Amii Kress, Jordan Kuiper, Mingyu Zhang, Kristen Lyall, Sheenas Martenies, Akram Alshawabkeh, Catherine Bulka, Carlos Camargo, Jaeun Choi, Elena Colicino, Anne Dunlop, Michael Elliott, Assiamira Ferrara, Tebeb Gebrestadik, Jiang Gui, Kylie Harrall, Tina Hartert, Barry Lester, Andrew Manigault, Justin Manjourides, Yu Ni , et al. (4 additional authors not shown)

Abstract: Integrative analysis of datasets generated by multiple cohorts is a widely-used approach for increasing sample size, precision of population estimators, and generalizability of analysis results in epidemiological studies. However, often each individual cohort dataset does not have all variables of interest for an integrative analysis collected as a part of an original study. Such cohort-level miss… ▽ More Integrative analysis of datasets generated by multiple cohorts is a widely-used approach for increasing sample size, precision of population estimators, and generalizability of analysis results in epidemiological studies. However, often each individual cohort dataset does not have all variables of interest for an integrative analysis collected as a part of an original study. Such cohort-level missingness poses methodological challenges to the integrative analysis since missing variables have traditionally: (1) been removed from the data for complete case analysis; or (2) been completed by missing data interpolation techniques using data with the same covariate distribution from other studies. In most integrative-analysis studies, neither approach is optimal as it leads to either loosing the majority of study covariates or challenges in specifying the cohorts following the same distributions. We propose a novel approach to identify the studies with same distributions that could be used for completing the cohort-level missing information. Our methodology relies on (1) identifying sub-groups of cohorts with similar covariate distributions using cohort identity random forest prediction models followed by clustering; and then (2) applying a recursive pairwise distribution test for high dimensional data to these sub-groups. Extensive simulation studies show that cohorts with the same distribution are correctly grouped together in almost all simulation settings. Our methods' application to two ECHO-wide Cohort Studies reveals that the cohorts grouped together reflect the similarities in study design. The methods are implemented in R software package relate. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.13596 [pdf, other]

Fast Community Detection in Dynamic and Heterogeneous Networks

Authors: Maoyu Zhang, **gfei Zhang, Wenlin Dai

Abstract: Dynamic heterogeneous networks describe the temporal evolution of interactions among nodes and edges of different types. While there is a rich literature on finding communities in dynamic networks, the application of these methods to dynamic heterogeneous networks can be inappropriate, due to the involvement of different types of nodes and edges and the need to treat them differently. In this pape… ▽ More Dynamic heterogeneous networks describe the temporal evolution of interactions among nodes and edges of different types. While there is a rich literature on finding communities in dynamic networks, the application of these methods to dynamic heterogeneous networks can be inappropriate, due to the involvement of different types of nodes and edges and the need to treat them differently. In this paper, we propose a statistical framework for detecting common communities in dynamic and heterogeneous networks. Under this framework, we develop a fast community detection method called DHNet that can efficiently estimate the community label as well as the number of communities. An attractive feature of DHNet is that it does not require the number of communities to be known a priori, a common assumption in community detection methods. While DHNet does not require any parametric assumptions on the underlying network model, we show that the identified label is consistent under a time-varying heterogeneous stochastic block model with a temporal correlation structure and edge sparsity. We further illustrate the utility of DHNet through simulations and an application to review data from Yelp, where DHNet shows improvements both in terms of accuracy and interpretability over existing solutions. △ Less

Submitted 29 October, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.03612 [pdf, ps, other]

1st ICLR International Workshop on Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data (PAIR^2Struct)

Authors: Hao Wang, Wanyu Lin, Hao He, Di Wang, Chengzhi Mao, Muhan Zhang

Abstract: Recent years have seen advances on principles and guidance relating to accountable and ethical use of artificial intelligence (AI) spring up around the globe. Specifically, Data Privacy, Accountability, Interpretability, Robustness, and Reasoning have been broadly recognized as fundamental principles of using machine learning (ML) technologies on decision-critical and/or privacy-sensitive applicat… ▽ More Recent years have seen advances on principles and guidance relating to accountable and ethical use of artificial intelligence (AI) spring up around the globe. Specifically, Data Privacy, Accountability, Interpretability, Robustness, and Reasoning have been broadly recognized as fundamental principles of using machine learning (ML) technologies on decision-critical and/or privacy-sensitive applications. On the other hand, in tremendous real-world applications, data itself can be well represented as various structured formalisms, such as graph-structured data (e.g., networks), grid-structured data (e.g., images), sequential data (e.g., text), etc. By exploiting the inherently structured knowledge, one can design plausible approaches to identify and use more relevant variables to make reliable decisions, thereby facilitating real-world deployments. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2210.01376 [pdf, ps, other]

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

Authors: Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang

Abstract: We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^Tα_t)^{1/2}+\max_{t\in[T]}α_t)$ with high probability, where $α_t$ is the independence number of the feedback graph at round $t$. Compared to… ▽ More We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^Tα_t)^{1/2}+\max_{t\in[T]}α_t)$ with high probability, where $α_t$ is the independence number of the feedback graph at round $t$. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any $\text{poly}(K)$ dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the $\mathcal{O}(\sqrt{KT})$ term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph. △ Less

Submitted 29 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2210.00847 [pdf, other]

Review of Clustering Methods for Functional Data

Authors: Mimi Zhang, Andrew Parnell

Abstract: Functional data clustering is to identify heterogeneous morphological patterns in the continuous functions underlying the discrete measurements/observations. Application of functional data clustering has appeared in many publications across various fields of sciences, including but not limited to biology, (bio)chemistry, engineering, environmental science, medical science, psychology, social scien… ▽ More Functional data clustering is to identify heterogeneous morphological patterns in the continuous functions underlying the discrete measurements/observations. Application of functional data clustering has appeared in many publications across various fields of sciences, including but not limited to biology, (bio)chemistry, engineering, environmental science, medical science, psychology, social science, etc. The phenomenal growth of the application of functional data clustering indicates the urgent need for a systematic approach to develop efficient clustering methods and scalable algorithmic implementations. On the other hand, there is abundant literature on the cluster analysis of time series, trajectory data, spatio-temporal data, etc., which are all related to functional data. Therefore, an overarching structure of existing functional data clustering methods will enable the cross-pollination of ideas across various research fields. We here conduct a comprehensive review of original clustering methods for functional data. We propose a systematic taxonomy that explores the connections and differences among the existing functional data clustering methods and relates them to the conventional multivariate clustering methods. The structure of the taxonomy is built on three main attributes of a functional data clustering method and therefore is more reliable than existing categorizations. The review aims to bridge the gap between the functional data analysis community and the clustering community and to generate new principles for functional data clustering. △ Less

Submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.08858 [pdf, other]

Rethinking Knowledge Graph Evaluation Under the Open-World Assumption

Authors: Haotong Yang, Zhouchen Lin, Muhan Zhang

Abstract: Most knowledge graphs (KGs) are incomplete, which motivates one important research topic on automatically complementing knowledge graphs. However, evaluation of knowledge graph completion (KGC) models often ignores the incompleteness -- facts in the test set are ranked against all unknown triplets which may contain a large number of missing facts not included in the KG yet. Treating all unknown tr… ▽ More Most knowledge graphs (KGs) are incomplete, which motivates one important research topic on automatically complementing knowledge graphs. However, evaluation of knowledge graph completion (KGC) models often ignores the incompleteness -- facts in the test set are ranked against all unknown triplets which may contain a large number of missing facts not included in the KG yet. Treating all unknown triplets as false is called the closed-world assumption. This closed-world assumption might negatively affect the fairness and consistency of the evaluation metrics. In this paper, we study KGC evaluation under a more realistic setting, namely the open-world assumption, where unknown triplets are considered to include many missing facts not included in the training or test sets. For the currently most used metrics such as mean reciprocal rank (MRR) and Hits@K, we point out that their behavior may be unexpected under the open-world assumption. Specifically, with not many missing facts, their numbers show a logarithmic trend with respect to the true strength of the model, and thus, the metric increase could be insignificant in terms of reflecting the true model improvement. Further, considering the variance, we show that the degradation in the reported numbers may result in incorrect comparisons between different models, where stronger models may have lower metric numbers. We validate the phenomenon both theoretically and experimentally. Finally, we suggest possible causes and solutions for this problem. Our code and data are available at https://github.com/GraphPKU/Open-World-KG . △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2209.07396 [pdf, other]

Towards Healing the Blindness of Score Matching

Authors: Mingtian Zhang, Oscar Key, Peter Hayes, David Barber, Brooks Paige, François-Xavier Briol

Abstract: Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of de… ▽ More Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of density estimation and report improved performance compared to traditional approaches. △ Less

Submitted 15 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.03617 [pdf, other]

Model-free Subsampling Method Based on Uniform Designs

Authors: Mei Zhang, Yongdao Zhou, Zheng Zhou, Aijun Zhang

Abstract: Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to… ▽ More Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to the original data, we propose a criterion, generalized empirical F-discrepancy (GEFD), and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs. These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs. By simulation examples and a real case study, we show that the proposed subsampling method is superior to the random sampling method. Moreover, our method keeps robust under diverse model specifications while other popular subsampling methods are under-performing. In practice, such a model-free property is more appealing than the model-based subsampling methods, where the latter may have poor performance when the model is misspecified, as demonstrated in our simulation studies. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.00114 [pdf, other]

Outcome Adaptive Propensity Score Methods for Handling Censoring and High-Dimensionality: Application to Insurance Claims

Authors: Youfei Yu, Jiacong Du, Min Zhang, Zhenke Wu, Andrew M. Ryan, Bhramar Mukherjee

Abstract: Propensity scores are commonly used to reduce the confounding bias in non-randomized observational studies for estimating the average treatment effect. An important assumption underlying this approach is that all confounders that are associated with both the treatment and the outcome of interest are measured and included in the propensity score model. In the absence of strong prior knowledge about… ▽ More Propensity scores are commonly used to reduce the confounding bias in non-randomized observational studies for estimating the average treatment effect. An important assumption underlying this approach is that all confounders that are associated with both the treatment and the outcome of interest are measured and included in the propensity score model. In the absence of strong prior knowledge about potential confounders, researchers may agnostically want to adjust for a high-dimensional set of pre-treatment variables. As such, variable selection procedure is needed for propensity score estimation. In addition, recent studies show that including variables related to treatment only in the propensity score model may inflate the variance of the treatment effect estimates, while including variables that are predictive of only the outcome can improve efficiency. In this paper, we propose a flexible approach to incorporating outcome-covariate relationship in the propensity score model by including the predicted binary outcome probability (OP) as a covariate. Our approach can be easily adapted to an ensemble of variable selection methods, including regularization methods and modern machine learning tools based on classification and regression trees. We evaluate our method to estimate the treatment effects on a binary outcome, which is possibly censored, among multiple treatment groups. Simulation studies indicate that incorporating OP for estimating the propensity scores can improve statistical efficiency and protect against model misspecification. The proposed methods are applied to a cohort of advanced stage prostate cancer patients identified from a private insurance claims database for comparing the adverse effects of four commonly used drugs for treating castration-resistant prostate cancer. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: 22 pages, 6 figures, 2 tables

arXiv:2207.08556 [pdf, other]

A Certifiable Security Patch for Object Tracking in Self-Driving Systems via Historical Deviation Modeling

Authors: Xudong Pan, Qifan Xiao, Mi Zhang, Min Yang

Abstract: Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with… ▽ More Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with solely attacking the object detectors, this new attack strategy influences the driving decision more effectively with less attack budgets. However, little is known on whether the revealed vulnerability remains effective in end-to-end self-driving systems and, if so, how to mitigate the threat. In this paper, we present the first systematic research on the security of object tracking in SDC. Through a comprehensive case study on the full perception pipeline of a popular open-sourced self-driving system, Baidu's Apollo, we prove the mainstream multi-object tracker (MOT) based on Kalman Filter (KF) is unsafe even with an enabled multi-sensor fusion mechanism. Our root cause analysis reveals, the vulnerability is innate to the design of KF-based MOT, which shall error-handle the prediction results from the object detectors yet the adopted KF algorithm is prone to trust the observation more when its deviation from the prediction is larger. To address this design flaw, we propose a simple yet effective security patch for KF-based MOT, the core of which is an adaptive strategy to balance the focus of KF on observations and predictions according to the anomaly index of the observation-prediction deviation, and has certified effectiveness against a generalized hijacking attack model. Extensive evaluation on $4$ KF-based existing MOT implementations (including 2D and 3D, academic and Apollo ones) validate the defense effectiveness and the trivial performance overhead of our approach. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.14371 [pdf, other]

Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model

Authors: Xudong Pan, Yifan Yan, Shengyao Zhang, Mi Zhang, Min Yang

Abstract: In this paper, we present a novel insider attack called Matryoshka, which employs an irrelevant scheduled-to-publish DNN model as a carrier model for covert transmission of multiple secret models which memorize the functionality of private ML data stored in local data centers. Instead of treating the parameters of the carrier model as bit strings and applying conventional steganography, we devise… ▽ More In this paper, we present a novel insider attack called Matryoshka, which employs an irrelevant scheduled-to-publish DNN model as a carrier model for covert transmission of multiple secret models which memorize the functionality of private ML data stored in local data centers. Instead of treating the parameters of the carrier model as bit strings and applying conventional steganography, we devise a novel parameter sharing approach which exploits the learning capacity of the carrier model for information hiding. Matryoshka simultaneously achieves: (i) High Capacity -- With almost no utility loss of the carrier model, Matryoshka can hide a 26x larger secret model or 8 secret models of diverse architectures spanning different application domains in the carrier model, neither of which can be done with existing steganography techniques; (ii) Decoding Efficiency -- once downloading the published carrier model, an outside colluder can exclusively decode the hidden models from the carrier model with only several integer secrets and the knowledge of the hidden model architecture; (iii) Effectiveness -- Moreover, almost all the recovered models have similar performance as if it were trained independently on the private data; (iv) Robustness -- Information redundancy is naturally implemented to achieve resilience against common post-processing techniques on the carrier before its publishing; (v) Covertness -- A model inspector with different levels of prior knowledge could hardly differentiate a carrier model from a normal model. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: A preprint work

arXiv:2206.04349 [pdf, other]

doi 10.1016/j.neucom.2020.10.117

Deep radiomic signature with immune cell markers predicts the survival of glioma patients

Authors: Ahmad Chaddad, Paul Daniel Mingli Zhang, Saima Rathore, Paul Sargos, Christian Desrosiers, Tamim Niazi

Abstract: Imaging biomarkers offer a non-invasive way to predict the response of immunotherapy prior to treatment. In this work, we propose a novel type of deep radiomic features (DRFs) computed from a convolutional neural network (CNN), which capture tumor characteristics related to immune cell markers and overall survival. Our study uses four MRI sequences (T1-weighted, T1-weighted post-contrast, T2-weigh… ▽ More Imaging biomarkers offer a non-invasive way to predict the response of immunotherapy prior to treatment. In this work, we propose a novel type of deep radiomic features (DRFs) computed from a convolutional neural network (CNN), which capture tumor characteristics related to immune cell markers and overall survival. Our study uses four MRI sequences (T1-weighted, T1-weighted post-contrast, T2-weighted and FLAIR) with corresponding immune cell markers of 151 patients with brain tumor. The proposed method extracts a total of 180 DRFs by aggregating the activation maps of a pre-trained 3D-CNN within labeled tumor regions of MRI scans. These features offer a compact, yet powerful representation of regional texture encoding tissue heterogeneity. A comprehensive set of experiments is performed to assess the relationship between the proposed DRFs and immune cell markers, and measure their association with overall survival. Results show a high correlation between DRFs and various markers, as well as significant differences between patients grouped based on these markers. Moreover, combining DRFs, clinical features and immune cell markers as input to a random forest classifier helps discriminate between short and long survival outcomes, with AUC of 72\% and p=2.36$\times$10$^{-5}$. These results demonstrate the usefulness of proposed DRFs as non-invasive biomarker for predicting treatment response in patients with brain tumors. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Journal ref: Neurocomputing, Volume 469, 16 January 2022, Pages 366-375

arXiv:2206.03955 [pdf, other]

Out-of-Distribution Detection with Class Ratio Estimation

Authors: Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yitong Sun, Steven McDonagh

Abstract: Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and… ▽ More Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and employs differing base distributions. Under our framework, the density ratio can be viewed as the unnormalized density of an implicit semantic distribution. Further, we propose to directly estimate the density ratio of a data sample through class ratio estimation. We report competitive results on OOD image problems in comparison with recent work that alternatively requires training of deep generative models for the task. Our approach enables a simple and yet effective path towards solving the OOD detection problem. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2206.01897 [pdf, other]

doi 10.1109/ISBI48211.2021.9434053

Modeling of Textures to Predict Immune Cell Status and Survival of Brain Tumour Patients

Authors: Ahmad Chaddad, Mingli Zhang, Lama Hassan, Tamim Niazi

Abstract: Radiomics has shown a capability for different types of cancers such as glioma to predict the clinical outcome. It can have a non-invasive means of evaluating the immunotherapy response prior to treatment. However, the use of deep convolutional neural networks (CNNs)-based radiomics requires large training image sets. To avoid this problem, we investigate a new imaging features that model distribu… ▽ More Radiomics has shown a capability for different types of cancers such as glioma to predict the clinical outcome. It can have a non-invasive means of evaluating the immunotherapy response prior to treatment. However, the use of deep convolutional neural networks (CNNs)-based radiomics requires large training image sets. To avoid this problem, we investigate a new imaging features that model distribution with a Gaussian mixture model (GMM) of learned 3D CNN features. Using these deep radiomic features (DRFs), we aim to predict the immune marker status (low versus high) and overall survival for glioma patients. We extract the DRFs by aggregating the activation maps of a pre-trained 3D-CNN within labeled tumor regions of MRI scans that corresponded immune markers of 151 patients. Our experiments are performed to assess the relationship between the proposed DRFs, three immune cell markers (Macrophage M1, Neutrophils and T Cells Follicular Helper), and measure their association with overall survival. Using the random forest (RF) model, DRFs was able to predict the immune marker status with area under the ROC curve (AUC) of 78.67, 83.93 and 75.67\% for Macrophage M1, Neutrophils and T Cells Follicular Helper, respectively. Combined the immune markers with DRFs and clinical variables, Kaplan-Meier estimator and Log-rank test achieved the most significant difference between predicted groups of patients (short-term versus long-term survival) with p\,=\,4.31$\times$10$^{-7}$ compared to p\,=\,0.03 for Immune cell markers, p\,=\,0.07 for clinical variables , and p\,=\,1.45$\times$10$^{-5}$ for DRFs. Our findings indicate that the proposed features (DRFs) used in RF models may significantly consider prognosticating patients with brain tumour prior to surgery through regularly acquired imaging data. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2206.01182 [pdf, other]

An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

Authors: **gyi Zhang, Cheng Meng, Jun Yu, Mengrui Zhang, Wenxuan Zhong, ** Ma

Abstract: Subsampling methods aim to select a subsample as a surrogate for the observed sample. Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades. Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions. Existing m… ▽ More Subsampling methods aim to select a subsample as a surrogate for the observed sample. Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades. Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions. Existing model-free subsampling methods are usually built upon clustering techniques or kernel tricks. Most of these methods suffer from either a large computational burden or a theoretical weakness. In particular, the theoretical weakness is that the empirical distribution of the selected subsample may not necessarily converge to the population distribution. Such computational and theoretical limitations hinder the broad applicability of model-free subsampling methods in practice. We propose a novel model-free subsampling method by utilizing optimal transport techniques. Moreover, we develop an efficient subsampling algorithm that is adaptive to the unknown probability density function. Theoretically, we show the selected subsample can be used for efficient density estimation by deriving the convergence rate for the proposed subsample kernel density estimator. We also provide the optimal bandwidth for the proposed estimator. Numerical studies on synthetic and real-world datasets demonstrate the performance of the proposed method is superior. △ Less

Submitted 31 May, 2022; originally announced June 2022.

arXiv:2205.14539 [pdf, other]

Improving VAE-based Representation Learning

Authors: Mingtian Zhang, Tim Z. Xiao, Brooks Paige, David Barber

Abstract: Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than other non-latent variable models. This has led to some speculations that latent variable models may be fundamentally unsuitable for representation learning. In th… ▽ More Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than other non-latent variable models. This has led to some speculations that latent variable models may be fundamentally unsuitable for representation learning. In this work, we study what properties are required for good representations and how different VAE structure choices could affect the learned properties. We show that by using a decoder that prefers to learn local features, the remaining global features can be well captured by the latent, which significantly improves performance of a downstream classification task. We further apply the proposed model to semi-supervised learning tasks and demonstrate improvements in data efficiency. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.11640 [pdf, other]

Generalization Gap in Amortized Inference

Authors: Mingtian Zhang, Peter Hayes, David Barber

Abstract: The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalization of a popular class of probabilistic model - the Variational Auto-Encoder (VAE). We discuss the two generalization gaps that affect VAEs and show that overfitting is usually dominated by amortized i… ▽ More The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalization of a popular class of probabilistic model - the Variational Auto-Encoder (VAE). We discuss the two generalization gaps that affect VAEs and show that overfitting is usually dominated by amortized inference. Based on this observation, we propose a new training objective that improves the generalization of amortized inference. We demonstrate how our method can improve performance in the context of image modeling and lossless compression. △ Less

Submitted 15 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Showing 1–50 of 202 results for author: Zhang, M