Skip to main content

Showing 1–50 of 202 results for author: Zhang, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.00882  [pdf, other

    stat.ME

    Subgroup Identification with Latent Factor Structure

    Authors: Yong He, Dong Liu, Fuxin Wang, Mingjuan Zhang, Wen-Xin Zhou

    Abstract: Subgroup analysis has attracted growing attention due to its ability to identify meaningful subgroups from a heterogeneous population and thereby improving predictive power. However, in many scenarios such as social science and biology, the covariates are possibly highly correlated due to the existence of common factors, which brings great challenges for group identification and is neglected in th… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2405.20678  [pdf, ps, other

    cs.LG cs.GT cs.MA stat.ML

    No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

    Authors: Mengxiao Zhang, Ramiro Deo-Campo Vuong, Haipeng Luo

    Abstract: We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-regret is possible after $T$ rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean).… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  3. arXiv:2405.20677  [pdf, other

    cs.LG stat.ML

    Provably Efficient Interactive-Grounded Learning with Personalized Reward

    Authors: Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

    Abstract: Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with contex… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  4. arXiv:2405.15038  [pdf, other

    stat.ME

    Preferential Latent Space Models for Networks with Textual Edges

    Authors: Maoyu Zhang, Biao Cai, Dong Li, Xiaoyue Niu, **gfei Zhang

    Abstract: Many real-world networks contain rich textual information in the edges, such as email networks where an edge between two nodes is an email exchange. Other examples include co-author networks and social media networks. The useful textual information carried in the edges is often discarded in most network analyses, resulting in an incomplete view of the relationships between nodes. In this work, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 30 pages

    MSC Class: G.3; F.2 ACM Class: G.3

  5. arXiv:2405.01425  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

    Authors: Yunbum Kook, Santosh S. Vempala, Matthew S. Zhang

    Abstract: We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in Rényi divergence (which implies TV, $\mathcal{W}_2$, KL, $χ^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 32 pages

  6. arXiv:2404.06055  [pdf, ps, other

    stat.AP

    Online/Offline Learning to Enable Robust Beamforming: Limited Feedback Meets Deep Generative Models

    Authors: Ying Li, Zhidi Lin, Kai Li, Michael Minyi Zhang

    Abstract: Robust beamforming is a pivotal technique in massive multiple-input multiple-output (MIMO) systems as it mitigates interference among user equipment (UE). One current risk-neutral approach to robust beamforming is the stochastic weighted minimum mean square error method (WMMSE). However, this method necessitates statistical channel information, which is typically inaccessible, particularly in fift… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  7. arXiv:2404.01697  [pdf, other

    stat.ML cs.LG

    Preventing Model Collapse in Gaussian Process Latent Variable Models

    Authors: Ying Li, Zhidi Lin, Feng Yin, Michael Minyi Zhang

    Abstract: Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the unde… ▽ More

    Submitted 18 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: International Conference on Machine Learning (ICML), 2024

  8. arXiv:2402.14840  [pdf, other

    cs.CL cs.AI stat.AP

    RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

    Authors: Congyun **, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, **jie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang

    Abstract: Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-Me… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 15 pages, 13 figures

  9. arXiv:2402.07355  [pdf, ps, other

    math.ST cs.LG stat.ML

    Sampling from the Mean-Field Stationary Distribution

    Authors: Yunbum Kook, Matthew S. Zhang, Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li

    Abstract: We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to decouple the two key aspects of this problem: (1) approximation of the mean-field SDE via a finite-particle system, via uniform-in-time propagation of ch… ▽ More

    Submitted 18 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  10. arXiv:2402.03933  [pdf

    cs.SE stat.AP

    Development of a Evaluation Tool for Age-Appropriate Software in Aging Environments: A Delphi Study

    Authors: Zhenggang Bai, Yougxiang Fang, Hongtu Chen, Xinru Chen, Ning An, Min Zhang, Guoxin Rui, **g **

    Abstract: Objective: We aimed to develop a dependable reliable tool for assessing software ageappropriateness. Methods: We conducted a systematic review to get the indicators of technology ageappropriateness from studies from January 2000 to April 2023.This study engaged 25 experts from the fields of anthropology, sociology,and social technology research across, three rounds of Delphi consultations were con… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  11. arXiv:2402.03008  [pdf, other

    stat.ML cs.LG stat.CO

    Diffusive Gibbs Sampling

    Authors: Wenlin Chen, Mingtian Zhang, Brooks Paige, José Miguel Hernández-Lobato, David Barber

    Abstract: The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and… ▽ More

    Submitted 29 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted for publication at ICML 2024. Code available: https://github.com/Wenlin-Chen/DiGS

  12. arXiv:2402.01055  [pdf, other

    cs.LG stat.ML

    Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

    Authors: Mingyuan Zhang, Shivani Agarwal

    Abstract: There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for exam… ▽ More

    Submitted 23 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  13. arXiv:2401.04900  [pdf, other

    astro-ph.SR astro-ph.IM cs.LG stat.ML

    SPT: Spectral Transformer for Red Giant Stars Age and Mass Estimation

    Authors: Mengmeng Zhang, Fan Wu, Yude Bu, Shanshan Li, Zhen** Yi, Meng Liu, Xiaoming Kong

    Abstract: The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlap** isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel fr… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted by A&A

  14. arXiv:2401.04693  [pdf, other

    stat.ME

    Co-Clustering Multi-View Data Using the Latent Block Model

    Authors: Joshua Tobin, Michaela Black, James Ng, Debbie Rankin, Jonathan Wallace, Catherine Hughes, Leane Hoey, Adrian Moore, **ling Wang, Geraldine Horigan, Paul Carlin, Helene McNulty, Anne M Molloy, Mimi Zhang

    Abstract: The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations.… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  15. arXiv:2312.08670  [pdf, other

    stat.ME cs.AI cs.LG

    Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

    Authors: Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

    Abstract: In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables… ▽ More

    Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages;

  16. arXiv:2311.18489  [pdf, other

    physics.med-ph stat.AP

    Sex-Specific Variances in Anatomy and Blood Flow of the Left Main Coronary Bifurcation: Implications for Coronary Artery Disease Risk

    Authors: Ramtin Gharleghi, Mingzi Zhang, Dona Adikari, Lucy McGrath-Cadell, Robert M. Graham, Jolanda Wentzel, Mark Webster, Chris Ellis, Sze-Yuan Ooi, Susann Beier

    Abstract: Studies have shown marked sex disparities in Coronary Artery Diseases (CAD) epidemiology, yet the underlying mechanisms remain unclear. We explored sex disparities in the coronary anatomy and the resulting haemodynamics in patients with suspected, but no significant CAD. Left Main (LM) bifurcations were reconstructed from CTCA images of 127 cases (42 males and 85 females, aged 38 to 81). Detailed… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 14 pages, 5 figures

  17. arXiv:2311.08690  [pdf, other

    cs.LG cs.CY stat.ME

    Enabling CMF Estimation in Data-Constrained Scenarios: A Semantic-Encoding Knowledge Mining Model

    Authors: Yanlin Qi, Jia Li, Michael Zhang

    Abstract: Precise estimation of Crash Modification Factors (CMFs) is central to evaluating the effectiveness of various road safety treatments and prioritizing infrastructure investment accordingly. While customized study for each countermeasure scenario is desired, the conventional CMF estimation approaches rely heavily on the availability of crash data at given sites. This not only makes the estimation co… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 39 pages, 9 figures

  18. arXiv:2311.00564  [pdf, other

    stat.ML cs.LG

    Online Student-$t$ Processes with an Overall-local Scale Structure for Modelling Non-stationary Data

    Authors: Taole Sha, Michael Minyi Zhang

    Abstract: Time-dependent data often exhibit characteristics, such as non-stationarity and heavy-tailed errors, that would be inappropriate to model with the typical assumptions used in popular models. Thus, more flexible approaches are required to be able to accommodate such issues. To this end, we propose a Bayesian mixture of student-$t$ processes with an overall-local scale structure for the covariance.… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 9 pages,5 figures

    MSC Class: 62F15

  19. arXiv:2310.13548  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Towards Understanding Sycophancy in Language Models

    Authors: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

    Abstract: Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 32 pages, 20 figures

    ACM Class: I.2.6

  20. arXiv:2310.03243  [pdf, other

    stat.ML cs.AI cs.LG

    Sparse Deep Learning for Time Series Data: Theory and Applications

    Authors: Mingxuan Zhang, Yan Sun, Faming Liang

    Abstract: Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the ob… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  21. arXiv:2309.06782  [pdf, other

    physics.data-an cs.LG hep-ex physics.ins-det stat.ML

    Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors

    Authors: Joosep Pata, Eric Wulff, Farouk Mokhtar, David Southwick, Mengke Zhang, Maria Girone, Javier Duarte

    Abstract: Efficient and accurate algorithms are necessary to reconstruct particles in the highly granular detectors anticipated at the High-Luminosity Large Hadron Collider and the Future Circular Collider. We study scalable machine learning models for event reconstruction in electron-positron collisions based on a full detector simulation. Particle-flow reconstruction can be formulated as a supervised lear… ▽ More

    Submitted 8 March, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: 21 pages, 10 figures

  22. arXiv:2309.05925  [pdf, other

    cs.LG cs.AI stat.ML

    On Regularized Sparse Logistic Regression

    Authors: Mengyuan Zhang, Kai Liu

    Abstract: Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be na… ▽ More

    Submitted 11 October, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted to ICDM2023

  23. arXiv:2309.02425  [pdf, ps, other

    cs.LG stat.ML

    On the Minimax Regret in Online Ranking with Top-k Feedback

    Authors: Mingyuan Zhang, Ambuj Tewari

    Abstract: In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to ana… ▽ More

    Submitted 12 April, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

  24. arXiv:2308.14048  [pdf, other

    stat.ML cs.LG stat.AP stat.CO stat.ME

    A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy

    Authors: Forough Fazeli-Asl, Michael Minyi Zhang

    Abstract: Generative models have emerged as a promising technique for producing high-quality images that are indistinguishable from real images. Generative adversarial networks (GANs) and variational autoencoders (VAEs) are two of the most prominent and widely studied generative models. GANs have demonstrated excellent performance in generating sharp realistic images and VAEs have shown strong abilities to… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  25. arXiv:2306.08352  [pdf, other

    stat.ML cs.AI cs.LG

    Bayesian Non-linear Latent Variable Modeling via Random Fourier Features

    Authors: Michael Minyi Zhang, Gregory W. Gundersen, Barbara E. Engelhardt

    Abstract: The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to over… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  26. arXiv:2306.03266  [pdf, other

    cs.LG stat.ML

    Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman

    Authors: Jiarui Feng, Lecheng Kong, Hao Liu, Dacheng Tao, Fuhai Li, Muhan Zhang, Yixin Chen

    Abstract: Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this li… ▽ More

    Submitted 14 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  27. arXiv:2306.00353  [pdf, other

    stat.ML cs.CR cs.LG

    Constructing Semantics-Aware Adversarial Examples with Probabilistic Perspective

    Authors: Andi Zhang, Mingtian Zhang, Damon Wischik

    Abstract: We propose a probabilistic perspective on adversarial examples. This perspective allows us to view geometric restrictions on adversarial examples as distributions, enabling a seamless shift towards data-driven, semantic constraints. Building on this foundation, we present a method for creating semantics-aware adversarial examples in a principle way. Leveraging the advanced generalization capabilit… ▽ More

    Submitted 11 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 16 pages, 9 figures

  28. arXiv:2305.11650  [pdf, other

    stat.ML cs.LG

    Moment Matching Denoising Gibbs Sampling

    Authors: Mingtian Zhang, Alex Hawkins-Hooker, Brooks Paige, David Barber

    Abstract: Energy-Based Models (EBMs) offer a versatile framework for modeling complex data distributions. However, training and sampling from EBMs continue to pose significant challenges. The widely-used Denoising Score Matching (DSM) method for scalable EBM training suffers from inconsistency issues, causing the energy model to learn a `noisy' data distribution. In this work, we propose an efficient sampli… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

  29. arXiv:2305.07813  [pdf, other

    stat.ME stat.CO

    Fast robust location and scatter estimation: a depth-based method

    Authors: Maoyu Zhang, Yan Song, Wenlin Dai

    Abstract: The minimum covariance determinant (MCD) estimator is ubiquitous in multivariate analysis, the critical step of which is to select a subset of a given size with the lowest sample covariance determinant. The concentration step (C-step) is a common tool for subset-seeking; however, it becomes computationally demanding for high-dimensional data. To alleviate the challenge, we propose a depth-based al… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  30. arXiv:2303.12677  [pdf, other

    stat.ME stat.AP

    Learning Brain Connectivity in Social Cognition with Dynamic Network Regression

    Authors: Maoyu Zhang, Biao Cai, Wenlin Dai, Dehan Kong, Hongyu Zhao, **gfei Zhang

    Abstract: Dynamic networks have been increasingly used to characterize brain connectivity that varies during resting and task states. In such characterizations, a connectivity network is typically measured at each time point for a subject over a common set of nodes representing brain regions, together with rich subject-level information. A common approach to analyzing such data is an edge-based method that… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  31. arXiv:2303.02637  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial Networks

    Authors: Forough Fazeli-Asl, Michael Minyi Zhang, Lizhen Lin

    Abstract: A classic inferential statistical problem is the goodness-of-fit (GOF) test. Such a test can be challenging when the hypothesized parametric model has an intractable likelihood and its distributional form is not available. Bayesian methods for GOF can be appealing due to their ability to incorporate expert knowledge through prior distributions. However, standard Bayesian methods for this test of… ▽ More

    Submitted 10 November, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Typos corrected, Secondary (simulation and theoretical) results added, Additional discussion added, references added

  32. arXiv:2302.08049  [pdf, ps, other

    math.ST stat.ML

    Improved Discretization Analysis for Underdamped Langevin Monte Carlo

    Authors: Matthew Zhang, Sinho Chewi, Mufan Bill Li, Krishnakumar Balasubramanian, Murat A. Erdogdu

    Abstract: Underdamped Langevin Monte Carlo (ULMC) is an algorithm used to sample from unnormalized densities by leveraging the momentum of a particle moving in a potential well. We provide a novel analysis of ULMC, motivated by two central questions: (1) Can we obtain improved sampling guarantees beyond strong log-concavity? (2) Can we achieve acceleration for sampling? For (1), prior results for ULMC onl… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  33. arXiv:2212.03905  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

    Authors: Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

    Abstract: Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications. In practice, VAEs usually require multiple training rounds to choose the amount of information the latent variable should retain. This trade-off between the reconstruction error (distortion) and the KL divergence (rate) is typically parameterized by a hyperparameter… ▽ More

    Submitted 16 August, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 22 pages, 9 figures

  34. arXiv:2211.00407  [pdf, other

    stat.ME

    Missing data interpolation in integrative multi-cohort analysis with disparate covariate information

    Authors: Ekaterina Smirnova, Yongqi Zhong, Rasha Alsaadawi, Xu Ning, Amii Kress, Jordan Kuiper, Mingyu Zhang, Kristen Lyall, Sheenas Martenies, Akram Alshawabkeh, Catherine Bulka, Carlos Camargo, Jaeun Choi, Elena Colicino, Anne Dunlop, Michael Elliott, Assiamira Ferrara, Tebeb Gebrestadik, Jiang Gui, Kylie Harrall, Tina Hartert, Barry Lester, Andrew Manigault, Justin Manjourides, Yu Ni , et al. (4 additional authors not shown)

    Abstract: Integrative analysis of datasets generated by multiple cohorts is a widely-used approach for increasing sample size, precision of population estimators, and generalizability of analysis results in epidemiological studies. However, often each individual cohort dataset does not have all variables of interest for an integrative analysis collected as a part of an original study. Such cohort-level miss… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  35. arXiv:2210.13596  [pdf, other

    stat.CO stat.AP

    Fast Community Detection in Dynamic and Heterogeneous Networks

    Authors: Maoyu Zhang, **gfei Zhang, Wenlin Dai

    Abstract: Dynamic heterogeneous networks describe the temporal evolution of interactions among nodes and edges of different types. While there is a rich literature on finding communities in dynamic networks, the application of these methods to dynamic heterogeneous networks can be inappropriate, due to the involvement of different types of nodes and edges and the need to treat them differently. In this pape… ▽ More

    Submitted 29 October, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

  36. arXiv:2210.03612  [pdf, ps, other

    stat.ML cs.AI cs.CR cs.CV cs.LG

    1st ICLR International Workshop on Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data (PAIR^2Struct)

    Authors: Hao Wang, Wanyu Lin, Hao He, Di Wang, Chengzhi Mao, Muhan Zhang

    Abstract: Recent years have seen advances on principles and guidance relating to accountable and ethical use of artificial intelligence (AI) spring up around the globe. Specifically, Data Privacy, Accountability, Interpretability, Robustness, and Reasoning have been broadly recognized as fundamental principles of using machine learning (ML) technologies on decision-critical and/or privacy-sensitive applicat… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  37. arXiv:2210.01376  [pdf, ps, other

    cs.LG stat.ML

    Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

    Authors: Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang

    Abstract: We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^Tα_t)^{1/2}+\max_{t\in[T]}α_t)$ with high probability, where $α_t$ is the independence number of the feedback graph at round $t$. Compared to… ▽ More

    Submitted 29 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

  38. arXiv:2210.00847  [pdf, other

    stat.ME stat.ML

    Review of Clustering Methods for Functional Data

    Authors: Mimi Zhang, Andrew Parnell

    Abstract: Functional data clustering is to identify heterogeneous morphological patterns in the continuous functions underlying the discrete measurements/observations. Application of functional data clustering has appeared in many publications across various fields of sciences, including but not limited to biology, (bio)chemistry, engineering, environmental science, medical science, psychology, social scien… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  39. arXiv:2209.08858  [pdf, other

    cs.AI cs.LG stat.ML

    Rethinking Knowledge Graph Evaluation Under the Open-World Assumption

    Authors: Haotong Yang, Zhouchen Lin, Muhan Zhang

    Abstract: Most knowledge graphs (KGs) are incomplete, which motivates one important research topic on automatically complementing knowledge graphs. However, evaluation of knowledge graph completion (KGC) models often ignores the incompleteness -- facts in the test set are ranked against all unknown triplets which may contain a large number of missing facts not included in the KG yet. Treating all unknown tr… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted at NeurIPS 2022

  40. arXiv:2209.07396  [pdf, other

    stat.ML cs.LG

    Towards Healing the Blindness of Score Matching

    Authors: Mingtian Zhang, Oscar Key, Peter Hayes, David Barber, Brooks Paige, François-Xavier Briol

    Abstract: Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of de… ▽ More

    Submitted 15 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  41. arXiv:2209.03617  [pdf, other

    stat.ME stat.ML

    Model-free Subsampling Method Based on Uniform Designs

    Authors: Mei Zhang, Yongdao Zhou, Zheng Zhou, Aijun Zhang

    Abstract: Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  42. arXiv:2208.00114  [pdf, other

    stat.ME stat.AP

    Outcome Adaptive Propensity Score Methods for Handling Censoring and High-Dimensionality: Application to Insurance Claims

    Authors: Youfei Yu, Jiacong Du, Min Zhang, Zhenke Wu, Andrew M. Ryan, Bhramar Mukherjee

    Abstract: Propensity scores are commonly used to reduce the confounding bias in non-randomized observational studies for estimating the average treatment effect. An important assumption underlying this approach is that all confounders that are associated with both the treatment and the outcome of interest are measured and included in the propensity score model. In the absence of strong prior knowledge about… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

    Comments: 22 pages, 6 figures, 2 tables

  43. arXiv:2207.08556  [pdf, other

    cs.CR stat.ML

    A Certifiable Security Patch for Object Tracking in Self-Driving Systems via Historical Deviation Modeling

    Authors: Xudong Pan, Qifan Xiao, Mi Zhang, Min Yang

    Abstract: Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  44. arXiv:2206.14371  [pdf, other

    stat.ML cs.AI cs.CR cs.LG

    Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model

    Authors: Xudong Pan, Yifan Yan, Shengyao Zhang, Mi Zhang, Min Yang

    Abstract: In this paper, we present a novel insider attack called Matryoshka, which employs an irrelevant scheduled-to-publish DNN model as a carrier model for covert transmission of multiple secret models which memorize the functionality of private ML data stored in local data centers. Instead of treating the parameters of the carrier model as bit strings and applying conventional steganography, we devise… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: A preprint work

  45. arXiv:2206.04349  [pdf, other

    cs.CV cs.AI q-bio.GN q-bio.QM stat.ME

    Deep radiomic signature with immune cell markers predicts the survival of glioma patients

    Authors: Ahmad Chaddad, Paul Daniel Mingli Zhang, Saima Rathore, Paul Sargos, Christian Desrosiers, Tamim Niazi

    Abstract: Imaging biomarkers offer a non-invasive way to predict the response of immunotherapy prior to treatment. In this work, we propose a novel type of deep radiomic features (DRFs) computed from a convolutional neural network (CNN), which capture tumor characteristics related to immune cell markers and overall survival. Our study uses four MRI sequences (T1-weighted, T1-weighted post-contrast, T2-weigh… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Journal ref: Neurocomputing, Volume 469, 16 January 2022, Pages 366-375

  46. arXiv:2206.03955  [pdf, other

    stat.ML cs.CV cs.LG

    Out-of-Distribution Detection with Class Ratio Estimation

    Authors: Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yitong Sun, Steven McDonagh

    Abstract: Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  47. arXiv:2206.01897  [pdf, other

    eess.IV cs.AI cs.CV q-bio.GN q-bio.QM stat.ME

    Modeling of Textures to Predict Immune Cell Status and Survival of Brain Tumour Patients

    Authors: Ahmad Chaddad, Mingli Zhang, Lama Hassan, Tamim Niazi

    Abstract: Radiomics has shown a capability for different types of cancers such as glioma to predict the clinical outcome. It can have a non-invasive means of evaluating the immunotherapy response prior to treatment. However, the use of deep convolutional neural networks (CNNs)-based radiomics requires large training image sets. To avoid this problem, we investigate a new imaging features that model distribu… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  48. arXiv:2206.01182  [pdf, other

    stat.ML math.ST

    An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

    Authors: **gyi Zhang, Cheng Meng, Jun Yu, Mengrui Zhang, Wenxuan Zhong, ** Ma

    Abstract: Subsampling methods aim to select a subsample as a surrogate for the observed sample. Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades. Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions. Existing m… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  49. arXiv:2205.14539  [pdf, other

    stat.ML cs.LG

    Improving VAE-based Representation Learning

    Authors: Mingtian Zhang, Tim Z. Xiao, Brooks Paige, David Barber

    Abstract: Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than other non-latent variable models. This has led to some speculations that latent variable models may be fundamentally unsuitable for representation learning. In th… ▽ More

    Submitted 28 May, 2022; originally announced May 2022.

  50. arXiv:2205.11640  [pdf, other

    stat.ML cs.LG

    Generalization Gap in Amortized Inference

    Authors: Mingtian Zhang, Peter Hayes, David Barber

    Abstract: The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalization of a popular class of probabilistic model - the Variational Auto-Encoder (VAE). We discuss the two generalization gaps that affect VAEs and show that overfitting is usually dominated by amortized i… ▽ More

    Submitted 15 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.