Skip to main content

Showing 1–50 of 173 results for author: Priebe, C

.
  1. arXiv:2406.11938  [pdf, other

    cs.AI cs.MA

    Tracking the perspectives of interacting language models

    Authors: Hayden Helm, Brandon Duderstadt, Youngser Park, Carey E. Priebe

    Abstract: Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.06573  [pdf, other

    cs.CL cs.LG

    MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

    Authors: Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid Bajwa, Carey E. Priebe, Eric Horvitz

    Abstract: Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad k… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures, 2 algorithms, appendix

    ACM Class: I.2.7

  3. arXiv:2405.12797  [pdf, other

    cs.SI stat.ML

    Refined Graph Encoder Embedding via Self-Training and Latent Community Recovery

    Authors: Cencheng Shen, Jonathan Larson, Ha Trinh, Carey E. Priebe

    Abstract: This paper introduces a refined graph encoder embedding method, enhancing the original graph encoder embedding using linear transformation, self-training, and hidden community recovery within observed communities. We provide the theoretical rationale for the refinement procedure, demonstrating how and why our proposed method can effectively identify useful hidden communities via stochastic block m… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 12 pages main + 4 pages appendix

  4. arXiv:2405.11111  [pdf, other

    stat.ME

    Euclidean mirrors and first-order changepoints in network time series

    Authors: Tianyi Chen, Zachary Lubberts, Avanti Athreya, Youngser Park, Carey E. Priebe

    Abstract: We describe a model for a network time series whose evolution is governed by an underlying stochastic process, known as the latent position process, in which network evolution can be represented in Euclidean space by a curve, called the Euclidean mirror. We define the notion of a first-order changepoint for a time series of networks, and construct a family of latent position process networks with… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  5. arXiv:2405.03225  [pdf, other

    stat.ME

    Consistent response prediction for multilayer networks on unknown manifolds

    Authors: Aranyak Acharyya, Jesús Arroyo Relión, Michael Clayton, Marta Zlatic, Youngser Park, Carey E. Priebe

    Abstract: Our paper deals with a collection of networks on a common set of nodes, where some of the networks are associated with responses. Assuming that the networks correspond to points on a one-dimensional manifold in a higher dimensional ambient space, we propose an algorithm to consistently predict the response at an unlabeled network. Our model involves a specific multiple random network model, namely… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2402.04436  [pdf, other

    stat.ML cs.LG

    Continuous Multidimensional Scaling

    Authors: Michael W. Trosset, Carey E. Priebe

    Abstract: Multidimensional scaling (MDS) is the act of embedding proximity information about a set of $n$ objects in $d$-dimensional Euclidean space. As originally conceived by the psychometric community, MDS was concerned with embedding a fixed set of proximities associated with a fixed set of objects. Modern concerns, e.g., that arise in develo** asymptotic theories for statistical inference on random g… ▽ More

    Submitted 8 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 15 pages. Modified a sentence in the Abstract for greater clarity

    MSC Class: 62H99

  7. arXiv:2402.04403  [pdf, other

    cs.DC cs.LG

    Edge-Parallel Graph Encoder Embedding

    Authors: Ariel Lubonja, Cencheng Shen, Carey Priebe, Randal Burns

    Abstract: New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations. One-Hot Graph Encoder Embedding (GEE) uses a single, linear pass over edges and produces an embedding that converges asymptotically to the spectral embedding. The scaling and performance benefits of this approach have been limited by a serial implementation in an interpreted langu… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 4 pages, 4 figures

  8. arXiv:2309.08913  [pdf, ps, other

    cs.AI cs.CL cs.CY cs.LG

    A Statistical Turing Test for Generative Models

    Authors: Hayden Helm, Carey E. Priebe, Weiwei Yang

    Abstract: The emergence of human-like abilities of AI systems for content generation in domains such as text, audio, and vision has prompted the development of classifiers to determine whether content originated from a human or a machine. Implicit in these efforts is an assumption that the generation properties of a human are different from that of the machine. In this work, we provide a framework in the la… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  9. arXiv:2308.13451  [pdf, other

    stat.ML cs.LG math.CO stat.AP stat.ME

    Gotta match 'em all: Solution diversification in graph matching matched filters

    Authors: Zhirui Li, Ben Johnson, Daniel L. Sussman, Carey E. Priebe, Vince Lyzinski

    Abstract: We present a novel approach for finding multiple noisily embedded template graphs in a very large background graph. Our method builds upon the graph-matching-matched-filter technique proposed in Sussman et al., with the discovery of multiple diverse matchings being achieved by iteratively penalizing a suitable node-pair similarity matrix in the matched filter algorithm. In addition, we propose alg… ▽ More

    Submitted 4 July, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: 27 pages, 12 figures, 3 tables

  10. arXiv:2307.15017  [pdf, other

    cs.CR cs.LG

    Samplable Anonymous Aggregation for Private Federated Data Analysis

    Authors: Kunal Talwar, Shan Wang, Audra McMillan, Vojta **a, Vitaly Feldman, Bailey Basile, Aine Cahill, Yi Sheng Chan, Mike Chatzidakis, Junye Chen, Oliver Chick, Mona Chitnis, Suman Ganta, Yusuf Goren, Filip Granqvist, Kristine Guo, Frederic Jacobs, Omid Javidbakht, Albert Liu, Richard Low, Dan Mascenik, Steve Myers, David Park, Wonhee Park, Gianni Parsa , et al. (11 additional authors not shown)

    Abstract: We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust as… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 24 pages

  11. arXiv:2305.06465  [pdf, other

    stat.ME

    Occam Factor for Random Graphs: Erdös-Rényi, Independent Edge, and Rank-1 Stochastic Blockmodel

    Authors: Tianyu Wang, Zachary M. Pisano, Carey E. Priebe

    Abstract: We investigate the evidence/flexibility (i.e., "Occam") paradigm and demonstrate the theoretical and empirical consistency of Bayesian evidence for the task of determining an appropriate generative model for network data. This model selection framework involves determining a collection of candidate models, equip** each of these models' parameters with prior distributions derived via the encompas… ▽ More

    Submitted 7 May, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

  12. arXiv:2305.05126  [pdf, other

    cs.LG cs.AI stat.ME

    Comparing Foundation Models using Data Kernels

    Authors: Brandon Duderstadt, Hayden S. Helm, Carey E. Priebe

    Abstract: Recent advances in self-supervised learning and neural network scaling have enabled the creation of large models, known as foundation models, which can be easily adapted to a wide range of downstream tasks. The current paradigm for comparing foundation models involves evaluating them with aggregate metrics on various benchmark datasets. This method of model comparison is heavily dependent on the c… ▽ More

    Submitted 7 January, 2024; v1 submitted 8 May, 2023; originally announced May 2023.

  13. Semisupervised regression in latent structure networks on unknown manifolds

    Authors: Aranyak Acharyya, Joshua Agterberg, Michael W. Trosset, Youngser Park, Carey E. Priebe

    Abstract: Random graphs are increasingly becoming objects of interest for modeling networks in a wide range of applications. Latent position random graph models posit that each node is associated with a latent position vector, and that these vectors follow some geometric structure in the latent space. In this paper, we consider random dot product graphs, in which an edge is formed between two nodes with pro… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Journal ref: Applied Network Science 8 (2023) 75

  14. Discovering Communication Pattern Shifts in Large-Scale Labeled Networks using Encoder Embedding and Vertex Dynamics

    Authors: Cencheng Shen, Jonathan Larson, Ha Trinh, Xihan Qin, Youngser Park, Carey E. Priebe

    Abstract: Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends. In particular, the scalability of graph analysis is a critical hurdle impeding progress in large-scale downstream inference. To address this challenge, we introduce a temporal encoder embedding met… ▽ More

    Submitted 29 November, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: 10 pages + 2 pages appendix, 8 figures

    Journal ref: IEEE Transactions on Network Science and Engineering 11(2), 2100-2109, 2024

  15. arXiv:2304.09132  [pdf, other

    stat.ME

    Independence testing for inhomogeneous random graphs

    Authors: Yukun Song, Carey E. Priebe, Minh Tang

    Abstract: Testing for independence between graphs is a problem that arises naturally in social network analysis and neuroscience. In this paper, we address independence testing for inhomogeneous Erdős-Rényi random graphs on the same vertex set. We first formulate a notion of pairwise correlations between the edges of these graphs and derive a necessary condition for their detectability. We next show that th… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 24 pages, 2 figures

  16. Synergistic Graph Fusion via Encoder Embedding

    Authors: Cencheng Shen, Carey E. Priebe, Jonathan Larson, Ha Trinh

    Abstract: In this paper, we introduce a method called graph fusion embedding, designed for multi-graph embedding with shared vertex sets. Under the framework of supervised learning, our method exhibits a remarkable and highly desirable synergistic effect: for sufficiently large vertex size, the accuracy of vertex classification consistently benefits from the incorporation of additional graphs. We establish… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: 19 pages main + 11 pages appendix

    Journal ref: Information Sciences 678, 120912, 2024

  17. arXiv:2303.04871  [pdf, other

    stat.AP

    Discovering a change point and piecewise linear structure in a time series of organoid networks via the iso-mirror

    Authors: Tianyi Chen, Youngser Park, Ali Saad-Eldin, Zachary Lubberts, Avanti Athreya, Benjamin D. Pedigo, Joshua T. Vogelstein, Francesca Puppo, Gabriel A. Silva, Alysson R. Muotri, Weiwei Yang, Christopher M. White, Carey E. Priebe

    Abstract: Recent advancements have been made in the development of cell-based in-vitro neuronal networks, or organoids. In order to better understand the network structure of these organoids, a super-selective algorithm has been proposed for inferring the effective connectivity networks from multi-electrode array data. In this paper, we apply a novel statistical method called spectral mirror estimation to t… ▽ More

    Submitted 12 April, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

  18. arXiv:2302.14186  [pdf, other

    eess.SP cs.LG stat.AP stat.ME stat.ML

    Approximately optimal domain adaptation with Fisher's Linear Discriminant

    Authors: Hayden S. Helm, Ashwin De Silva, Joshua T. Vogelstein, Carey E. Priebe, Weiwei Yang

    Abstract: We propose a class of models based on Fisher's Linear Discriminant (FLD) in the context of domain adaptation. The class is the convex combination of two hypotheses: i) an average hypothesis representing previously seen source tasks and ii) a hypothesis trained on a new target task. For a particular generative setting we derive the optimal convex combination of the two models under 0-1 loss, propos… ▽ More

    Submitted 1 March, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

  19. arXiv:2301.11290  [pdf, other

    cs.SI cs.LG stat.ML

    Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection

    Authors: Cencheng Shen, Youngser Park, Carey E. Priebe

    Abstract: In this paper, we introduce a novel and computationally efficient method for vertex embedding, community detection, and community size determination. Our approach leverages a normalized one-hot graph encoder and a rank-based cluster size measure. Through extensive simulations, we demonstrate the excellent numerical performance of our proposed graph encoder ensemble algorithm.

    Submitted 18 November, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: 8 pages

    Journal ref: in Proceedings of 2023 2nd International Conference on Algorithms, Data Mining, and Information Technology, 13-18, ACM, 2023

  20. arXiv:2210.15083  [pdf, other

    stat.ML cs.LG

    Deep Learning is Provably Robust to Symmetric Label Noise

    Authors: Carey E. Priebe, Ningyuan Huang, Soledad Villar, Cong Mu, Li Chen

    Abstract: Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mi… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  21. arXiv:2210.14378  [pdf, other

    cs.CL cs.LG

    Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

    Authors: Kelly Marchisio, Ali Saad-Eldin, Kevin Duh, Carey Priebe, Philipp Koehn

    Abstract: Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval. We improve bilingual lexicon induction performance across 40 language pairs with a graph-matching method based on optimal transport. The method is especially strong with low amounts of supervision.

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 Camera-Ready

  22. arXiv:2209.12054  [pdf, other

    stat.ML cs.LG

    From Local to Global: Spectral-Inspired Graph Neural Networks

    Authors: Ningyuan Huang, Soledad Villar, Carey E. Priebe, Da Zheng, Chengyue Huang, Lin Yang, Vladimir Braverman

    Abstract: Graph Neural Networks (GNNs) are powerful deep learning methods for Non-Euclidean data. Popular GNNs are message-passing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss long-range signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like over-smoothing or over-squashing. To mitigate… ▽ More

    Submitted 4 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at the NeurIPS 2022 GLFrontiers Workshop

  23. arXiv:2208.13921  [pdf, other

    cs.SI math.ST stat.CO stat.ML

    Dynamic Network Sampling for Community Detection

    Authors: Cong Mu, Youngser Park, Carey E. Priebe

    Abstract: We propose a dynamic network sampling scheme to optimize block recovery for stochastic blockmodel (SBM) in the case where it is prohibitively expensive to observe the entire graph. Theoretically, we provide justification of our proposed Chernoff-optimal dynamic sampling scheme via the Chernoff information. Practically, we evaluate the performance, in terms of block recovery, of our method on sever… ▽ More

    Submitted 16 December, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 18 pages, 8 figures

  24. arXiv:2208.10967  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    The Value of Out-of-Distribution Data

    Authors: Ashwin De Silva, Rahul Ramesh, Carey E. Priebe, Pratik Chaudhari, Joshua T. Vogelstein

    Abstract: We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task imp… ▽ More

    Submitted 13 July, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: Previous versions of this work have been presented at the Out-of-Distribution Generalization in Computer Vision (OOD-CV) Workshop (ECCV 2022) and the Workshop on Distribution Shifts (NeurIPS 2022)

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:7366-7389, 2023

  25. arXiv:2208.03211  [pdf, other

    cs.LG cs.AI cs.NE

    Why do networks have inhibitory/negative connections?

    Authors: Qingyang Wang, Michael A. Powell, Ali Geisa, Eric Bridgeford, Carey E. Priebe, Joshua T. Vogelstein

    Abstract: Why do brains have inhibitory connections? Why do deep networks have negative weights? We propose an answer from the perspective of representation capacity. We believe representing functions is the primary role of both (i) the brain in natural intelligence, and (ii) deep networks in artificial intelligence. Our answer to why there are inhibitory/negative weights is: to learn more functions. We pro… ▽ More

    Submitted 17 August, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

    Comments: ICCV2023 camera-ready

  26. arXiv:2205.14299  [pdf, other

    cs.LG cs.CV

    Deep Learning with Label Noise: A Hierarchical Approach

    Authors: Li Chen, Ningyuan Huang, Cong Mu, Hayden S. Helm, Kate Lytvynets, Weiwei Yang, Carey E. Priebe

    Abstract: Deep neural networks are susceptible to label noise. Existing methods to improve robustness, such as meta-learning and regularization, usually require significant change to the network architecture or careful tuning of the optimization procedure. In this work, we propose a simple hierarchical approach that incorporates a label hierarchy when training the deep learning models. Our approach requires… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 8 pages, 7 figures

  27. arXiv:2205.06877  [pdf, other

    stat.ME stat.AP

    Euclidean mirrors and dynamics in network time series

    Authors: Avanti Athreya, Zachary Lubberts, Youngser Park, Carey E Priebe

    Abstract: Analyzing changes in network evolution is central to statistical network inference, as underscored by recent challenges of predicting and distinguishing pandemic-induced transformations in organizational and communication networks. We consider a joint network model in which each node has an associated time-varying low-dimensional latent vector of feature data, and connection probabilities are func… ▽ More

    Submitted 30 May, 2024; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: 36 pages, 26 pages of supplementary material and proofs, 12 figures

    MSC Class: 62G05; 62H12

  28. arXiv:2203.09275  [pdf, other

    cs.CV

    ART-SS: An Adaptive Rejection Technique for Semi-Supervised restoration for adverse weather-affected images

    Authors: Rajeev Yasarla, Carey E. Priebe, Vishal Patel

    Abstract: In recent years, convolutional neural network-based single image adverse weather removal methods have achieved significant performance improvements on many benchmark datasets. However, these methods require large amounts of clean-weather degraded image pairs for training, which is often difficult to obtain in practice. Although various weather degradation synthesis methods exist in the literature,… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  29. arXiv:2203.00516  [pdf, other

    eess.SP cs.LG stat.ME

    Mental State Classification Using Multi-graph Features

    Authors: Guodong Chen, Hayden S. Helm, Kate Lytvynets, Weiwei Yang, Carey E. Priebe

    Abstract: We consider the problem of extracting features from passive, multi-channel electroencephalogram (EEG) devices for downstream inference tasks related to high-level mental states such as stress and cognitive load. Our proposed method leverages recently developed multi-graph tools and applies them to the time series of graphs implied by the statistical dependence structure (e.g., correlation) amongst… ▽ More

    Submitted 25 February, 2022; originally announced March 2022.

  30. arXiv:2201.07372  [pdf, other

    cs.LG cs.AI

    Prospective Learning: Principled Extrapolation to the Future

    Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

    Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More

    Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

  31. arXiv:2111.05366  [pdf, other

    stat.ML cs.LG math.CO

    Graph Matching via Optimal Transport

    Authors: Ali Saad-Eldin, Benjamin D. Pedigo, Carey E. Priebe, Joshua T. Vogelstein

    Abstract: The graph matching problem seeks to find an alignment between the nodes of two graphs that minimizes the number of adjacency disagreements. Solving the graph matching is increasingly important due to it's applications in operations research, computer vision, neuroscience, and more. However, current state-of-the-art algorithms are inefficient in matching very large graphs, though they produce good… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  32. arXiv:2109.14501  [pdf, other

    stat.ML cs.AI cs.LG

    Towards a theory of out-of-distribution learning

    Authors: Jayanta Dey, Ali Geisa, Ronak Mehta, Tyler M. Tomita, Hayden S. Helm, Haoyin Xu, Eric Eaton, Jeffery Dick, Carey E. Priebe, Joshua T. Vogelstein

    Abstract: Learning is a process wherein a learning agent enhances its performance through exposure of experience or data. Throughout this journey, the agent may encounter diverse learning environments. For example, data may be presented to the leaner all at once, in multiple batches, or sequentially. Furthermore, the distribution of each data sample could be either identical and independent (iid) or non-iid… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

  33. One-Hot Graph Encoder Embedding

    Authors: Cencheng Shen, Qizhe Wang, Carey E. Priebe

    Abstract: In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding. It has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC -- making it an ideal candidate for huge graph processing. It is applicable to either adjacency matrix or graph Laplacian, and can be viewed as a transformation of the spectral e… ▽ More

    Submitted 1 December, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 7 pages main + 7 pages appendix

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 45(6), 7933 - 7938, 2023

  34. arXiv:2109.12640  [pdf, other

    cs.CL

    An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces

    Authors: Kelly Marchisio, Youngser Park, Ali Saad-Eldin, Anton Alyakin, Kevin Duh, Carey Priebe, Philipp Koehn

    Abstract: Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node's graph neighborhood without assuming a linear transform, and exp… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: EMNLP Findings 2021 Camera-Ready

  35. arXiv:2108.13637  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    When are Deep Networks really better than Decision Forests at small sample sizes, and how?

    Authors: Haoyin Xu, Kaleab A. Kinfu, Will LeVine, Sambit Panda, Jayanta Dey, Michael Ainsworth, Yu-Chung Peng, Madi Kusmanov, Florian Engert, Christopher M. White, Joshua T. Vogelstein, Carey E. Priebe

    Abstract: Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies… ▽ More

    Submitted 2 November, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

  36. arXiv:2107.11403  [pdf, other

    stat.ME cs.DM math.ST

    Quantifying Network Similarity using Graph Cumulants

    Authors: Gecia Bravo-Hermsdorff, Lee M. Gunderson, Pierre-André Maugis, Carey E. Priebe

    Abstract: How might one test the hypothesis that networks were sampled from the same distribution? Here, we compare two statistical tests that use subgraph counts to address this question. The first uses the empirical subgraph densities themselves as estimates of those of the underlying distribution. The second test uses a new approach that converts these subgraph densities into estimates of the \textit{gra… ▽ More

    Submitted 18 July, 2023; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: Shared first authorship. Title changed from "A principled (and practical) test for network comparison" to "Quantifying Network Similarity using Graph Cumulants". Updated version accepted for publication in Journal of Machine Learning Research (JMLR), 2023

    Journal ref: Journal of Machine Learning Research (JMLR), 2023

  37. arXiv:2106.12621  [pdf, other

    cs.LG cs.IR stat.ME

    Leveraging semantically similar queries for ranking via combining representations

    Authors: Hayden S. Helm, Marah Abdin, Benjamin D. Pedigo, Shweti Mahajan, Vince Lyzinski, Youngser Park, Amitabh Basu, Piali~Choudhury, Christopher M. White, Weiwei Yang, Carey E. Priebe

    Abstract: In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of l… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

  38. arXiv:2105.13346  [pdf, other

    math.ST math.OC math.SP stat.ME

    Entrywise Estimation of Singular Vectors of Low-Rank Matrices with Heteroskedasticity and Dependence

    Authors: Joshua Agterberg, Zachary Lubberts, Carey Priebe

    Abstract: We propose an estimator for the singular vectors of high-dimensional low-rank matrices corrupted by additive subgaussian noise, where the noise matrix is allowed to have dependence within rows and heteroskedasticity between them. We prove finite-sample $\ell_{2,\infty}$ bounds and a Berry-Esseen theorem for the individual entries of the estimator, and we apply these results to high-dimensional mix… ▽ More

    Submitted 13 September, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

    Journal ref: IEEE Transactions on Information Theory, vol. 68, no. 7, pp. 4618 - 4650, July 2022

  39. arXiv:2105.01566  [pdf, other

    stat.ME

    Occam Factor for Gaussian Models With Unknown Variance Structure

    Authors: Zachary M. Pisano, Daniel Q. Naiman, Carey E. Priebe

    Abstract: We discuss model selection to determine whether the variance-covariance matrix of a multivariate Gaussian model with known mean should be considered to be a constant diagonal, a non-constant diagonal, or an arbitrary positive definite matrix. Of particular interest is the relationship between Bayesian evidence and the flexibility penalty due to Priebe and Rougier. For the case of an exponential fa… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: 46 pages, 1 figure

    MSC Class: Primary 62F07; secondary 62F10

  40. arXiv:2104.00641  [pdf

    stat.ML cs.LG

    Dynamic Silos: Increased Modularity in Intra-organizational Communication Networks during the Covid-19 Pandemic

    Authors: Tiona Zuzul, Emily Cox Pahnke, Jonathan Larson, Patrick Bourke, Nicholas Caurvina, Neha Parikh Shah, Fereshteh Amini, Jeffrey Weston, Youngser Park, Joshua Vogelstein, Christopher White, Carey E. Priebe

    Abstract: Workplace communications around the world were drastically altered by Covid-19, related work-from-home orders, and the rise of remote work. To understand these shifts, we analyzed aggregated, anonymized metadata from over 360 billion emails within 4,361 organizations worldwide. By comparing month-to-month and year-over-year metrics, we examined changes in network community structures over 24 month… ▽ More

    Submitted 28 July, 2023; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: 48 pages, 15 figures

  41. arXiv:2103.14726  [pdf, other

    cs.SI stat.ME

    Random line graphs and edge-attributed network inference

    Authors: Zachary Lubberts, Avanti Athreya, Youngser Park, Carey E. Priebe

    Abstract: We extend the latent position random graph model to the line graph of a random graph, which is formed by creating a vertex for each edge in the original random graph, and connecting each pair of edges incident to a common vertex in the original graph. We prove concentration inequalities for the spectrum of a line graph, as well as limiting distribution results for the largest eigenvalue and the em… ▽ More

    Submitted 23 February, 2024; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: 44 pages total, including supplementary material; 5 figures

    MSC Class: 05C80; 15A52; 62F12

  42. The Phantom Alignment Strength Conjecture: Practical use of graph matching alignment strength to indicate a meaningful graph match

    Authors: Donniell E. Fishkind, Felix Parker, Hamilton Sawczuk, Lingyao Meng, Eric Bridgeford, Avanti Athreya, Carey E. Priebe, Vince Lyzinski

    Abstract: The alignment strength of a graph matching is a quantity that gives the practitioner a measure of the correlation of the two graphs, and it can also give the practitioner a sense for whether the graph matching algorithm found the true matching. Unfortunately, when a graph matching algorithm fails to find the truth because of weak signal, there may be "phantom alignment strength" from meaningless m… ▽ More

    Submitted 23 August, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

  43. arXiv:2102.10263  [pdf, other

    stat.ML cs.LG stat.ME

    Inducing a hierarchy for multi-class classification problems

    Authors: Hayden S. Helm, Weiwei Yang, Sujeeth Bharadwaj, Kate Lytvynets, Oriana Riva, Christopher White, Ali Geisa, Carey E. Priebe

    Abstract: In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Un-fortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical flat classifiers must be employed. In this paper, we investigate a class of methods that induce a hierarchy that c… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

  44. arXiv:2101.12430  [pdf, other

    cs.LG cs.IR cs.SI stat.ML

    Subgraph nomination: Query by Example Subgraph Retrieval in Networks

    Authors: Al-Fahad M. Al-Qadhi, Carey E. Priebe, Hayden S. Helm, Vince Lyzinski

    Abstract: This paper introduces the subgraph nomination inference task, in which example subgraphs of interest are used to query a network for similarly interesting subgraphs. This type of problem appears time and again in real world problems connected to, for example, user recommendation systems and structural retrieval tasks in social and biological/connectomic networks. We formally define the subgraph no… ▽ More

    Submitted 19 December, 2022; v1 submitted 29 January, 2021; originally announced January 2021.

    Comments: 37 pages, 11 figures

  45. arXiv:2012.09828  [pdf, other

    math.ST

    Nonparametric Two-Sample Hypothesis Testing for Random Graphs with Negative and Repeated Eigenvalues

    Authors: Joshua Agterberg, Minh Tang, Carey Priebe

    Abstract: We propose a nonparametric two-sample test statistic for low-rank, conditionally independent edge random graphs whose edge probability matrices have negative eigenvalues and arbitrarily close eigenvalues. Our proposed test statistic involves using the maximum mean discrepancy applied to suitably rotated rows of a graph embedding, where the rotation is estimated using optimal transport. We show tha… ▽ More

    Submitted 18 December, 2020; v1 submitted 17 December, 2020; originally announced December 2020.

  46. arXiv:2011.14990  [pdf, other

    q-bio.NC stat.ME

    Discovery of Multi-Level Network Differences Across Populations of Heterogeneous Connectomes

    Authors: Vivek Gopalakrishnan, Jaewon Chung, Eric Bridgeford, Benjamin D. Pedigo, Jesús Arroyo, Lucy Upchurch, G. Allan Johnson, Nian Wang, Youngser Park, Carey E. Priebe, Joshua T. Vogelstein

    Abstract: A connectome is a map of the structural and/or functional connections in the brain. This information-rich representation has the potential to transform our understanding of the relationship between patterns in brain connectivity and neurological processes, disorders, and diseases. However, existing computational techniques used to analyze connectomes are oftentimes insufficient for interrogating m… ▽ More

    Submitted 13 April, 2022; v1 submitted 30 November, 2020; originally announced November 2020.

    Comments: 29 pages, 12 figures

  47. arXiv:2011.06557  [pdf, other

    stat.ML cs.LG stat.ME

    A partition-based similarity for classification distributions

    Authors: Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, Carey E. Priebe

    Abstract: Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  48. arXiv:2010.14622  [pdf, other

    cs.SI stat.ME

    Vertex nomination between graphs via spectral embedding and quadratic programming

    Authors: Runbing Zheng, Vince Lyzinski, Carey E. Priebe, Minh Tang

    Abstract: Given a network and a subset of interesting vertices whose identities are only partially known, the vertex nomination problem seeks to rank the remaining vertices in such a way that the interesting vertices are ranked at the top of the list. An important variant of this problem is vertex nomination in the multi-graphs setting. Given two graphs $G_1, G_2$ with common vertices and a vertex of intere… ▽ More

    Submitted 27 March, 2022; v1 submitted 24 October, 2020; originally announced October 2020.

  49. A Simple Spectral Failure Mode for Graph Convolutional Networks

    Authors: Carey E. Priebe, Cencheng Shen, Ningyuan Huang, Tianyi Chen

    Abstract: Neural networks have achieved remarkable successes in machine learning tasks. This has recently been extended to graph learning using neural networks. However, there is limited theoretical work in understanding how and when they perform well, especially relative to established statistical learning techniques such as spectral embedding. In this short paper, we present a simple generative model wher… ▽ More

    Submitted 11 August, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 44(11), 8689-8693, 2022

  50. arXiv:2008.10055  [pdf, other

    stat.ME

    Multiple Network Embedding for Anomaly Detection in Time Series of Graphs

    Authors: Guodong Chen, Jesús Arroyo, Avanti Athreya, Joshua Cape, Joshua T. Vogelstein, Youngser Park, Chris White, Jonathan Larson, Weiwei Yang, Carey E. Priebe

    Abstract: This paper considers the graph signal processing problem of anomaly detection in time series of graphs. We examine two related, complementary inference tasks: the detection of anomalous graphs within a time series, and the detection of temporally anomalous vertices. We approach these tasks via the adaptation of statistically principled methods for joint graph inference, specifically \emph{multiple… ▽ More

    Submitted 10 March, 2024; v1 submitted 23 August, 2020; originally announced August 2020.

    Comments: 51 pages, 17 figures