Search | arXiv e-print repository

Harmonizing Program Induction with Rate-Distortion Theory

Authors: Hanqi Zhou, David G. Nagy, Charley M. Wu

Abstract: Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take… ▽ More Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take the form of programs. In this work, we adapt RDT by proposing a three way trade-off among rate (description length), distortion (error), and computational costs (search budget). We use simulations on a melody task to study the implications of this trade-off, and show that constructing a shared program library across tasks provides global benefits. However, this comes at the cost of sensitivity to curricula, which is also characteristic of human learners. Finally, we use methods from partial information decomposition to generate training curricula that induce more effective libraries and better generalization. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: CogSci 2024

arXiv:2404.08472 [pdf, other]

TSLANet: Rethinking Transformers for Time Series Representation Learning

Authors: Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Xiaoli Li

Abstract: Time series data, characterized by its intrinsic long and short-range dependencies, poses a unique challenge across analytical applications. While Transformer-based models excel at capturing long-range dependencies, they face limitations in noise sensitivity, computational efficiency, and overfitting with smaller datasets. In response, we introduce a novel Time Series Lightweight Adaptive Network… ▽ More Time series data, characterized by its intrinsic long and short-range dependencies, poses a unique challenge across analytical applications. While Transformer-based models excel at capturing long-range dependencies, they face limitations in noise sensitivity, computational efficiency, and overfitting with smaller datasets. In response, we introduce a novel Time Series Lightweight Adaptive Network (TSLANet), as a universal convolutional model for diverse time series tasks. Specifically, we propose an Adaptive Spectral Block, harnessing Fourier analysis to enhance feature representation and to capture both long-term and short-term interactions while mitigating noise via adaptive thresholding. Additionally, we introduce an Interactive Convolution Block and leverage self-supervised learning to refine the capacity of TSLANet for decoding complex temporal patterns and improve its robustness on different datasets. Our comprehensive experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection, showcasing its resilience and adaptability across a spectrum of noise levels and data sizes. The code is available at https://github.com/emadeldeen24/TSLANet. △ Less

Submitted 6 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted in ICML 2024

arXiv:2403.19994 [pdf, other]

Supervised Bayesian joint graphical model for simultaneous network estimation and subgroup identification

Authors: Xing Qin, Xu Liu, Shuangge Ma, Mengyun Wu

Abstract: Heterogeneity is a fundamental characteristic of cancer. To accommodate heterogeneity, subgroup identification has been extensively studied and broadly categorized into unsupervised and supervised analysis. Compared to unsupervised analysis, supervised approaches potentially hold greater clinical implications. Under the unsupervised analysis framework, several methods focusing on network-based sub… ▽ More Heterogeneity is a fundamental characteristic of cancer. To accommodate heterogeneity, subgroup identification has been extensively studied and broadly categorized into unsupervised and supervised analysis. Compared to unsupervised analysis, supervised approaches potentially hold greater clinical implications. Under the unsupervised analysis framework, several methods focusing on network-based subgroup identification have been developed, offering more comprehensive insights than those restricted to mean, variance, and other simplistic distributions by incorporating the interconnections among variables. However, research on supervised network-based subgroup identification remains limited. In this study, we develop a novel supervised Bayesian graphical model for jointly identifying multiple heterogeneous networks and subgroups. In the proposed model, heterogeneity is not only reflected in molecular data but also associated with a clinical outcome, and a novel similarity prior is introduced to effectively accommodate similarities among the networks of different subgroups, significantly facilitating clinically meaningful biological network construction and subgroup identification. The consistency properties of the estimates are rigorously established, and an efficient algorithm is developed. Extensive simulation studies and a real-world application to TCGA data are conducted, which demonstrate the advantages of the proposed approach in terms of both subgroup and network identification. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.13179 [pdf, other]

Predictive, scalable and interpretable knowledge tracing on structured domains

Authors: Hanqi Zhou, Robert Bamler, Charley M. Wu, Álvaro Tejero-Cantero

Abstract: Intelligent tutoring systems optimize the selection and timing of learning materials to enhance understanding and long-term retention. This requires estimates of both the learner's progress (''knowledge tracing''; KT), and the prerequisite structure of the learning domain (''knowledge map**''). While recent deep learning models achieve high KT accuracy, they do so at the expense of the interpret… ▽ More Intelligent tutoring systems optimize the selection and timing of learning materials to enhance understanding and long-term retention. This requires estimates of both the learner's progress (''knowledge tracing''; KT), and the prerequisite structure of the learning domain (''knowledge map**''). While recent deep learning models achieve high KT accuracy, they do so at the expense of the interpretability of psychologically-inspired models. In this work, we present a solution to this trade-off. PSI-KT is a hierarchical generative approach that explicitly models how both individual cognitive traits and the prerequisite structure of knowledge influence learning dynamics, thus achieving interpretability by design. Moreover, by using scalable Bayesian inference, PSI-KT targets the real-world need for efficient personalization even with a growing body of learners and learning histories. Evaluated on three datasets from online learning platforms, PSI-KT achieves superior multi-step predictive accuracy and scalable inference in continual-learning settings, all while providing interpretable representations of learner-specific traits and the prerequisite structure of knowledge that causally supports learning. In sum, predictive, scalable and interpretable knowledge tracing with solid knowledge map** lays a key foundation for effective personalized learning to make education accessible to a broad, global audience. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.07724 [pdf, other]

Balancing Fairness and Accuracy in Data-Restricted Binary Classification

Authors: Zachary McBride Lazri, Danial Dervovic, Antigoni Polychroniadou, Ivan Brugere, Dana Dachman-Soled, Min Wu

Abstract: Applications that deal with sensitive information may have restrictions placed on the data available to a machine learning (ML) classifier. For example, in some applications, a classifier may not have direct access to sensitive attributes, affecting its ability to produce accurate and fair decisions. This paper proposes a framework that models the trade-off between accuracy and fairness under four… ▽ More Applications that deal with sensitive information may have restrictions placed on the data available to a machine learning (ML) classifier. For example, in some applications, a classifier may not have direct access to sensitive attributes, affecting its ability to produce accurate and fair decisions. This paper proposes a framework that models the trade-off between accuracy and fairness under four practical scenarios that dictate the type of data available for analysis. Prior works examine this trade-off by analyzing the outputs of a scoring function that has been trained to implicitly learn the underlying distribution of the feature vector, class label, and sensitive attribute of a dataset. In contrast, our framework directly analyzes the behavior of the optimal Bayesian classifier on this underlying distribution by constructing a discrete approximation it from the dataset itself. This approach enables us to formulate multiple convex optimization problems, which allow us to answer the question: How is the accuracy of a Bayesian classifier affected in different data restricting scenarios when constrained to be fair? Analysis is performed on a set of fairness definitions that include group and individual fairness. Experiments on three datasets demonstrate the utility of the proposed framework as a tool for quantifying the trade-offs among different fairness notions and their distributional dependencies. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.12785 [pdf, other]

Stochastic Graph Heat Modelling for Diffusion-based Connectivity Retrieval

Authors: Stephan Goerttler, Fei He, Min Wu

Abstract: Heat diffusion describes the process by which heat flows from areas with higher temperatures to ones with lower temperatures. This concept was previously adapted to graph structures, whereby heat flows between nodes of a graph depending on the graph topology. Here, we combine the graph heat equation with the stochastic heat equation, which ultimately yields a model for multivariate time signals on… ▽ More Heat diffusion describes the process by which heat flows from areas with higher temperatures to ones with lower temperatures. This concept was previously adapted to graph structures, whereby heat flows between nodes of a graph depending on the graph topology. Here, we combine the graph heat equation with the stochastic heat equation, which ultimately yields a model for multivariate time signals on a graph. We show theoretically how the model can be used to directly compute the diffusion-based connectivity structure from multivariate signals. Unlike other connectivity measures, our heat model-based approach is inherently multivariate and yields an absolute scaling factor, namely the graph thermal diffusivity, which captures the extent of heat-like graph propagation in the data. On two datasets, we show how the graph thermal diffusivity can be used to characterise Alzheimer's disease (AD). We find that the graph thermal diffusivity is lower for AD patients than healthy controls and correlates with mini mental state examination (MMSE) scores, suggesting structural impairment in patients in line with previous findings. △ Less

Submitted 30 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 4 pages, 1 figure, conference paper

arXiv:2402.01929 [pdf, other]

Sample, estimate, aggregate: A recipe for causal discovery foundation models

Authors: Menghua Wu, Yujia Bao, Regina Barzilay, Tommi Jaakkola

Abstract: Causal discovery, the task of inferring causal structure from data, promises to accelerate scientific research, inform policy making, and more. However, causal discovery algorithms over larger sets of variables tend to be brittle against misspecification or when data are limited. To mitigate these challenges, we train a supervised model that learns to predict a larger causal graph from the outputs… ▽ More Causal discovery, the task of inferring causal structure from data, promises to accelerate scientific research, inform policy making, and more. However, causal discovery algorithms over larger sets of variables tend to be brittle against misspecification or when data are limited. To mitigate these challenges, we train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables, along with other statistical hints like inverse covariance. Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets. Theoretically, we show that this model is well-specified, in the sense that it can recover a causal graph consistent with graphs over subsets. Empirically, we train the model to be robust to erroneous estimates using diverse synthetic data. Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift, and can be adapted at low cost to different discovery algorithms or choice of statistics. △ Less

Submitted 23 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Preprint. Under review

arXiv:2310.01575 [pdf, other]

Derivation of outcome-dependent dietary patterns for low-income women obtained from survey data using a Supervised Weighted Overfitted Latent Class Analysis

Authors: Stephanie M. Wu, Matthew R. Williams, Terrance D. Savitsky, Briana J. K. Stephenson

Abstract: Poor diet quality is a key modifiable risk factor for hypertension and disproportionately impacts low-income women. \sw{Analyzing diet-driven hypertensive outcomes in this demographic is challenging due to the complexity of dietary data and selection bias when the data come from surveys, a main data source for understanding diet-disease relationships in understudied populations. Supervised Bayesia… ▽ More Poor diet quality is a key modifiable risk factor for hypertension and disproportionately impacts low-income women. \sw{Analyzing diet-driven hypertensive outcomes in this demographic is challenging due to the complexity of dietary data and selection bias when the data come from surveys, a main data source for understanding diet-disease relationships in understudied populations. Supervised Bayesian model-based clustering methods summarize dietary data into latent patterns that holistically capture relationships among foods and a known health outcome but do not sufficiently account for complex survey design. This leads to biased estimation and inference and lack of generalizability of the patterns}. To address this, we propose a supervised weighted overfitted latent class analysis (SWOLCA) based on a Bayesian pseudo-likelihood approach that integrates sampling weights into an exposure-outcome model for discrete data. Our model adjusts for stratification, clustering, and informative sampling, and handles modifying effects via interaction terms within a Markov chain Monte Carlo Gibbs sampling algorithm. Simulation studies confirm that the SWOLCA model exhibits good performance in terms of bias, precision, and coverage. Using data from the National Health and Nutrition Examination Survey (2015-2018), we demonstrate the utility of our model by characterizing dietary patterns associated with hypertensive outcomes among low-income women in the United States. △ Less

Submitted 28 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: 16 pages, 8 tables, 7 figures

arXiv:2309.15585 [pdf, other]

Identification of Influencing Factors on Self-reported Count Data with Multiple Potential Inflated Values

Authors: Yang Li, Mingcong Wu, Mengyun Wu, Shuangge Ma

Abstract: The Online Chauffeured Service Demand (OCSD) research is an exploratory market study of designated driver services in China. Researchers are interested in the influencing factors of chauffeured service adoption and usage and have collected relevant data using a self-reported questionnaire. As self-reported count measure data is typically inflated, there exist challenges to its validity, which may… ▽ More The Online Chauffeured Service Demand (OCSD) research is an exploratory market study of designated driver services in China. Researchers are interested in the influencing factors of chauffeured service adoption and usage and have collected relevant data using a self-reported questionnaire. As self-reported count measure data is typically inflated, there exist challenges to its validity, which may bias estimation and increase error in empirical research. Motivated by the analysis of self-reported data with multiple inflated values, we propose a novel approach to simultaneously achieve data-driven inflated value selection and identification of important influencing factors. In particular, the regularization technique is applied to the mixing proportions of inflated values and the regression parameters to obtain shrinkage estimates. We analyze the OCSD data with the proposed approach, deriving insights into the determinants impacting service demand. The proper interpretations and implications contribute to service promotion and related policy optimization. Extensive simulation studies and consistent asymptotic properties further establish the effectiveness of the proposed approach. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 18 pages, 8 figures, references added

arXiv:2309.03354 [pdf, other]

Ensemble linear interpolators: The role of ensembling

Authors: Mingqi Wu, Qiang Sun

Abstract: Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular rando… ▽ More Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 30-page main text including figures and tables, 50-page appendix

arXiv:2307.15268 [pdf, other]

Multivariate Differential Association Analysis

Authors: Hoseung Song, Michael C. Wu

Abstract: Identifying how dependence relationships vary across different conditions plays a significant role in many scientific investigations. For example, it is important for the comparison of biological systems to see if relationships between genomic features differ between cases and controls. In this paper, we seek to evaluate whether the relationships between two sets of variables is different across t… ▽ More Identifying how dependence relationships vary across different conditions plays a significant role in many scientific investigations. For example, it is important for the comparison of biological systems to see if relationships between genomic features differ between cases and controls. In this paper, we seek to evaluate whether the relationships between two sets of variables is different across two conditions. Specifically, we assess: do two sets of high-dimensional variables have similar dependence relationships across two conditions?. We propose a new kernel-based test to capture the differential dependence. Specifically, the new test determines whether two measures that detect dependence relationships are similar or not under two conditions. We introduce the asymptotic permutation null distribution of the test statistic and it is shown to work well under finite samples such that the test is computationally efficient, making it easily applicable to analyze large data sets. We demonstrate through numerical studies that our proposed test has high power for detecting differential linear and non-linear relationships. The proposed method is implemented in an R package kerDAA. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2306.05566 [pdf, other]

Data-Adaptive Probabilistic Likelihood Approximation for Ordinary Differential Equations

Authors: Mohan Wu, Martin Lysy

Abstract: Estimating the parameters of ordinary differential equations (ODEs) is of fundamental importance in many scientific applications. While ODEs are typically approximated with deterministic algorithms, new research on probabilistic solvers indicates that they produce more reliable parameter estimates by better accounting for numerical errors. However, many ODE systems are highly sensitive to their pa… ▽ More Estimating the parameters of ordinary differential equations (ODEs) is of fundamental importance in many scientific applications. While ODEs are typically approximated with deterministic algorithms, new research on probabilistic solvers indicates that they produce more reliable parameter estimates by better accounting for numerical errors. However, many ODE systems are highly sensitive to their parameter values. This produces deep local maxima in the likelihood function -- a problem which existing probabilistic solvers have yet to resolve. Here we present a novel probabilistic ODE likelihood approximation, DALTON, which can dramatically reduce parameter sensitivity by learning from noisy ODE measurements in a data-adaptive manner. Our approximation scales linearly in both ODE variables and time discretization points, and is applicable to ODEs with both partially-unobserved components and non-Gaussian measurement models. Several examples demonstrate that DALTON produces more accurate parameter estimates via numerical optimization than existing probabilistic ODE solvers, and even in some cases than the exact ODE likelihood itself. △ Less

Submitted 6 December, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 8 pages, 7 figures

arXiv:2306.00265 [pdf, other]

Doubly Robust Self-Training

Authors: Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

Abstract: Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provabl… ▽ More Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provably balances between two extremes. When the pseudo-labels are entirely incorrect, our method reduces to a training process solely using labeled data. Conversely, when the pseudo-labels are completely accurate, our method transforms into a training process utilizing all pseudo-labeled data and labeled data, thus increasing the effective sample size. Through empirical evaluations on both the ImageNet dataset for image classification and the nuScenes autonomous driving dataset for 3D object detection, we demonstrate the superiority of the doubly robust loss over the standard self-training baseline. △ Less

Submitted 2 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2303.04996 [pdf]

Bayesian estimation methods for survey data with potential applications to health disparities research

Authors: Stephanie M. Wu, Briana Joy K. Stephenson

Abstract: Understanding how and why certain communities bear a disproportionate burden of disease is challenging due to the scarcity of data on these communities. Surveys provide a useful avenue for accessing hard-to-reach populations, as many surveys specifically oversample understudied and vulnerable populations. When survey data is used for analysis, it is important to account for the complex survey desi… ▽ More Understanding how and why certain communities bear a disproportionate burden of disease is challenging due to the scarcity of data on these communities. Surveys provide a useful avenue for accessing hard-to-reach populations, as many surveys specifically oversample understudied and vulnerable populations. When survey data is used for analysis, it is important to account for the complex survey design that gave rise to the data, in order to avoid biased conclusions. The field of Bayesian survey statistics aims to account for such survey design while leveraging the advantages of Bayesian models, which can flexibly handle sparsity through borrowing of information and provide a coherent inferential framework to easily obtain variances for complex models and data types. For these reasons, Bayesian survey methods seem uniquely well-poised for health disparities research, where heterogeneity and sparsity are frequent considerations. This review discusses three main approaches found in the Bayesian survey methodology literature: 1) multilevel regression and post-stratification, 2) weighted pseudolikelihood-based methods, and 3) synthetic population generation. We discuss advantages and disadvantages of each approach, examine recent applications and extensions, and consider how these approaches may be leveraged to improve research in population health equity. △ Less

Submitted 26 July, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 36 pages, 1 figure, 4 tables. Under review at WIREs Computational Statistics

arXiv:2303.03582 [pdf, other]

Statistical inferences for complex dependence of multimodal imaging data

Authors: **yuan Chang, **g He, Jian Kang, Mingcong Wu

Abstract: Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HC… ▽ More Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HCP) study, we particularly address three hypothesis testing problems: (a) testing independence among imaging modalities over brain regions, (b) testing independence between brain regions within imaging modalities, and (c) testing independence between brain regions across different modalities. Considering a general form for all the three tests, we develop a global testing procedure and a multiple testing procedure controlling the false discovery rate. We study theoretical properties of the proposed tests and develop a computationally efficient distributed algorithm. The proposed methods and theory are general and relevant for many statistical problems of testing independence structure among the components of high-dimensional random vectors with arbitrary dependence structures. We also illustrate our proposed methods via extensive simulations and analysis of five task fMRI contrast maps in the HCP study. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2210.16835 [pdf, other]

Variance reduced Shapley value estimation for trustworthy data valuation

Authors: Mengmeng Wu, Ruoxi Jia, Changle Lin, Wei Huang, Xiangyu Chang

Abstract: Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace,… ▽ More Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace, we propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short). We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS. Finally, the effectiveness of VRDS is illustrated in different types of datasets and data removal applications. △ Less

Submitted 22 May, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

arXiv:2206.10763 [pdf, other]

doi 10.1038/s41597-022-01808-2

Simulated redistricting plans for the analysis and evaluation of redistricting in the United States

Authors: Cory McCartan, Christopher T. Kenny, Tyler Simko, George Garcia III, Kevin Wang, Melissa Wu, Shiro Kuriwaki, Kosuke Imai

Abstract: This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standa… ▽ More This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data. △ Less

Submitted 20 October, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: 11 pages, 3 figures

Journal ref: Sci Data (2022) 9, 689

arXiv:2206.09120 [pdf, other]

Pursuit of a Discriminative Representation for Multiple Subspaces via Sequential Games

Authors: Druv Pai, Michael Psenka, Chih-Yuan Chiu, Manxi Wu, Edgar Dobriban, Yi Ma

Abstract: We consider the problem of learning discriminative representations for data in a high-dimensional space with distribution supported on or around multiple low-dimensional linear subspaces. That is, we wish to compute a linear injective map of the data such that the features lie on multiple orthogonal subspaces. Instead of treating this learning problem using multiple PCAs, we cast it as a sequentia… ▽ More We consider the problem of learning discriminative representations for data in a high-dimensional space with distribution supported on or around multiple low-dimensional linear subspaces. That is, we wish to compute a linear injective map of the data such that the features lie on multiple orthogonal subspaces. Instead of treating this learning problem using multiple PCAs, we cast it as a sequential game using the closed-loop transcription (CTRL) framework recently proposed for learning discriminative and generative representations for general low-dimensional submanifolds. We prove that the equilibrium solutions to the game indeed give correct representations. Our approach unifies classical methods of learning subspaces with modern deep learning practice, by showing that subspace learning problems may be provably solved using the modern toolkit of representation learning. In addition, our work provides the first theoretical justification for the CTRL framework, in the important case of linear subspaces. We support our theoretical findings with compelling empirical evidence. We also generalize the sequential game formulation to more general representation learning problems. Our code, including methods for easy reproduction of experimental results, is publically available on GitHub. △ Less

Submitted 5 October, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

Comments: main body is 16 pages and has 5 figures; appendix is 17 pages and has 6 figures

arXiv:2205.09735 [pdf, other]

Foundation Posteriors for Approximate Probabilistic Inference

Authors: Mike Wu, Noah Goodman

Abstract: Probabilistic programs provide an expressive representation language for generative models. Given a probabilistic program, we are interested in the task of posterior inference: estimating a latent variable given a set of observed variables. Existing techniques for inference in probabilistic programs often require choosing many hyper-parameters, are computationally expensive, and/or only work for r… ▽ More Probabilistic programs provide an expressive representation language for generative models. Given a probabilistic program, we are interested in the task of posterior inference: estimating a latent variable given a set of observed variables. Existing techniques for inference in probabilistic programs often require choosing many hyper-parameters, are computationally expensive, and/or only work for restricted classes of programs. Here we formulate inference as masked language modeling: given a program, we generate a supervised dataset of variables and assignments, and randomly mask a subset of the assignments. We then train a neural network to unmask the random values, defining an approximate posterior distribution. By optimizing a single neural network across a range of programs we amortize the cost of training, yielding a "foundation" posterior able to do zero-shot inference for new programs. The foundation posterior can also be fine-tuned for a particular program and dataset by optimizing a variational inference objective. We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs. △ Less

Submitted 31 August, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: 9 pages without appendix

arXiv:2204.00443 [pdf, other]

doi 10.1103/PhysRevD.106.023521

Differentiating small-scale subhalo distributions in CDM and WDM models using persistent homology

Authors: Jessi Cisewski-Kehe, Brittany Terese Fasy, Wojciech Hellwing, Mark R. Lovell, Pawel Drozda, Mike Wu

Abstract: The spatial distribution of galaxies at sufficiently small scales will encode information about the identity of the dark matter. We develop a novel description of the halo distribution using persistent homology summaries, in which collections of points are decomposed into clusters, loops and voids. We apply these methods, together with a set of hypothesis tests, to dark matter haloes in MW-analog… ▽ More The spatial distribution of galaxies at sufficiently small scales will encode information about the identity of the dark matter. We develop a novel description of the halo distribution using persistent homology summaries, in which collections of points are decomposed into clusters, loops and voids. We apply these methods, together with a set of hypothesis tests, to dark matter haloes in MW-analog environment regions of the cold dark matter (CDM) and warm dark matter (WDM) Copernicus Complexio $N$-body cosmological simulations. The results of the hypothesis tests find statistically significant differences (p-values $\leq$ 0.001) between the CDM and WDM structures, and the functional summaries of persistence diagrams detect differences at scales that are distinct from the comparison spatial point process functional summaries considered (including the two-point correlation function). The differences between the models are driven most strongly at filtration scales $\sim100$~kpc, where CDM generates larger numbers of unconnected halo clusters while WDM instead generates loops. This study was conducted on dark matter haloes generally; future work will involve applying the same methods to realistic galaxy catalogues. △ Less

Submitted 1 April, 2022; originally announced April 2022.

Comments: 17 pages, 11 figures

arXiv:2109.10621 [pdf, ps, other]

Two-level Bayesian interaction analysis for survival data incorporating pathway information

Authors: Xing Qin, Shuangge Ma, Mengyun Wu

Abstract: Genetic interactions play an important role in the progression of complex diseases, providing explanation of variations in disease phenotype missed by main genetic effects. Comparatively, there are fewer investigations on prognostic survival time, given its challenging characteristics such as censoring. In recent biomedical research, two-level analysis of both genes and their involved pathways has… ▽ More Genetic interactions play an important role in the progression of complex diseases, providing explanation of variations in disease phenotype missed by main genetic effects. Comparatively, there are fewer investigations on prognostic survival time, given its challenging characteristics such as censoring. In recent biomedical research, two-level analysis of both genes and their involved pathways has received much attention and been demonstrated to be more effective than single-level analysis, however such analysis is limited to main effects. Pathways are not isolated and their interactions have also been suggested to have important contributions to the prognosis of complex diseases. In this article, we develop a novel two-level Bayesian interaction analysis approach for survival data. This approach is the first to conduct the analysis of lower-level gene-gene interactions and higher-level pathway-pathway interactions simultaneously. Significantly advancing from existing Bayesian studies based on the Markov Chain Monte Carlo (MCMC) technique, we propose a variational inference framework based on the accelerated failure time model with favourable priors to account for two-level selection as well as censoring. The computational efficiency is much desirable for high dimensional interaction analysis. We examine performance of the proposed approach using extensive simulation. Application to TCGA melanoma and lung adenocarcinoma data leads to biologically sensible findings with satisfactory prediction accuracy and selection stability. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2108.11579 [pdf, other]

Modeling Item Response Theory with Stochastic Variational Inference

Authors: Mike Wu, Richard L. Davis, Benjamin W. Domingue, Chris Piech, Noah Goodman

Abstract: Item Response Theory (IRT) is a ubiquitous model for understanding human behaviors and attitudes based on their responses to questions. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving psychometric modeling leading to improved scientific understanding and public policy. However, while larger datasets allow for more flexible approaches, many… ▽ More Item Response Theory (IRT) is a ubiquitous model for understanding human behaviors and attitudes based on their responses to questions. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving psychometric modeling leading to improved scientific understanding and public policy. However, while larger datasets allow for more flexible approaches, many contemporary algorithms for fitting IRT models may also have massive computational demands that forbid real-world application. To address this bottleneck, we introduce a variational Bayesian inference algorithm for IRT, and show that it is fast and scalable without sacrificing accuracy. Applying this method to five large-scale item response datasets from cognitive science and education yields higher log likelihoods and higher accuracy in imputing missing data than alternative inference algorithms. Using this new inference approach we then generalize IRT with expressive Bayesian models of responses, leveraging recent advances in deep learning to capture nonlinear item characteristic curves (ICC) with neural networks. Using an eigth-grade mathematics test from TIMSS, we show our nonlinear IRT models can capture interesting asymmetric ICCs. The algorithm implementation is open-source, and easily usable. △ Less

Submitted 28 July, 2022; v1 submitted 26 August, 2021; originally announced August 2021.

Comments: version two includes added experiments; 33 pages of content; 6 pages appendix; figures at the bottom. arXiv admin note: text overlap with arXiv:2002.00276

arXiv:2104.04244 [pdf, other]

How rotational invariance of common kernels prevents generalization in high dimensions

Authors: Konstantin Donhauser, Mingqi Wu, Fanny Yang

Abstract: Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. However, its behavior in high dimensions is much less understood. Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data. In this paper, we show that the rotational invariance property of commonly studie… ▽ More Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. However, its behavior in high dimensions is much less understood. Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data. In this paper, we show that the rotational invariance property of commonly studied kernels (such as RBF, inner product kernels and fully-connected NTK of any depth) induces a bias towards low-degree polynomials in high dimensions. Our result implies a lower bound on the generalization error for a wide range of distributions and various choices of the scaling for kernels with different eigenvalue decays. This lower bound suggests that general consistency results for kernel ridge regression in high dimensions require a more refined analysis that depends on the structure of the kernel beyond its eigenvalue decay. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2103.08450 [pdf, other]

Modeling Multivariate Cyber Risks: Deep Learning Dating Extreme Value Theory

Authors: Mingyue Zhang Wu, **zhu Luo, Xing Fang, Maochao Xu, Peng Zhao

Abstract: Modeling cyber risks has been an important but challenging task in the domain of cyber security. It is mainly because of the high dimensionality and heavy tails of risk patterns. Those obstacles have hindered the development of statistical modeling of the multivariate cyber risks. In this work, we propose a novel approach for modeling the multivariate cyber risks which relies on the deep learning… ▽ More Modeling cyber risks has been an important but challenging task in the domain of cyber security. It is mainly because of the high dimensionality and heavy tails of risk patterns. Those obstacles have hindered the development of statistical modeling of the multivariate cyber risks. In this work, we propose a novel approach for modeling the multivariate cyber risks which relies on the deep learning and extreme value theory. The proposed model not only enjoys the high accurate point predictions via deep learning but also can provide the satisfactory high quantile prediction via extreme value theory. The simulation study shows that the proposed model can model the multivariate cyber risks very well and provide satisfactory prediction performances. The empirical evidence based on real honeypot attack data also shows that the proposed model has very satisfactory prediction performances. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: 25 pages

arXiv:2010.10960 [pdf, ps, other]

Gene-gene interaction analysis incorporating network information via a structured Bayesian approach

Authors: Xing Qin, Shuangge Ma, Mengyun Wu

Abstract: Increasing evidence has shown that gene-gene interactions have important effects on biological processes of human diseases. Due to the high dimensionality of genetic measurements, existing interaction analysis methods usually suffer from a lack of sufficient information and are still unsatisfactory. Biological networks have been massively accumulated, allowing researchers to identify biomarkers fr… ▽ More Increasing evidence has shown that gene-gene interactions have important effects on biological processes of human diseases. Due to the high dimensionality of genetic measurements, existing interaction analysis methods usually suffer from a lack of sufficient information and are still unsatisfactory. Biological networks have been massively accumulated, allowing researchers to identify biomarkers from a system perspective by utilizing network selection (consisting of functionally related biomarkers) as well as network structures. In the main-effect analysis, network information has been widely incorporated, leading to biologically more meaningful and more accurate estimates. However, there is still a big gap in the context of interaction analysis. In this study, we develop a novel structured Bayesian interaction analysis approach, effectively incorporating the network information. This study is among the first to identify gene-gene interactions with the assistance of network selection for phenotype prediction, while simultaneously accommodating the underlying network structures. It innovatively respects the multiple hierarchies among main effects, interactions, and networks. Bayesian method is adopted, which has been shown to have multiple advantages over some other techniques. An efficient variational inference algorithm is developed to explore the posterior distribution. Extensive simulation studies demonstrate the practical superiority of the proposed approach. The analysis of TCGA data on melanoma and lung cancer leads to biologically sensible findings with satisfactory prediction accuracy and selection stability. △ Less

Submitted 8 January, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

arXiv:2010.02038 [pdf, other]

A Simple Framework for Uncertainty in Contrastive Learning

Authors: Mike Wu, Noah Goodman

Abstract: Contrastive approaches to representation learning have recently shown great promise. In contrast to generative approaches, these contrastive models learn a deterministic encoder with no notion of uncertainty or confidence. In this paper, we introduce a simple approach based on "contrasting distributions" that learns to assign uncertainty for pretrained contrastive representations. In particular, w… ▽ More Contrastive approaches to representation learning have recently shown great promise. In contrast to generative approaches, these contrastive models learn a deterministic encoder with no notion of uncertainty or confidence. In this paper, we introduce a simple approach based on "contrasting distributions" that learns to assign uncertainty for pretrained contrastive representations. In particular, we train a deep network from a representation to a distribution in representation space, whose variance can be used as a measure of confidence. In our experiments, we show that this deep uncertainty model can be used (1) to visually interpret model behavior, (2) to detect new noise in the input to deployed models, (3) to detect anomalies, where we outperform 10 baseline methods across 11 tasks with improvements of up to 14% absolute, and (4) to classify out-of-distribution examples where our fully unsupervised model is competitive with supervised methods. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 8 pages main text

arXiv:2010.02037 [pdf, other]

Conditional Negative Sampling for Contrastive Learning of Visual Representations

Authors: Mike Wu, Milan Mosse, Chengxu Zhuang, Daniel Yamins, Noah Goodman

Abstract: Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two views of an image. NCE uses randomly sampled negative examples to normalize the objective. In this paper, we show that choosing difficult negatives, or those more similar to the current instance, can yield stronger rep… ▽ More Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two views of an image. NCE uses randomly sampled negative examples to normalize the objective. In this paper, we show that choosing difficult negatives, or those more similar to the current instance, can yield stronger representations. To do this, we introduce a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive. We prove that these estimators lower-bound mutual information, with higher bias but lower variance than NCE. Experimentally, we find our approach, applied on top of existing models (IR, CMC, and MoCo) improves accuracy by 2-5% points in each case, measured by linear evaluation on four standard image datasets. Moreover, we find continued benefits when transferring features to a variety of new image distributions from the Meta-Dataset collection and to a variety of downstream tasks such as object detection, instance segmentation, and keypoint detection. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 8 pages, 4 pages supplement

arXiv:2010.02004 [pdf, ps, other]

doi 10.18653/v1/2020.findings-emnlp.266

Assessing Robustness of Text Classification through Maximal Safe Radius Computation

Authors: Emanuele La Malfa, Min Wu, Luca Laurenti, Benjie Wang, Anthony Hartshorn, Marta Kwiatkowska

Abstract: Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of rob… ▽ More Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. Since computing the exact maximal safe radius is not feasible in practice, we instead approximate it by computing a lower and upper bound. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions. The lower bound computation is achieved through an adaptation of the linear bounding techniques implemented in tools CNN-Cert and POPQORN, respectively for convolutional and recurrent network models. We evaluate the methods on sentiment analysis and news classification models for four datasets (IMDB, SST, AG News and NEWS) and a range of embeddings, and provide an analysis of robustness trends. We also apply our framework to interpretability analysis and compare it with LIME. △ Less

Submitted 7 October, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

Comments: 12 pages + appendix

Journal ref: EMNLP-Findings2020

arXiv:2007.09868 [pdf, other]

Attention Sequence to Sequence Model for Machine Remaining Useful Life Prediction

Authors: Mohamed Ragab, Zhenghua Chen, Min Wu, Chee-Keong Kwoh, Ruqiang Yan, Xiaoli Li

Abstract: Accurate estimation of remaining useful life (RUL) of industrial equipment can enable advanced maintenance schedules, increase equipment availability and reduce operational costs. However, existing deep learning methods for RUL prediction are not completely successful due to the following two reasons. First, relying on a single objective function to estimate the RUL will limit the learned represen… ▽ More Accurate estimation of remaining useful life (RUL) of industrial equipment can enable advanced maintenance schedules, increase equipment availability and reduce operational costs. However, existing deep learning methods for RUL prediction are not completely successful due to the following two reasons. First, relying on a single objective function to estimate the RUL will limit the learned representations and thus affect the prediction accuracy. Second, while longer sequences are more informative for modelling the sensor dynamics of equipment, existing methods are less effective to deal with very long sequences, as they mainly focus on the latest information. To address these two problems, we develop a novel attention-based sequence to sequence with auxiliary task (ATS2S) model. In particular, our model jointly optimizes both reconstruction loss to empower our model with predictive capabilities (by predicting next input sequence given current input sequence) and RUL prediction loss to minimize the difference between the predicted RUL and actual RUL. Furthermore, to better handle longer sequence, we employ the attention mechanism to focus on all the important input information during training process. Finally, we propose a new dual-latent feature representation to integrate the encoder features and decoder hidden states, to capture rich semantic information in data. We conduct extensive experiments on four real datasets to evaluate the efficacy of the proposed method. Experimental results show that our proposed method can achieve superior performance over 13 state-of-the-art methods consistently. △ Less

Submitted 19 July, 2020; originally announced July 2020.

arXiv:2007.04484 [pdf, other]

Transparency Tools for Fairness in AI (Luskin)

Authors: Mingliang Chen, Aria Shahverdi, Sarah Anderson, Se Yong Park, Justin Zhang, Dana Dachman-Soled, Kristin Lauter, Min Wu

Abstract: We propose new tools for policy-makers to use when assessing and correcting fairness and bias in AI algorithms. The three tools are: - A new definition of fairness called "controlled fairness" with respect to choices of protected features and filters. The definition provides a simple test of fairness of an algorithm with respect to a dataset. This notion of fairness is suitable in cases where fa… ▽ More We propose new tools for policy-makers to use when assessing and correcting fairness and bias in AI algorithms. The three tools are: - A new definition of fairness called "controlled fairness" with respect to choices of protected features and filters. The definition provides a simple test of fairness of an algorithm with respect to a dataset. This notion of fairness is suitable in cases where fairness is prioritized over accuracy, such as in cases where there is no "ground truth" data, only data labeled with past decisions (which may have been biased). - Algorithms for retraining a given classifier to achieve "controlled fairness" with respect to a choice of features and filters. Two algorithms are presented, implemented and tested. These algorithms require training two different models in two stages. We experiment with combinations of various types of models for the first and second stage and report on which combinations perform best in terms of fairness and accuracy. - Algorithms for adjusting model parameters to achieve a notion of fairness called "classification parity". This notion of fairness is suitable in cases where accuracy is prioritized. Two algorithms are presented, one which assumes that protected features are accessible to the model during testing, and one which assumes protected features are not accessible during testing. We evaluate our tools on three different publicly available datasets. We find that the tools are useful for understanding various dimensions of bias, and that in practice the algorithms are effective in starkly reducing a given observed bias when tested on new data. △ Less

Submitted 8 July, 2020; originally announced July 2020.

arXiv:2007.03208 [pdf, other]

A Topological Approach to Inferring the Intrinsic Dimension of Convex Sensing Data

Authors: Min-Chun Wu, Vladimir Itskov

Abstract: We consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown continuous quasi-convex functions. Given the measurement data, can one determine the dimension of this space? In this paper, we develop a method for inferring the intrinsic dimension of the data from measurements by quasi-convex functions, under natural generic assumptions. The dimension… ▽ More We consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown continuous quasi-convex functions. Given the measurement data, can one determine the dimension of this space? In this paper, we develop a method for inferring the intrinsic dimension of the data from measurements by quasi-convex functions, under natural generic assumptions. The dimension inference problem depends only on discrete data of the ordering of the measured points of space, induced by the sensor functions. We introduce a construction of a filtration of Dowker complexes, associated to measurements by quasi-convex functions. Topological features of these complexes are then used to infer the intrinsic dimension. We prove convergence theorems that guarantee obtaining the correct intrinsic dimension in the limit of large data, under natural generic assumptions. We also illustrate the usability of this method in simulations. △ Less

Submitted 7 July, 2020; originally announced July 2020.

MSC Class: 62R40 (Primary) 55N31; 92B99 (Secondary)

arXiv:2006.10667 [pdf, other]

Towards Threshold Invariant Fair Classification

Authors: Mingliang Chen, Min Wu

Abstract: Effective machine learning models can automatically learn useful information from a large quantity of data and provide decisions in a high accuracy. These models may, however, lead to unfair predictions in certain sense among the population groups of interest, where the grou** is based on such sensitive attributes as race and gender. Various fairness definitions, such as demographic parity and e… ▽ More Effective machine learning models can automatically learn useful information from a large quantity of data and provide decisions in a high accuracy. These models may, however, lead to unfair predictions in certain sense among the population groups of interest, where the grou** is based on such sensitive attributes as race and gender. Various fairness definitions, such as demographic parity and equalized odds, were proposed in prior art to ensure that decisions guided by the machine learning models are equitable. Unfortunately, the "fair" model trained with these fairness definitions is threshold sensitive, i.e., the condition of fairness may no longer hold true when tuning the decision threshold. This paper introduces the notion of threshold invariant fairness, which enforces equitable performances across different groups independent of the decision threshold. To achieve this goal, this paper proposes to equalize the risk distributions among the groups via two approximation methods. Experimental results demonstrate that the proposed methodology is effective to alleviate the threshold sensitivity in machine learning models designed to achieve fairness. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: Accepted to UAI 2020

arXiv:2005.13149 [pdf, other]

On Mutual Information in Contrastive Learning for Visual Representations

Authors: Mike Wu, Chengxu Zhuang, Milan Mosse, Daniel Yamins, Noah Goodman

Abstract: In recent years, several unsupervised, "contrastive" learning algorithms in vision have been shown to learn representations that perform remarkably well on transfer tasks. We show that this family of algorithms maximizes a lower bound on the mutual information between two or more "views" of an image where typical views come from a composition of image augmentations. Our bound generalizes the InfoN… ▽ More In recent years, several unsupervised, "contrastive" learning algorithms in vision have been shown to learn representations that perform remarkably well on transfer tasks. We show that this family of algorithms maximizes a lower bound on the mutual information between two or more "views" of an image where typical views come from a composition of image augmentations. Our bound generalizes the InfoNCE objective to support negative sampling from a restricted region of "difficult" contrasts. We find that the choice of negative samples and views are critical to the success of these algorithms. Reformulating previous learning objectives in terms of mutual information also simplifies and stabilizes them. In practice, our new objectives yield representations that outperform those learned with previous approaches for transfer to classification, bounding box detection, instance segmentation, and keypoint detection. % experiments show that choosing more difficult negative samples results in a stronger representation, outperforming those learned with IR, LA, and CMC in classification, bounding box detection, instance segmentation, and keypoint detection. The mutual information framework provides a unifying comparison of approaches to contrastive learning and uncovers the choices that impact representation learning. △ Less

Submitted 5 June, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: 8 pages content; 15 pages supplement with proofs

arXiv:2005.09195 [pdf, other]

Riemannian Proximal Policy Optimization

Authors: Shijun Wang, Baocheng Zhu, Chen Li, Mingzhe Wu, James Zhang, Wei Chu, Yuan Qi

Abstract: In this paper, We propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a nonconvex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower b… ▽ More In this paper, We propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a nonconvex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: 12 pages, 1 figures

arXiv:2005.08189 [pdf, other]

doi 10.1145/3441450

Multi-View Collaborative Network Embedding

Authors: Sezin Kircali Ata, Yuan Fang, Min Wu, Jiaqi Shi, Chee Keong Kwoh, Xiaoli Li

Abstract: Real-world networks often exist with multiple views, where each view describes one type of interaction among a common set of nodes. For example, on a video-sharing network, while two user nodes are linked if they have common favorite videos in one view, they can also be linked in another view if they share common subscribers. Unlike traditional single-view networks, multiple views maintain differe… ▽ More Real-world networks often exist with multiple views, where each view describes one type of interaction among a common set of nodes. For example, on a video-sharing network, while two user nodes are linked if they have common favorite videos in one view, they can also be linked in another view if they share common subscribers. Unlike traditional single-view networks, multiple views maintain different semantics to complement each other. In this paper, we propose MANE, a multi-view network embedding approach to learn low-dimensional representations. Similar to existing studies, MANE hinges on diversity and collaboration - while diversity enables views to maintain their individual semantics, collaboration enables views to work together. However, we also discover a novel form of second-order collaboration that has not been explored previously, and further unify it into our framework to attain superior node representations. Furthermore, as each view often has varying importance w.r.t. different nodes, we propose MANE+, an attention-based extension of MANE to model node-wise view importance. Finally, we conduct comprehensive experiments on three public, real-world multi-view networks, and the results demonstrate that our models consistently outperform state-of-the-art approaches. △ Less

Submitted 17 December, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: Accepted for publication in the ACM Transactions on Knowledge Discovery from Data, TKDD

Journal ref: ACM Trans. Knowl. Discov. Data 15, 3, Article 39 (April 2021), 18 pages

arXiv:2004.14774 [pdf, other]

IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

Authors: Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet , et al. (11 additional authors not shown)

Abstract: This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w… ▽ More This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/". △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and Automation Magazine. arXiv admin note: text overlap with arXiv:1911.06487

arXiv:2003.09527 [pdf, other]

Predicting Real-Time Locational Marginal Prices: A GAN-Based Video Prediction Approach

Authors: Zhongxia Zhang, Meng Wu

Abstract: In this paper, we propose an unsupervised data-driven approach to predict real-time locational marginal prices (RTLMPs). The proposed approach is built upon a general data structure for organizing system-wide heterogeneous market data streams into the format of market data images and videos. Leveraging this general data structure, the system-wide RTLMP prediction problem is formulated as a video p… ▽ More In this paper, we propose an unsupervised data-driven approach to predict real-time locational marginal prices (RTLMPs). The proposed approach is built upon a general data structure for organizing system-wide heterogeneous market data streams into the format of market data images and videos. Leveraging this general data structure, the system-wide RTLMP prediction problem is formulated as a video prediction problem. A video prediction model based on generative adversarial networks (GAN) is proposed to learn the spatio-temporal correlations among historical RTLMPs and predict system-wide RTLMPs for the next hour. An autoregressive moving average (ARMA) calibration method is adopted to improve the prediction accuracy. The proposed RTLMP prediction method takes public market data as inputs, without requiring any confidential information on system topology, model parameters, or market operating details. Case studies using public market data from ISO New England (ISO-NE) and Southwest Power Pool (SPP) demonstrate that the proposed method is able to learn spatio-temporal correlations among RTLMPs and perform accurate RTLMP prediction. △ Less

Submitted 20 March, 2020; originally announced March 2020.

arXiv:2002.00276 [pdf, other]

Variational Item Response Theory: Fast, Accurate, and Expressive

Authors: Mike Wu, Richard L. Davis, Benjamin W. Domingue, Chris Piech, Noah Goodman

Abstract: Item Response Theory (IRT) is a ubiquitous model for understanding humans based on their responses to questions, used in fields as diverse as education, medicine and psychology. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving test scoring and better informing public policy. Yet larger datasets pose a difficult speed / accuracy challenge to… ▽ More Item Response Theory (IRT) is a ubiquitous model for understanding humans based on their responses to questions, used in fields as diverse as education, medicine and psychology. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving test scoring and better informing public policy. Yet larger datasets pose a difficult speed / accuracy challenge to contemporary algorithms for fitting IRT models. We introduce a variational Bayesian inference algorithm for IRT, and show that it is fast and scaleable without sacrificing accuracy. Using this inference approach we then extend classic IRT with expressive Bayesian models of responses. Applying this method to five large-scale item response datasets from cognitive science and education yields higher log likelihoods and improvements in imputing missing data. The algorithm implementation is open-source, and easily usable. △ Less

Submitted 16 March, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

Comments: 10 pages of content

arXiv:1912.08370 [pdf, other]

Multidimensional molecular changes-environment interaction analysis for disease outcomes

Authors: Yaqing Xu, Mengyun Wu, Shuangge Ma

Abstract: For the outcomes and phenotypes of complex diseases, multiple types of molecular (genetic, genomic, epigenetic, etc.) changes, environmental risk factors, and their interactions have been found to have important contributions. In each of the existing studies, only the interactions between one type of molecular changes and environmental risk factors have been analyzed. In recent biomedical studies,… ▽ More For the outcomes and phenotypes of complex diseases, multiple types of molecular (genetic, genomic, epigenetic, etc.) changes, environmental risk factors, and their interactions have been found to have important contributions. In each of the existing studies, only the interactions between one type of molecular changes and environmental risk factors have been analyzed. In recent biomedical studies, multidimensional profiling, under which data on multiple types of molecular changes is collected on the same subjects, is becoming popular. A myriad of recent studies have shown that collectively analyzing multiple types of molecular changes is not only biologically sensible but also leads to improved estimation and prediction. In this study, we conduct M-E interaction analysis, with M standing for multidimensional molecular changes and E standing for environmental risk factors, which can accommodate multiple types of molecular measurements and sufficiently account for their overlap** information (attributable to regulations) as well as independent information. The proposed approach is based on the penalization technique, has a solid statistical ground, and can be effectively realized. Extensive simulation shows that it outperforms multiple closely relevant alternatives. In the analysis of TCGA (The Cancer Genome Atlas) data on lung adenocarcinoma and cutaneous melanoma, sensible findings with superior stability and prediction are made. △ Less

Submitted 17 December, 2019; originally announced December 2019.

arXiv:1912.05075 [pdf, other]

Multimodal Generative Models for Compositional Representation Learning

Authors: Mike Wu, Noah Goodman

Abstract: As deep neural networks become more adept at traditional tasks, many of the most exciting new challenges concern multimodality---observations that combine diverse types, such as image and text. In this paper, we introduce a family of multimodal deep generative models derived from variational bounds on the evidence (data marginal likelihood). As part of our derivation we find that many previous mul… ▽ More As deep neural networks become more adept at traditional tasks, many of the most exciting new challenges concern multimodality---observations that combine diverse types, such as image and text. In this paper, we introduce a family of multimodal deep generative models derived from variational bounds on the evidence (data marginal likelihood). As part of our derivation we find that many previous multimodal variational autoencoders used objectives that do not correctly bound the joint marginal likelihood across modalities. We further generalize our objective to work with several types of deep generative model (VAE, GAN, and flow-based), and allow use of different model types for different modalities. We benchmark our models across many image, label, and text datasets, and find that our multimodal VAEs excel with and without weak supervision. Additional improvements come from use of GAN image models with VAE language models. Finally, we investigate the effect of language on learned image representations through a variety of downstream tasks, such as compositionally, bounding box prediction, and visual relation prediction. We find evidence that these image representations are more abstract and compositional than equivalent representations learned from only visual data. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 24 pages content; 7 pages appendix

arXiv:1911.10558 [pdf, other]

Fast Polynomial Kernel Classification for Massive Data

Authors: **shan Zeng, Minrun Wu, Shao-Bo Lin, Ding-Xuan Zhou

Abstract: In the era of big data, it is desired to develop efficient machine learning algorithms to tackle massive data challenges such as storage bottleneck, algorithmic scalability, and interpretability. In this paper, we develop a novel efficient classification algorithm, called fast polynomial kernel classification (FPC), to conquer the scalability and storage challenges. Our main tools are a suitable s… ▽ More In the era of big data, it is desired to develop efficient machine learning algorithms to tackle massive data challenges such as storage bottleneck, algorithmic scalability, and interpretability. In this paper, we develop a novel efficient classification algorithm, called fast polynomial kernel classification (FPC), to conquer the scalability and storage challenges. Our main tools are a suitable selected feature map** based on polynomial kernels and an alternating direction method of multipliers (ADMM) algorithm for a related non-smooth convex optimization problem. Fast learning rates as well as feasibility verifications including the efficiency of an ADMM solver with convergence guarantees and the selection of center points are established to justify theoretical behaviors of FPC. Our theoretical assertions are verified by a series of simulations and real data applications. Numerical results demonstrate that FPC significantly reduces the computational burden and storage memory of existing learning schemes such as support vector machines, Nyström and random feature methods, without sacrificing their generalization abilities much. △ Less

Submitted 11 November, 2022; v1 submitted 24 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1402.4735 by other authors

arXiv:1911.09821 [pdf, other]

Learning Feature Interactions with Lorentzian Factorization Machine

Authors: Canran Xu, Ming Wu

Abstract: Learning representations for feature interactions to model user behaviors is critical for recommendation system and click-trough rate (CTR) predictions. Recent advances in this area are empowered by deep learning methods which could learn sophisticated feature interactions and achieve the state-of-the-art result in an end-to-end manner. These approaches require large number of training parameters… ▽ More Learning representations for feature interactions to model user behaviors is critical for recommendation system and click-trough rate (CTR) predictions. Recent advances in this area are empowered by deep learning methods which could learn sophisticated feature interactions and achieve the state-of-the-art result in an end-to-end manner. These approaches require large number of training parameters integrated with the low-level representations, and thus are memory and computational inefficient. In this paper, we propose a new model named "LorentzFM" that can learn feature interactions embedded in a hyperbolic space in which the violation of triangle inequality for Lorentz distances is available. To this end, the learned representation is benefited by the peculiar geometric properties of hyperbolic triangles, and result in a significant reduction in the number of parameters (20\% to 80\%) because all the top deep learning layers are not required. With such a lightweight architecture, LorentzFM achieves comparable and even materially better results than the deep learning methods such as DeepFM, xDeepFM and Deep \& Cross in both recommendation and CTR prediction tasks. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: 8 pages, 5 figures, accepted to AAAI-2020

arXiv:1909.12903 [pdf, ps, other]

PINE: Universal Deep Embedding for Graph Nodes via Partial Permutation Invariant Set Functions

Authors: Shupeng Gui, Xiangliang Zhang, Pan Zhong, Shuang Qiu, Mingrui Wu, Jie** Ye, Zhengdao Wang, Ji Liu

Abstract: Graph node embedding aims at learning a vector representation for all nodes given a graph. It is a central problem in many machine learning tasks (e.g., node classification, recommendation, community detection). The key problem in graph node embedding lies in how to define the dependence to neighbors. Existing approaches specify (either explicitly or implicitly) certain dependencies on neighbors,… ▽ More Graph node embedding aims at learning a vector representation for all nodes given a graph. It is a central problem in many machine learning tasks (e.g., node classification, recommendation, community detection). The key problem in graph node embedding lies in how to define the dependence to neighbors. Existing approaches specify (either explicitly or implicitly) certain dependencies on neighbors, which may lead to loss of subtle but important structural information within the graph and other dependencies among neighbors. This intrigues us to ask the question: can we design a model to give the maximal flexibility of dependencies to each node's neighborhood. In this paper, we propose a novel graph node embedding (named PINE) via a novel notion of partial permutation invariant set function, to capture any possible dependence. Our method 1) can learn an arbitrary form of the representation function from the neighborhood, withour losing any potential dependence structures, and 2) is applicable to both homogeneous and heterogeneous graph embedding, the latter of which is challenged by the diversity of node types. Furthermore, we provide theoretical guarantee for the representation capability of our method for general homogeneous and heterogeneous graphs. Empirical evaluation results on benchmark data sets show that our proposed PINE method outperforms the state-of-the-art approaches on producing node vectors for various learning tasks of both homogeneous and heterogeneous graphs. △ Less

Submitted 25 September, 2019; originally announced September 2019.

Comments: 24 pages, 4 figures, 3 tables. arXiv admin note: text overlap with arXiv:1805.11182

arXiv:1908.06951 [pdf, ps, other]

Gradient Boosting Machine: A Survey

Authors: Zhiyuan He, Danchen Lin, Thomas Lau, Mike Wu

Abstract: In this survey, we discuss several different types of gradient boosting algorithms and illustrate their mathematical frameworks in detail: 1. introduction of gradient boosting leads to 2. objective function optimization, 3. loss function estimations, and 4. model constructions. 5. application of boosting in ranking. In this survey, we discuss several different types of gradient boosting algorithms and illustrate their mathematical frameworks in detail: 1. introduction of gradient boosting leads to 2. objective function optimization, 3. loss function estimations, and 4. model constructions. 5. application of boosting in ranking. △ Less

Submitted 19 August, 2019; originally announced August 2019.

arXiv:1908.05611 [pdf, other]

GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model

Authors: Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu

Abstract: Recently, researchers utilize Knowledge Graph (KG) as side information in recommendation system to address cold start and sparsity issue and improve the recommendation performance. Existing KG-aware recommendation model use the feature of neighboring entities and structural information to update the embedding of currently located entity. Although the fruitful information is beneficial to the follo… ▽ More Recently, researchers utilize Knowledge Graph (KG) as side information in recommendation system to address cold start and sparsity issue and improve the recommendation performance. Existing KG-aware recommendation model use the feature of neighboring entities and structural information to update the embedding of currently located entity. Although the fruitful information is beneficial to the following task, the cost of exploring the entire graph is massive and impractical. In order to reduce the computational cost and maintain the pattern of extracting features, KG-aware recommendation model usually utilize fixed-size and random set of neighbors rather than complete information in KG. Nonetheless, there are two critical issues in these approaches: First of all, fixed-size and randomly selected neighbors restrict the view of graph. In addition, as the order of graph feature increases, the growth of parameter dimensionality of the model may lead the training process hard to converge. To solve the aforementioned limitations, we propose GraphSW, a strategy based on stage-wise training framework which would only access to a subset of the entities in KG in every stage. During the following stages, the learned embedding from previous stages is provided to the network in the next stage and the model can learn the information gradually from the KG. We apply stage-wise training on two SOTA recommendation models, RippleNet and Knowledge Graph Convolutional Networks (KGCN). Moreover, we evaluate the performance on six real world datasets, Last.FM 2011, Book-Crossing,movie, LFM-1b 2015, Amazon-book and Yelp 2018. The result of our experiments shows that proposed strategy can help both models to collect more information from the KG and improve the performance. Furthermore, it is observed that GraphSW can assist KGCN to converge effectively in high-order graph feature. △ Less

Submitted 19 August, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

arXiv:1908.05254 [pdf, other]

Optimizing for Interpretability in Deep Neural Networks with Tree Regularization

Authors: Mike Wu, Sonali Parbhoo, Michael C. Hughes, Volker Roth, Finale Doshi-Velez

Abstract: Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity -- for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interp… ▽ More Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity -- for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interpretability as a separate process after training. In this work, we take a different approach and explicitly regularize deep models so that they are well-approximated by processes that humans can step-through in little time. Specifically, we train several families of deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. The resulting axis-aligned decision functions uniquely make tree regularized models easy for humans to interpret. Moreover, for situations in which a single, global tree is a poor estimator, we introduce a regional tree regularizer that encourages the deep model to resemble a compact, axis-aligned decision tree in predefined, human-interpretable contexts. Using intuitive toy examples as well as medical tasks for patients in critical care and with HIV, we demonstrate that this new family of tree regularizers yield models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1908.04494, arXiv:1711.06178

arXiv:1908.04494 [pdf, other]

Regional Tree Regularization for Interpretability in Black Box Models

Authors: Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez

Abstract: The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. However, it may be unreasonable to expect that a single tree can predict well across all possible inputs. In this work, we propose regio… ▽ More The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. However, it may be unreasonable to expect that a single tree can predict well across all possible inputs. In this work, we propose regional tree regularization, which encourages a deep model to be well-approximated by several separate decision trees specific to predefined regions of the input space. Practitioners can define regions based on domain knowledge of contexts where different decision-making logic is needed. Across many datasets, our approach delivers more accurate predictions than simply training separate decision trees for each region, while producing simpler explanations than other neural net regularization schemes without sacrificing predictive power. Two healthcare case studies in critical care and HIV demonstrate how experts can improve understanding of deep models via our approach. △ Less

Submitted 16 March, 2020; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: AAAI 2020 (Oral)

arXiv:1908.00654 [pdf]

Teasing out the overall survival benefit with adjustment for treatment switching to other therapies

Authors: Yuqing Xu, Mei**g Wu, Weili He, Qiming Liao, Yabing Mai

Abstract: In oncology clinical trials, characterizing the long-term overall survival (OS) benefit for an experimental drug or treatment regimen (experimental group) is often unobservable if some patients in the control group switch to drugs in the experimental group and/or other cancer treatments after disease progression. A key question often raised by payers and reimbursement agencies is how to estimate t… ▽ More In oncology clinical trials, characterizing the long-term overall survival (OS) benefit for an experimental drug or treatment regimen (experimental group) is often unobservable if some patients in the control group switch to drugs in the experimental group and/or other cancer treatments after disease progression. A key question often raised by payers and reimbursement agencies is how to estimate the true benefit of the experimental drug group on overall survival that would have been estimated if there were no treatment switches. Several commonly used statistical methods are available to estimate overall survival benefit while adjusting for treatment switching, ranging from naive exclusion or censoring approaches to more advanced methods including inverse probability of censoring weighting (IPCW), iterative parameter estimation (IPE) algorithm or rank-preserving structural failure time models (RPSFTM). However, many clinical trials now have patients switching to different treatment regimens other than the test drugs, and the existing methods cannot handle more complicated scenarios. To address this challenge, we propose two additional methods: stratified RPSFTM and random-forest-based prediction. A simulation study is conducted to assess the properties of the existing methods along with the two newly proposed approaches. △ Less

Submitted 1 August, 2019; originally announced August 2019.

arXiv:1905.09916 [pdf, other]

Generative Grading: Near Human-level Accuracy for Automated Feedback on Richly Structured Problems

Authors: Ali Malik, Mike Wu, Vrinda Vasavada, **peng Song, Madison Coots, John Mitchell, Noah Goodman, Chris Piech

Abstract: Access to high-quality education at scale is limited by the difficulty of providing student feedback on open-ended assignments in structured domains like computer programming, graphics, and short response questions. This problem has proven to be exceptionally difficult: for humans, it requires large amounts of manual work, and for computers, until recently, achieving anything near human-level accu… ▽ More Access to high-quality education at scale is limited by the difficulty of providing student feedback on open-ended assignments in structured domains like computer programming, graphics, and short response questions. This problem has proven to be exceptionally difficult: for humans, it requires large amounts of manual work, and for computers, until recently, achieving anything near human-level accuracy has been unattainable. In this paper, we present generative grading: a novel computational approach for providing feedback at scale that is capable of accurately grading student work and providing nuanced, interpretable feedback. Our approach uses generative descriptions of student cognition, written as probabilistic programs, to synthesise millions of labelled example solutions to a problem; we then learn to infer feedback for real student solutions based on this cognitive model. We apply our methods to three settings. In block-based coding, we achieve a 50% improvement upon the previous best results for feedback, achieving super-human accuracy. In two other widely different domains -- graphical tasks and short text answers -- we achieve major improvement over the previous state of the art by about 4x and 1.5x respectively, approaching human accuracy. In a real classroom, we ran an experiment where we used our system to augment human graders, yielding doubled grading accuracy while halving grading time. △ Less

Submitted 23 March, 2021; v1 submitted 23 May, 2019; originally announced May 2019.

Comments: 10 pages of content

arXiv:1904.08030 [pdf, other]

Multi-Interest Network with Dynamic Routing for Recommendation at Tmall

Authors: Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Pipei Huang, Huan Zhao, Guoliang Kang, Qiwei Chen, Wei Li, Dik Lun Lee

Abstract: Industrial recommender systems usually consist of the matching stage and the ranking stage, in order to handle the billion-scale of users and items. The matching stage retrieves candidate items relevant to user interests, while the ranking stage sorts candidate items by user interests. Thus, the most critical ability is to model and represent user interests for either stage. Most of the existing d… ▽ More Industrial recommender systems usually consist of the matching stage and the ranking stage, in order to handle the billion-scale of users and items. The matching stage retrieves candidate items relevant to user interests, while the ranking stage sorts candidate items by user interests. Thus, the most critical ability is to model and represent user interests for either stage. Most of the existing deep learning-based models represent one user as a single vector which is insufficient to capture the varying nature of user's interests. In this paper, we approach this problem from a different view, to represent one user with multiple vectors encoding the different aspects of the user's interests. We propose the Multi-Interest Network with Dynamic routing (MIND) for dealing with user's diverse interests in the matching stage. Specifically, we design a multi-interest extractor layer based on capsule routing mechanism, which is applicable for clustering historical behaviors and extracting diverse interests. Furthermore, we develop a technique named label-aware attention to help learn a user representation with multiple vectors. Through extensive experiments on several public benchmarks and one large-scale industrial dataset from Tmall, we demonstrate that MIND can achieve superior performance than state-of-the-art methods for recommendation. Currently, MIND has been deployed for handling major online traffic at the homepage on Mobile Tmall App. △ Less

Submitted 16 April, 2019; originally announced April 2019.

Showing 1–50 of 75 results for author: Wu, M