Skip to main content

Showing 1–50 of 126 results for author: Carin, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.17248  [pdf, other

    stat.ML cs.LG

    Transformer In-Context Learning for Categorical Data

    Authors: Aaron T. Wang, Ricardo Henao, Lawrence Carin

    Abstract: Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is draw… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2312.01167  [pdf, other

    cs.CV cs.LG stat.ML

    Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning

    Authors: Vinay K Verma, Nikhil Mehta, Kevin J Liang, Aakansha Mishra, Lawrence Carin

    Abstract: Zero-shot learning (ZSL) is a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed state of the art, but these generative models can be slow or computationally expensive to train. Also, these generative models as… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024. arXiv admin note: substantial text overlap with arXiv:2102.11856

  3. arXiv:2202.12932  [pdf, other

    stat.ML cs.LG

    Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

    Authors: Paidamoyo Chapfuwa, Sherri Rose, Lawrence Carin, Edward Meeds, Ricardo Henao

    Abstract: End-to-end learning of dynamical systems with black-box models, such as neural ordinary differential equations (ODEs), provides a flexible framework for learning dynamics from data without prescribing a mathematical model for the dynamics. Unfortunately, this flexibility comes at the cost of understanding the dynamical system, for which ODEs are used ubiquitously. Further, experimental data are co… ▽ More

    Submitted 16 June, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: Accepted for the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022). Github code can be found at https://github.com/paidamoyo/structured_latent_ODEs

  4. arXiv:2111.02947  [pdf, other

    cs.LG stat.ML

    Variational Inference with Holder Bounds

    Authors: Junya Chen, Danni Lu, Zidi Xiu, Ke Bai, Lawrence Carin, Chenyang Tao

    Abstract: The recent introduction of thermodynamic integration techniques has provided a new framework for understanding and improving variational inference (VI). In this work, we present a careful analysis of the thermodynamic variational objective (TVO), bridging the gap between existing variational objectives and shedding new insights to advance the field. In particular, we elucidate how the TVO naturall… ▽ More

    Submitted 13 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

  5. arXiv:2107.04661  [pdf, other

    cs.LG cs.AI stat.ML

    Hölder Bounds for Sensitivity Analysis in Causal Reasoning

    Authors: Serge Assaad, Shuxi Zeng, Henry Pfister, Fan Li, Lawrence Carin

    Abstract: We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using Hölder's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent o… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Workshop on the Neglected Assumptions in Causal Inference at the International Conference on Machine Learning (ICML), 2021

  6. arXiv:2107.01152  [pdf, other

    stat.ML cs.AI cs.CV cs.IT cs.LG

    Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

    Authors: Junya Chen, Zhe Gan, Xuan Li, Qing Guo, Liqun Chen, Shuyang Gao, Tagyoung Chung, Yi Xu, Belinda Zeng, Wenlian Lu, Fan Li, Lawrence Carin, Chenyang Tao

    Abstract: InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size reg… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  7. arXiv:2107.01131  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

    Authors: Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, **g Huang, Chenyang Tao

    Abstract: Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variationa… ▽ More

    Submitted 24 October, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

  8. arXiv:2104.13417  [pdf, other

    cs.CV cs.LG stat.ML

    Towards Fair Federated Learning with Zero-Shot Data Augmentation

    Authors: Weituo Hao, Mostafa El-Khamy, Jungwon Lee, Jianyi Zhang, Kevin J Liang, Changyou Chen, Lawrence Carin

    Abstract: Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. Although it is recognized that statistical heterogeneity of the client local data yields slower global model convergence, it is less commonly recognized that it also yields a biased federated global model w… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted by IEEE CVPR Workshop on Fair, Data Efficient And Trusted Computer Vision

  9. arXiv:2103.04032  [pdf, other

    cs.LG cs.CV stat.ML

    CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks

    Authors: Sakshi Varshney, Vinay Kumar Verma, Srijith P K, Lawrence Carin, Piyush Rai

    Abstract: We present a continual learning approach for generative adversarial networks (GANs), by designing and leveraging parameter-efficient feature map transformations. Our approach is based on learning a set of global and task-specific parameters. The global parameters are fixed across tasks whereas the task-specific parameters act as local adapters for each task, and help in efficiently obtaining task-… ▽ More

    Submitted 30 July, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

    Comments: Under Submission

  10. arXiv:2102.11856  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning

    Authors: Vinay Kumar Verma, Kevin Liang, Nikhil Mehta, Lawrence Carin

    Abstract: Zero-shot learning (ZSL) has been shown to be a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges still remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed the state of the art of ZSL, but these generative models can be slow or computationally expensive to trai… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Under Review

  11. arXiv:2012.05644  [pdf, other

    cs.LG cs.SI stat.ML

    Learning Graphons via Structured Gromov-Wasserstein Barycenters

    Authors: Hongteng Xu, Dixin Luo, Lawrence Carin, Hongyuan Zha

    Abstract: We propose a novel and principled method to learn a nonparametric graph model called graphon, which is defined in an infinite-dimensional space and represents arbitrary-size graphs. Based on the weak regularity lemma from the theory of graphons, we leverage a step function to approximate a graphon. We show that the cut distance of graphons can be relaxed to the Gromov-Wasserstein distance of their… ▽ More

    Submitted 17 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

    Journal ref: AAAI 2021

  12. arXiv:2011.00593  [pdf, other

    cs.CL stat.ML

    MixKD: Towards Efficient Distillation of Large-scale Language Models

    Authors: Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

    Abstract: Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (both memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such… ▽ More

    Submitted 17 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: ICLR 2021 Camera Ready

  13. arXiv:2010.12618  [pdf, other

    stat.ML cs.LG

    Counterfactual Representation Learning with Balancing Weights

    Authors: Serge Assaad, Shuxi Zeng, Chenyang Tao, Shounak Datta, Nikhil Mehta, Ricardo Henao, Fan Li, Lawrence Carin

    Abstract: A key to causal inference with observational data is achieving balance in predictive features associated with each treatment type. Recent literature has explored representation learning to achieve this goal. In this work, we discuss the pitfalls of these strategies - such as a steep trade-off between achieving balance and predictive power - and present a remedy via the integration of balancing wei… ▽ More

    Submitted 23 February, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

  14. arXiv:2010.07866  [pdf, other

    stat.ML cs.LG

    Double Robust Representation Learning for Counterfactual Prediction

    Authors: Shuxi Zeng, Serge Assaad, Chenyang Tao, Shounak Datta, Lawrence Carin, Fan Li

    Abstract: Causal inference, or counterfactual prediction, is central to decision making in healthcare, policy and social sciences. To de-bias causal estimators with high-dimensional data in observational studies, recent advances suggest the importance of combining machine learning models for both the propensity score and the outcome function. We propose a novel scalable method to learn double-robust represe… ▽ More

    Submitted 16 October, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: 18 pages, 5 figures, 2 Tables

  15. arXiv:2008.05687  [pdf, other

    cs.LG stat.ML

    WAFFLe: Weight Anonymized Factorization for Federated Learning

    Authors: Weituo Hao, Nikhil Mehta, Kevin J Liang, Pengyu Cheng, Mostafa El-Khamy, Lawrence Carin

    Abstract: In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore,… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  16. arXiv:2007.06178  [pdf, other

    cs.LG cs.CV stat.ML

    Bridging Maximum Likelihood and Adversarial Learning via $α$-Divergence

    Authors: Miaoyun Zhao, Yulai Cong, Shuyang Dai, Lawrence Carin

    Abstract: Maximum likelihood (ML) and adversarial learning are two popular approaches for training generative models, and from many perspectives these techniques are complementary. ML learning encourages the capture of all data modes, and it is typically characterized by stable training. However, ML learning tends to distribute probability mass diffusely over the data space, $e.g.$, yielding blurry syntheti… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: AAAI 2020

  17. arXiv:2006.12013  [pdf, other

    cs.LG stat.ML

    CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

    Authors: Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin

    Abstract: Mutual information (MI) minimization has gained considerable interests in various machine learning tasks. However, estimating and minimizing MI in high-dimensional spaces remains a challenging problem, especially when only samples, rather than distribution forms, are accessible. Previous works mainly focus on MI lower bound approximation, which is not applicable to MI minimization problems. In thi… ▽ More

    Submitted 23 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted by the 37th International Conference on Machine Learing (ICML2020)

  18. arXiv:2006.08873  [pdf, other

    stat.ML cs.LG

    GO Hessian for Expectation-Based Objectives

    Authors: Yulai Cong, Miaoyun Zhao, Jianqiao Li, Junya Chen, Lawrence Carin

    Abstract: An unbiased low-variance gradient estimator, termed GO gradient, was proposed recently for expectation-based objectives $\mathbb{E}_{q_{\boldsymbolγ}(\boldsymbol{y})} [f(\boldsymbol{y})]$, where the random variable (RV) $\boldsymbol{y}$ may be drawn from a stochastic computation graph with continuous (non-reparameterizable) internal nodes and continuous/discrete leaves. Upgrading the GO gradient,… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

  19. Enabling Counterfactual Survival Analysis with Balanced Representations

    Authors: Paidamoyo Chapfuwa, Serge Assaad, Shuxi Zeng, Michael J. Pencina, Lawrence Carin, Ricardo Henao

    Abstract: Balanced representation learning methods have been applied successfully to counterfactual inference from observational data. However, approaches that account for survival outcomes are relatively limited. Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials, and such data are also relevant in fields like manufactur… ▽ More

    Submitted 3 March, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: Accepted at ACM Conference on Health, Inference, and Learning (ACM CHIL 2021). Code at https://github.com/paidamoyo/counterfactual_survival_analysis

  20. arXiv:2006.07487  [pdf, other

    stat.ML cs.LG

    Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization

    Authors: Shi**g Si, Chris. J. Oates, Andrew B. Duncan, Lawrence Carin, François-Xavier Briol

    Abstract: Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that… ▽ More

    Submitted 21 July, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Accepted by MCQMC2020

    MSC Class: G.3

  21. arXiv:2006.03160  [pdf, other

    cs.LG stat.ML

    Hierarchical Optimal Transport for Robust Multi-View Learning

    Authors: Dixin Luo, Hongteng Xu, Lawrence Carin

    Abstract: Traditional multi-view learning methods often rely on two assumptions: ($i$) the samples in different views are well-aligned, and ($ii$) their representations in latent space obey the same distribution. Unfortunately, these two assumptions may be questionable in practice, which limits the application of multi-view learning. In this work, we propose a hierarchical optimal transport (HOT) method to… ▽ More

    Submitted 8 June, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

  22. arXiv:2006.03089  [pdf, other

    cs.LG stat.ML

    Towards Understanding Fast Adversarial Training

    Authors: Bai Li, Shiqi Wang, Suman Jana, Lawrence Carin

    Abstract: Current neural-network-based classifiers are susceptible to adversarial examples. The most empirically successful approach to defending against such adversarial examples is adversarial training, which incorporates a strong self-attack during training to enhance its robustness. This approach, however, is computationally expensive and hence is hard to scale up. A recent work, called fast adversarial… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

  23. arXiv:2006.00693  [pdf, other

    cs.LG stat.ML

    Improving Disentangled Text Representation Learning with Information-Theoretic Guidance

    Authors: Pengyu Cheng, Martin Renqiang Min, Dinghan Shen, Christopher Malon, Yizhe Zhang, Yitong Li, Lawrence Carin

    Abstract: Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g.,… ▽ More

    Submitted 12 January, 2022; v1 submitted 31 May, 2020; originally announced June 2020.

    Comments: Accepted by the 58th Annual Meeting of the Association for Computational Linguistics (ACL2020)

  24. arXiv:2005.00054  [pdf, other

    cs.LG stat.ML

    APo-VAE: Text Generation in Hyperbolic Space

    Authors: Shuyang Dai, Zhe Gan, Yu Cheng, Chenyang Tao, Lawrence Carin, **g**g Liu

    Abstract: Natural language often exhibits inherent hierarchical structure ingrained with complex syntax and semantics. However, most state-of-the-art deep generative models learn embeddings only in Euclidean vector space, without accounting for this structural property of language. In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations. An… ▽ More

    Submitted 14 July, 2021; v1 submitted 30 April, 2020; originally announced May 2020.

  25. arXiv:2004.14861  [pdf, other

    cs.CR cs.LG stat.ML

    Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability

    Authors: Nathan Inkawhich, Kevin J Liang, Binghui Wang, Matthew Inkawhich, Lawrence Carin, Yiran Chen

    Abstract: We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for mult… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  26. arXiv:2004.12519  [pdf, other

    cs.LG stat.ML

    Transferable Perturbations of Deep Feature Distributions

    Authors: Nathan Inkawhich, Kevin J Liang, Lawrence Carin, Yiran Chen

    Abstract: Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: Published as a conference paper at ICLR 2020

  27. arXiv:2004.10098  [pdf, other

    cs.LG stat.ML

    Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors

    Authors: Nikhil Mehta, Kevin J Liang, Vinay K Verma, Lawrence Carin

    Abstract: Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity,… ▽ More

    Submitted 27 April, 2021; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021 Post-conference updates: Fixed typo in equation (11) and updated references

  28. arXiv:2003.05733  [pdf, other

    cs.LG stat.ML

    Towards Practical Lottery Ticket Hypothesis for Adversarial Training

    Authors: Bai Li, Shiqi Wang, Yunhan Jia, Yantao Lu, Zhenyu Zhong, Lawrence Carin, Suman Jana

    Abstract: Recent research has proposed the lottery ticket hypothesis, suggesting that for a deep neural network, there exist trainable sub-networks performing equally or better than the original model with commensurate training steps. While this discovery is insightful, finding proper sub-networks requires iterative training and pruning. The high cost incurred limits the applications of the lottery ticket h… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

  29. arXiv:2003.00355  [pdf, other

    stat.ML cs.LG

    Survival Cluster Analysis

    Authors: Paidamoyo Chapfuwa, Chunyuan Li, Nikhil Mehta, Lawrence Carin, Ricardo Henao

    Abstract: Conventional survival analysis approaches estimate risk scores or individualized time-to-event distributions conditioned on covariates. In practice, there is often great population-level phenotypic heterogeneity, resulting from (unknown) subpopulations with diverse risk profiles or survival distributions. As a result, there is an unmet need in survival analysis for identifying subpopulations with… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted at ACM CHIL 2020. Code: this https URL, https://github.com/paidamoyo/survival_cluster_analysis

  30. arXiv:2002.02913  [pdf, other

    cs.LG stat.ML

    Learning Autoencoders with Relational Regularization

    Authors: Hongteng Xu, Dixin Luo, Ricardo Henao, Svati Shah, Lawrence Carin

    Abstract: A new algorithmic framework is proposed for learning autoencoders of data distributions. We minimize the discrepancy between the model and target distributions, with a \emph{relational regularization} on the learnable latent prior. This regularization penalizes the fused Gromov-Wasserstein (FGW) distance between the latent prior and its corresponding posterior, allowing one to flexibly learn a str… ▽ More

    Submitted 25 June, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

    Journal ref: International conference on machine learning 2020

  31. arXiv:1911.08709  [pdf, other

    cs.LG stat.ML

    Graph-Driven Generative Models for Heterogeneous Multi-Task Learning

    Authors: Wenlin Wang, Hongteng Xu, Zhe Gan, Bai Li, Guoyin Wang, Liqun Chen, Qian Yang, Wenqi Wang, Lawrence Carin

    Abstract: We propose a novel graph-driven generative model, that unifies multiple heterogeneous learning tasks into the same framework. The proposed model is based on the fact that heterogeneous learning tasks, which correspond to different generative processes, often rely on data with a shared graph structure. Accordingly, our model combines a graph convolutional network (GCN) with multiple variational aut… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI-2020

  32. arXiv:1911.06156  [pdf, other

    cs.CL cs.LG stat.ML

    Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

    Authors: Dhanasekar Sundararaman, Vivek Subramanian, Guoyin Wang, Shi**g Si, Dinghan Shen, Dong Wang, Lawrence Carin

    Abstract: Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder based on their relationships to all tokens in a sequence. Recent studies have shown that although such models are capable of learning syntactic features purely b… ▽ More

    Submitted 9 November, 2019; originally announced November 2019.

  33. arXiv:1910.12735  [pdf, other

    cs.IR cs.LG stat.ML

    Learning to Recommend from Sparse Data via Generative User Feedback

    Authors: Wenlin Wang, Hongteng Xu, Ruiyi Zhang, Wenqi Wang, Piyush Rai, Lawrence Carin

    Abstract: Traditional collaborative filtering (CF) based recommender systems tend to perform poorly when the user-item interactions/ratings are highly scarce. To address this, we propose a learning framework that improves collaborative filtering with a synthetic feedback loop (CF-SFL) to simulate the user feedback. The proposed framework consists of a "recommender" and a "virtual user". The "recommender" is… ▽ More

    Submitted 16 December, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

    Comments: To appear in AAAI-2021

  34. arXiv:1910.09057  [pdf, other

    cs.LG cs.CV stat.ML

    Zero-Shot Recognition via Optimal Transport

    Authors: Wenlin Wang, Hongteng Xu, Guoyin Wang, Wenqi Wang, Lawrence Carin

    Abstract: We propose an optimal transport (OT) framework for generalized zero-shot learning (GZSL), seeking to distinguish samples for both seen and unseen classes, with the assist of auxiliary attributes. The discrepancy between features and attributes is minimized by solving an optimal transport problem. {Specifically, we build a conditional generative model to generate features from seen-class attributes… ▽ More

    Submitted 26 December, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

    Comments: To appear in WACV 2021

  35. arXiv:1910.04233  [pdf, other

    stat.ML cs.LG cs.NE

    Kernel-Based Approaches for Sequence Modeling: Connections to Neural Methods

    Authors: Kevin J Liang, Guoyin Wang, Yitong Li, Ricardo Henao, Lawrence Carin

    Abstract: We investigate time-dependent data analysis from the perspective of recurrent kernel machines, from which models with hidden units and gated memory cells arise naturally. By considering dynamic gating of the memory cell, a model closely related to the long short-term memory (LSTM) recurrent neural network is derived. Extending this setup to $n$-gram filters, the convolutional neural network (CNN),… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  36. arXiv:1910.02187  [pdf, other

    cs.LG cs.CL cs.SI stat.ML

    Dynamic Embedding on Textual Networks via a Gaussian Process

    Authors: Pengyu Cheng, Yitong Li, Xinyuan Zhang, Liqun Cheng, David Carlson, Lawrence Carin

    Abstract: Textual network embedding aims to learn low-dimensional representations of text-annotated nodes in a graph. Prior work in this area has typically focused on fixed graph structures; however, real-world networks are often dynamic. We address this challenge with a novel end-to-end node-embedding model, called Dynamic Embedding for Textual Networks with a Gaussian Process (DetGP). After training, DetG… ▽ More

    Submitted 27 November, 2019; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: Accepted for presentation at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

  37. arXiv:1910.02176  [pdf, other

    cs.LG stat.ML

    Straight-Through Estimator as Projected Wasserstein Gradient Flow

    Authors: Pengyu Cheng, Chang Liu, Chunyuan Li, Dinghan Shen, Ricardo Henao, Lawrence Carin

    Abstract: The Straight-Through (ST) estimator is a widely used technique for back-propagating gradients through discrete random variables. However, this effective method lacks theoretical justification. In this paper, we show that ST can be interpreted as the simulation of the projected Wasserstein gradient flow (pWGF). Based on this understanding, a theoretical foundation is established to justify the conv… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: Accepted as NeurIPS 2018 Bayesian Deep Learning Workshop

  38. arXiv:1910.02096  [pdf, other

    cs.LG stat.ML

    Fused Gromov-Wasserstein Alignment for Hawkes Processes

    Authors: Dixin Luo, Hongteng Xu, Lawrence Carin

    Abstract: We propose a novel fused Gromov-Wasserstein alignment method to jointly learn the Hawkes processes in different event spaces, and align their event types. Given two Hawkes processes, we use fused Gromov-Wasserstein discrepancy to measure their dissimilarity, which considers both the Wasserstein discrepancy based on their base intensities and the Gromov-Wasserstein discrepancy based on their infect… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: The workshop on learning with temporal point processes in NeurIPS 2019 (WTPP19)

  39. arXiv:1909.13456  [pdf, other

    cs.LG cs.CL stat.ML

    Improving Textual Network Learning with Variational Homophilic Embeddings

    Authors: Wenlin Wang, Chenyang Tao, Zhe Gan, Guoyin Wang, Liqun Chen, Xinyuan Zhang, Ruiyi Zhang, Qian Yang, Ricardo Henao, Lawrence Carin

    Abstract: The performance of many network learning applications crucially hinges on the success of network embedding algorithms, which aim to encode rich network information into low-dimensional vertex-based vector representations. This paper considers a novel variational formulation of network embeddings, with special focus on textual networks. Different from most existing methods that optimize a discrimin… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted to NeurIPS 2019

  40. arXiv:1909.06695  [pdf, other

    cs.CL cs.LG stat.ML

    Ouroboros: On Accelerating Training of Transformer-Based Language Models

    Authors: Qian Yang, Zhouyuan Huo, Wenlin Wang, Heng Huang, Lawrence Carin

    Abstract: Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing dev… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

    Comments: To appear in the proceedings of Neural Information Processing Systems Conference (2019)

  41. arXiv:1909.05288  [pdf, other

    cs.LG stat.ML

    Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation

    Authors: Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, **g**g Liu, Lawrence Carin

    Abstract: Recent unsupervised approaches to domain adaptation primarily focus on minimizing the gap between the source and the target domains through refining the feature generator, in order to learn a better alignment between the two domains. This minimization can be achieved via a domain classifier to detect target-domain features that are divergent from source-domain features. However, by optimizing via… ▽ More

    Submitted 6 October, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

  42. arXiv:1906.08397  [pdf, other

    stat.ML cs.LG

    Adversarial Self-Paced Learning for Mixture Models of Hawkes Processes

    Authors: Dixin Luo, Hongteng Xu, Lawrence Carin

    Abstract: We propose a novel adversarial learning strategy for mixture models of Hawkes processes, leveraging data augmentation techniques of Hawkes process in the framework of self-paced learning. Instead of learning a mixture model directly from a set of event sequences drawn from different Hawkes processes, the proposed method learns the target model iteratively, which generates "easy" sequences and uses… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

  43. arXiv:1906.05492  [pdf, other

    stat.AP cs.LG

    Interpretable ICD Code Embeddings with Self- and Mutual-Attention Mechanisms

    Authors: Dixin Luo, Hongteng Xu, Lawrence Carin

    Abstract: We propose a novel and interpretable embedding method to represent the international statistical classification codes of diseases and related health problems (i.e., ICD codes). This method considers a self-attention mechanism within the disease domain and a mutual-attention mechanism jointly between diseases and procedures. This framework captures the clinical relationships between the disease cod… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  44. arXiv:1906.04281  [pdf, other

    cs.LG cs.IR stat.ML

    Towards Amortized Ranking-Critical Training for Collaborative Filtering

    Authors: Sam Lobel, Chunyuan Li, Jianfeng Gao, Lawrence Carin

    Abstract: Collaborative filtering is widely used in modern recommender systems. Recent research shows that variational autoencoders (VAEs) yield state-of-the-art performance by integrating flexible representations from deep neural networks into latent variable models, mitigating limitations of traditional linear factor models. VAEs are typically trained by maximizing the likelihood (MLE) of users interactin… ▽ More

    Submitted 10 February, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: The first two authors contributed equally to this manuscript. Code: https://github.com/samlobel/RaCT_CF

  45. arXiv:1906.02181  [pdf, other

    stat.ML cs.CL cs.LG

    Syntax-Infused Variational Autoencoder for Text Generation

    Authors: Xinyuan Zhang, Yi Yang, Siyang Yuan, Dinghan Shen, Lawrence Carin

    Abstract: We present a syntax-infused variational autoencoder (SIVAE), that integrates sentences with their syntactic trees to improve the grammar of generated sentences. Distinct from existing VAE-based text generative models, SIVAE contains two separate latent spaces, for sentences and syntactic trees. The evidence lower bound objective is redesigned correspondingly, by optimizing a joint distribution tha… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted by ACL 2019

  46. Survival Function Matching for Calibrated Time-to-Event Predictions

    Authors: Paidamoyo Chapfuwa, Chenyang Tao, Lawrence Carin, Ricardo Henao

    Abstract: Models for predicting the time of a future event are crucial for risk assessment, across a diverse range of applications. Existing time-to-event (survival) models have focused primarily on preserving pairwise ordering of estimated event times, or relative risk. Model calibration is relatively under explored, despite its critical importance in time-to-event applications. We present a survival funct… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

  47. arXiv:1905.07645  [pdf, other

    cs.LG cs.SI stat.ML

    Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching

    Authors: Hongteng Xu, Dixin Luo, Lawrence Carin

    Abstract: We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric on graphs. Given two graphs, the optimal transport associated with their Gromov-Wasserstein discrepancy provides the correspondence between their nod… ▽ More

    Submitted 9 October, 2019; v1 submitted 18 May, 2019; originally announced May 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  48. arXiv:1905.06455  [pdf, other

    cs.LG cs.CR stat.ML

    On Norm-Agnostic Robustness of Adversarial Training

    Authors: Bai Li, Changyou Chen, Wenlin Wang, Lawrence Carin

    Abstract: Adversarial examples are carefully perturbed in-puts for fooling machine learning models. A well-acknowledged defense method against such examples is adversarial training, where adversarial examples are injected into training data to increase robustness. In this paper, we propose a new attack to unveil an undesired property of the state-of-the-art adversarial training, that is it fails to obtain r… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: 4 pages, 2 figures, presented at the ICML 2019 Workshop on Uncertainty and Robustness in Deep Learning. arXiv admin note: text overlap with arXiv:1809.03113

  49. arXiv:1905.05738  [pdf, other

    cs.LG cs.SI stat.ML

    Stochastic Blockmodels meet Graph Neural Networks

    Authors: Nikhil Mehta, Lawrence Carin, Piyush Rai

    Abstract: Stochastic blockmodels (SBM) and their variants, $e.g.$, mixed-membership and overlap** stochastic blockmodels, are latent variable based generative models for graphs. They have proven to be successful for various tasks, such as discovering the community structure and link prediction on graph-structured data. Recently, graph neural networks, $e.g.$, graph convolutional networks, have also emerge… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

  50. arXiv:1903.10145  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

    Authors: Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin

    Abstract: Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter β. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for β, and show that KL… ▽ More

    Submitted 10 June, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

    Comments: Published in NAACL 2019; The first two authors contribute equally; Code: https://github.com/haofuml/cyclical_annealing