Skip to main content

Showing 1–50 of 202 results for author: Carin, L

.
  1. arXiv:2405.17248  [pdf, other

    stat.ML cs.LG

    Transformer In-Context Learning for Categorical Data

    Authors: Aaron T. Wang, Ricardo Henao, Lawrence Carin

    Abstract: Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is draw… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2312.01167  [pdf, other

    cs.CV cs.LG stat.ML

    Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning

    Authors: Vinay K Verma, Nikhil Mehta, Kevin J Liang, Aakansha Mishra, Lawrence Carin

    Abstract: Zero-shot learning (ZSL) is a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed state of the art, but these generative models can be slow or computationally expensive to train. Also, these generative models as… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024. arXiv admin note: substantial text overlap with arXiv:2102.11856

  3. arXiv:2303.05581  [pdf, other

    cs.CL cs.AI

    Open World Classification with Adaptive Negative Samples

    Authors: Ke Bai, Guoyin Wang, Jiwei Li, Sunghyun Park, Sung** Lee, Puyang Xu, Ricardo Henao, Lawrence Carin

    Abstract: Open world classification is a task in natural language processing with key practical relevance and impact. Since the open or {\em unknown} category data only manifests in the inference phase, finding a model with a suitable decision boundary accommodating for the identification of known classes and discrimination of the open category is challenging. The performance of existing models is limited b… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: Accepted by EMNLP 2021 (Main Track, Long Paper)

  4. arXiv:2210.12818  [pdf, other

    cs.CV

    Pushing the Efficiency Limit Using Structured Sparse Convolutions

    Authors: Vinay Kumar Verma, Nikhil Mehta, Shi**g Si, Ricardo Henao, Lawrence Carin

    Abstract: Weight pruning is among the most popular approaches for compressing deep convolutional neural networks. Recent work suggests that in a randomly initialized deep neural network, there exist sparse subnetworks that achieve performance comparable to the original network. Unfortunately, finding these subnetworks involves iterative stages of training and pruning, which can be computationally expensive.… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted at the IEEE Winter Conference on Applications of Computer Vision, WACV 2023

  5. arXiv:2210.09132  [pdf, other

    cs.CL

    Pseudo-OOD training for robust language models

    Authors: Dhanasekar Sundararaman, Nikhil Mehta, Lawrence Carin

    Abstract: While pre-trained large-scale deep models have garnered attention as an important topic for many downstream natural language processing (NLP) tasks, such models often make unreliable predictions on out-of-distribution (OOD) inputs. As such, OOD detection is a key component of a reliable machine-learning model for any industry-scale application. Common approaches often assume access to additional O… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Work in progress

  6. arXiv:2209.09923  [pdf, other

    cs.LG cs.IR

    Collaborative Anomaly Detection

    Authors: Ke Bai, Aonan Zhang, Zhizhong Li, Ricardo Heano, Chong Wang, Lawrence Carin

    Abstract: In recommendation systems, items are likely to be exposed to various users and we would like to learn about the familiarity of a new user with an existing item. This can be formulated as an anomaly detection (AD) problem distinguishing between "common users" (nominal) and "fresh users" (anomalous). Considering the sheer volume of items and the sparsity of user-item paired data, independently apply… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  7. arXiv:2205.03559  [pdf, other

    cs.CL cs.LG

    Improving Downstream Task Performance by Treating Numbers as Entities

    Authors: Dhanasekar Sundararaman, Vivek Subramanian, Guoyin Wang, Liyan Xu, Lawrence Carin

    Abstract: Numbers are essential components of text, like any other word tokens, from which natural language processing (NLP) models are built and deployed. Though numbers are typically not accounted for distinctly in most NLP tasks, there is still an underlying amount of numeracy already exhibited by NLP models. In this work, we attempt to tap this potential of state-of-the-art NLP models and transfer their… ▽ More

    Submitted 18 September, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: Accepted to CIKM 2022

  8. arXiv:2203.09424  [pdf, other

    cs.CL

    elBERto: Self-supervised Commonsense Learning for Question Answering

    Authors: Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, Lawrence Carin

    Abstract: Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context. Typically, existing approaches first retrieve external evidence and then perform commonsense reasoning using these evidence. In this paper, we propose a Self-supervised Bidirectional Encoder Representation Learning of Commonsense (elBERto) framework, which is compatible with off-… ▽ More

    Submitted 4 October, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

  9. arXiv:2202.12932  [pdf, other

    stat.ML cs.LG

    Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

    Authors: Paidamoyo Chapfuwa, Sherri Rose, Lawrence Carin, Edward Meeds, Ricardo Henao

    Abstract: End-to-end learning of dynamical systems with black-box models, such as neural ordinary differential equations (ODEs), provides a flexible framework for learning dynamics from data without prescribing a mathematical model for the dynamics. Unfortunately, this flexibility comes at the cost of understanding the dynamical system, for which ODEs are used ubiquitously. Further, experimental data are co… ▽ More

    Submitted 16 June, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: Accepted for the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022). Github code can be found at https://github.com/paidamoyo/structured_latent_ODEs

  10. Explainable multiple abnormality classification of chest CT volumes

    Authors: Rachel Lea Draelos, Lawrence Carin

    Abstract: Understanding model predictions is critical in healthcare, to facilitate rapid verification of model correctness and to guard against use of models that exploit confounding variables. We introduce the challenging new task of explainable multiple abnormality classification in volumetric medical images, in which a model must indicate the regions used to predict each abnormality. To solve this task,… ▽ More

    Submitted 20 August, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: Published in Artificial Intelligence in Medicine, 2022. 8 figures, 7 tables

    Journal ref: Artificial Intelligence in Medicine, August 2022

  11. arXiv:2111.02949  [pdf, other

    cs.LG cs.DC

    Finite-Time Consensus Learning for Decentralized Optimization with Nonlinear Gossi**

    Authors: Junya Chen, Sijia Wang, Lawrence Carin, Chenyang Tao

    Abstract: Distributed learning has become an integral tool for scaling up machine learning and addressing the growing need for data privacy. Although more robust to the network topology, decentralized learning schemes have not gained the same level of popularity as their centralized counterparts for being less competitive performance-wise. In this work, we attribute this issue to the lack of synchronization… ▽ More

    Submitted 13 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

  12. arXiv:2111.02947  [pdf, other

    cs.LG stat.ML

    Variational Inference with Holder Bounds

    Authors: Junya Chen, Danni Lu, Zidi Xiu, Ke Bai, Lawrence Carin, Chenyang Tao

    Abstract: The recent introduction of thermodynamic integration techniques has provided a new framework for understanding and improving variational inference (VI). In this work, we present a careful analysis of the thermodynamic variational objective (TVO), bridging the gap between existing variational objectives and shedding new insights to advance the field. In particular, we elucidate how the TVO naturall… ▽ More

    Submitted 13 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

  13. arXiv:2107.04661  [pdf, other

    cs.LG cs.AI stat.ML

    Hölder Bounds for Sensitivity Analysis in Causal Reasoning

    Authors: Serge Assaad, Shuxi Zeng, Henry Pfister, Fan Li, Lawrence Carin

    Abstract: We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using Hölder's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent o… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Workshop on the Neglected Assumptions in Causal Inference at the International Conference on Machine Learning (ICML), 2021

  14. arXiv:2107.01983  [pdf, other

    cs.LG cs.AI

    Gradient Importance Learning for Incomplete Observations

    Authors: Qitong Gao, Dong Wang, Joshua D. Amason, Siyang Yuan, Chenyang Tao, Ricardo Henao, Majda Hadziahmetovic, Lawrence Carin, Miroslav Pajic

    Abstract: Though recent works have developed methods that can generate estimates (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks such as classification. This is particularly true if the data have large missingness rates or a small sampl… ▽ More

    Submitted 1 March, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

  15. arXiv:2107.01152  [pdf, other

    stat.ML cs.AI cs.CV cs.IT cs.LG

    Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

    Authors: Junya Chen, Zhe Gan, Xuan Li, Qing Guo, Liqun Chen, Shuyang Gao, Tagyoung Chung, Yi Xu, Belinda Zeng, Wenlian Lu, Fan Li, Lawrence Carin, Chenyang Tao

    Abstract: InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size reg… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  16. arXiv:2107.01131  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

    Authors: Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, **g Huang, Chenyang Tao

    Abstract: Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variationa… ▽ More

    Submitted 24 October, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

  17. arXiv:2104.13417  [pdf, other

    cs.CV cs.LG stat.ML

    Towards Fair Federated Learning with Zero-Shot Data Augmentation

    Authors: Weituo Hao, Mostafa El-Khamy, Jungwon Lee, Jianyi Zhang, Kevin J Liang, Changyou Chen, Lawrence Carin

    Abstract: Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. Although it is recognized that statistical heterogeneity of the client local data yields slower global model convergence, it is less commonly recognized that it also yields a biased federated global model w… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted by IEEE CVPR Workshop on Fair, Data Efficient And Trusted Computer Vision

  18. arXiv:2104.02652  [pdf, other

    cs.CV cs.LG

    Malignancy Prediction and Lesion Identification from Clinical Dermatological Images

    Authors: Meng Xia, Meenal K. Kheterpal, Samantha C. Wong, Christine Park, William Ratliff, Lawrence Carin, Ricardo Henao

    Abstract: We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images, which can be indistinctly acquired via smartphone or dermoscopy capture. Additionally, we do not assume that images contain single lesions, thus the framework supports both focal or wide-field images. Specifically, we propose a two-stage approach in which we first identify all le… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

  19. arXiv:2103.13558  [pdf, other

    cs.LG cs.AI cs.CV

    Efficient Feature Transformations for Discriminative and Generative Continual Learning

    Authors: Vinay Kumar Verma, Kevin J Liang, Nikhil Mehta, Piyush Rai, Lawrence Carin

    Abstract: As neural networks are increasingly being applied to real-world applications, mechanisms to address distributional shift and sequential task learning without forgetting are critical. Methods incorporating network expansion have shown promise by naturally adding model capacity for learning new tasks while simultaneously avoiding catastrophic forgetting. However, the growth in the number of addition… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted in CVPR 2021

  20. arXiv:2103.09420  [pdf, other

    eess.AS cs.SD

    Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning

    Authors: Siyang Yuan, Pengyu Cheng, Ruiyi Zhang, Weituo Hao, Zhe Gan, Lawrence Carin

    Abstract: Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and pre-known speakers. However, zero-shot voice style transfer, which learns from non-parallel data and generates voices for previously unseen speakers, remains a ch… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: To appear in ICLR 2021

  21. arXiv:2103.06413  [pdf, other

    cs.CL cs.LG

    FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders

    Authors: Pengyu Cheng, Weituo Hao, Siyang Yuan, Shi**g Si, Lawrence Carin

    Abstract: Pretrained text encoders, such as BERT, have been applied increasingly in various natural language processing (NLP) tasks, and have recently demonstrated significant performance gains. However, recent studies have demonstrated the existence of social bias in these pretrained NLP models. Although prior works have made progress on word-level debiasing, improved sentence-level fairness of pretrained… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted by the 9th International Conference on Learning Representations (ICLR 2021)

  22. arXiv:2103.04032  [pdf, other

    cs.LG cs.CV stat.ML

    CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks

    Authors: Sakshi Varshney, Vinay Kumar Verma, Srijith P K, Lawrence Carin, Piyush Rai

    Abstract: We present a continual learning approach for generative adversarial networks (GANs), by designing and leveraging parameter-efficient feature map transformations. Our approach is based on learning a set of global and task-specific parameters. The global parameters are fixed across tasks whereas the task-specific parameters act as local adapters for each task, and help in efficiently obtaining task-… ▽ More

    Submitted 30 July, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

    Comments: Under Submission

  23. arXiv:2102.11856  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning

    Authors: Vinay Kumar Verma, Kevin Liang, Nikhil Mehta, Lawrence Carin

    Abstract: Zero-shot learning (ZSL) has been shown to be a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges still remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed the state of the art of ZSL, but these generative models can be slow or computationally expensive to trai… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Under Review

  24. FLOP: Federated Learning on Medical Datasets using Partial Networks

    Authors: Qian Yang, Jianyi Zhang, Weituo Hao, Gregory Spell, Lawrence Carin

    Abstract: The outbreak of COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources. To aid and accelerate the diagnosis process, automatic diagnosis of COVID-19 via deep learning models has recently been explored by researchers across the world. While different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19, the data itself is still… ▽ More

    Submitted 22 June, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: To appear in KDD 2021

  25. arXiv:2101.06804  [pdf, other

    cs.CL

    What Makes Good In-Context Examples for GPT-$3$?

    Authors: Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen

    Abstract: GPT-$3$ has attracted lots of attention due to its superior performance across a wide range of NLP tasks, especially with its powerful and versatile in-context few-shot learning ability. Despite its success, we found that the empirical results of GPT-$3$ depend heavily on the choice of in-context examples. In this work, we investigate whether there are more effective strategies for judiciously sel… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

  26. arXiv:2101.00355  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning for Flexibility Design Problems

    Authors: Yehua Wei, Lei Zhang, Ruiyi Zhang, Shi**g Si, Hao Zhang, Lawrence Carin

    Abstract: Flexibility design problems are a class of problems that appear in strategic decision-making across industries, where the objective is to design a ($e.g.$, manufacturing) network that affords flexibility and adaptivity. The underlying combinatorial nature and stochastic objectives make flexibility design problems challenging for standard optimization methods. In this paper, we develop a reinforcem… ▽ More

    Submitted 18 January, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

  27. arXiv:2012.08674  [pdf, other

    cs.LG cs.CV

    Wasserstein Contrastive Representation Distillation

    Authors: Liqun Chen, Dong Wang, Zhe Gan, **g**g Liu, Ricardo Henao, Lawrence Carin

    Abstract: The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former. Existing work, e.g., using Kullback-Leibler divergence for distillation, may fail to capture important structural knowledge in the teacher network and often lacks the ability for feature generalizatio… ▽ More

    Submitted 28 March, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted by CVPR 2021

  28. arXiv:2012.05644  [pdf, other

    cs.LG cs.SI stat.ML

    Learning Graphons via Structured Gromov-Wasserstein Barycenters

    Authors: Hongteng Xu, Dixin Luo, Lawrence Carin, Hongyuan Zha

    Abstract: We propose a novel and principled method to learn a nonparametric graph model called graphon, which is defined in an infinite-dimensional space and represents arbitrary-size graphs. Based on the weak regularity lemma from the theory of graphons, we leverage a step function to approximate a graphon. We show that the cut distance of graphons can be relaxed to the Gromov-Wasserstein distance of their… ▽ More

    Submitted 17 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

    Journal ref: AAAI 2021

  29. arXiv:2012.03369  [pdf, other

    cs.CV cs.AI

    Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models

    Authors: Dong Wang, Yuewei Yang, Chenyang Tao, Zhe Gan, Liqun Chen, Fanjie Kong, Ricardo Henao, Lawrence Carin

    Abstract: Deep neural networks excel at comprehending complex visual signals, delivering on par or even superior performance to that of human experts. However, ad-hoc visual explanations of model decisions often reveal an alarming level of reliance on exploiting non-causal visual cues that strongly correlate with the target label in training data. As such, deep neural nets suffer compromised generalization… ▽ More

    Submitted 29 April, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

  30. arXiv:2011.12454  [pdf, other

    cs.CV

    Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer

    Authors: Zidi Xiu, Junya Chen, Ricardo Henao, Benjamin Goldstein, Lawrence Carin, Chenyang Tao

    Abstract: Dealing with severe class imbalance poses a major challenge for real-world applications, especially when the accurate classification and generalization of minority classes is of primary interest. In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets. While existing solutions mostly appeal to sampling or weighting adjustments to alleviate… ▽ More

    Submitted 19 November, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

  31. arXiv:2011.08891  [pdf, other

    eess.IV cs.CV cs.LG

    Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks

    Authors: Rachel Lea Draelos, Lawrence Carin

    Abstract: Explanation methods facilitate the development of models that learn meaningful concepts and avoid exploiting spurious correlations. We illustrate a previously unrecognized limitation of the popular neural network explanation method Grad-CAM: as a side effect of the gradient averaging step, Grad-CAM sometimes highlights locations the model did not actually use. To solve this problem, we propose HiR… ▽ More

    Submitted 20 November, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: 20 pages, 8 figures, 2 tables

  32. arXiv:2011.04794  [pdf, other

    cs.IT

    Estimating Total Correlation with Mutual Information Estimators

    Authors: Ke Bai, Pengyu Cheng, Weituo Hao, Ricardo Henao, Lawrence Carin

    Abstract: Total correlation (TC) is a fundamental concept in information theory that measures statistical dependency among multiple random variables. Recently, TC has shown noticeable effectiveness as a regularizer in many learning tasks, where the correlation among multiple latent embeddings requires to be jointly minimized or maximized. However, calculating precise TC values is challenging, especially whe… ▽ More

    Submitted 22 February, 2023; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted by AISTATS 2023

  33. arXiv:2011.00593  [pdf, other

    cs.CL stat.ML

    MixKD: Towards Efficient Distillation of Large-scale Language Models

    Authors: Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

    Abstract: Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (both memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such… ▽ More

    Submitted 17 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: ICLR 2021 Camera Ready

  34. arXiv:2010.12618  [pdf, other

    stat.ML cs.LG

    Counterfactual Representation Learning with Balancing Weights

    Authors: Serge Assaad, Shuxi Zeng, Chenyang Tao, Shounak Datta, Nikhil Mehta, Ricardo Henao, Fan Li, Lawrence Carin

    Abstract: A key to causal inference with observational data is achieving balance in predictive features associated with each treatment type. Recent literature has explored representation learning to achieve this goal. In this work, we discuss the pitfalls of these strategies - such as a steep trade-off between achieving balance and predictive power - and present a remedy via the integration of balancing wei… ▽ More

    Submitted 23 February, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

  35. arXiv:2010.07866  [pdf, other

    stat.ML cs.LG

    Double Robust Representation Learning for Counterfactual Prediction

    Authors: Shuxi Zeng, Serge Assaad, Chenyang Tao, Shounak Datta, Lawrence Carin, Fan Li

    Abstract: Causal inference, or counterfactual prediction, is central to decision making in healthcare, policy and social sciences. To de-bias causal estimators with high-dimensional data in observational studies, recent advances suggest the importance of combining machine learning models for both the propensity score and the outcome function. We propose a novel scalable method to learn double-robust represe… ▽ More

    Submitted 16 October, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: 18 pages, 5 figures, 2 Tables

  36. arXiv:2010.07488  [pdf, other

    cs.LG cs.CV eess.IV

    RetiNerveNet: Using Recursive Deep Learning to Estimate Pointwise 24-2 Visual Field Data based on Retinal Structure

    Authors: Shounak Datta, Eduardo B. Mariottoni, David Dov, Alessandro A. Jammal, Lawrence Carin, Felipe A. Medeiros

    Abstract: Glaucoma is the leading cause of irreversible blindness in the world, affecting over 70 million people. The cumbersome Standard Automated Perimetry (SAP) test is most frequently used to detect visual loss due to glaucoma. Due to the SAP test's innate difficulty and its high test-retest variability, we propose the RetiNerveNet, a deep convolutional recursive neural network for obtaining estimates o… ▽ More

    Submitted 19 June, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

  37. arXiv:2010.05994  [pdf, other

    cs.CL cs.LG

    Improving Text Generation with Student-Forcing Optimal Transport

    Authors: Guoyin Wang, Chunyuan Li, Jianqiao Li, Hao Fu, Yuh-Chen Lin, Liqun Chen, Yizhe Zhang, Chenyang Tao, Ruiyi Zhang, Wenlin Wang, Dinghan Shen, Qian Yang, Lawrence Carin

    Abstract: Neural language models are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testing, we propose using optimal transport (OT) to match the sequ… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: To appear at EMNLP 2020

  38. Background Adaptive Faster R-CNN for Semi-Supervised Convolutional Object Detection of Threats in X-Ray Images

    Authors: John B. Sigman, Gregory P. Spell, Kevin J Liang, Lawrence Carin

    Abstract: Recently, progress has been made in the supervised training of Convolutional Object Detectors (e.g. Faster R-CNN) for threat recognition in carry-on luggage using X-ray images. This is part of the Transportation Security Administration's (TSA's) mission to protect air travelers in the United States. While more training data with threats may reliably improve performance for this class of deep algor… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Journal ref: Proc. SPIE 11404, Anomaly Detection and Imaging with X-Rays (ADIX) V, 1140404 (26 May 2020)

  39. arXiv:2008.06597  [pdf, other

    cs.CV

    Weakly supervised cross-domain alignment with optimal transport

    Authors: Siyang Yuan, Ke Bai, Liqun Chen, Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin

    Abstract: Cross-domain alignment between image objects and text sequences is key to many visual-language tasks, and it poses a fundamental challenge to both computer vision and natural language processing. This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities, under a weakly-supervised setup, improving performan… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

    Comments: Accepted to BMVC 2020 (Oral)

  40. arXiv:2008.05687  [pdf, other

    cs.LG stat.ML

    WAFFLe: Weight Anonymized Factorization for Federated Learning

    Authors: Weituo Hao, Nikhil Mehta, Kevin J Liang, Pengyu Cheng, Mostafa El-Khamy, Lawrence Carin

    Abstract: In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore,… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  41. arXiv:2007.06178  [pdf, other

    cs.LG cs.CV stat.ML

    Bridging Maximum Likelihood and Adversarial Learning via $α$-Divergence

    Authors: Miaoyun Zhao, Yulai Cong, Shuyang Dai, Lawrence Carin

    Abstract: Maximum likelihood (ML) and adversarial learning are two popular approaches for training generative models, and from many perspectives these techniques are complementary. ML learning encourages the capture of all data modes, and it is typically characterized by stable training. However, ML learning tends to distribute probability mass diffusely over the data space, $e.g.$, yielding blurry syntheti… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: AAAI 2020

  42. arXiv:2006.14744  [pdf, other

    cs.CL cs.CV cs.LG

    Graph Optimal Transport for Cross-Domain Alignment

    Authors: Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, **g**g Liu

    Abstract: Cross-domain alignment between two sets of entities (e.g., objects in an image, words in a sentence) is fundamental to both computer vision and natural language processing. Existing methods mainly focus on designing advanced attention mechanisms to simulate soft alignment, with no training signals to explicitly encourage alignment. The learned attention matrices are also dense and lacks interpreta… ▽ More

    Submitted 24 July, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

    Journal ref: ICML 2020

  43. arXiv:2006.12013  [pdf, other

    cs.LG stat.ML

    CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

    Authors: Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin

    Abstract: Mutual information (MI) minimization has gained considerable interests in various machine learning tasks. However, estimating and minimizing MI in high-dimensional spaces remains a challenging problem, especially when only samples, rather than distribution forms, are accessible. Previous works mainly focus on MI lower bound approximation, which is not applicable to MI minimization problems. In thi… ▽ More

    Submitted 23 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted by the 37th International Conference on Machine Learing (ICML2020)

  44. arXiv:2006.11991  [pdf, other

    cs.CL cs.LG

    Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage

    Authors: Shi**g Si, Rui Wang, Jedrek Wosik, Hao Zhang, David Dov, Guoyin Wang, Ricardo Henao, Lawrence Carin

    Abstract: Small and imbalanced datasets commonly seen in healthcare represent a challenge when training classifiers based on deep learning models. So motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder Representations from Transformers forBiomedical TextMining). Specifically, (i) we introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and… ▽ More

    Submitted 21 June, 2020; originally announced June 2020.

    Comments: 20 pages, Machine Learning for Healthcare 2020 (To appear)

  45. arXiv:2006.08873  [pdf, other

    stat.ML cs.LG

    GO Hessian for Expectation-Based Objectives

    Authors: Yulai Cong, Miaoyun Zhao, Jianqiao Li, Junya Chen, Lawrence Carin

    Abstract: An unbiased low-variance gradient estimator, termed GO gradient, was proposed recently for expectation-based objectives $\mathbb{E}_{q_{\boldsymbolγ}(\boldsymbol{y})} [f(\boldsymbol{y})]$, where the random variable (RV) $\boldsymbol{y}$ may be drawn from a stochastic computation graph with continuous (non-reparameterizable) internal nodes and continuous/discrete leaves. Upgrading the GO gradient,… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

  46. Enabling Counterfactual Survival Analysis with Balanced Representations

    Authors: Paidamoyo Chapfuwa, Serge Assaad, Shuxi Zeng, Michael J. Pencina, Lawrence Carin, Ricardo Henao

    Abstract: Balanced representation learning methods have been applied successfully to counterfactual inference from observational data. However, approaches that account for survival outcomes are relatively limited. Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials, and such data are also relevant in fields like manufactur… ▽ More

    Submitted 3 March, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: Accepted at ACM Conference on Health, Inference, and Learning (ACM CHIL 2021). Code at https://github.com/paidamoyo/counterfactual_survival_analysis

  47. arXiv:2006.07543  [pdf, other

    cs.CV cs.LG

    GAN Memory with No Forgetting

    Authors: Yulai Cong, Miaoyun Zhao, Jianqiao Li, Sijia Wang, Lawrence Carin

    Abstract: As a fundamental issue in lifelong learning, catastrophic forgetting is directly caused by inaccessible historical data; accordingly, if the data (information) were memorized perfectly, no forgetting should be expected. Motivated by that, we propose a GAN memory for lifelong learning, which is capable of remembering a stream of datasets via generative processes, with \emph{no} forgetting. Our GAN… ▽ More

    Submitted 12 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: NeurIPS2020

  48. arXiv:2006.07487  [pdf, other

    stat.ML cs.LG

    Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization

    Authors: Shi**g Si, Chris. J. Oates, Andrew B. Duncan, Lawrence Carin, François-Xavier Briol

    Abstract: Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that… ▽ More

    Submitted 21 July, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Accepted by MCQMC2020

    MSC Class: G.3

  49. arXiv:2006.03160  [pdf, other

    cs.LG stat.ML

    Hierarchical Optimal Transport for Robust Multi-View Learning

    Authors: Dixin Luo, Hongteng Xu, Lawrence Carin

    Abstract: Traditional multi-view learning methods often rely on two assumptions: ($i$) the samples in different views are well-aligned, and ($ii$) their representations in latent space obey the same distribution. Unfortunately, these two assumptions may be questionable in practice, which limits the application of multi-view learning. In this work, we propose a hierarchical optimal transport (HOT) method to… ▽ More

    Submitted 8 June, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

  50. arXiv:2006.03089  [pdf, other

    cs.LG stat.ML

    Towards Understanding Fast Adversarial Training

    Authors: Bai Li, Shiqi Wang, Suman Jana, Lawrence Carin

    Abstract: Current neural-network-based classifiers are susceptible to adversarial examples. The most empirically successful approach to defending against such adversarial examples is adversarial training, which incorporates a strong self-attack during training to enhance its robustness. This approach, however, is computationally expensive and hence is hard to scale up. A recent work, called fast adversarial… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.