-
SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer
Authors:
Renan A. Rojas-Gomez,
Karan Singhal,
Ali Etemad,
Alex Bijamov,
Warren R. Morningstar,
Philip Andrew Mansfield
Abstract:
Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel augmentation technique based on Neural Style Tr…
▽ More
Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel augmentation technique based on Neural Style Transfer. SASSL decouples semantic and stylistic attributes in images and applies transformations exclusively to the style while preserving content, generating diverse samples that better retain semantics. Our technique boosts top-1 classification accuracy on ImageNet by up to 2$\%$ compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets.
△ Less
Submitted 3 February, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
Plex: Towards Reliability using Pretrained Large Model Extensions
Authors:
Dustin Tran,
Jeremiah Liu,
Michael W. Dusenberry,
Du Phan,
Mark Collier,
Jie Ren,
Kehang Han,
Zi Wang,
Zelda Mariet,
Huiyi Hu,
Neil Band,
Tim G. J. Rudner,
Karan Singhal,
Zachary Nado,
Joost van Amersfoort,
Andreas Kirsch,
Rodolphe Jenatton,
Nithum Thain,
Honglin Yuan,
Kelly Buchanan,
Kevin Murphy,
D. Sculley,
Yarin Gal,
Zoubin Ghahramani,
Jasper Snoek
, et al. (1 additional authors not shown)
Abstract:
A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive per…
▽ More
A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex's capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
What Do We Mean by Generalization in Federated Learning?
Authors:
Honglin Yuan,
Warren Morningstar,
Lin Ning,
Karan Singhal
Abstract:
Federated learning data is drawn from a distribution of distributions: clients are drawn from a meta-distribution, and their data are drawn from local data distributions. Thus generalization studies in federated learning should separate performance gaps from unseen client data (out-of-sample gap) from performance gaps from unseen client distributions (participation gap). In this work, we propose a…
▽ More
Federated learning data is drawn from a distribution of distributions: clients are drawn from a meta-distribution, and their data are drawn from local data distributions. Thus generalization studies in federated learning should separate performance gaps from unseen client data (out-of-sample gap) from performance gaps from unseen client distributions (participation gap). In this work, we propose a framework for disentangling these performance gaps. Using this framework, we observe and explain differences in behavior across natural and synthetic federated datasets, indicating that dataset synthesis strategy can be important for realistic simulations of generalization in federated learning. We propose a semantic synthesis strategy that enables realistic simulation without naturally-partitioned data. Informed by our findings, we call out community suggestions for future federated learning works.
△ Less
Submitted 16 March, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Learning Network of Multivariate Hawkes Processes: A Time Series Approach
Authors:
Jalal Etesami,
Negar Kiyavash,
Kun Zhang,
Kushagra Singhal
Abstract:
Learning the influence structure of multiple time series data is of great interest to many disciplines. This paper studies the problem of recovering the causal structure in network of multivariate linear Hawkes processes. In such processes, the occurrence of an event in one process affects the probability of occurrence of new events in some other processes. Thus, a natural notion of causality exis…
▽ More
Learning the influence structure of multiple time series data is of great interest to many disciplines. This paper studies the problem of recovering the causal structure in network of multivariate linear Hawkes processes. In such processes, the occurrence of an event in one process affects the probability of occurrence of new events in some other processes. Thus, a natural notion of causality exists between such processes captured by the support of the excitation matrix. We show that the resulting causal influence network is equivalent to the Directed Information graph (DIG) of the processes, which encodes the causal factorization of the joint distribution of the processes. Furthermore, we present an algorithm for learning the support of excitation matrix (or equivalently the DIG). The performance of the algorithm is evaluated on synthesized multivariate Hawkes networks as well as a stock market and MemeTracker real-world dataset.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.