Showing 1–2 of 2 results for author: Ioannou, Y A

Search v0.5.6 released 2020-02-24

arXiv:2111.12170 [pdf, other]

cs.LG cs.AI cs.CV

Domain-Agnostic Clustering with Self-Distillation

Authors: Mohammed Adnan, Yani A. Ioannou, Chuan-Yung Tsai, Graham W. Taylor

Abstract: Recent advancements in self-supervised learning have reduced the gap between supervised and unsupervised representation learning. However, most self-supervised and deep clustering techniques rely heavily on data augmentation, rendering them ineffective for many learning tasks where insufficient domain knowledge exists for performing augmentation. We propose a new self-distillation based algorithm… ▽ More Recent advancements in self-supervised learning have reduced the gap between supervised and unsupervised representation learning. However, most self-supervised and deep clustering techniques rely heavily on data augmentation, rendering them ineffective for many learning tasks where insufficient domain knowledge exists for performing augmentation. We propose a new self-distillation based algorithm for domain-agnostic clustering. Our method builds upon the existing deep clustering frameworks and requires no separate student model. The proposed method outperforms existing domain agnostic (augmentation-free) algorithms on CIFAR-10. We empirically demonstrate that knowledge distillation can improve unsupervised representation learning by extracting richer `dark knowledge' from the model than using predicted labels alone. Preliminary experiments also suggest that self-distillation improves the convergence of DeepCluster-v2. △ Less

Submitted 20 December, 2021; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: NeurIPS 2021 Workshop: Self-Supervised Learning - Theory and Practice
arXiv:2010.03533 [pdf, other]

cs.LG cs.CV

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Authors: Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

Abstract: Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). Thro… ▽ More Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). Through our analysis of gradient flow during training we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and demonstrate the importance of using sparsity-aware initialization. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from - however, this comes at the cost of learning novel solutions. △ Less

Submitted 15 March, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: Published in AAAI 2022. Code can be found at https://github.com/google-research/rigl/tree/master/rigl/rigl_tf2

MSC Class: 68T07

Search v0.5.6 released 2020-02-24