Search | arXiv e-print repository

arXiv:2305.19207 [pdf, other]

Group Invariant Global Pooling

Authors: Kamil Bujel, Yonatan Gideoni, Chaitanya K. Joshi, Pietro Liò

Abstract: Much work has been devoted to devising architectures that build group-equivariant representations, while invariance is often induced using simple global pooling mechanisms. Little work has been done on creating expressive layers that are invariant to given symmetries, despite the success of permutation invariant pooling in various molecular tasks. In this work, we present Group Invariant Global Po… ▽ More Much work has been devoted to devising architectures that build group-equivariant representations, while invariance is often induced using simple global pooling mechanisms. Little work has been done on creating expressive layers that are invariant to given symmetries, despite the success of permutation invariant pooling in various molecular tasks. In this work, we present Group Invariant Global Pooling (GIGP), an invariant pooling layer that is provably sufficiently expressive to represent a large class of invariant functions. We validate GIGP on rotated MNIST and QM9, showing improvements for the latter while attaining identical results for the former. By making the pooling process group orbit-aware, this invariant aggregation method leads to improved performance, while performing well-principled group aggregation. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2303.07991 [pdf, other]

Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

Authors: Kamil Bujel, Andrew Caines, Helen Yannakoudakis, Marek Rei

Abstract: Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We f… ▽ More Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2103.14465 [pdf, other]

Zero-shot Sequence Labeling for Transformer-based Sentence Classifiers

Authors: Kamil Bujel, Helen Yannakoudakis, Marek Rei

Abstract: We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-based architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many token… ▽ More We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-based architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many tokens, negatively affecting zero-shot token-level performance. We find that a soft attention module which explicitly encourages sharpness of attention weights can significantly outperform existing methods. △ Less

Submitted 8 June, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

arXiv:1812.02300 [pdf, other]

Solving High Volume Capacitated Vehicle Routing Problem with Time Windows using Recursive-DBSCAN clustering algorithm

Authors: Kamil Bujel, Feiko Lai, Michal Szczecinski, Winnie So, Miguel Fernandez

Abstract: This paper introduces a new approach to improve the performance of the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) solvers for a high number of nodes. It proposes to cluster nodes together using Recursive-DBSCAN - an algorithm that recursively applies DBSCAN until clusters below the preset maximum number of nodes are obtained. That approach leads to 61% decrease in runtimes of t… ▽ More This paper introduces a new approach to improve the performance of the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) solvers for a high number of nodes. It proposes to cluster nodes together using Recursive-DBSCAN - an algorithm that recursively applies DBSCAN until clusters below the preset maximum number of nodes are obtained. That approach leads to 61% decrease in runtimes of the CVRPTW solver as benchmarked against Google Optimization Tools, while the difference of total distance and number of vehicles used by found solutions is below 7%. The improvement of runtimes with the Recursive-DBSCAN method is because of splitting the node-set into constituent clusters, which limits the number of solutions checked by the solver, consequently reducing the runtime. The proposed method consumes less memory and is able to find solutions for problems up to 5000 nodes, while the baseline Google Optimisation Tools solves problems up to 2000 nodes. △ Less

Submitted 23 March, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

Comments: Draft

Showing 1–4 of 4 results for author: Bujel, K