Showing 1–2 of 2 results for author: Behjati, M

Search v0.5.6 released 2020-02-24

arXiv:2310.17284 [pdf, other]

cs.CL

Learning to Abstract with Nonparametric Variational Information Bottleneck

Authors: Melika Behjati, Fabio Fehr, James Henderson

Abstract: Learned representations at the level of characters, sub-words, words and sentences, have each contributed to advances in understanding different NLP tasks and linguistic phenomena. However, learning textual embeddings is costly as they are tokenization specific and require different models to be trained for each level of abstraction. We introduce a novel language representation model which can lea… ▽ More Learned representations at the level of characters, sub-words, words and sentences, have each contributed to advances in understanding different NLP tasks and linguistic phenomena. However, learning textual embeddings is costly as they are tokenization specific and require different models to be trained for each level of abstraction. We introduce a novel language representation model which can learn to compress to different levels of abstraction at different layers of the same model. We apply Nonparametric Variational Information Bottleneck (NVIB) to stacked Transformer self-attention layers in the encoder, which encourages an information-theoretic compression of the representations through the model. We find that the layers within the model correspond to increasing levels of abstraction and that their representations are more linguistically informed. Finally, we show that NVIB compression results in a model which is more robust to adversarial perturbations. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of EMNLP 2023
arXiv:2102.01223 [pdf, other]

cs.CL cs.LG

Inducing Meaningful Units from Character Sequences with Dynamic Capacity Slot Attention

Authors: Melika Behjati, James Henderson

Abstract: Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaningful units in a sequence of characters. Rather than segmenting the sequence, our Dynamic Capacity Slot Attention model discovers continuous representations of the objects in the sequence, extending an architecture for object discovery in images. We train ou… ▽ More Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaningful units in a sequence of characters. Rather than segmenting the sequence, our Dynamic Capacity Slot Attention model discovers continuous representations of the objects in the sequence, extending an architecture for object discovery in images. We train our model on different languages and evaluate the quality of the obtained representations with forward and reverse probing classifiers. These experiments show that our model succeeds in discovering units which are similar to those proposed previously in form, content and level of abstraction, and which show promise for capturing meaningful information at a higher level of abstraction. △ Less

Submitted 16 January, 2024; v1 submitted 1 February, 2021; originally announced February 2021.

Comments: Accepted to TMLR 2023

Search v0.5.6 released 2020-02-24