Skip to main content

Showing 1–9 of 9 results for author: Modarressi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.11672  [pdf, other

    cs.CL

    MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

    Authors: Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, Mohsen Fayyaz, Hinrich Schütze

    Abstract: While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Paramet… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  2. arXiv:2306.02873  [pdf, other

    cs.CL

    DecompX: Explaining Transformers Decisions by Propagating Token Decomposition

    Authors: Ali Modarressi, Mohsen Fayyaz, Ehsan Aghazadeh, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar

    Abstract: An emerging solution for explaining Transformer-based models is to use vector-based analysis on how the representations are formed. However, providing a faithful vector-based explanation for a multi-layer model could be challenging in three aspects: (1) Incorporating all components into the analysis, (2) Aggregating the layer dynamics to determine the information flow and mixture throughout the en… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 (main conference)

  3. arXiv:2305.14322  [pdf, other

    cs.CL

    RET-LLM: Towards a General Read-Write Memory for Large Language Models

    Authors: Ali Modarressi, Ayyoob Imani, Mohsen Fayyaz, Hinrich Schütze

    Abstract: Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) through their extensive parameters and comprehensive data utilization. However, existing LLMs lack a dedicated memory unit, limiting their ability to explicitly store and retrieve knowledge for various tasks. In this paper, we propose RET-LLM a novel framework that equips LLMs with a general wri… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  4. arXiv:2302.02852  [pdf, other

    cs.CL

    Guide the Learner: Controlling Product of Experts Debiasing Method Based on Token Attribution Similarities

    Authors: Ali Modarressi, Hossein Amirkhani, Mohammad Taher Pilehvar

    Abstract: Several proposals have been put forward in recent years for improving out-of-distribution (OOD) performance through mitigating dataset biases. A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model. Here, the underlying assumption is that the biased model resorts to shortcut features. Hence, those training examples that are correctly pre… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted to EACL 2023 (main conference)

  5. arXiv:2211.05610  [pdf, other

    cs.CL

    BERT on a Data Diet: Finding Important Examples by Gradient-Based Pruning

    Authors: Mohsen Fayyaz, Ehsan Aghazadeh, Ali Modarressi, Mohammad Taher Pilehvar, Yadollah Yaghoobzadeh, Samira Ebrahimi Kahou

    Abstract: Current pre-trained language models rely on large datasets for achieving state-of-the-art performance. However, past research has shown that not all examples in a dataset are equally important during training. In fact, it is sometimes possible to prune a considerable fraction of the training set while maintaining the test performance. Established on standard vision benchmarks, two gradient-based s… ▽ More

    Submitted 28 November, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: ENLSP @ NeurIPS2022

  6. arXiv:2205.03286  [pdf, other

    cs.CL

    GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

    Authors: Ali Modarressi, Mohsen Fayyaz, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar

    Abstract: There has been a growing interest in interpreting the underlying dynamics of Transformers. While self-attention patterns were initially deemed as the primary option, recent studies have shown that integrating other components can yield more accurate explanations. This paper introduces a novel token attribution analysis method that incorporates all the components in the encoder block and aggregates… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted to NAACL 2022 (main conference)

  7. arXiv:2203.08991  [pdf, other

    cs.CL

    AdapLeR: Speeding up Inference by Adaptive Length Reduction

    Authors: Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar

    Abstract: Pre-trained language models have shown stellar performance in various downstream tasks. But, this usually comes at the cost of high latency and computation, hindering their usage in resource-limited settings. In this work, we propose a novel approach for reducing the computational cost of BERT with minimal loss in downstream performance. Our method dynamically eliminates less contributing tokens t… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022 (main conference)

  8. arXiv:2109.05958  [pdf, other

    cs.CL cs.AI

    Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations

    Authors: Mohsen Fayyaz, Ehsan Aghazadeh, Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar

    Abstract: Most of the recent works on probing representations have focused on BERT, with the presumption that the findings might be similar to the other models. In this work, we extend the probing studies to two other models in the family, namely ELECTRA and XLNet, showing that variations in the pre-training objectives or architectural choices can result in different behaviors in encoding linguistic informa… ▽ More

    Submitted 15 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted to BlackboxNLP Workshop at EMNLP 2021

  9. arXiv:2104.01477  [pdf, other

    cs.CL

    Exploring the Role of BERT Token Representations to Explain Sentence Probing Results

    Authors: Hosein Mohebbi, Ali Modarressi, Mohammad Taher Pilehvar

    Abstract: Several studies have been carried out on revealing linguistic features captured by BERT. This is usually achieved by training a diagnostic classifier on the representations obtained from different layers of BERT. The subsequent classification accuracy is then interpreted as the ability of the model in encoding the corresponding linguistic property. Despite providing insights, these studies have le… ▽ More

    Submitted 11 September, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

    Comments: Accepted to EMNLP 2021 (main conference)