Search | arXiv e-print repository

doi 10.1007/s40314-024-02836-x

Properties of core-EP matrices and binary relationships

Authors: Ehsan Kheirandish, Abbas Salemi, Néstor Thome

Abstract: In this paper, various properties of core-EP matrices are investigated. We introduce the MPDMP matrix associated with $A$ and by means of it, some properties and equivalent conditions of core-EP matrices can be obtained. Also, properties of MPD, DMP, and CMP inverses are studied and we prove that in the class of core-EP matrices, DMP, MPD, and Drazin inverses are the same. Moreover, DMP and MPD bi… ▽ More In this paper, various properties of core-EP matrices are investigated. We introduce the MPDMP matrix associated with $A$ and by means of it, some properties and equivalent conditions of core-EP matrices can be obtained. Also, properties of MPD, DMP, and CMP inverses are studied and we prove that in the class of core-EP matrices, DMP, MPD, and Drazin inverses are the same. Moreover, DMP and MPD binary relation orders are introduced and the relationship between these orders and other binary relation orders are considered. △ Less

Submitted 6 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: 20 pages

MSC Class: 15A09; 15A45

arXiv:2405.00175 [pdf, other]

Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models

Authors: Alireza Salemi, Hamed Zamani

Abstract: This paper introduces uRAG--a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. Each RAG system consumes the retrieval results for a unique purpose, such as open-domain question answering, fact verification, entity linking, and relation extraction. We introduce a generic training guideline that standardizes the communication bet… ▽ More This paper introduces uRAG--a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. Each RAG system consumes the retrieval results for a unique purpose, such as open-domain question answering, fact verification, entity linking, and relation extraction. We introduce a generic training guideline that standardizes the communication between the search engine and the downstream RAG systems that engage in optimizing the retrieval model. This lays the groundwork for us to build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine. Using this experimentation ecosystem, we answer a number of fundamental research questions that improve our understanding of promises and challenges in develo** search engines for machines. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.13781 [pdf, other]

Evaluating Retrieval Quality in Retrieval-Augmented Generation

Authors: Alireza Salemi, Hamed Zamani

Abstract: Evaluating retrieval-augmented generation (RAG) presents challenges, particularly for retrieval models within these systems. Traditional end-to-end evaluation methods are computationally expensive. Furthermore, evaluation of the retrieval model's performance based on query-document relevance labels shows a small correlation with the RAG system's downstream performance. We propose a novel evaluatio… ▽ More Evaluating retrieval-augmented generation (RAG) presents challenges, particularly for retrieval models within these systems. Traditional end-to-end evaluation methods are computationally expensive. Furthermore, evaluation of the retrieval model's performance based on query-document relevance labels shows a small correlation with the RAG system's downstream performance. We propose a novel evaluation approach, eRAG, where each document in the retrieval list is individually utilized by the large language model within the RAG system. The output generated for each document is then evaluated based on the downstream task ground truth labels. In this manner, the downstream performance for each document serves as its relevance label. We employ various downstream task metrics to obtain document-level annotations and aggregate them using set-based or ranking metrics. Extensive experiments on a wide range of datasets demonstrate that eRAG achieves a higher correlation with downstream RAG performance compared to baseline methods, with improvements in Kendall's $τ$ correlation ranging from 0.168 to 0.494. Additionally, eRAG offers significant computational advantages, improving runtime and consuming up to 50 times less GPU memory than end-to-end evaluation. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.05970 [pdf, other]

Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation

Authors: Alireza Salemi, Surya Kallumadi, Hamed Zamani

Abstract: This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms… ▽ More This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms that solicit feedback from the downstream personalized generation tasks for retrieval optimization -- one based on reinforcement learning whose reward function is defined using any arbitrary metric for personalized generation and another based on knowledge distillation from the downstream LLM to the retrieval model. This paper also introduces a pre- and post-generation retriever selection model that decides what retriever to choose for each LLM input. Extensive experiments on diverse tasks from the language model personalization (LaMP) benchmark reveal statistically significant improvements in six out of seven datasets. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2401.06466 [pdf, other]

PersianMind: A Cross-Lingual Persian-English Large Language Model

Authors: Pedram Rostami, Ali Salemi, Mohammad Javad Dousti

Abstract: Large language models demonstrate remarkable proficiency in various linguistic tasks and have extensive knowledge across various domains. Although they perform best in English, their ability in other languages is notable too. In contrast, open-source models, such as LLaMa, are primarily trained on English datasets, resulting in poor performance in non-English languages. In this paper, we introduce… ▽ More Large language models demonstrate remarkable proficiency in various linguistic tasks and have extensive knowledge across various domains. Although they perform best in English, their ability in other languages is notable too. In contrast, open-source models, such as LLaMa, are primarily trained on English datasets, resulting in poor performance in non-English languages. In this paper, we introduce PersianMind, an open-source bilingual large language model which demonstrates comparable performance to closed-source GPT-3.5-turbo in the Persian language. By expanding LLaMa2's vocabulary with 10,000 Persian tokens and training it on a dataset comprising nearly 2 billion Persian tokens, we show that our approach preserves the model's English knowledge and employs transfer learning to excel at transferring task knowledge from one language to another. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2306.16478 [pdf, other]

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

Authors: Alireza Salemi, Mahta Rafiee, Hamed Zamani

Abstract: This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions. This category is called outside-knowledge visual question answering (OK-VQA). A major step in develo** OK-VQA systems is to retrieve relevant documents for the given multi-modal query. Current state-of-the-art asymmetric dense retrieval model for this… ▽ More This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions. This category is called outside-knowledge visual question answering (OK-VQA). A major step in develo** OK-VQA systems is to retrieve relevant documents for the given multi-modal query. Current state-of-the-art asymmetric dense retrieval model for this task uses an architecture with a multi-modal query encoder and a uni-modal document encoder. Such an architecture requires a large amount of training data for effective performance. We propose an automatic data generation pipeline for pre-training passage retrieval models for OK-VQA tasks. The proposed approach leads to 26.9% Precision@5 improvements compared to the current state-of-the-art asymmetric architecture. Additionally, the proposed pre-training approach exhibits a good ability in zero-shot retrieval scenarios. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2304.13649 [pdf, other]

A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering

Authors: Alireza Salemi, Juan Altmayer Pizzorno, Hamed Zamani

Abstract: Knowledge-Intensive Visual Question Answering (KI-VQA) refers to answering a question about an image whose answer does not lie in the image. This paper presents a new pipeline for KI-VQA tasks, consisting of a retriever and a reader. First, we introduce DEDR, a symmetric dual encoding dense retrieval framework in which documents and queries are encoded into a shared embedding space using uni-modal… ▽ More Knowledge-Intensive Visual Question Answering (KI-VQA) refers to answering a question about an image whose answer does not lie in the image. This paper presents a new pipeline for KI-VQA tasks, consisting of a retriever and a reader. First, we introduce DEDR, a symmetric dual encoding dense retrieval framework in which documents and queries are encoded into a shared embedding space using uni-modal (textual) and multi-modal encoders. We introduce an iterative knowledge distillation approach that bridges the gap between the representation spaces in these two encoders. Extensive evaluation on two well-established KI-VQA datasets, i.e., OK-VQA and FVQA, suggests that DEDR outperforms state-of-the-art baselines by 11.6% and 30.9% on OK-VQA and FVQA, respectively. Utilizing the passages retrieved by DEDR, we further introduce MM-FiD, an encoder-decoder multi-modal fusion-in-decoder model, for generating a textual answer for KI-VQA tasks. MM-FiD encodes the question, the image, and each retrieved passage separately and uses all passages jointly in its decoder. Compared to competitive baselines in the literature, this approach leads to 5.5% and 8.5% improvements in terms of question answering accuracy on OK-VQA and FVQA, respectively. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.11406 [pdf, other]

LaMP: When Large Language Models Meet Personalization

Authors: Alireza Salemi, Sheshera Mysore, Michael Bendersky, Hamed Zamani

Abstract: This paper highlights the importance of personalization in large language models and introduces the LaMP benchmark -- a novel benchmark for training and evaluating language models for producing personalized outputs. LaMP offers a comprehensive evaluation framework with diverse language tasks and multiple entries for each user profile. It consists of seven personalized tasks, spanning three text cl… ▽ More This paper highlights the importance of personalization in large language models and introduces the LaMP benchmark -- a novel benchmark for training and evaluating language models for producing personalized outputs. LaMP offers a comprehensive evaluation framework with diverse language tasks and multiple entries for each user profile. It consists of seven personalized tasks, spanning three text classification and four text generation tasks. We additionally propose two retrieval augmentation approaches that retrieve personal items from each user profile for personalizing language model outputs. To this aim, we study various retrieval models, including term matching, semantic matching, and time-aware methods. Extensive experiments on LaMP for zero-shot and fine-tuned language models demonstrate the efficacy of the proposed retrieval augmentation approach and highlight the impact of personalization in various natural language tasks. △ Less

Submitted 4 June, 2024; v1 submitted 22 April, 2023; originally announced April 2023.

arXiv:2304.01282 [pdf, other]

PEACH: Pre-Training Sequence-to-Sequence Multilingual Models for Translation with Semi-Supervised Pseudo-Parallel Document Generation

Authors: Alireza Salemi, Amirhossein Abaskohi, Sara Tavakoli, Yadollah Yaghoobzadeh, Azadeh Shakery

Abstract: Multilingual pre-training significantly improves many multilingual NLP tasks, including machine translation. Most existing methods are based on some variants of masked language modeling and text-denoising objectives on monolingual data. Multilingual pre-training on monolingual data ignores the availability of parallel data in many language pairs. Also, some other works integrate the available huma… ▽ More Multilingual pre-training significantly improves many multilingual NLP tasks, including machine translation. Most existing methods are based on some variants of masked language modeling and text-denoising objectives on monolingual data. Multilingual pre-training on monolingual data ignores the availability of parallel data in many language pairs. Also, some other works integrate the available human-generated parallel translation data in their pre-training. This kind of parallel data is definitely helpful, but it is limited even in high-resource language pairs. This paper introduces a novel semi-supervised method, SPDG, that generates high-quality pseudo-parallel data for multilingual pre-training. First, a denoising model is pre-trained on monolingual data to reorder, add, remove, and substitute words, enhancing the pre-training documents' quality. Then, we generate different pseudo-translations for each pre-training document using dictionaries for word-by-word translation and applying the pre-trained denoising model. The resulting pseudo-parallel data is then used to pre-train our multilingual sequence-to-sequence model, PEACH. Our experiments show that PEACH outperforms existing approaches used in training mT5 and mBART on various translation tasks, including supervised, zero- and few-shot scenarios. Moreover, PEACH's ability to transfer knowledge between similar languages makes it particularly useful for low-resource languages. Our results demonstrate that with high-quality dictionaries for generating accurate pseudo-parallel, PEACH can be valuable for low-resource languages. △ Less

Submitted 14 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: 15 pages, 5 figures, 16 tables, 1 algorithm, LoResMT@EACL 2023

Journal ref: https://aclanthology.org/2023.loresmt-1.3

arXiv:2207.13477 [pdf, ps, other]

An improved bound on Legendre approximation

Authors: M. Hamzehnejad, M. M. Hosseini, A. Salemi

Abstract: In this paper, new relations between the derivatives of the Legendre polynomials are obtained, and by these relations, new upper bounds for the Legendre coefficients of differentiable functions are presented. These upper bounds are sharp and cover more categories of differentiable functions. Moreover, new and sharper bounds for the approximation error of the partial sums of Legendre polynomials ar… ▽ More In this paper, new relations between the derivatives of the Legendre polynomials are obtained, and by these relations, new upper bounds for the Legendre coefficients of differentiable functions are presented. These upper bounds are sharp and cover more categories of differentiable functions. Moreover, new and sharper bounds for the approximation error of the partial sums of Legendre polynomials are provided. Numerical examples are given to validate our theoretical results. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: The paper contains 15 pages

MSC Class: 41A25

arXiv:2112.13430 [pdf, other]

IoT Analytics and Blockchain

Authors: Abbas Saleminezhadl, Manuel Remmele, Ravikumar Chaudhari, Rasha Kashef

Abstract: The Internet of Things (IoT) is revolutionizing human life with the idea of interconnecting everyday used devices (Things) and making them smart. By establishing a communication network between devices, the IoT system aids in automating tasks and making them efficient and powerful. The sensors and the physical world, connected over a network, involve a massive amount of data. The data collection a… ▽ More The Internet of Things (IoT) is revolutionizing human life with the idea of interconnecting everyday used devices (Things) and making them smart. By establishing a communication network between devices, the IoT system aids in automating tasks and making them efficient and powerful. The sensors and the physical world, connected over a network, involve a massive amount of data. The data collection and sharing possess a critical threat of being stolen and manipulated over the network. These inadequate data security and privacy issues in IoT systems raise concerns about maintaining authentication of IoT data. Blockchain, a tempter-resistant ledger, has emerged as a viable alternative to provide security features. Blockchain technologies with decentralized structures can help resolve IoT structure issues and protect against a single point of failure. While providing robust security features, Blockchain also bears various critical challenges in the IoT environment to adapt. This paper presents a survey on state-of-the-art Blockchain technologies focusing on IoT applications. With Blockchain protocols and data structures, the IoT applications are outlined, along with possible advancements and modifications. △ Less

Submitted 26 December, 2021; originally announced December 2021.

arXiv:2109.04098 [pdf, other]

ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization

Authors: Alireza Salemi, Emad Kebriaei, Ghazal Neisi Minaei, Azadeh Shakery

Abstract: Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoder-dec… ▽ More Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoder-decoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore. Our models also outperform prior works in textual entailment, question paraphrasing, and multiple choice question answering. Finally, we established a human evaluation and show that using the semantic score significantly improves summarization results. △ Less

Submitted 9 September, 2021; originally announced September 2021.

arXiv:2104.04770 [pdf, other]

UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

Authors: Alireza Salemi, Nazanin Sabri, Emad Kebriaei, Behnam Bahrak, Azadeh Shakery

Abstract: Detecting which parts of a sentence contribute to that sentence's toxicity -- rather than providing a sentence-level verdict of hatefulness -- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multi… ▽ More Detecting which parts of a sentence contribute to that sentence's toxicity -- rather than providing a sentence-level verdict of hatefulness -- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entity-based, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:1910.05933 [pdf, other]

doi 10.1007/s13042-020-01193-5

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

Authors: Ali Hassani, Amir Iranmanesh, Mahdi Eftekhari, Abbas Salemi

Abstract: One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-Means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number… ▽ More One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-Means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number of clusters, which is useful in some practical cases. Other practical methods which do are simply too complex, as they require at least one run of K-Means for each possible K. In order to address this issue, we propose a K-Means initialization similar to K-Means++, which would be able to estimate K based on the feature space while finding suitable initial centroids for K-Means in a deterministic manner. Then we compare the proposed method, DISCERN, with a few of the most practical K estimation methods, while also comparing clustering results of K-Means when initialized randomly, using K-Means++ and using DISCERN. The results show improvement in both the estimation and final clustering performance. △ Less

Submitted 22 September, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: Int. J. Mach. Learn. & Cyber. (2020)

arXiv:0807.5094 [pdf, ps, other]

The structure of strong linear preservers of gw-majorization on $ \mathbf{M}_{{n,m}

Authors: A. Armandnejad, A. Salemi

Abstract: Let M_{n,m} be the set of all n by m matrices with entries in F, where F is the field of real or complex numbers. A matrix R in M_{n} with the property Re=e, is said to be a g-row stochastic (generalized row stochastic) matrix. Let A,B in M_{n,m}, so B is said to be gw-majorized by A if there exists an n by n g-row stochastic matrix R such that B=RA. In this paper we characterize all linear oper… ▽ More Let M_{n,m} be the set of all n by m matrices with entries in F, where F is the field of real or complex numbers. A matrix R in M_{n} with the property Re=e, is said to be a g-row stochastic (generalized row stochastic) matrix. Let A,B in M_{n,m}, so B is said to be gw-majorized by A if there exists an n by n g-row stochastic matrix R such that B=RA. In this paper we characterize all linear operators that strongly preserve gw-majorization on M_{n,m} and all linear operators that strongly preserve matrix majorization on M_{n} . △ Less

Submitted 31 July, 2008; originally announced July 2008.

Comments: 7 pages

MSC Class: 15A03; 15A04; 15A51

Showing 1–15 of 15 results for author: Salemi, A