Skip to main content

Showing 1–18 of 18 results for author: Asgari, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05279  [pdf, other

    cs.CL cs.AI

    SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

    Authors: MohammadAli SadraeiJavaeri, Ehsaneddin Asgari, Alice Carolyn McHardy, Hamid Reza Rabiee

    Abstract: Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their growing use, achieving optimal tuning with soft prompts, especially for smaller datasets, remains a substantial challenge. This study makes two contributions in this… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2404.06644  [pdf, other

    cs.CL cs.AI

    Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?

    Authors: Omid Ghahroodi, Marzia Nouri, Mohammad Vali Sanian, Alireza Sahebi, Doratossadat Dastgheib, Ehsaneddin Asgari, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban

    Abstract: Evaluating Large Language Models (LLMs) is challenging due to their generative nature, necessitating precise evaluation methodologies. Additionally, non-English LLM evaluation lags behind English, resulting in the absence or weakness of LLMs for many languages. In response to this necessity, we introduce Khayyam Challenge (also known as PersianMMLU), a meticulously curated collection comprising 20… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  3. arXiv:2402.02369  [pdf, other

    cs.CV cs.CL cs.MM

    M$^3$Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing

    Authors: Mohammadreza Mofayezi, Reza Alipour, Mohammad Ali Kakavand, Ehsaneddin Asgari

    Abstract: Human face generation and editing represent an essential task in the era of computer vision and the digital world. Recent studies have shown remarkable progress in multi-modal face generation and editing, for instance, using face segmentation to guide image generation. However, it may be challenging for some users to create these conditioning modalities manually. Thus, we introduce M3Face, a unifi… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  4. arXiv:2312.03361  [pdf, other

    cs.CL

    KhabarChin: Automatic Detection of Important News in the Persian Language

    Authors: Hamed Hematian Hemati, Arash Lagzian, Moein Salimi Sartakhti, Hamid Beigy, Ehsaneddin Asgari

    Abstract: Being aware of important news is crucial for staying informed and making well-informed decisions efficiently. Natural Language Processing (NLP) approaches can significantly automate this process. This paper introduces the detection of important news, in a previously unexplored area, and presents a new benchmarking dataset (Khabarchin) for detecting important news in the Persian language. We define… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 8 pages, 2 figures

  5. arXiv:2305.08487  [pdf, other

    cs.CL

    Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages

    Authors: Chunlan Ma, Ayyoob ImaniGooghari, Haotian Ye, Renhao Pei, Ehsaneddin Asgari, Hinrich Schütze

    Abstract: While natural language processing tools have been developed extensively for some of the world's languages, a significant portion of the world's over 7000 languages are still neglected. One reason for this is that evaluation datasets do not yet cover a wide range of languages, including low-resource and endangered ones. We aim to address this issue by creating a text classification dataset encompas… ▽ More

    Submitted 4 June, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

  6. arXiv:2301.13771  [pdf, other

    cs.CL

    The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments

    Authors: Nailia Mirzakhmedova, Johannes Kiesel, Milad Alshomary, Maximilian Heinrich, Nicolas Handke, Xiaoni Cai, Barriere Valentin, Doratossadat Dastgheib, Omid Ghahroodi, Mohammad Ali Sadraei, Ehsaneddin Asgari, Lea Kawaletz, Henning Wachsmuth, Benno Stein

    Abstract: We present the Touché23-ValueEval Dataset for Identifying Human Values behind Arguments. To investigate approaches for the automated detection of human values behind arguments, we collected 9324 arguments from 6 diverse sources, covering religious texts, political discussions, free-text arguments, newspaper editorials, and online democracy platforms. Each argument was annotated by 3 crowdworkers f… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  7. arXiv:2206.01444  [pdf, other

    cs.LG cs.PF

    XPASC: Measuring Generalization in Weak Supervision by Explainability and Association

    Authors: Luisa März, Ehsaneddin Asgari, Fabienne Braune, Franziska Zimmermann, Benjamin Roth

    Abstract: Weak supervision is leveraged in a wide range of domains and tasks due to its ability to create massive amounts of labeled data, requiring only little manual effort. Standard approaches use labeling functions to specify signals that are relevant for the labeling. It has been conjectured that weakly supervised models over-rely on those signals and as a result suffer from overfitting. To verify this… ▽ More

    Submitted 22 November, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: 26 pages, 20 Figures, 5 Tables

  8. arXiv:2204.07430  [pdf

    cs.SE cs.AI cs.LO

    Stateless and Rule-Based Verification For Compliance Checking Applications

    Authors: Mohammad Reza Besharati, Mohammad Izadi, Ehsaneddin Asgari

    Abstract: Underlying computational model has an important role in any computation. The state and transition (such as in automata) and rule and value (such as in Lisp and logic programming) are two comparable and counterpart computational models. Both of deductive and model checking verification techniques are relying on a notion of state and as a result, their underlying computational models are state depen… ▽ More

    Submitted 28 April, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

  9. arXiv:2109.07994  [pdf, other

    cs.LG cs.CL

    KnowMAN: Weakly Supervised Multinomial Adversarial Networks

    Authors: Luisa März, Ehsaneddin Asgari, Fabienne Braune, Franziska Zimmermann, Benjamin Roth

    Abstract: The absence of labeled data for training neural models is often addressed by leveraging knowledge about the specific task, resulting in heuristic but noisy labels. The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training. This process of weakly supervised training may result in an over-reli… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: 9 pages, 3 figures, 2 tables, accepted to EMNLP 2021

  10. arXiv:2012.11657  [pdf, other

    cs.CL

    Subword Sampling for Low Resource Word Alignment

    Authors: Ehsaneddin Asgari, Masoud Jalili Sabet, Philipp Dufter, Christopher Ringlstetter, Hinrich Schütze

    Abstract: Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods are designed for a high resource setting in machine translation where millions of parallel sentences are available. This amount reduces to a few thousands of sen… ▽ More

    Submitted 15 June, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

  11. arXiv:2005.07979  [pdf, other

    cs.CL cs.AI

    Unsupervised Embedding-based Detection of Lexical Semantic Changes

    Authors: Ehsaneddin Asgari, Christoph Ringlstetter, Hinrich Schütze

    Abstract: This paper describes EmbLexChange, a system introduced by the "Life-Language" team for SemEval-2020 Task 1, on unsupervised detection of lexical-semantic changes. EmbLexChange is defined as the divergence between the embedding based profiles of word w (calculated with respect to a set of reference words) in the source and the target domains (source and target domains can be simply two time frames… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

  12. arXiv:1904.09678  [pdf, other

    cs.CL

    UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages

    Authors: Ehsaneddin Asgari, Fabienne Braune, Benjamin Roth, Christoph Ringlstetter, Mohammad R. K. Mofrad

    Abstract: In this paper, we introduce UniSent universal sentiment lexica for $1000+$ languages. Sentiment lexica are vital for sentiment analysis in absence of document-level annotations, a very common scenario for low-resource languages. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of the number of covered languages, including many low resource ones. In this work… ▽ More

    Submitted 28 November, 2019; v1 submitted 21 April, 2019; originally announced April 2019.

  13. arXiv:1810.03466  [pdf

    cs.LG cs.AI q-fin.RM

    Wide and Deep Learning for Peer-to-Peer Lending

    Authors: Kaveh Bastani, Elham Asgari, Hamed Namavari

    Abstract: This paper proposes a two-stage scoring approach to help lenders decide their fund allocations in the peer-to-peer (P2P) lending market. The existing scoring approaches focus on only either probability of default (PD) prediction, known as credit scoring, or profitability prediction, known as profit scoring, to identify the best loans for investment. Credit scoring fails to deliver the main need of… ▽ More

    Submitted 8 October, 2018; v1 submitted 4 October, 2018; originally announced October 2018.

  14. arXiv:1704.08914  [pdf, other

    cs.CL cs.AI cs.LG

    Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages

    Authors: Ehsaneddin Asgari, Hinrich Schütze

    Abstract: We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting - to the… ▽ More

    Submitted 14 September, 2017; v1 submitted 28 April, 2017; originally announced April 2017.

    Journal ref: Extended version of EMNLP 2017

  15. arXiv:1610.00479  [pdf, ps, other

    cs.CL

    Nonsymbolic Text Representation

    Authors: Hinrich Schuetze, Heike Adel, Ehsaneddin Asgari

    Abstract: We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that… ▽ More

    Submitted 1 May, 2017; v1 submitted 3 October, 2016; originally announced October 2016.

  16. arXiv:1604.08561  [pdf, other

    cs.CL

    Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance

    Authors: Ehsaneddin Asgari, Mohammad R. K. Mofrad

    Abstract: We introduce a new measure of distance between languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between languages. Using such a measure, we perform language comparison for fifty natural languages and twelve genetic languages. Our natural language dataset is a collection of sentence-al… ▽ More

    Submitted 28 April, 2016; originally announced April 2016.

  17. arXiv:1512.00397  [pdf

    q-bio.GN cs.AI

    A New Approach for Scalable Analysis of Microbial Communities

    Authors: Ehsaneddin Asgari, Kiavash Garakani, Mohammad R. K Mofrad

    Abstract: Microbial communities play important roles in the function and maintenance of various biosystems, ranging from human body to the environment. Current methods for analysis of microbial communities are typically based on taxonomic phylogenetic alignment using 16S rRNA metagenomic or Whole Genome Sequencing data. In typical characterizations of microbial communities, studies deal with billions of mic… ▽ More

    Submitted 1 December, 2015; originally announced December 2015.

  18. arXiv:1503.05140  [pdf

    q-bio.QM cs.AI cs.LG q-bio.GN

    ProtVec: A Continuous Distributed Representation of Biological Sequences

    Authors: Ehsaneddin Asgari, Mohammad R. K. Mofrad

    Abstract: We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we… ▽ More

    Submitted 26 May, 2016; v1 submitted 17 March, 2015; originally announced March 2015.

    Journal ref: PLoS ONE 10(11): e0141287, 2015