Skip to main content

Showing 1–18 of 18 results for author: Şahin, G G

.
  1. arXiv:2403.03167  [pdf, other

    cs.CL

    PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset

    Authors: Arda Uzunoglu, Abdalfatah Rashid Safa, Gözde Gül Şahin

    Abstract: Recently, there has been growing interest within the community regarding whether large language models are capable of planning or executing plans. However, most prior studies use LLMs to generate high-level plans for simplified scenarios lacking linguistic complexity and domain diversity, limiting analysis of their planning abilities. These setups constrain evaluation methods (e.g., predefined act… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 9 pages, ACL 2024 Findings

  2. arXiv:2312.08722  [pdf, other

    cs.AI

    Quantifying Divergence for Human-AI Collaboration and Cognitive Trust

    Authors: Müge Kural, Ali Gebeşçe, Tilek Chubakov, Gözde Gül Şahin

    Abstract: Predicting the collaboration likelihood and measuring cognitive trust to AI systems is more important than ever. To do that, previous research mostly focus solely on the model features (e.g., accuracy, confidence) and ignore the human factor. To address that, we propose several decision-making similarity measures based on divergence metrics (e.g., KL, JSD) calculated over the labels acquired from… ▽ More

    Submitted 18 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  3. arXiv:2309.11346  [pdf, other

    cs.CL cs.LG

    GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

    Authors: Atakan Kara, Farrin Marouf Sofian, Andrew Bond, Gözde Gül Şahin

    Abstract: Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners. Develo** such tools requires a large amount of parallel, annotated data, which is unavailable for most languages. Synthetic data generation is a common practice to overcome the scarcity of such data. However, it is not straightforward for morphologically rich languages like… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at Findings of IJCNLP-AACL 2023

  4. arXiv:2309.06698  [pdf, other

    cs.CL

    Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish

    Authors: Arda Uzunoglu, Gözde Gül Şahin

    Abstract: Understanding procedural natural language (e.g., step-by-step instructions) is a crucial step to execution and planning. However, while there are ample corpora and downstream tasks available in English, the field lacks such resources for most languages. To address this gap, we conduct a case study on Turkish procedural texts. We first expand the number of tutorials in Turkish wikiHow from 2,000 to… ▽ More

    Submitted 6 March, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 9 pages

  5. arXiv:2307.14632  [pdf, other

    cs.CL cs.AI

    Metric-Based In-context Learning: A Case Study in Text Simplification

    Authors: Subha Vadlamannati, Gözde Gül Şahin

    Abstract: In-context learning (ICL) for large language models has proven to be a powerful approach for many natural language processing tasks. However, determining the best method to select examples for ICL is nontrivial as the results can vary greatly depending on the quality, quantity, and order of examples used. In this paper, we conduct a case study on text simplification (TS) to investigate how to sele… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to INLG

  6. arXiv:2304.12836  [pdf, other

    cs.CL

    Lessons Learned from a Citizen Science Project for Natural Language Processing

    Authors: Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, Gözde Gül Şahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho, Iryna Gurevych

    Abstract: Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted to EACL 2023. Code will be published on github: https://github.com/UKPLab/eacl2023-citizen-science-lessons-learned

  7. arXiv:2211.01736  [pdf, other

    cs.CL cs.AI cs.LG

    Transformers on Multilingual Clause-Level Morphology

    Authors: Emre Can Acikgoz, Tilek Chubakov, Müge Kural, Gözde Gül Şahin, Deniz Yuret

    Abstract: This paper describes our winning systems in MRL: The 1st Shared Task on Multilingual Clause-level Morphology (EMNLP 2022 Workshop) designed by KUIS AI NLP team. We present our work for all three parts of the shared task: inflection, reinflection, and analysis. We mainly explore transformers with two approaches: (i) training models from scratch in combination with data augmentation, and (ii) transf… ▽ More

    Submitted 13 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

  8. arXiv:2203.13693  [pdf, other

    cs.CL cs.IR

    UKP-SQUARE: An Online Platform for Question Answering Research

    Authors: Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych

    Abstract: Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that cons… ▽ More

    Submitted 28 March, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022 Demo Track

  9. arXiv:2112.01922  [pdf, other

    cs.CL cs.LG

    MetaQA: Combining Expert Agents for Multi-Skill Question Answering

    Authors: Haritz Puerto, Gözde Gül Şahin, Iryna Gurevych

    Abstract: The recent explosion of question answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training on multiple datasets or by combining multiple models. Despite the promising results of multi-dataset models, some domains or QA formats may require specific architectures, and thus the adaptability of these models migh… ▽ More

    Submitted 6 February, 2023; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: Accepted at EACL 2023

  10. arXiv:2111.14574  [pdf, other

    math.ST cs.LG

    On the rate of convergence of a classifier based on a Transformer encoder

    Authors: Iryna Gurevych, Michael Kohler, Gözde Gül Sahin

    Abstract: Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  11. arXiv:2111.09618  [pdf, other

    cs.CL cs.AI

    To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

    Authors: Gözde Gül Şahin

    Abstract: Data-hungry deep neural networks have established themselves as the standard for many NLP tasks including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind of their statistical counter-parts in low-resource scenarios. One methodology to counter attack this problem is text augmentation, i.e., generating new synthetic… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Accepted to Computational Linguistics

  12. arXiv:2004.13161  [pdf, other

    cs.CL cs.AI

    PuzzLing Machines: A Challenge on Learning From Small Data

    Authors: Gözde Gül Şahin, Yova Kementchedjhieva, Phillip Rust, Iryna Gurevych

    Abstract: Deep neural models have repeatedly proved excellent at memorizing surface patterns from large datasets for various ML and NLP benchmarks. They struggle to achieve human-like thinking, however, because they lack the skill of iterative reasoning upon knowledge. To expose this problem in a new light, we introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta St… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted to ACL 2020

  13. arXiv:1912.05274  [pdf, other

    cs.CL cs.LG

    Two Birds with One Stone: Investigating Invertible Neural Networks for Inverse Problems in Morphology

    Authors: Gözde Gül Şahin, Iryna Gurevych

    Abstract: Most problems in natural language processing can be approximated as inverse problems such as analysis and generation at variety of levels from morphological (e.g., cat+Plural <-> cats) to semantic (e.g., (call + 1 2) <-> "Calculate one plus two."). Although the tasks in both directions are closely related, general approach in the field has been to design separate models specific for each task. How… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: AAAI 2020

  14. arXiv:1907.11438  [pdf, other

    cs.CL

    LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations

    Authors: Max Eichler, Gözde Gül Şahin, Iryna Gurevych

    Abstract: We present LINSPECTOR WEB, an open source multilingual inspector to analyze word representations. Our system provides researchers working in low-resource settings with an easily accessible web based probing tool to gain quick insights into their word embeddings especially outside of the English language. To do this we employ 16 simple linguistic probing tasks such as gender, case marking, and tens… ▽ More

    Submitted 15 October, 2019; v1 submitted 26 July, 2019; originally announced July 2019.

    Comments: Accepted at EMNLP 2019 System Demonstrations

  15. arXiv:1903.11508  [pdf, other

    cs.CL cs.CR cs.CV cs.LG

    Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

    Authors: Steffen Eger, Gözde Gül Şahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

    Abstract: Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate th… ▽ More

    Submitted 10 June, 2020; v1 submitted 27 March, 2019; originally announced March 2019.

    Comments: Accepted as long paper at NAACL-2019; fixed one ungrammatical sentence

  16. arXiv:1903.09460  [pdf, other

    cs.CL

    Data Augmentation via Dependency Tree Morphing for Low-Resource Languages

    Authors: Gözde Gül Şahin, Mark Steedman

    Abstract: Neural NLP systems achieve high scores in the presence of sizable training dataset. Lack of such datasets leads to poor system performances in the case low-resource languages. We present two simple text augmentation techniques using dependency trees, inspired from image processing. We crop sentences by removing dependency links, and we rotate sentences by moving the tree fragments around the root.… ▽ More

    Submitted 22 March, 2019; originally announced March 2019.

  17. arXiv:1903.09442  [pdf, other

    cs.CL

    LINSPECTOR: Multilingual Probing Tasks for Word Representations

    Authors: Gözde Gül Şahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

    Abstract: Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation… ▽ More

    Submitted 11 December, 2019; v1 submitted 22 March, 2019; originally announced March 2019.

    Comments: Demo is available from: https://linspector.ukp.informatik.tu-darmstadt.de/

  18. arXiv:1805.11937  [pdf, other

    cs.CL

    Character-Level Models versus Morphology in Semantic Role Labeling

    Authors: Gözde Gül Şahin, Mark Steedman

    Abstract: Character-level models have become a popular approach specially for their accessibility and ability to handle unseen data. However, little is known on their ability to reveal the underlying morphological structure of a word, which is a crucial skill for high-level semantic analysis tasks, such as semantic role labeling (SRL). In this work, we train various types of SRL models that use word, charac… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: Accepted for publication at the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)