Skip to main content

Showing 1–4 of 4 results for author: Pylypenko, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.13170  [pdf, other

    cs.CL

    Measuring Spurious Correlation in Classification: 'Clever Hans' in Translationese

    Authors: Angana Borah, Daria Pylypenko, Cristina Espana-Bonet, Josef van Genabith

    Abstract: Recent work has shown evidence of 'Clever Hans' behavior in high-performance neural translationese classifiers, where BERT-based classifiers capitalize on spurious correlations, in particular topic information, between data and target classification labels, rather than genuine translationese signals. Translationese signals are subtle (especially for professional translation) and compete with many… ▽ More

    Submitted 11 June, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: Accepted to RANLP 2023 (oral)

  2. arXiv:2210.13391  [pdf, other

    cs.CL

    Explaining Translationese: why are Neural Classifiers Better and what do they Learn?

    Authors: Kwabena Amponsah-Kaakyire, Daria Pylypenko, Josef van Genabith, Cristina España-Bonet

    Abstract: Recent work has shown that neural feature- and representation-learning, e.g. BERT, achieves superior performance over traditional manual feature engineering based approaches, with e.g. SVMs, in translationese classification tasks. Previous research did not show $(i)$ whether the difference is because of the features, the classifiers or both, and $(ii)$ what the neural classifiers actually learn. T… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: 16 pages, 7 figures, 4 tables. The first 2 authors contributed equally. Accepted to BlackboxNLP 2022 (at EMNLP 2022)

  3. arXiv:2109.07604  [pdf, other

    cs.CL

    Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

    Authors: Daria Pylypenko, Kwabena Amponsah-Kaakyire, Koel Dutta Chowdhury, Josef van Genabith, Cristina España-Bonet

    Abstract: Traditional hand-crafted linguistically-informed features have often been used for distinguishing between translated and original non-translated texts. By contrast, to date, neural architectures without manual feature engineering have been less explored for this task. In this work, we (i) compare the traditional feature-engineering-based approach to the feature-learning-based one and (ii) analyse… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: 9 pages, 5 pages appendix, 2 figures, 7 tables. The first 3 authors contributed equally. Accepted to EMNLP 2021, Main Conference

  4. arXiv:2103.17250  [pdf, other

    cs.CL cs.LG

    Leveraging Neural Machine Translation for Word Alignment

    Authors: Vilém Zouhar, Daria Pylypenko

    Abstract: The most common tools for word-alignment rely on a large amount of parallel sentences, which are then usually processed according to one of the IBM model algorithms. The training data is, however, the same as for machine translation (MT) systems, especially for neural MT (NMT), which itself is able to produce word-alignments using the trained attention heads. This is convenient because word-alignm… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

    Comments: 16 pages (without references). To be published in PBML 116