Skip to main content

Showing 1–11 of 11 results for author: Rikters, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.05041  [pdf, other

    cs.SI cs.AI cs.CL

    What Food Do We Tweet about on a Rainy Day?

    Authors: Maija Kāle, Matīss Rikters

    Abstract: Food choice is a complex phenomenon shaped by factors such as taste, ambience, culture or weather. In this paper, we explore food-related tweeting in different weather conditions. We inspect a Latvian food tweet dataset spanning the past decade in conjunction with a weather observation dataset consisting of average temperature, precipitation, and other phenomena. We find which weather conditions l… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Journal ref: Published in the proceedings of The 29th Annual Conference of the Association for Natural Language Processing (NLP2023)

  2. How Masterly Are People at Playing with Their Vocabulary? Analysis of the Wordle Game for Latvian

    Authors: Matīss Rikters, Sanita Reinsone

    Abstract: In this paper, we describe adaptation of a simple word guessing game that occupied the hearts and minds of people around the world. There are versions for all three Baltic countries and even several versions of each. We specifically pay attention to the Latvian version and look into how people form their guesses given any already uncovered hints. The paper analyses guess patterns, easy and difficu… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Journal ref: In Proceedings of the 10th Conference Human Language Technologies - The Baltic Perspective (Baltic HLT 2022)

  3. arXiv:2109.02995  [pdf, other

    cs.CL

    Revisiting Context Choices for Context-aware Machine Translation

    Authors: Matīss Rikters, Toshiaki Nakazawa

    Abstract: One of the most popular methods for context-aware machine translation (MT) is to use separate encoders for the source sentence and context as multiple sources for one target sentence. Recent work has cast doubt on whether these models actually learn useful signals from the context or are improvements in automatic evaluation metrics just a side-effect. We show that multi-source transformer models i… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  4. arXiv:2106.04903  [pdf, other

    cs.CL

    Fragmented and Valuable: Following Sentiment Changes in Food Tweets

    Authors: Maija Kāle, Matīss Rikters

    Abstract: We analysed sentiment and frequencies related to smell, taste and temperature expressed by food tweets in the Latvian language. To get a better understanding of the role of smell, taste and temperature in the mental map of food associations, we looked at such categories as 'tasty' and 'healthy', which turned out to be mutually exclusive. By analysing the occurrence frequency of words associated wi… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Journal ref: Published in Smell, Taste, and Temperature Interfaces CHI 2021 workshop

  5. arXiv:2012.06143  [pdf, ps, other

    cs.CL

    Document-aligned Japanese-English Conversation Parallel Corpus

    Authors: Matīss Rikters, Ryokan Ri, Tong Li, Toshiaki Nakazawa

    Abstract: Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: Published in proceedings of the Fifth Conference on Machine Translation, 2020

    Journal ref: Proceedings of the Fifth Conference on Machine Translation (2020), pages 637-643

  6. arXiv:2008.01940  [pdf, other

    cs.CL

    Designing the Business Conversation Corpus

    Authors: Matīss Rikters, Ryokan Ri, Tong Li, Toshiaki Nakazawa

    Abstract: While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newl… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Journal ref: Published in proceedings of the 6th Workshop on Asian Translation, 2019

  7. arXiv:2007.05194  [pdf, other

    cs.CL

    What Can We Learn From Almost a Decade of Food Tweets

    Authors: Uga Sproģis, Matīss Rikters

    Abstract: We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking. The corpus has been collected over time-span of over 8 years and includes over 2 million tweets entailed with additional useful data. We also separate two sub-corpora of question and answer tweets and sentiment annotated tweets. We analyse contents of the corpus and demo… ▽ More

    Submitted 1 September, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

    Journal ref: In Proceedings of the 9th Conference Human Language Technologies - The Baltic Perspective (Baltic HLT 2020)

  8. Impact of Corpora Quality on Neural Machine Translation

    Authors: Matīss Rikters

    Abstract: Large parallel corpora that are automatically obtained from the web, documents or elsewhere often exhibit many corrupted parts that are bound to negatively affect the quality of the systems and models that learn from these corpora. This paper describes frequent problems found in data and such data affects neural machine translation systems, as well as how to identify and deal with them. The soluti… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

    Journal ref: Published in the proceedings of the 8th International Baltic Human Language Technologies Conference (Baltic HLT 2018), held in Tartu, Estonia, on 27-29 September 2018

  9. arXiv:1808.02733  [pdf, other

    cs.CL

    Debugging Neural Machine Translations

    Authors: Matīss Rikters

    Abstract: In this paper, we describe a tool for debugging the output and attention weights of neural machine translation (NMT) systems and for improved estimations of confidence about the output based on the attention. The purpose of the tool is to help researchers and developers find weak and faulty example translations that their NMT systems produce without the need for reference translations. Our tool al… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

    Journal ref: Baltic DB&IS 2018 Joint Proceedings of the Conference Forum, Trakai, Lithuania, 2018

  10. arXiv:1710.06313  [pdf, other

    cs.CL

    Paying Attention to Multi-Word Expressions in Neural Machine Translation

    Authors: Matīss Rikters, Ondřej Bojar

    Abstract: Processing of multi-word expressions (MWEs) is a known problem for any natural language processing task. Even neural machine translation (NMT) struggles to overcome it. This paper presents results of experiments on investigating NMT attention allocation to the MWEs and improving automated translation of sentences that contain MWEs in English->Latvian and English->Czech NMT systems. Two improvement… ▽ More

    Submitted 4 May, 2019; v1 submitted 17 October, 2017; originally announced October 2017.

    Journal ref: Published in Machine Translation Summit XVI, Nagoya, Japan, September 2017

  11. arXiv:1710.03743  [pdf, other

    cs.CL

    Confidence through Attention

    Authors: Matīss Rikters, Mark Fishel

    Abstract: Attention distributions of the generated translations are a useful bi-product of attention-based recurrent neural network translation models and can be treated as soft alignments between the input and output tokens. In this work, we use attention distributions as a confidence metric for output translations. We present two strategies of using the attention distributions: filtering out bad translati… ▽ More

    Submitted 10 October, 2017; originally announced October 2017.

    Journal ref: Machine Translation Summit XVI, Nagoya, Japan, September 2017