Skip to main content

Showing 1–16 of 16 results for author: Morishita, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.09017  [pdf, other

    cs.CL

    A Japanese-Chinese Parallel Corpus Using Crowdsourcing for Web Mining

    Authors: Masaaki Nagata, Makoto Morishita, Katsuki Chousa, Norihito Yasuda

    Abstract: Using crowdsourcing, we collected more than 10,000 URL pairs (parallel top page pairs) of bilingual websites that contain parallel documents and created a Japanese-Chinese parallel corpus of 4.6M sentence pairs from these websites. We used a Japanese-Chinese bilingual dictionary of 160K word pairs for document and sentence alignment. We then used high-quality 1.2M Japanese-Chinese sentence pairs t… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Work in progress

  2. arXiv:2404.15752  [pdf

    cs.PF

    Performance Evaluation of CMOS Annealing with Support Vector Machine

    Authors: Ryoga Fukuhara, Makoto Morishita, Takahiro Katagiri, Masatoshi Kawai, Toru Nagai, Tetsuya Hoshino

    Abstract: In this paper, support vector machine (SVM) performance was assessed utilizing a quantum-inspired complementary metal-oxide semiconductor (CMOS) annealer. The primary focus during performance evaluation was the accuracy rate in binary classification problems. A comparative analysis was conducted between SVM running on a CPU (classical computation) and executed on a quantum-inspired annealer. The p… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  3. arXiv:2404.09002  [pdf, other

    cs.CL

    WikiSplit++: Easy Data Refinement for Split and Rephrase

    Authors: Hayato Tsukagoshi, Tsutomu Hirao, Makoto Morishita, Katsuki Chousa, Ryohei Sasano, Koichi Takeda

    Abstract: The task of Split and Rephrase, which splits a complex sentence into multiple simple sentences with the same meaning, improves readability and enhances the performance of downstream tasks in natural language processing (NLP). However, while Split and Rephrase can be improved using a text-to-text generation approach that applies encoder-decoder models fine-tuned with a large-scale dataset, it still… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted at LREC-COLING 2024

  4. arXiv:2402.09344  [pdf, other

    cs.CL

    Generating Diverse Translation with Perturbed kNN-MT

    Authors: Yuto Nishida, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe

    Abstract: Generating multiple translation candidates would enable users to choose the one that satisfies their needs. Although there has been work on diversified generation, there exists room for improving the diversity mainly because the previous methods do not address the overcorrection problem -- the model underestimates a prediction that is largely different from the training data, even if that predicti… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL 2024 SRW

  5. arXiv:2311.11690  [pdf, other

    cs.PL cs.AI cs.CL cs.SE

    Refactoring Programs Using Large Language Models with Few-Shot Examples

    Authors: Atsushi Shirafuji, Yusuke Oda, Jun Suzuki, Makoto Morishita, Yutaka Watanobe

    Abstract: A less complex and more straightforward program is a crucial factor that enhances its maintainability and makes writing secure and bug-free programs easier. However, due to its heavy workload and the risks of breaking the working programs, programmers are reluctant to do code refactoring, and thus, it also causes the loss of potential learning experiences. To mitigate this, we demonstrate the appl… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 10 pages, 10 figures, accepted to the 30th Asia-Pacific Software Engineering Conference (APSEC 2023)

  6. Chat Translation Error Detection for Assisting Cross-lingual Communications

    Authors: Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Ryoko Tokuhisa, Ana Brassard, Kentaro Inui

    Abstract: In this paper, we describe the development of a communication support system that detects erroneous translations to facilitate crosslingual communications due to the limitations of current machine chat translation methods. We trained an error detector as the baseline of the system and constructed a new Japanese-English bilingual chat corpus, BPersona-chat, which comprises multiturn colloquial chat… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems, pages 88-95, November 2022, Online. Association for Computational Linguistics

  7. arXiv:2306.14583  [pdf, ps, other

    cs.CL cs.AI cs.SE

    Exploring the Robustness of Large Language Models for Solving Programming Problems

    Authors: Atsushi Shirafuji, Yutaka Watanobe, Takumi Ito, Makoto Morishita, Yuki Nakamura, Yusuke Oda, Jun Suzuki

    Abstract: Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However, the extent to which LLMs understand problem descriptions and generate programs accordingly or just retrieve source code from the most relevant problem in traini… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  8. arXiv:2210.15861  [pdf, other

    cs.CL

    Domain Adaptation of Machine Translation with Crowdworkers

    Authors: Makoto Morishita, Jun Suzuki, Masaaki Nagata

    Abstract: Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propos… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022 Industry Track

  9. arXiv:2202.12607  [pdf, ps, other

    cs.CL

    JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus

    Authors: Makoto Morishita, Katsuki Chousa, Jun Suzuki, Masaaki Nagata

    Abstract: Most current machine translation models are mainly trained with parallel corpora, and their translation accuracy largely depends on the quality and quantity of the corpora. Although there are billions of parallel sentences for a few language pairs, effectively dealing with most language pairs is difficult due to a lack of publicly available parallel corpora. This paper creates a large parallel cor… ▽ More

    Submitted 28 February, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: 7 pages

  10. arXiv:2106.05450  [pdf, other

    cs.CL

    Input Augmentation Improves Constrained Beam Search for Neural Machine Translation: NTT at WAT 2021

    Authors: Katsuki Chousa, Makoto Morishita

    Abstract: This paper describes our systems that were submitted to the restricted translation task at WAT 2021. In this task, the systems are required to output translated sentences that contain all given word constraints. Our system combined input augmentation and constrained beam search algorithms. Through experiments, we found that this combination significantly improves translation accuracy and can save… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: 9 pages, 4 figures, WAT 2021 Restricted Translation Task

  11. arXiv:2011.02121  [pdf, other

    cs.CL

    PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

    Authors: Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui

    Abstract: Neural Machine Translation (NMT) has shown drastic improvement in its quality when translating clean input, such as text from the news domain. However, existing studies suggest that NMT still struggles with certain kinds of input with considerable noise, such as User-Generated Contents (UGC) on the Internet. To make better use of NMT for cross-cultural communication, one of the most promising dire… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: 15 pages, 4 figures, accepted at COLING 2020

  12. arXiv:2003.10784  [pdf, other

    cs.NI cs.LG stat.AP stat.ML

    Recovery command generation towards automatic recovery in ICT systems by Seq2Seq learning

    Authors: Hiroki Ikeuchi, Akio Watanabe, Tsutomu Hirao, Makoto Morishita, Masaaki Nishino, Yoichi Matsuo, Keishiro Watanabe

    Abstract: With the increase in scale and complexity of ICT systems, their operation increasingly requires automatic recovery from failures. Although it has become possible to automatically detect anomalies and analyze root causes of failures with current methods, making decisions on what commands should be executed to recover from failures still depends on manual operation, which is quite time-consuming. To… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

    Comments: accepted for IEEE/IFIP Network Operations and Management Symposium 2020 (NOMS2020)

  13. arXiv:1911.10668  [pdf, other

    cs.CL

    JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus

    Authors: Makoto Morishita, Jun Suzuki, Masaaki Nagata

    Abstract: Recent machine translation algorithms mainly rely on parallel corpora. However, since the availability of parallel corpora remains limited, only some resource-rich language pairs can benefit from them. We constructed a parallel corpus for English-Japanese, for which the amount of publicly available parallel corpora is still limited. We constructed the parallel corpus by broadly crawling the web an… ▽ More

    Submitted 15 March, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

    Comments: http://www.kecl.ntt.co.jp/icl/lirg/jparacrawl/ LREC 2020, Camera Ready

  14. arXiv:1907.03927  [pdf, other

    cs.CL

    NTT's Machine Translation Systems for WMT19 Robustness Task

    Authors: Soichiro Murakami, Makoto Morishita, Tsutomu Hirao, Masaaki Nagata

    Abstract: This paper describes NTT's submission to the WMT19 robustness task. This task mainly focuses on translating noisy text (e.g., posts on Twitter), which presents different difficulties from typical translation tasks such as news. Our submission combined techniques including utilization of a synthetic corpus, domain adaptation, and a placeholder mechanism, which significantly improved over the previo… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

    Comments: submitted to WMT 2019

  15. arXiv:1706.05765  [pdf, other

    cs.CL

    An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation

    Authors: Makoto Morishita, Yusuke Oda, Graham Neubig, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the a… ▽ More

    Submitted 18 June, 2017; originally announced June 2017.

    Comments: 8 pages, accepted to the First Workshop on Neural Machine Translation

  16. arXiv:1510.05203  [pdf, other

    cs.CL

    Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015

    Authors: Graham Neubig, Makoto Morishita, Satoshi Nakamura

    Abstract: This year, the Nara Institute of Science and Technology (NAIST)'s submission to the 2015 Workshop on Asian Translation was based on syntax-based statistical machine translation, with the addition of a reranking component using neural attentional machine translation models. Experiments re-confirmed results from previous work stating that neural MT reranking provides a large gain in objective evalua… ▽ More

    Submitted 18 October, 2015; originally announced October 2015.

    Comments: 7 pages, 1 figure

    Journal ref: Proceedings of the 2nd Workshop on Asian Translation (WAT), pp. 35-41, 2015