Skip to main content

Showing 1–8 of 8 results for author: Yada, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2403.18336  [pdf, other

    cs.CL cs.LG

    A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages

    Authors: Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum

    Abstract: User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social m… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  3. arXiv:2306.14379  [pdf, other

    cs.HC

    HeaRT: Health Record Timeliner to visualise patients' medical history from health record text

    Authors: Shuntaro Yada, Eiji Aramaki

    Abstract: Electronic health records (EHRs), which contain patients' medical histories, tend to be written in freely formatted (unstructured) text because they are complicated by their nature. Quickly understanding a patient's history is challenging and critical because writing styles vary among doctors, which may even cause clinical incidents. This paper proposes a Health Record Timeliner system (HeaRT), wh… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: Full evaluation results at: https://github.com/shuntaroy/heart-evaluation

  4. arXiv:2111.04261  [pdf, other

    cs.CL cs.AI

    JaMIE: A Pipeline Japanese Medical Information Extraction System

    Authors: Fei Cheng, Shuntaro Yada, Ribeka Tanaka, Eiji Aramaki, Sadao Kurohashi

    Abstract: We present an open-access natural language processing toolkit for Japanese medical information extraction. We first propose a novel relation annotation schema for investigating the medical and temporal relations between medical entities in Japanese medical reports. We experiment with the practical annotation scenarios by separately annotating two different types of reports. We design a pipeline sy… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

    Comments: 8 pages

  5. arXiv:2104.10493  [pdf, other

    cs.CL

    End-to-end Biomedical Entity Linking with Span-based Dictionary Matching

    Authors: Shogo Ujiie, Hayate Iso, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

    Abstract: Disease name recognition and normalization, which is generally called biomedical entity linking, is a fundamental process in biomedical text mining. Recently, neural joint learning of both tasks has been proposed to utilize the mutual benefits. While this approach achieves high performance, disease concepts that do not appear in the training dataset cannot be accurately predicted. This study intro… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

  6. arXiv:2101.00036  [pdf, other

    cs.CL

    KART: Parameterization of Privacy Leakage Scenarios from Pre-trained Language Models

    Authors: Yuta Nakamura, Shouhei Hanaoka, Yukihiro Nomura, Naoto Hayashi, Osamu Abe, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

    Abstract: For the safe sharing pre-trained language models, no guidelines exist at present owing to the difficulty in estimating the upper bound of the risk of privacy leakage. One problem is that previous studies have assessed the risk for different real-world privacy leakage scenarios and attack methods, which reduces the portability of the findings. To tackle this problem, we represent complex real-world… ▽ More

    Submitted 17 March, 2022; v1 submitted 31 December, 2020; originally announced January 2021.

  7. Syndromic surveillance using search query logs and user location information from smartphones against COVID-19 clusters in Japan

    Authors: Shohei Hisada, Taichi Murayama, Kota Tsubouchi, Sumio Fujita, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

    Abstract: [Background] Two clusters of coronavirus disease 2019 (COVID-19) were confirmed in Hokkaido, Japan in February 2020. To capture the clusters, this study employs Web search query logs and user location information from smartphones. [Material and Methods] First, we anonymously identified smartphone users who used a Web search engine (Yahoo! JAPAN Search) for the COVID-19 or its symptoms via its comp… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

  8. arXiv:2004.08145  [pdf, other

    cs.SI cs.IR

    NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset

    Authors: Zhiwei Gao, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

    Abstract: Since the outbreak of coronavirus disease 2019 (COVID-19) in the late 2019, it has affected over 200 countries and billions of people worldwide. This has affected the social life of people owing to enforcements, such as "social distancing" and "stay at home." This has resulted in an increasing interaction through social media. Given that social media can bring us valuable information about COVID-1… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.