Skip to main content

Showing 1–8 of 8 results for author: Tsai, R T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.12024  [pdf, other

    cs.CL

    Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems

    Authors: Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai

    Abstract: Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Taiwanese Hokkien are relatively under-explored. The study aims to address this gap by develo** a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English. We employ a pre-trained LLaMA 2-7B model specialized in Traditional Mandarin Chinese to l… ▽ More

    Submitted 14 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024 as a long oral paper

  2. arXiv:2402.01685  [pdf, other

    cs.CL cs.AI cs.DB

    SMUTF: Schema Matching Using Generative Tags and Hybrid Features

    Authors: Yu Zhang, Mei Di, Haozheng Luo, Chenwei Xu, Richard Tzong-Han Tsai

    Abstract: We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the… ▽ More

    Submitted 6 February, 2024; v1 submitted 22 January, 2024; originally announced February 2024.

  3. arXiv:2310.04799  [pdf, other

    cs.CL

    Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

    Authors: Shih-Cheng Huang, Pin-Zu Li, Yu-Chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tzong-Han Tsai, Hung-yi Lee

    Abstract: Recently, the development of open-source large language models (LLMs) has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic.… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: ACL 2024 camera-ready version

  4. arXiv:2308.15118  [pdf, other

    cs.CL

    Large Language Models on the Chessboard: A Study on ChatGPT's Formal Language Comprehension and Complex Reasoning Skills

    Authors: Mu-Tien Kuo, Chih-Chung Hsueh, Richard Tzong-Han Tsai

    Abstract: While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining bot… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  5. arXiv:2301.08937  [pdf, other

    cs.CL cs.AI

    Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

    Authors: Sin-En Lu, Bo-Han Lu, Chao-Yi Lu, Richard Tzong-Han Tsai

    Abstract: In natural language processing (NLP), code-mixing (CM) is a challenging task, especially when the mixed languages include dialects. In Southeast Asian countries such as Singapore, Indonesia, and Malaysia, Hokkien-Mandarin is the most widespread code-mixed language pair among Chinese immigrants, and it is also common in Taiwan. However, dialects such as Hokkien often have a scarcity of resources an… ▽ More

    Submitted 21 January, 2023; originally announced January 2023.

    Comments: The paper was accepted by EMNLP 2022 findings

  6. arXiv:2206.07860  [pdf, other

    cs.SD cs.LG eess.AS

    EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

    Authors: Li-Chin Chen, Po-Hsun Chen, Richard Tzong-Han Tsai, Yu Tsao

    Abstract: Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been ad… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: Accepted By IEEE Signal Processing Letter

    Journal ref: IEEE Signal Processing Letters, vol. 29, p. 2582-2586, 2022

  7. Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

    Authors: Ming-Siang Huang, Po-Ting Lai, Richard Tzong-Han Tsai, Wen-Lian Hsu

    Abstract: The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the mentioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

    Comments: 17 pages

    Journal ref: Briefings in Bioinformatics, 2020, bbaa054

  8. Textual Analysis for Studying Chinese Historical Documents and Literary Novels

    Authors: Chao-Lin Liu, Guan-Tao **, Hongsu Wang, Qing-Feng Liu, Wen-Huei Cheng, Wei-Yun Chiu, Richard Tzong-Han Tsai, Yu-Chun Wang

    Abstract: We analyzed historical and literary documents in Chinese to gain insights into research issues, and overview our studies which utilized four different sources of text materials in this paper. We investigated the history of concepts and transliterated words in China with the Database for the Study of Modern China Thought and Literature, which contains historical documents about China between 1830 a… ▽ More

    Submitted 11 October, 2015; originally announced October 2015.

    Comments: 11 pages, 7 figures, 2 tables, The Fourth ASE International Conference on Social Informatics