Skip to main content

Showing 1–20 of 20 results for author: Cho, W I

.
  1. arXiv:2404.11539  [pdf, other

    cs.CL

    Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis

    Authors: Soyoung Yang, Won Ik Cho

    Abstract: In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries betwee… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 10 pages

  2. arXiv:2404.09041  [pdf, other

    cs.CY

    Three Disclaimers for Safe Disclosure: A Cardwriter for Reporting the Use of Generative AI in Writing Process

    Authors: Won Ik Cho, Eunjung Cho, Hyeonji Shin

    Abstract: Generative artificial intelligence (AI) and large language models (LLMs) are increasingly being used in the academic writing process. This is despite the current lack of unified framework for reporting the use of machine assistance. In this work, we propose "Cardwriter", an intuitive interface that produces a short report for authors to declare their use of generative AI in their writing process.… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 6 pages; an implementation version of PaperCard project

  3. arXiv:2310.04824  [pdf, other

    cs.CY

    PaperCard for Reporting Machine Assistance in Academic Writing

    Authors: Won Ik Cho, Eunjung Cho, Kyunghyun Cho

    Abstract: Academic writing process has benefited from various technological developments over the years including search engines, automatic translators, and editing tools that review grammar and spelling mistakes. They have enabled human writers to become more efficient in writing academic papers, for example by hel** with finding relevant literature more effectively and polishing texts. While these devel… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted at EAAMO'23 as a poster presentation

  4. arXiv:2304.00350  [pdf, other

    cs.CL

    When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

    Authors: Won Ik Cho, Yoon Kyung Lee, Seoyeon Bae, Jihwan Kim, Sangah Park, Moosung Kim, Sowon Hahn, Nam Soo Kim

    Abstract: Building a natural language dataset requires caution since word semantics is vulnerable to subtle text change or the definition of the annotated concept. Such a tendency can be seen in generative tasks like question-answering and dialogue generation and also in tasks that create a categorization-based corpus, like topic classification or sentiment analysis. Open-domain conversations involve two or… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: Presented at HCOMP 2022 as Works-in-Progress

  5. arXiv:2204.02633  [pdf

    cs.CL cs.AI

    DAGAM: Data Augmentation with Generation And Modification

    Authors: Byeong-Cheol Jo, Tak-Sung Heo, Yeongjoon Park, Yongmin Yoo, Won Ik Cho, Kyungsun Kim

    Abstract: Text classification is a representative downstream task of natural language processing, and has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. Along with significant i… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  6. arXiv:2202.12459  [pdf, other

    cs.CL

    APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets

    Authors: Kichang Yang, Wonjun Jang, Won Ik Cho

    Abstract: In hate speech detection, develo** training and evaluation datasets across various domains is the critical issue. Whereas, major approaches crawl social media texts and hire crowd-workers to annotate the data. Following this convention often restricts the scope of pejorative expressions to a single domain lacking generalization. Sometimes domain overlap between training corpus and evaluation set… ▽ More

    Submitted 25 October, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: Findings of EMNLP 2022

  7. arXiv:2107.02875  [pdf, other

    cs.CL

    Kosp2e: Korean Speech to English Translation Corpus

    Authors: Won Ik Cho, Seok Min Kim, Hyunchang Cho, Nam Soo Kim

    Abstract: Most speech-to-text (S2T) translation studies use English speech as a source, which makes it difficult for non-English speakers to take advantage of the S2T technologies. For some languages, this problem was tackled through corpus construction, but the farther linguistically from English or the more under-resourced, this deficiency and underrepresentedness becomes more significant. In this paper,… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Interspeech 2021 Camera-ready

  8. arXiv:2105.09680  [pdf, other

    cs.CL

    KLUE: Korean Language Understanding Evaluation

    Authors: Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park , et al. (6 additional authors not shown)

    Abstract: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scrat… ▽ More

    Submitted 2 November, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: 76 pages, 10 figures, 36 tables

  9. arXiv:2103.13439  [pdf, other

    cs.CL

    StyleKQC: A Style-Variant Paraphrase Corpus for Korean Questions and Commands

    Authors: Won Ik Cho, Sangwhan Moon, Jong In Kim, Seok Min Kim, Nam Soo Kim

    Abstract: Paraphrasing is often performed with less concern for controlled style conversion. Especially for questions and commands, style-variant paraphrasing can be crucial in tone and manner, which also matters with industrial applications such as dialog systems. In this paper, we attack this issue with a corpus construction scheme that simultaneously considers the core content and style of directives, na… ▽ More

    Submitted 27 April, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: LREC 2022 Camera-ready

  10. Open Korean Corpora: A Practical Report

    Authors: Won Ik Cho, Sangwhan Moon, Youngsook Song

    Abstract: Korean is often referred to as a low-resource language in the research community. While this claim is partially true, it is also because the availability of resources is inadequately advertised and curated. This work curates and reviews a list of Korean corpora, first describing institution-level resource development, then further iterate through a list of current open datasets for different types… ▽ More

    Submitted 16 May, 2023; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Published in NLP-OSS @EMNLP2020; May 2023 version added with new datasets

  11. TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition

    Authors: Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, Won Ik Cho, Nam Soo Kim

    Abstract: In recent years, there has been a great deal of research in develo** end-to-end speech recognition models, which enable simplifying the traditional pipeline and achieving promising results. Despite their remarkable performance improvements, end-to-end models typically require expensive computational cost to show successful performance. To reduce this computational burden, knowledge distillation… ▽ More

    Submitted 16 September, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

  12. arXiv:2005.12503  [pdf, other

    cs.CL cs.CY

    BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection

    Authors: Jihyung Moon, Won Ik Cho, Junbum Lee

    Abstract: Toxic comments in online platforms are an unavoidable social issue under the cloak of anonymity. Hate speech detection has been actively done for languages such as English, German, or Italian, where manually labeled corpus has been released. In this work, we first present 9.4K manually labeled entertainment news comments for identifying Korean toxic speech, collected from a widely used online news… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: To be published in SocialNLP@ACL 2020

  13. arXiv:2005.08213  [pdf, other

    cs.CL cs.SD eess.AS

    Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

    Authors: Won Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim

    Abstract: Speech is one of the most effective means of communication and is full of information that helps the transmission of utterer's thoughts. However, mainly due to the cumbersome processing of acoustic features, phoneme or word posterior probability has frequently been discarded in understanding the natural language. Thus, some recent spoken language understanding (SLU) modules have utilized end-to-en… ▽ More

    Submitted 8 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020 Camera-ready

  14. arXiv:1912.00342  [pdf, other

    cs.CL

    Machines Getting with the Program: Understanding Intent Arguments of Non-Canonical Directives

    Authors: Won Ik Cho, Young Ki Moon, Sangwhan Moon, Seok Min Kim, Nam Soo Kim

    Abstract: Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective. Along with these requirements, agents are expected to extrapolate intent from the user's dialogue even when subjected to non-canonical forms of speech. This depends on the agent's comprehension of parap… ▽ More

    Submitted 7 October, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

    Comments: Findings of ACL: EMNLP 2020

  15. arXiv:1910.09275  [pdf, other

    cs.CL eess.AS

    Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution

    Authors: Won Ik Cho, Jeonghwa Cho, Woo Hyun Kang, Nam Soo Kim

    Abstract: Analyzing how human beings resolve syntactic ambiguity has long been an issue of interest in the field of linguistics. It is, at the same time, one of the most challenging issues for spoken language understanding (SLU) systems as well. As syntactic ambiguity is intertwined with issues regarding prosody and semantics, the computational approach toward speech intention identification is expected to… ▽ More

    Submitted 21 May, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: CogSci 2020 Camera-ready

  16. arXiv:1905.13656  [pdf, ps, other

    cs.CL

    Investigating an Effective Character-level Embedding in Korean Sentence Classification

    Authors: Won Ik Cho, Seok Min Kim, Nam Soo Kim

    Abstract: Different from the writing systems of many Romance and Germanic languages, some languages or language families show complex conjunct forms in character composition. For such cases where the conjuncts consist of the components representing consonant(s) and vowel, various character encoding schemes can be adopted beyond merely making up a one-hot vector. However, there has been little work done on i… ▽ More

    Submitted 18 September, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: PACLIC 33 Camera-ready

  17. arXiv:1905.11684  [pdf, other

    cs.CL

    On Measuring Gender Bias in Translation of Gender-neutral Pronouns

    Authors: Won Ik Cho, Ji Won Kim, Seok Min Kim, Nam Soo Kim

    Abstract: Ethics regarding social bias has recently thrown striking issues in natural language processing. Especially for gender-related topics, the need for a system that reduces the model bias has grown in areas such as image captioning, content recommendation, and automated employment. However, detection and evaluation of gender bias in the machine translation systems are not yet thoroughly investigated,… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: Accepted to 1st ACL Workshop on Gender Bias for Natural Language Processing (GeBNLP 2019)

  18. Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency

    Authors: Won Ik Cho, Hyeon Seung Lee, Ji Won Yoon, Seok Min Kim, Nam Soo Kim

    Abstract: For a large portion of real-life utterances, the intention cannot be solely decided by either their semantic or syntactic characteristics. Although not all the sociolinguistic and pragmatic information can be digitized, at least phonetic features are indispensable in understanding the spoken language. Especially in head-final languages such as Korean, sentence-final prosody has great importance in… ▽ More

    Submitted 26 June, 2022; v1 submitted 10 November, 2018; originally announced November 2018.

    Comments: 14 pages, 2 figures, 7 tables; Identical to the previous revision. The latest version of this manuscript is recently accepted at ACM TALLIP, with the modified title, authors, and contents (see the DOI below). Please refer to THIS version only when relevant to the analysis with speech data, and refer to the journal version to cite the protocol and dataset

  19. Giving Space to Your Message: Assistive Word Segmentation for the Electronic Ty** of Digital Minorities

    Authors: Won Ik Cho, Sung Jun Cheon, Woo Hyun Kang, Ji Won Kim, Nam Soo Kim

    Abstract: For readability and disambiguation of the written text, appropriate word segmentation is recommended for documentation, and it also holds for the digitized texts. If the language is agglutinative while far from scriptio continua, for instance in the Korean language, the problem becomes more significant. However, some device users these days find it challenging to communicate via key stroking, not… ▽ More

    Submitted 4 May, 2021; v1 submitted 31 October, 2018; originally announced October 2018.

    Comments: DIS 2021 Camera-ready

  20. arXiv:1810.04631  [pdf, ps, other

    cs.CL

    Extracting Arguments from Korean Question and Command: An Annotated Corpus for Structured Paraphrasing

    Authors: Won Ik Cho, Young Ki Moon, Woo Hyun Kang, Nam Soo Kim

    Abstract: Intention identification is a core issue in dialog management. However, due to the non-canonicality of the spoken language, it is difficult to extract the content automatically from the conversation-style utterances. This is much more challenging for languages like Korean and Japanese since the agglutination between morphemes make it difficult for the machines to parse the sentence and understand… ▽ More

    Submitted 9 July, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

    Comments: 5 pages and 2 tables, Annotation guideline for Seoul Korean sentences