Skip to main content

Showing 1–18 of 18 results for author: Kim, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18675  [pdf, other

    cs.HC cs.AI cs.CL

    Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants

    Authors: Minhwa Lee, Zae Myung Kim, Vivek A. Khetan, Dongyeop Kang

    Abstract: Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific wri… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to CHI 2024 In2Writing Workshop

  2. arXiv:2403.01078  [pdf, other

    cs.LG cs.AI physics.bio-ph q-bio.GN

    $Γ$-VAE: Curvature regularized variational autoencoders for uncovering emergent low dimensional geometric structure in high dimensional data

    Authors: Jason Z. Kim, Nicolas Perrin-Gilbert, Erkan Narmanli, Paul Klein, Christopher R. Myers, Itai Cohen, Joshua J. Waterfall, James P. Sethna

    Abstract: Natural systems with emergent behaviors often organize along low-dimensional subsets of high-dimensional spaces. For example, despite the tens of thousands of genes in the human genome, the principled study of genomics is fruitful because biological processes rely on coordinated organization that results in lower dimensional phenotypes. To uncover this organization, many nonlinear dimensionality r… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  3. arXiv:2402.10586  [pdf, other

    cs.CL cs.AI

    Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs

    Authors: Zae Myung Kim, Kwang Hee Lee, Preston Zhu, Vipul Raheja, Dongyeop Kang

    Abstract: With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel metho… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 26 pages, accepted at ACL 2024 (Main)

  4. arXiv:2401.14698  [pdf, other

    cs.CL cs.AI

    Under the Surface: Tracking the Artifactuality of LLM-Generated Data

    Authors: Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee, Zae Myung Kim, Shirley Anugrah Hayati, Risako Owan, Bin Hu, Ritik Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy, Vincent Liu, Dongyeop Kang

    Abstract: This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant c… ▽ More

    Submitted 30 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Core Authors: Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee and Zae Myung Kim | Project lead : Debarati Das | PI : Dongyeop Kang

  5. arXiv:2309.17012  [pdf, other

    cs.CL cs.AI cs.LG

    Benchmarking Cognitive Biases in Large Language Models as Evaluators

    Authors: Ryan Koo, Minhwa Lee, Vipul Raheja, Jong Inn Park, Zae Myung Kim, Dongyeop Kang

    Abstract: Large Language Models (LLMs) have recently been shown to be effective as automatic evaluators with simple prompting and in-context learning. In this work, we assemble 15 LLMs of four different size ranges and evaluate their output responses by preference ranking from the other LLMs as evaluators, such as System Star is better than System Square. We then evaluate the quality of ranking outputs intr… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Under review at ICLR 2024. 26 pages, 8 figures, 7 tables

  6. arXiv:2306.04043  [pdf, other

    cs.CL

    An Analysis of Reader Engagement in Literary Fiction through Eye Tracking and Linguistic Features

    Authors: Rose Neis, Karin de Langis, Zae Myung Kim, Dongyeop Kang

    Abstract: Capturing readers' engagement in fiction is a challenging but important aspect of narrative understanding. In this study, we collected 23 readers' reactions to 2 short stories through eye tracking, sentence-level annotations, and an overall engagement scale survey. We analyzed the significance of various qualities of the text in predicting how engaging a reader is likely to find it. As enjoyment o… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: 9 pages, 4 figures

  7. arXiv:2305.14671  [pdf, other

    cs.CL

    A Survey of Diffusion Models in Natural Language Processing

    Authors: Hao Zou, Zae Myung Kim, Dongyeop Kang

    Abstract: This survey paper provides a comprehensive review of the use of diffusion models in natural language processing (NLP). Diffusion models are a class of mathematical models that aim to capture the diffusion of information or signals across a network or manifold. In NLP, diffusion models have been used in a variety of applications, such as natural language generation, sentiment analysis, topic modeli… ▽ More

    Submitted 14 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: We changed the title of the paper due to a conflict with a previous paper

  8. arXiv:2305.13826  [pdf, other

    cs.CL

    "Is the Pope Catholic?" Applying Chain-of-Thought Reasoning to Understanding Conversational Implicatures

    Authors: Zae Myung Kim, David E. Taylor, Dongyeop Kang

    Abstract: Conversational implicatures are pragmatic inferences that require listeners to deduce the intended meaning conveyed by a speaker from their explicit utterances. Although such inferential reasoning is fundamental to human communication, recent research indicates that large language models struggle to comprehend these implicatures as effectively as the average human. This paper demonstrates that by… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  9. arXiv:2212.01350  [pdf, other

    cs.CL

    Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks

    Authors: Zae Myung Kim, Wanyu Du, Vipul Raheja, Dhruv Kumar, Dongyeop Kang

    Abstract: Iterative text revision improves text quality by fixing grammatical errors, rephrasing for better readability or contextual appropriateness, or reorganizing sentence structures throughout a document. Most recent research has focused on understanding and classifying different types of edits in the iterative revision process from human-written text instead of building accurate and robust systems for… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: 14 pages, accepted at EMNLP 2022 conference as a full paper

  10. arXiv:2206.01326  [pdf, other

    cs.CV cs.CY cs.LG

    Improving Fairness in Large-Scale Object Recognition by CrowdSourced Demographic Information

    Authors: Zu Kim, André Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green, N'Mah Fodiatu Yilla, Tobias Weyand

    Abstract: There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML f… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  11. Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

    Authors: Wanyu Du, Zae Myung Kim, Vipul Raheja, Dhruv Kumar, Dongyeop Kang

    Abstract: Revision is an essential part of the human writing process. It tends to be strategic, adaptive, and, more importantly, iterative in nature. Despite the success of large language models on text revision tasks, they are limited to non-iterative, one-shot revisions. Examining and evaluating the capability of large language models for making continuous revisions and collaborating with human writers is… ▽ More

    Submitted 23 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted by The First Workshop on Intelligent and Interactive Writing Assistants at ACL2022

  12. Understanding Iterative Revision from Human-Written Text

    Authors: Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez, Dongyeop Kang

    Abstract: Writing is, by nature, a strategic, adaptive, and more importantly, an iterative process. A crucial part of writing is editing and revising the text. Previous works on text revision have focused on defining edit intention taxonomies within a single domain or develo** computational models with a single level of edit granularity, such as sentence-level edits, which differ from human's revision cyc… ▽ More

    Submitted 15 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: To appear in ACL2022

  13. arXiv:2112.09047  [pdf, other

    physics.soc-ph cs.DL

    Citation inequity and gendered citation practices in contemporary physics

    Authors: Erin G. Teich, Jason Z. Kim, Christopher W. Lynn, Samantha C. Simon, Andrei A. Klishin, Karol P. Szymula, Pragya Srivastava, Lee C. Bassett, Perry Zurn, Jordan D. Dworkin, Dani S. Bassett

    Abstract: The historical and contemporary under-attribution of women's contributions to scientific scholarship is well-known and well-studied, with effects that are felt today in myriad ways by women scientists. One measure of this under-attribution is the so-called citation gap between men and women: the under-citation of papers authored by women relative to expected rates coupled with a corresponding over… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  14. arXiv:2110.08631  [pdf, other

    cs.NE nlin.CD

    Learning Continuous Chaotic Attractors with a Reservoir Computer

    Authors: Lindsay M. Smith, Jason Z. Kim, Zhixin Lu, Dani S. Bassett

    Abstract: Neural systems are well known for their ability to learn and store information as memories. Even more impressive is their ability to abstract these memories to create complex internal representations, enabling advanced functions such as the spatial manipulation of mental representations. While recurrent neural networks (RNNs) are capable of representing complex information, the exact mechanisms of… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

    Comments: 9 pages

  15. arXiv:2108.08874  [pdf, other

    cs.CV

    Towards A Fairer Landmark Recognition Dataset

    Authors: Zu Kim, André Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green, N'Mah Fodiatu Yilla, Tobias Weyand

    Abstract: We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead to biased data. To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population. T… ▽ More

    Submitted 6 June, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

    Comments: Please cite the full detailed version of the paper instead: Improving Fairness in Large-Scale Object Recognition by CrowdSourced Demographic Information arXiv:2206.01326

  16. arXiv:2105.14940  [pdf, other

    cs.CL cs.AI cs.LG

    Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

    Authors: Zae Myung Kim, Laurent Besacier, Vassilina Nikoulina, Didier Schwab

    Abstract: Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of language-independent representations, or whether a multilingual model partitions its weights among different languages. While most of such work has been conducted in a "black-box" manner, this paper aims to analyze individual components of a multilingual neural translation (NMT)… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: 10 pages, accepted at Findings of ACL 2021 (short)

  17. arXiv:2008.02878  [pdf, ps, other

    cs.CL cs.LG

    A Multilingual Neural Machine Translation Model for Biomedical Data

    Authors: Alexandre Bérard, Zae Myung Kim, Vassilina Nikoulina, Eunjeong L. Park, Matthias Gallé

    Abstract: We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: https://github.com/naver/covid19-nmt

  18. arXiv:2005.01186  [pdf, other

    cond-mat.dis-nn cs.NE nlin.CD

    Teaching Recurrent Neural Networks to Modify Chaotic Memories by Example

    Authors: Jason Z. Kim, Zhixin Lu, Erfan Nozari, George J. Pappas, Danielle S. Bassett

    Abstract: The ability to store and manipulate information is a hallmark of computational systems. Whereas computers are carefully engineered to represent and perform mathematical operations on structured data, neurobiological systems perform analogous functions despite flexible organization and unstructured sensory input. Recent efforts have made progress in modeling the representation and recall of informa… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

    Comments: 7 main text figures, 3 supplementary figures