Skip to main content

Showing 1–25 of 25 results for author: Park, C Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15951  [pdf, other

    cs.CL

    Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration

    Authors: Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Ye** Choi, Yulia Tsvetkov

    Abstract: While existing alignment paradigms have been integral in develo** large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but special… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.12904  [pdf, other

    cs.LG physics.comp-ph physics.optics

    Meent: Differentiable Electromagnetic Simulator for Machine Learning

    Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chae** Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, **myoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

    Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: under review

  3. arXiv:2404.06664  [pdf, other

    cs.CL cs.AI cs.HC

    CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

    Authors: Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Ye** Choi

    Abstract: Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current methods for develo** benchmarks. Existing multicultural evaluations primarily rely on expensive and restricted human annotations or potentially outdat… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Preprint (under review)

  4. arXiv:2311.09741  [pdf, other

    cs.CL cs.LG

    P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models

    Authors: Yuhan Liu, Shangbin Feng, Xiaochuang Han, Vidhisha Balachandran, Chan Young Park, Sachin Kumar, Yulia Tsvetkov

    Abstract: In this work, we take a first step towards designing summarization systems that are faithful to the author's intent, not only the semantic content of the article. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries, misrepresenting the intent and… ▽ More

    Submitted 4 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  5. arXiv:2311.07115  [pdf, other

    cs.CL

    Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions

    Authors: Sachin Kumar, Chan Young Park, Yulia Tsvetkov

    Abstract: Language model (LM) prompting--a popular paradigm for solving NLP tasks--has been shown to be susceptible to miscalibration and brittleness to slight prompt variations, caused by its discriminative prompting approach, i.e., predicting the label given the input. To address these issues, we propose Gen-Z--a generative prompting framework for zero-shot text classification. GEN-Z is generative, as it… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  6. arXiv:2305.14326  [pdf, other

    cs.CL

    TalkUp: Paving the Way for Understanding Empowering Language

    Authors: Lucille Njoo, Chan Young Park, Octavia Stappart, Marvin Thielk, Yi Chu, Yulia Tsvetkov

    Abstract: Empowering language is important in many real-world contexts, from education to workplace dynamics to healthcare. Though language technologies are growing more prevalent in these contexts, empowerment has seldom been studied in NLP, and moreover, it is inherently challenging to operationalize because of its implicit nature. This work builds from linguistic and social psychology literature to explo… ▽ More

    Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Findings of EMNLP 2023

  7. arXiv:2305.10731  [pdf, other

    cs.CL

    Analyzing Norm Violations in Live-Stream Chat

    Authors: Jihyung Moon, Dong-Ho Lee, Hyundong Cho, Woojeong **, Chan Young Park, Minwoo Kim, Jonathan May, Jay Pujara, Sungjoon Park

    Abstract: Toxic language, such as hate speech, can deter users from participating in online communities and enjoying popular platforms. Previous approaches to detecting toxic language and norm violations have been primarily concerned with conversations from online forums and social media, such as Reddit and Twitter. These approaches are less effective when applied to conversations on live-streaming platform… ▽ More

    Submitted 7 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 17 pages, 8 figures, 15 tables

  8. arXiv:2305.08283  [pdf, other

    cs.CL

    From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models

    Authors: Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov

    Abstract: Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias. A significant portion of this data includes opinions and perspectives which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure political biases in LMs trained on su… ▽ More

    Submitted 5 July, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  9. arXiv:2211.14981  [pdf, other

    cs.HC

    The Grind for Good Data: Understanding ML Practitioners' Struggles and Aspirations in Making Good Data

    Authors: Inha Cha, Juhyun Oh, Cheul Young Park, Jiyoon Han, Hwalsuk Lee

    Abstract: We thought data to be simply given, but reality tells otherwise; it is costly, situation-dependent, and muddled with dilemmas, constantly requiring human intervention. The ML community's focus on quality data is increasing in the same vein, as good data is vital for successful ML systems. Nonetheless, few works have investigated the dataset builders and the specifics of what they do and struggle t… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

  10. arXiv:2205.12633  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

    Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, ** Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

    Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

  11. arXiv:2205.12382  [pdf, other

    cs.CL

    Challenges and Opportunities in Information Manipulation Detection: An Examination of Wartime Russian Media

    Authors: Chan Young Park, Julia Mendelsohn, Anjalie Field, Yulia Tsvetkov

    Abstract: NLP research on public opinion manipulation campaigns has primarily focused on detecting overt strategies such as fake news and disinformation. However, information manipulation in the ongoing Russia-Ukraine war exemplifies how governments and media also employ more nuanced strategies. We release a new dataset, VoynaSlov, containing 38M+ posts from Russian media outlets on Twitter and VKontakte, a… ▽ More

    Submitted 24 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Findings of EMNLP 2022

  12. arXiv:2205.01931  [pdf, other

    cs.CV cs.LG

    Map** the landscape of histomorphological cancer phenotypes using self-supervised learning on unlabeled, unannotated pathology slides

    Authors: Adalberto Claudio Quiros, Nicolas Coudray, Anna Yeaton, Xinyu Yang, Bo**g Liu, Hortense Le, Luis Chiriboga, Afreen Karimkhan, Navneet Narula, David A. Moore, Christopher Y. Park, Harvey Pass, Andre L. Moreira, John Le Quesne, Aristotelis Tsirigos, Ke Yuan

    Abstract: Definitive cancer diagnosis and management depend upon the extraction of information from microscopy images by pathologists. These images contain complex information requiring time-consuming expert human interpretation that is prone to human bias. Supervised deep learning approaches have proven powerful for classification tasks, but they are inherently limited by the cost and quality of annotation… ▽ More

    Submitted 1 September, 2023; v1 submitted 4 May, 2022; originally announced May 2022.

  13. arXiv:2110.04419  [pdf, other

    cs.CL

    Detecting Community Sensitive Norm Violations in Online Conversations

    Authors: Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens, Yulia Tsvetkov

    Abstract: Online platforms and communities establish their own norms that govern what behavior is acceptable within the community. Substantial effort in NLP has focused on identifying unacceptable behaviors and, recently, on forecasting them before they occur. However, these efforts have largely focused on toxicity as the sole form of community norm violation. Such focus has overlooked the much larger set o… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Findings of EMNLP 2021

  14. arXiv:2105.11366  [pdf, other

    cs.LG

    GMAC: A Distributional Perspective on Actor-Critic Framework

    Authors: Daniel Wontae Nam, Younghoon Kim, Chan Y. Park

    Abstract: In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cramér distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR($λ$), which learns the correct value distribu… ▽ More

    Submitted 15 July, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7927-7936, 2021

  15. Controlled Analyses of Social Biases in Wikipedia Bios

    Authors: Anjalie Field, Chan Young Park, Kevin Z. Lin, Yulia Tsvetkov

    Abstract: Social biases on Wikipedia, a widely-read global platform, could greatly influence public opinion. While prior research has examined man/woman gender bias in biography articles, possible influences of other demographic attributes limit conclusions. In this work, we present a methodology for analyzing Wikipedia pages about people that isolates dimensions of interest (e.g., gender), from other attri… ▽ More

    Submitted 9 February, 2022; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: Accepted to the Web Conference 2022 (WWW '22)

  16. arXiv:2010.10820  [pdf, other

    cs.CL

    Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia

    Authors: Chan Young Park, Xinru Yan, Anjalie Field, Yulia Tsvetkov

    Abstract: Specific lexical choices in narrative text reflect both the writer's attitudes towards people in the narrative and influence the audience's reactions. Prior work has examined descriptions of people in English using contextual affective analysis, a natural language processing (NLP) technique that seeks to analyze how people are portrayed along dimensions of power, agency, and sentiment. Our work pr… ▽ More

    Submitted 8 April, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: ICWSM 2021

  17. arXiv:2008.01354  [pdf, other

    cs.CL

    NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer

    Authors: Hwijeen Ahn, Jimin Sun, Chan Young Park, Jungyun Seo

    Abstract: This paper describes our approach to the task of identifying offensive languages in a multilingual setting. We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection. Leveraging the semi-supervised dataset resulted in performance improvements compared to the baseline trained solely with the manu… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: To be published in SemEval-2020

  18. arXiv:2006.09336  [pdf, other

    cs.CL

    Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks

    Authors: Jimin Sun, Hwijeen Ahn, Chan Young Park, Yulia Tsvetkov, David R. Mortensen

    Abstract: Much work in cross-lingual transfer learning explored how to select better transfer languages for multilingual tasks, primarily focusing on typological and genealogical similarities between languages. We hypothesize that these measures of linguistic proximity are not enough when working with pragmatically-motivated tasks, such as sentiment analysis. As an alternative, we introduce three linguistic… ▽ More

    Submitted 8 April, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: EACL 2021

  19. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

    Authors: Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, Uichin Lee

    Abstract: Recognizing emotions during social interactions has many potential applications with the popularization of low-cost mobile sensors, but a challenge remains with the lack of naturalistic affective interaction data. Most existing emotion datasets do not support studying idiosyncratic emotions arising in the wild as they were collected in constrained environments. Therefore, studying emotions in the… ▽ More

    Submitted 19 May, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

    Comments: 20 pages, 4 figures, for associated dataset, see https://doi.org/10.5281/zenodo.3814370

    Journal ref: Sci Data 7, (2020) 293

  20. arXiv:1910.12748  [pdf, other

    cs.LG stat.ML

    A Study of Machine Learning Models in Predicting the Intention of Adolescents to Smoke Cigarettes

    Authors: Seung Joon Nam, Han Min Kim, Thomas Kang, Cheol Young Park

    Abstract: The use of electronic cigarette (e-cigarette) is increasing among adolescents. This is problematic since consuming nicotine at an early age can cause harmful effects in develo** teenager's brain and health. Additionally, the use of e-cigarette has a possibility of leading to the use of cigarettes, which is more severe. There were many researches about e-cigarette and cigarette that mostly focuse… ▽ More

    Submitted 31 October, 2019; v1 submitted 28 October, 2019; originally announced October 2019.

  21. arXiv:1904.12958  [pdf

    cs.AI physics.soc-ph q-bio.PE

    Predictive Situation Awareness for Ebola Virus Disease using a Collective Intelligence Multi-Model Integration Platform: Bayes Cloud

    Authors: Cheol Young Park, Shou Matsumoto, Jubyung Ha, YoungWon Park

    Abstract: The humanity has been facing a plethora of challenges associated with infectious diseases, which kill more than 6 million people a year. Although continuous efforts have been applied to relieve the potential damages from such misfortunate events, it is unquestionable that there are many persisting challenges yet to overcome. One related issue we particularly address here is the assessment and pred… ▽ More

    Submitted 4 May, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

  22. arXiv:1806.02457  [pdf

    cs.AI

    Reference Model of Multi-Entity Bayesian Networks for Predictive Situation Awareness

    Authors: Cheol Young Park, Kathryn Blackmond Laskey

    Abstract: During the past quarter-century, situation awareness (SAW) has become a critical research theme, because of its importance. Since the concept of SAW was first introduced during World War I, various versions of SAW have been researched and introduced. Predictive Situation Awareness (PSAW) focuses on the ability to predict aspects of a temporally evolving situation over time. PSAW requires a formal… ▽ More

    Submitted 7 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

  23. arXiv:1806.02455  [pdf

    cs.LG stat.ML

    MEBN-RM: A Map** between Multi-Entity Bayesian Network and Relational Model

    Authors: Cheol Young Park, Kathryn Blackmond Laskey

    Abstract: Multi-Entity Bayesian Network (MEBN) is a knowledge representation formalism combining Bayesian Networks (BN) with First-Order Logic (FOL). MEBN has sufficient expressive power for general-purpose knowledge representation and reasoning. Develo** a MEBN model to support a given application is a challenge, requiring definition of entities, relationships, random variables, conditional dependence re… ▽ More

    Submitted 7 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

    Journal ref: Applied Sciences 2019,9

  24. arXiv:1806.02421  [pdf

    cs.LG cs.AI stat.ML

    Human-aided Multi-Entity Bayesian Networks Learning from Relational Data

    Authors: Cheol Young Park, Kathryn Blackmond Laskey

    Abstract: An Artificial Intelligence (AI) system is an autonomous system which emulates human mental and physical activities such as Observe, Orient, Decide, and Act, called the OODA process. An AI system performing the OODA process requires a semantically rich representation to handle a complex real world situation and ability to reason under uncertainty about the situation. Multi-Entity Bayesian Networks… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

  25. Gaussian Mixture Reduction for Time-Constrained Approximate Inference in Hybrid Bayesian Networks

    Authors: Cheol Young Park, Kathryn Blackmond Laskey, Paulo C. G. Costa, Shou Matsumoto

    Abstract: Hybrid Bayesian Networks (HBNs), which contain both discrete and continuous variables, arise naturally in many application areas (e.g., image understanding, data fusion, medical diagnosis, fraud detection). This paper concerns inference in an important subclass of HBNs, the conditional Gaussian (CG) networks, in which all continuous random variables have Gaussian distributions and all children of… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

    Journal ref: Appl. Sci. 2019, 9, 2055