Skip to main content

Showing 1–50 of 262 results for author: Oh, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00693  [pdf, other

    cs.AI cs.CL cs.LG

    BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models

    Authors: Gihun Lee, Minchan Jeong, Yu** Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

    Abstract: While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: under review

  2. arXiv:2406.19648  [pdf

    cs.HC cs.AI cs.CL

    Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task

    Authors: Sion Yoon, Tae Eun Kim, Yoo Jung Oh

    Abstract: The dynamics of human-AI communication have been reshaped by language models such as ChatGPT. However, extant research has primarily focused on dyadic communication, leaving much to be explored regarding the dynamics of human-AI communication in group settings. The availability of multiple language model chatbots presents a unique opportunity for scholars to better understand the interaction betwe… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.15225  [pdf, other

    cs.AI cs.RO eess.SP

    Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting

    Authors: Jiyong Oh, Syed M. Raza, Lusungu J. Mwasinga, Moonseong Kim, Hyunseung Choo

    Abstract: Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. T… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures, Published in the 2024 IEEE Network Operations and Management Symposium (NOMS 2024)

  4. arXiv:2406.10815  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

    Authors: Jeongheon Oh, Kibok Lee

    Abstract: Supervised contrastive representation learning has been shown to be effective in various transfer learning scenarios. However, while asymmetric non-contrastive learning (ANCL) often outperforms its contrastive learning counterpart in self-supervised representation learning, the extension of ANCL to supervised scenarios is less explored. To bridge the gap, we study ANCL for supervised representatio… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  5. arXiv:2406.06448  [pdf, other

    cs.HC

    How is the Pilot Doing: VTOL Pilot Workload Estimation by Multimodal Machine Learning on Psycho-physiological Signals

    Authors: Jong Hoon Park, Lawrence Chen, Ian Higgins, Zhaobo Zheng, Shashank Mehrotra, Kevin Salubre, Mohammadreza Mousaei, Steven Willits, Blain Levedahl, Timothy Buker, Eliot Xing, Teruhisa Misu, Sebastian Scherer, Jean Oh

    Abstract: Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a cr… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 8 pages, 7 figures

  6. arXiv:2405.19806  [pdf, other

    cs.LG

    Preference Alignment with Flow Matching

    Authors: Minu Kim, Yongsik Lee, Sehyeok Kang, Jihwan Oh, Song Chong, Seyoung Yun

    Abstract: We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs lik… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  7. Maximizing Patch Coverage for Testing of Highly-Configurable Software without Exploding Build Times

    Authors: Necip Fazıl Yıldıran, Jeho Oh, Julia Lawall, Paul Gazzillo

    Abstract: The Linux kernel is highly-configurable, with a build system that takes a configuration file as input and automatically tailors the source code accordingly. Configurability, however, complicates testing, because different configuration options lead to the inclusion of different code fragments. With thousands of patches received per month, Linux kernel maintainers employ extensive automated continu… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  8. arXiv:2404.16032  [pdf, other

    cs.LG

    Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts

    Authors: Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh

    Abstract: Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in context. This leads to cases of conflict between the model's parametric knowledge and the contextual information, where the model may not always update its knowl… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  9. arXiv:2404.12980  [pdf, other

    cs.HC

    Ring-a-Pose: A Ring for Continuous Hand Pose Tracking

    Authors: Tianhong Catherine Yu, Guilin Hu, Ruidong Zhang, Hyunchul Lim, Saif Mahmud, Chi-Jung Lee, Ke Li, Devansh Agarwal, Shuyang Nie, **seok Oh, François Guimbretière, Cheng Zhang

    Abstract: We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  10. arXiv:2404.11358  [pdf, other

    cs.CV

    DeblurGS: Gaussian Splatting for Camera Motion Blur

    Authors: Jeongtaek Oh, Jaeyoung Chung, Dongwoo Lee, Kyoung Mu Lee

    Abstract: Although significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to real-world applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challeng… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  11. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  12. arXiv:2404.01805  [pdf, other

    cs.LG

    Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

    Authors: Michael Mitsios, Georgios Vamvoukakis, Georgia Maniati, Nikolaos Ellinas, Georgios Dimitriou, Konstantinos Markopoulos, Panos Kakoulidis, Alexandra Vioni, Myrsini Christidou, Junkwang Oh, Gunu Jho, Inchul Hwang, Georgios Vardaxoglou, Aimilios Chalamandaris, Pirros Tsiakoulis, Spyros Raptis

    Abstract: Emotion detection in textual data has received growing interest in recent years, as it is pivotal for develo** empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and differentiates between the diversified similarities and distinctions of various emotions. Initially, we establish a baseline by training a transforme… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  13. arXiv:2404.01692  [pdf, other

    cs.CV

    Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss

    Authors: Jaeha Kim, Junghun Oh, Kyoung Mu Lee

    Abstract: In real-world scenarios, image recognition tasks, such as semantic segmentation and object detection, often pose greater challenges due to the lack of information available within low-resolution (LR) content. Image super-resolution (SR) is one of the promising solutions for addressing the challenges. However, due to the ill-posed property of SR, it is challenging for typical SR methods to restore… ▽ More

    Submitted 4 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  14. arXiv:2403.19060  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Towards Human-Centered Construction Robotics: An RL-Driven Companion Robot For Contextually Assisting Carpentry Workers

    Authors: Yuning Wu, Jiaying Wei, Jean Oh, Daniel Cardoso Llach

    Abstract: In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a "work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workf… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  15. arXiv:2403.09933  [pdf, other

    cs.RO

    Design and Control Co-Optimization for Automated Design Iteration of Dexterous Anthropomorphic Soft Robotic Hands

    Authors: Pragna Mannam, Xingyu Liu, Ding Zhao, Jean Oh, Nancy Pollard

    Abstract: We automate soft robotic hand design iteration by co-optimizing design and control policy for dexterous manipulation skills in simulation. Our design iteration pipeline combines genetic algorithms and policy transfer to learn control policies for nearly 400 hand designs, testing grasp quality under external force disturbances. We validate the optimized designs in the real world through teleoperati… ▽ More

    Submitted 25 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Journal ref: IEEE-RAS International Conference on Soft Robotics (RoboSoft) 2024

  16. arXiv:2403.07968  [pdf, other

    cs.LG cs.AI

    Do Deep Neural Network Solutions Form a Star Domain?

    Authors: Ankit Sonthalia, Alexander Rubinstein, Ehsan Abbasnejad, Seong Joon Oh

    Abstract: It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require ver… ▽ More

    Submitted 9 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  17. arXiv:2403.06471  [pdf, other

    cs.CV

    Toward Robust Canine Cardiac Diagnosis: Deep Prototype Alignment Network-Based Few-Shot Segmentation in Veterinary Medicine

    Authors: Jun-Young Oh, In-Gyu Lee, Tae-Eui Kam, Ji-Hoon Jeong

    Abstract: In the cutting-edge domain of medical artificial intelligence (AI), remarkable advances have been achieved in areas such as diagnosis, prediction, and therapeutic interventions. Despite these advances, the technology for image segmentation faces the significant barrier of having to produce extensively annotated datasets. To address this challenge, few-shot segmentation (FSS) has been recognized as… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  18. arXiv:2403.06342  [pdf, other

    math.NA cs.LG

    Separable Physics-informed Neural Networks for Solving the BGK Model of the Boltzmann Equation

    Authors: Jaemin Oh, Seung Yeon Cho, Seok-Bae Yun, Eunbyung Park, Youngjoon Hong

    Abstract: In this study, we introduce a method based on Separable Physics-Informed Neural Networks (SPINNs) for effectively solving the BGK model of the Boltzmann equation. While the mesh-free nature of PINNs offers significant advantages in handling high-dimensional partial differential equations (PDEs), challenges arise when applying quadrature rules for accurate integral evaluation in the BGK operator, w… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    MSC Class: 68T20; 35R09

  19. arXiv:2403.05973  [pdf, other

    cs.CL cs.AI cs.LG

    Calibrating Large Language Models Using Their Generations Only

    Authors: Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

    Abstract: As large language models (LLMs) are increasingly deployed in user-facing applications, building trust and maintaining safety by accurately quantifying a model's confidence in its prediction becomes even more important. However, finding effective ways to calibrate LLMs - especially when the only interface to the models is their generated text - remains a challenge. We propose APRICOT (auxiliary pre… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  20. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  21. arXiv:2403.05266  [pdf, other

    cs.CL cs.AI cs.LG

    ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

    Authors: Jio Oh, Soyeon Kim, Junseok Seo, **dong Wang, Ruochen Xu, Xing Xie, Steven Euijong Whang

    Abstract: Large language models (LLMs) have achieved unprecedented performance in various applications, yet their evaluation remains a critical issue. Existing hallucination benchmarks are either static or lack adjustable complexity for thorough analysis. We contend that utilizing existing relational databases is a promising approach for constructing benchmarks due to their accurate knowledge description vi… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  22. arXiv:2403.03642  [pdf, other

    eess.IV cs.CV cs.LG

    Generative Active Learning with Variational Autoencoder for Radiology Data Generation in Veterinary Medicine

    Authors: In-Gyu Lee, Jun-Young Oh, Hee-Jung Yu, Jae-Hwan Kim, Ki-Dong Eom, Ji-Hoon Jeong

    Abstract: Recently, with increasing interest in pet healthcare, the demand for computer-aided diagnosis (CAD) systems in veterinary medicine has increased. The development of veterinary CAD has stagnated due to a lack of sufficient radiology data. To overcome the challenge, we propose a generative active learning framework based on a variational autoencoder. This approach aims to alleviate the scarcity of r… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  23. arXiv:2403.02639  [pdf, other

    cs.CV cs.LG

    False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy

    Authors: Jiyong Oh, Junhaeng Lee, Woongchan Byun, Minsang Kong, Sang Hun Lee

    Abstract: Recent studies have focused on enhancing the performance of 3D object detection models. Among various approaches, ground-truth sampling has been proposed as an augmentation technique to address the challenges posed by limited ground-truth data. However, an inherent issue with ground-truth sampling is its tendency to increase false positives. Therefore, this study aims to overcome the limitations o… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  24. arXiv:2402.19460  [pdf, other

    cs.LG stat.ML

    Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks

    Authors: Bálint Mucsányi, Michael Kirchhof, Seong Joon Oh

    Abstract: Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, out-of-distribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one task. Hence, there is a plethora of recent advances with different intentions - that oft… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 43 pages

  25. arXiv:2402.18045  [pdf, other

    cs.CL

    Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore

    Authors: Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh

    Abstract: Large Language Models (LLMs) are prone to factuality hallucination, generating text that contradicts established knowledge. While extensive research has addressed this in English, little is known about multilingual LLMs. This paper systematically evaluates multilingual LLMs' factual accuracy across languages and geographic regions. We introduce a novel pipeline for multilingual factuality evaluati… ▽ More

    Submitted 1 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  26. arXiv:2402.16569  [pdf, other

    cs.CV cs.LG

    Pretrained Visual Uncertainties

    Authors: Michael Kirchhof, Mark Collier, Seong Joon Oh, Enkelejda Kasneci

    Abstract: Accurate uncertainty estimation is vital to trustworthy machine learning, yet uncertainties typically have to be learned for each task anew. This work introduces the first pretrained uncertainty modules for vision models. Similar to standard pretraining this enables the zero-shot transfer of uncertainties learned on a large pretraining dataset to specialized downstream datasets. We enable our larg… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  27. arXiv:2402.13442  [pdf, other

    cs.RO

    CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting

    Authors: Peter Schaldenbrand, Gaurav Parmar, Jun-Yan Zhu, James McCann, Jean Oh

    Abstract: Prior robot painting and drawing work, such as FRIDA, has focused on decreasing the sim-to-real gap and expanding input modalities for users, but the interaction with these systems generally exists only in the input stages. To support interactive, human-robot collaborative painting, we introduce the Collaborative FRIDA (CoFRIDA) robot painting framework, which can co-paint by modifying and engagin… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  28. arXiv:2402.12991  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

    Authors: Martin Gubri, Dennis Ulmer, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

    Abstract: Large Language Model (LLM) services and models often come with legal rules on who can use them and how they must use them. Assessing the compliance of the released LLMs is crucial, as these rules protect the interests of the LLM contributor and prevent misuse. In this context, we describe the novel fingerprinting problem of Black-box Identity Verification (BBIV). The goal is to determine whether a… ▽ More

    Submitted 6 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (findings)

  29. arXiv:2402.06204  [pdf, other

    cs.CL cs.AI

    The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

    Authors: Juhyun Oh, Eunsu Kim, Inha Cha, Alice Oh

    Abstract: This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators. We assess the performance of three LLMs and one open-source LM in Question-Answering (QA) and evaluation tasks using the TriviaQA (Joshi et al., 2017) dataset. Results indicate a significant disparity, with LLMs exhibiting lower performance in evaluation tasks compared… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  30. arXiv:2402.01520  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

    Authors: Panos Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris

    Abstract: In this paper, we propose a singing voice synthesis model, Karaoker-SSL, that is trained only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource pipeline that does not utilize any singing data end-to-end, since its vocoder is also trained on speech data. Karaoker-SSL is conditioned by self-supervised speech representations in an unsupervised manner. We preproce… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE ICASSP SASB 2024

  31. arXiv:2401.16743  [pdf, ps, other

    cs.IT

    Multi-Group Multicasting Systems Using Multiple RISs

    Authors: Hyeongtaek Lee, Seungsik Moon, Youngjoo Lee, Jaeky Oh, Jaehoon Chung, Junil Choi

    Abstract: In this paper, practical utilization of multiple distributed reconfigurable intelligent surfaces (RISs), which are able to conduct group-specific operations, for multi-group multicasting systems is investigated. To tackle the inter-group interference issue in the multi-group multicasting systems, the block diagonalization (BD)-based beamforming is considered first. Without any inter-group interfer… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transactions on Wireless Communications

  32. arXiv:2401.14107  [pdf, other

    cs.LG eess.SP

    Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement

    Authors: Aaqib Saeed, Dimitris Spathis, Jungwoo Oh, Edward Choi, Ali Etemad

    Abstract: Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the us… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  33. arXiv:2401.09382  [pdf, other

    cs.RO

    POE: Acoustic Soft Robotic Proprioception for Omnidirectional End-effectors

    Authors: Uksang Yoo, Ziven Lopez, Jeffrey Ichnowski, Jean Oh

    Abstract: Soft robotic shape estimation and proprioception are challenging because of soft robot's complex deformation behaviors and infinite degrees of freedom. A soft robot's continuously deforming body makes it difficult to integrate rigid sensors and to reliably estimate its shape. In this work, we present Proprioceptive Omnidirectional End-effector (POE), which has six embedded microphones across the t… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  34. arXiv:2401.08053  [pdf, other

    cs.CV

    SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

    Authors: Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh

    Abstract: Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset t… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  35. arXiv:2401.03707  [pdf, other

    cs.CV

    FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

    Authors: Geunhyuk Youk, Jihyong Oh, Munchurl Kim

    Abstract: We present a joint learning scheme of video super-resolution and deblurring, called VSRDB, to restore clean high-resolution (HR) videos from blurry low-resolution (LR) ones. This joint restoration problem has drawn much less attention compared to single restoration problems. In this paper, we propose a novel flow-guided dynamic filtering (FGDF) and iterative feature refinement with multi-attention… ▽ More

    Submitted 27 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: CVPR2024 (camera-ready version). The last two authors are co-corresponding authors. Please visit our project page at https://kaist-viclab.github.io/fmanet-site

  36. arXiv:2312.13528  [pdf, other

    cs.CV

    DyBluRF: Dynamic Deblurring Neural Radiance Fields for Blurry Monocular Video

    Authors: Minh-Quan Viet Bui, Jongmin Park, Jihyong Oh, Munchurl Kim

    Abstract: Neural Radiance Fields (NeRF), initially developed for static scenes, have inspired many video novel view synthesis techniques. However, the challenge for video view synthesis arises from motion blur, a consequence of object or camera movement during exposure, which hinders the precise synthesis of sharp spatio-temporal views. In response, we propose a novel dynamic deblurring NeRF framework for b… ▽ More

    Submitted 29 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: The first two authors contributed equally to this work (equal contribution). The last two authors advised equally to this work. Please visit our project page at https://kaist-viclab.github.io/dyblurf-site/

  37. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  38. arXiv:2312.02530  [pdf, other

    cs.LG cs.AI

    MEMTO: Memory-guided Transformer for Multivariate Time Series Anomaly Detection

    Authors: Junho Song, Keonwoo Kim, Jeonglyul Oh, Sungzoon Cho

    Abstract: Detecting anomalies in real-world multivariate time series data is challenging due to complex temporal dependencies and inter-variable correlations. Recently, reconstruction-based deep models have been widely used to solve the problem. However, these methods still suffer from an over-generalization issue and fail to deliver consistently high performance. To address this issue, we propose the MEMTO… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  39. arXiv:2312.01638  [pdf, other

    eess.IV cs.CV

    J-Net: Improved U-Net for Terahertz Image Super-Resolution

    Authors: Woon-Ha Yeo, Seung-Hwan Jung, Seung Jae Oh, Inhee Maeng, Eui Su Lee, Han-Cheol Ryu

    Abstract: Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  40. arXiv:2311.16176  [pdf, other

    cs.LG cs.AI cs.CV

    Mitigating Biases with Diverse Ensembles and Diffusion Models

    Authors: Luca Scimeca, Alexander Rubinstein, Damien Teney, Seong Joon Oh, Armand Mihai Nicolicioiu, Yoshua Bengio

    Abstract: Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut learning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) to mitigate this form of bias. We show that at particular… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.02230

  41. arXiv:2311.13398  [pdf, other

    cs.CV cs.GR

    Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images

    Authors: Jaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee

    Abstract: In this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide… ▽ More

    Submitted 4 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures; Project page: robot0321.github.io/DepthRegGS

  42. arXiv:2311.13267  [pdf, other

    cs.LG cs.AI cs.CV

    FedFN: Feature Normalization for Alleviating Data Heterogeneity Problem in Federated Learning

    Authors: Seongyoon Kim, Gihun Lee, Jaehoon Oh, Se-Young Yun

    Abstract: Federated Learning (FL) is a collaborative method for training models while preserving data privacy in decentralized settings. However, FL encounters challenges related to data heterogeneity, which can result in performance degradation. In our study, we observe that as data heterogeneity increases, feature representation in the FedAVG model deteriorates more significantly compared to classifier we… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: NeurIPS Workshop: "Federated Learning in the Age of Foundation Models" 2023

  43. arXiv:2311.12077  [pdf, other

    cs.CV

    Efficient Model Agnostic Approach for Implicit Neural Representation Based Arbitrary-Scale Image Super-Resolution

    Authors: Young Jae Oh, Jihun Kim, Tae Hyun Kim

    Abstract: Single image super-resolution (SISR) has experienced significant advancements, primarily driven by deep convolutional networks. Traditional networks, however, are limited to upscaling images to a fixed scale, leading to the utilization of implicit neural functions for generating arbitrarily scaled images. Nevertheless, these methodologies have imposed substantial computational demands as they invo… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  44. arXiv:2310.20477  [pdf, other

    cs.HC cs.LG

    Exploring Practitioner Perspectives On Training Data Attribution Explanations

    Authors: Elisa Nguyen, Evgenii Kortukov, Jean Y. Song, Seong Joon Oh

    Abstract: Explainable AI (XAI) aims to provide insight into opaque model reasoning to humans and as such is an interdisciplinary field by nature. In this paper, we interviewed 10 practitioners to understand the possible usability of training data attribution (TDA) explanations and to explore the design space of such an approach. We confirmed that training data quality is often the most important factor for… ▽ More

    Submitted 22 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS XAI in Action workshop 2023

  45. arXiv:2310.18652  [pdf, other

    cs.CL cs.AI cs.CV

    EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

    Authors: Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi

    Abstract: Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop o… ▽ More

    Submitted 25 December, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023 Datasets and Benchmarks Track (10 pages for main text, 4 pages for references, 39 pages for supplementary materials)

  46. arXiv:2310.18586  [pdf, other

    cs.LG stat.ML

    Optimal Transport for Kernel Gaussian Mixture Models

    Authors: Jung Hun Oh, Rena Elkin, Anish Kumar Simhal, Jiening Zhu, Joseph O Deasy, Allen Tannenbaum

    Abstract: The Wasserstein distance from optimal mass transport (OMT) is a powerful mathematical tool with numerous applications that provides a natural measure of the distance between two probability distributions. Several methods to incorporate OMT into widely used probabilistic models, such as Gaussian or Gaussian mixture, have been developed to enhance the capability of modeling complex multimodal densit… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: 17 pages, 5 figures, 2 tables

  47. arXiv:2310.10073  [pdf, other

    cs.CV

    Expression Domain Translation Network for Cross-domain Head Reenactment

    Authors: Taewoong Kang, Jeongsik Oh, Jaeseong Lee, Sunghyun Park, Jaegul Choo

    Abstract: Despite the remarkable advancements in head reenactment, the existing methods face challenges in cross-domain head reenactment, which aims to transfer human motions to domains outside the human, including cartoon characters. It is still difficult to extract motion from out-of-domain images due to the distinct appearances, such as large eyes. Recently, previous work introduced a large-scale anime d… ▽ More

    Submitted 6 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Project page with videos: https://keh0t0.github.io/research/EDTN/

  48. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  49. arXiv:2310.08215  [pdf, other

    cs.LG cs.AI

    Trustworthy Machine Learning

    Authors: Bálint Mucsányi, Michael Kirchhof, Elisa Nguyen, Alexander Rubinstein, Seong Joon Oh

    Abstract: As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machi… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 373 pages, textbook at the University of Tübingen

    ACM Class: I.2.0

  50. arXiv:2309.13144  [pdf, other

    cs.RO

    SoRTS: Learned Tree Search for Long Horizon Social Robot Navigation

    Authors: Ingrid Navarro, Jay Patrikar, Joao P. A. Dantas, Rohan Baijal, Ian Higgins, Sebastian Scherer, Jean Oh

    Abstract: The fast-growing demand for fully autonomous robots in shared spaces calls for the development of trustworthy agents that can safely and seamlessly navigate in crowded environments. Recent models for motion prediction show promise in characterizing social interactions in such environments. Still, adapting them for navigation is challenging as they often suffer from generalization failures. Prompte… ▽ More

    Submitted 16 February, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2304.01428