Search | arXiv e-print repository

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

Authors: Hyunwoo Ryu, Jiwoo Kim, Hyunseok An, Junwoo Chang, Joohwan Seo, Taehan Kim, Yubin Kim, Chaewon Hwang, Jongeun Choi, Roberto Horowitz

Abstract: Diffusion generative modeling has become a promising approach for learning robotic manipulation tasks from stochastic human demonstrations. In this paper, we present Diffusion-EDFs, a novel SE(3)-equivariant diffusion-based approach for visual robotic manipulation tasks. We show that our proposed method achieves remarkable data efficiency, requiring only 5 to 10 human demonstrations for effective… ▽ More Diffusion generative modeling has become a promising approach for learning robotic manipulation tasks from stochastic human demonstrations. In this paper, we present Diffusion-EDFs, a novel SE(3)-equivariant diffusion-based approach for visual robotic manipulation tasks. We show that our proposed method achieves remarkable data efficiency, requiring only 5 to 10 human demonstrations for effective end-to-end training in less than an hour. Furthermore, our benchmark experiments demonstrate that our approach has superior generalizability and robustness compared to state-of-the-art methods. Lastly, we validate our methods with real hardware experiments. Project Website: https://sites.google.com/view/diffusion-edfs/home △ Less

Submitted 28 November, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: 31 pages, 13 figures

arXiv:2309.02405 [pdf, other]

Generating Realistic Images from In-the-wild Sounds

Authors: Taegyeong Lee, Jeonghun Kang, Hyeonyu Kim, Taehwan Kim

Abstract: Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have focused on generating images from sound in limited categories or music. In this paper, we propose a novel approach to generate images from in-the-wild sounds. First,… ▽ More Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have focused on generating images from sound in limited categories or music. In this paper, we propose a novel approach to generate images from in-the-wild sounds. First, we convert sound into text using audio captioning. Second, we propose audio attention and sentence attention to represent the rich characteristics of sound and visualize the sound. Lastly, we propose a direct sound optimization with CLIPscore and AudioCLIP and generate images with a diffusion-based model. In experiments, it shows that our model is able to generate high quality images from wild sounds and outperforms baselines in both quantitative and qualitative evaluations on wild audio datasets. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted to ICCV 2023

arXiv:2309.01961 [pdf, other]

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-** Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks. △ Less

Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Tech report, project page https://nice.lgresearch.ai/

arXiv:2308.15224 [pdf, other]

doi 10.1145/3586183.3606770

Papeos: Augmenting Research Papers with Talk Videos

Authors: Tae Soo Kim, Matt Latzke, Jonathan Bragg, Amy X. Zhang, Joseph Chee Chang

Abstract: Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers… ▽ More Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers and talk videos to leverage their complementary nature to provide a rich and fluid research consumption experience. Based on formative and co-design studies, we present Papeos, a novel reading and authoring interface that allow authors to augment their papers by segmenting and localizing talk videos alongside relevant paper passages with automatically generated suggestions. With Papeos, readers can visually skim a paper through clip thumbnails, and fluidly switch between consuming dense text in the paper or visual summaries in the video. In a comparative lab study (n=16), Papeos reduced mental load, scaffolded navigation, and facilitated more comprehensive reading of papers. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Accepted to UIST 2023

arXiv:2308.15013 [pdf, ps, other]

Some identities on degenerate harmonic and degenerate higher-order harmonic numbers

Authors: Taekyun Kim, Dae San Kim

Abstract: The harmonic numbers and higher-order harmonic numbers appear frequently in several areas which are related to combinatorial identities, many expressions involving special functions in analytic number theory, and analysis of algorithms. The aim of this paper is to study the degenerate harmonic and degenerate higher-order harmonic numbers, which are respectively degenerate versions of the harmonic… ▽ More The harmonic numbers and higher-order harmonic numbers appear frequently in several areas which are related to combinatorial identities, many expressions involving special functions in analytic number theory, and analysis of algorithms. The aim of this paper is to study the degenerate harmonic and degenerate higher-order harmonic numbers, which are respectively degenerate versions of the harmonic and higher-order harmonic numbers, in connection with the degenerate zeta and degenerate Hurwitz zeta function. Here the degenerate zeta and degenerate Hurwitz zeta function are respectively degenerate versions of the Riemann zeta and Hurwitz zeta function. We show that several infinite sums involving the degenerate higher-order harmonic numbers can be expressed in terms of the degenerate zeta function. Furthermore, we demonstrate that an infinite sum involving finite sums of products of the degenerate harmonic numbers can be represented by using the degenerate Hurwitz zeta function. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: 11 pages

MSC Class: 11B83; 11S40

arXiv:2308.11974 [pdf, other]

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

Authors: Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, Taehyeong Kim

Abstract: Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introdu… ▽ More Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts. Our project page is available at https://seokhunchoi.github.io/Blending-NeRF/ △ Less

Submitted 11 September, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023. The first two authors contributed equally to this work

arXiv:2308.11444 [pdf, other]

Adaptive Graduated Non-Convexity for Pose Graph Optimization

Authors: Seungwon Choi, Wonseok Kang, Jiseong Chung, Jaehyun Kim, Tae-wan Kim

Abstract: We present a novel approach to robust pose graph optimization based on Graduated Non-Convexity (GNC). Unlike traditional GNC-based methods, the proposed approach employs an adaptive shape function using B-spline to optimize the shape of the robust kernel. This aims to reduce GNC iterations, boosting computational speed without compromising accuracy. When integrated with the open-source riSAM algor… ▽ More We present a novel approach to robust pose graph optimization based on Graduated Non-Convexity (GNC). Unlike traditional GNC-based methods, the proposed approach employs an adaptive shape function using B-spline to optimize the shape of the robust kernel. This aims to reduce GNC iterations, boosting computational speed without compromising accuracy. When integrated with the open-source riSAM algorithm, the method demonstrates enhanced efficiency across diverse datasets. Accompanying open-source code aims to encourage further research in this area. https://github.com/SNU-DLLAB/AGNC-PGO △ Less

Submitted 23 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 4 pages, 3 figures. Accepted for the workshop on Robotic Perception and Map**(ROPEM): Frontier Vision & Learning Techniques, organized at the 2023 International Conference on Intelligent Robots and Systems (IROS)

arXiv:2308.09486 [pdf, ps, other]

New approach to $λ$-stirling numbers

Authors: Dae san Kim, Hye Kyung Kim, Taekyun Kim

Abstract: The aim of this paper is to study the $λ$-Stirling numbers of both kinds which are $λ$-analogues of Stirling numbers of both kinds. Those numbers have nice combinatorial interpretations when $λ$ are positive integers. If $λ$ =1, then the $λ$-Stirling numbers of both kinds reduce to the Stirling numbers of both kinds. We derive new types of generating functions of the $λ$-Stirling numbers of both k… ▽ More The aim of this paper is to study the $λ$-Stirling numbers of both kinds which are $λ$-analogues of Stirling numbers of both kinds. Those numbers have nice combinatorial interpretations when $λ$ are positive integers. If $λ$ =1, then the $λ$-Stirling numbers of both kinds reduce to the Stirling numbers of both kinds. We derive new types of generating functions of the $λ$-Stirling numbers of both kinds which are related to the reciprocals of the generalized rising factorials. Furthermore, some related identities are also derived from those generating functions. In addition, all the corresponding results to the $λ$-Stirling numbers of both kinds are obtained also for the $λ$-analogues of r-Stirling numbers of both kinds which are generalizations of those numbers. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 10 pages

MSC Class: 11B73; 05A18

arXiv:2308.08336 [pdf, other]

Photoemission Evidence of a Novel Charge Order in Kagome Metal FeGe

Authors: Zhisheng Zhao, Tongrui Li, Peng Li, Xueliang Wu, Jianghao Yao, Ziyuan Chen, Shengtao Cui, Zhe Sun, Yichen Yang, Zhicheng Jiang, Zhengtai Liu, Alex Louat, Timur Kim, Cephise Cacho, Aifeng Wang, Yilin Wang, Dawei Shen, Juan Jiang, Donglai Feng

Abstract: A charge order has been discovered to emerge deep into the antiferromagnetic phase of the kagome metal FeGe. To study its origin, the evolution of the low-lying electronic structure across the charge order phase transition is investigated with angle-resolved photoemission spectroscopy. We do not find signatures of nesting between Fermi surface sections or van-Hove singularities in zero-frequency j… ▽ More A charge order has been discovered to emerge deep into the antiferromagnetic phase of the kagome metal FeGe. To study its origin, the evolution of the low-lying electronic structure across the charge order phase transition is investigated with angle-resolved photoemission spectroscopy. We do not find signatures of nesting between Fermi surface sections or van-Hove singularities in zero-frequency joint density of states, and there are no obvious energy gaps at the Fermi level, which exclude the nesting mechanism for the charge order formation in FeGe. However, two obvious changes in the band structure have been detected, i.e., one electron-like band around the K point and another one around the A point move upward in energy position when the charge order forms. These features can be well reproduced by our density-functional theory calculations, where the charge order is primarily driven by magnetic energy saving via large dimerizations of a quarter of Ge1-sites (in the kagome plane) along the c-axis. Our results provide strong support for this novel charge order formation mechanism in FeGe, in contrast to the conventional nesting mechanism. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 6 pages, 4 figures

arXiv:2308.08246 [pdf]

doi 10.1038/s41535-023-00573-8

Orbital-selective charge-density wave in TaTe$_4$

Authors: R. Z. Xu, X. Du, J. S. Zhou, X. Gu, Q. Q. Zhang, Y. D. Li, W. X. Zhao, F. W. Zheng, M. Arita, K. Shimada, T. K. Kim, C. Cacho, Y. F. Guo, Z. K. Liu, Y. L. Chen, L. X. Yang

Abstract: TaTe$_4$, a metallic charge-density wave (CDW) material discovered decades ago, has attracted renewed attention due to its rich interesting properties such as pressure-induced superconductivity and candidate non-trivial topological phase. Here, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structure of TaTe… ▽ More TaTe$_4$, a metallic charge-density wave (CDW) material discovered decades ago, has attracted renewed attention due to its rich interesting properties such as pressure-induced superconductivity and candidate non-trivial topological phase. Here, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structure of TaTe$_4$. At 26 K, we observe a CDW gap as large as 290 meV, which persists up to 500 K. The CDW-modulated band structure shows a complex reconstruction that closely correlates with the lattice distortion. Inside the CDW gap, there exist highly dispersive energy bands contributing to the remnant Fermi surface and metallic behavior in the CDW state. Interestingly, our ab-initio calculation reveals that the large CDW gap mainly opens in the electronic states with out-of-plane orbital components, while the in-gap metallic states originate from in-plane orbitals, suggesting an orbital texture that couples with the CDW order. Our results shed light on the interplay between electron, lattice, and orbital in quasi-one-dimensional CDW materials. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: to appear in npj Quantum Materials

Journal ref: npj Quantum Materials 8, 44 (2023)

arXiv:2308.02313 [pdf, other]

doi 10.1103/PhysRevLett.131.236502

The fate of quasiparticles at high-temperature

Authors: A. Hunter, S. Beck, E. Cappelli, F. Margot, M. Straub, Y. Alexanian, G. Gatti, M. D. Watson, T. K. Kim, C. Cacho, N. C. Plumb, M. Shi, M. Radović, D. A. Sokolov, A. P. Mackenzie, M. Zingl, J. Mravlje, A. Georges, F. Baumberger, A. Tamai

Abstract: We study the temperature evolution of quasiparticles in the correlated metal Sr$_2$RuO$_4$. Our angle resolved photoemission data show that quasiparticles persist up to temperatures above 200~K, far beyond the Fermi liquid regime. Extracting the quasiparticle self-energy we demonstrate that the quasiparticle residue $Z$ increases with increasing temperature. Quasiparticles eventually disappear on… ▽ More We study the temperature evolution of quasiparticles in the correlated metal Sr$_2$RuO$_4$. Our angle resolved photoemission data show that quasiparticles persist up to temperatures above 200~K, far beyond the Fermi liquid regime. Extracting the quasiparticle self-energy we demonstrate that the quasiparticle residue $Z$ increases with increasing temperature. Quasiparticles eventually disappear on approaching the bad metal state of Sr$_2$RuO$_4$ not by losing weight but via excessive broadening from super-Planckian scattering. We further show that the Fermi surface of Sr$_2$RuO$_4$ - defined as the loci where the spectral function peaks - deflates with increasing temperature. These findings are in semi-quantitative agreement with dynamical mean field theory calculations. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: Supplemental Material available upon request

Journal ref: Phys. Rev. Lett. 131, 236502 (2023)

arXiv:2308.00901 [pdf, ps, other]

Higher-order degenerate harmonic numbers related to degenerate Riemann zeta function

Authors: Taekyun Kim, Dae San Kim

Abstract: Recently, Kim-Kim investigated the degenerate harmonic numbers and the degenerate hyperharmonic numbers as degenerate versions of the harmonic numbers and the hyperharmonic numbers, respectively. The aim of this paper is to study the higher-order degenerate harmonic numbers and the higher-order degenerate hyperharmonic numbers as higher-order versions for the degenerate harmonic numbers and the de… ▽ More Recently, Kim-Kim investigated the degenerate harmonic numbers and the degenerate hyperharmonic numbers as degenerate versions of the harmonic numbers and the hyperharmonic numbers, respectively. The aim of this paper is to study the higher-order degenerate harmonic numbers and the higher-order degenerate hyperharmonic numbers as higher-order versions for the degenerate harmonic numbers and the degenerate hyperharmonic numbers, respectively. In addition, we study the higher-order alternating degenerate hyperharmonic numbers as an `alternating version' of the higher-order degenerate hyperharmonic numbers. In more detail, we find generating functions of them, explicit expressions for them and some relations among them for those three kinds of numbe △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 9 pages

MSC Class: 11b83

arXiv:2308.00846 [pdf, other]

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology

Authors: Bongjoon Hyun, Taehun Kim, Dongjae Lee, Minsoo Rhu

Abstract: Processing-in-memory (PIM) has been explored for decades by computer architects, yet it has never seen the light of day in real-world products due to their high design overheads and lack of a killer application. With the advent of critical memory-intensive workloads, several commercial PIM technologies have been introduced to the market ranging from domain-specific PIM architectures to more genera… ▽ More Processing-in-memory (PIM) has been explored for decades by computer architects, yet it has never seen the light of day in real-world products due to their high design overheads and lack of a killer application. With the advent of critical memory-intensive workloads, several commercial PIM technologies have been introduced to the market ranging from domain-specific PIM architectures to more general-purpose PIM architectures. In this work, we deepdive into UPMEM's commercial PIM technology, a general-purpose PIM-enabled parallel architecture that is highly programmable. Our first key contribution is the development of a flexible simulation framework for PIM. The simulator we developed (aka PIMulator) enables the compilation of UPMEM-PIM source codes into its compiled machine-level instructions, which are subsequently consumed by our cycle-level performance simulator. Using PIMulator, we demystify UPMEM's PIM design through a detailed characterization study. Building on top of our characterization, we conduct a series of case studies to pathfind important architectural features that we deem will be critical for future PIM architectures to support △ Less

Submitted 6 March, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: Published at the 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA-30), 2024

arXiv:2307.16425 [pdf, other]

All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio

Authors: Taejun Kim, Juhan Nam

Abstract: Music is characterized by complex hierarchical structures. Develo** a comprehensive model to capture these structures has been a significant challenge in the field of Music Information Retrieval (MIR). Prior research has mainly focused on addressing individual tasks for specific hierarchical levels, rather than providing a unified approach. In this paper, we introduce a versatile, all-in-one mod… ▽ More Music is characterized by complex hierarchical structures. Develo** a comprehensive model to capture these structures has been a significant challenge in the field of Music Information Retrieval (MIR). Prior research has mainly focused on addressing individual tasks for specific hierarchical levels, rather than providing a unified approach. In this paper, we introduce a versatile, all-in-one model that jointly performs beat and downbeat tracking as well as functional structure segmentation and labeling. The model leverages source-separated spectrograms as inputs and employs dilated neighborhood attentions to capture temporal long-term dependencies, along with non-dilated attentions for local instrumental dependencies. Consequently, the proposed model achieves state-of-the-art performance in all four tasks on the Harmonix Set while maintaining a relatively lower number of parameters compared to recent state-of-the-art models. Furthermore, our ablation study demonstrates that the concurrent learning of beats, downbeats, and segments can lead to enhanced performance, with each task mutually benefiting from the others. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: This paper has been accepted for publication at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023

arXiv:2307.14659 [pdf, other]

LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement

Authors: Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tae-Kyun Kim, Wei Liu, Hongdong Li

Abstract: Current deep learning methods for low-light image enhancement (LLIE) typically rely on pixel-wise map** learned from paired data. However, these methods often overlook the importance of considering degradation representations, which can lead to sub-optimal outcomes. In this paper, we address this limitation by proposing a degradation-aware learning scheme for LLIE using diffusion models, which e… ▽ More Current deep learning methods for low-light image enhancement (LLIE) typically rely on pixel-wise map** learned from paired data. However, these methods often overlook the importance of considering degradation representations, which can lead to sub-optimal outcomes. In this paper, we address this limitation by proposing a degradation-aware learning scheme for LLIE using diffusion models, which effectively integrates degradation and image priors into the diffusion process, resulting in improved image enhancement. Our proposed degradation-aware learning scheme is based on the understanding that degradation representations play a crucial role in accurately modeling and capturing the specific degradation patterns present in low-light images. To this end, First, a joint learning framework for both image generation and image enhancement is presented to learn the degradation representations. Second, to leverage the learned degradation representations, we develop a Low-Light Diffusion model (LLDiffusion) with a well-designed dynamic diffusion module. This module takes into account both the color map and the latent degradation representations to guide the diffusion process. By incorporating these conditioning factors, the proposed LLDiffusion can effectively enhance low-light images, considering both the inherent degradation patterns and the desired color fidelity. Finally, we evaluate our proposed method on several well-known benchmark datasets, including synthetic and real-world unpaired datasets. Extensive experiments on public benchmarks demonstrate that our LLDiffusion outperforms state-of-the-art LLIE methods both quantitatively and qualitatively. The source code and pre-trained models are available at https://github.com/TaoWangzj/LLDiffusion. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 16 pages, 9 figures

arXiv:2307.13991 [pdf, other]

METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation

Authors: Junwon Seo, Taekyung Kim, Seongyong Ahn, Kiho Kwak

Abstract: Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a va… ▽ More Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a variety of environments. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains. △ Less

Submitted 4 March, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: Our video can be found at https://youtu.be/4rIAMM1ZKMo

arXiv:2307.09254 [pdf, other]

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

Authors: Sangdon Park, Taesoo Kim

Abstract: Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximatel… ▽ More Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2307.08150 [pdf, other]

Efficient Treatment Effect Estimation with Out-of-bag Post-stratification

Authors: Taebin Kim, Lili Wang, Randy Lai, Sangho Yoon

Abstract: Post-stratification is often used to estimate treatment effects with higher efficiency. However, the majority of existing post-stratification frameworks depend on prior knowledge of the distributions of covariates and assume that the units are classified into post-strata without error. We propose a novel method to determine a proper stratification rule by map** the covariates into a post-stratif… ▽ More Post-stratification is often used to estimate treatment effects with higher efficiency. However, the majority of existing post-stratification frameworks depend on prior knowledge of the distributions of covariates and assume that the units are classified into post-strata without error. We propose a novel method to determine a proper stratification rule by map** the covariates into a post-stratification factor (PSF) using predictive regression models. Inspired by the bootstrap aggregating (bagging) method, we utilize the out-of-bag delete-D jackknife to estimate strata boundaries, strata weights, and the variance of the point estimate. Confidence intervals are constructed with these estimators to take into account the additional variability coming from uncertainty in the strata boundaries and weights. Extensive simulations show that our proposed method consistently improves the efficiency of the estimates when the regression models are predictive and tends to be more robust than the regression imputation method. △ Less

Submitted 12 September, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

arXiv:2307.08100 [pdf, other]

FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow

Authors: Jihyun Lee, Junbong Jang, Donghwan Kim, Minhyuk Sung, Tae-Kyun Kim

Abstract: Recent 4D shape representations model continuous temporal evolution of implicit shapes by (1) learning query flows without leveraging shape and articulation priors or (2) decoding shape occupancies separately for each time value. Thus, they do not effectively capture implicit correspondences between articulated shapes or regularize jittery temporal deformations. In this work, we present FourierHan… ▽ More Recent 4D shape representations model continuous temporal evolution of implicit shapes by (1) learning query flows without leveraging shape and articulation priors or (2) decoding shape occupancies separately for each time value. Thus, they do not effectively capture implicit correspondences between articulated shapes or regularize jittery temporal deformations. In this work, we present FourierHandFlow, which is a spatio-temporally continuous representation for human hands that combines a 3D occupancy field with articulation-aware query flows represented as Fourier series. Given an input RGB sequence, we aim to learn a fixed number of Fourier coefficients for each query flow to guarantee smooth and continuous temporal shape dynamics. To effectively model spatio-temporal deformations of articulated hands, we compose our 4D representation based on two types of Fourier query flow: (1) pose flow that models query dynamics influenced by hand articulation changes via implicit linear blend skinning and (2) shape flow that models query-wise displacement flow. In the experiments, our method achieves state-of-the-art results on video-based 4D reconstruction while being computationally more efficient than the existing 3D/4D implicit shape representations. We additionally show our results on motion inter- and extrapolation and texture transfer using the learned correspondences of implicit shapes. To the best of our knowledge, FourierHandFlow is the first neural 4D continuous hand representation learned from RGB videos. The code will be publicly accessible. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: 16 pages, 6 figures, under review

arXiv:2307.07684 [pdf, other]

doi 10.1038/s41467-023-39457-7

Unveiling phase diagram of the lightly doped high-Tc cuprate superconductors with disorder removed

Authors: Kifu Kurokawa, Shunsuke Isono, Yoshimitsu Kohama, So Kunisada, Shiro Sakai, Ryotaro Sekine, Makoto Okubo, Matthew D. Watson, Timur K. Kim, Cephise Cacho, Shik Shin, Takami Tohyama, Kazuyasu Tokiwa, Takeshi Kondo

Abstract: The currently established electronic phase diagram of cuprates is based on a study of single- and double-layered compounds. These CuO$_2$ planes, however, are directly contacted with dopant layers, thus inevitably disordered with an inhomogeneous electronic state. Here, we solve this issue by investigating a 6-layered Ba$_2$Ca$_5$Cu$_6$O$_{12}$(F,O)$_2$ with inner CuO$_2$ layers, which are clean w… ▽ More The currently established electronic phase diagram of cuprates is based on a study of single- and double-layered compounds. These CuO$_2$ planes, however, are directly contacted with dopant layers, thus inevitably disordered with an inhomogeneous electronic state. Here, we solve this issue by investigating a 6-layered Ba$_2$Ca$_5$Cu$_6$O$_{12}$(F,O)$_2$ with inner CuO$_2$ layers, which are clean with the extremely low disorder, by angle-resolved photoemission spectroscopy (ARPES) and quantum oscillation measurements. We find a tiny Fermi pocket with a do** level less than 1% to exhibit well-defined quasiparticle peaks which surprisingly lack the polaronic feature. This provides the first evidence that the slightest amount of carriers is enough to turn a Mott insulating state into a metallic state with long-lived quasiparticles. By tuning hole carriers, we also find an unexpected phase transition from the superconducting to metallic states at 4%. Our results are distinct from the nodal liquid state with polaronic features proposed as an anomaly of the heavily underdoped cuprates. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Journal ref: Nature Communications 14, 4064 (2023)

arXiv:2307.06398 [pdf, other]

Trainability, Expressivity and Interpretability in Gated Neural ODEs

Authors: Timothy Doyeon Kim, Tankut Can, Kamesh Krishnamurthy

Abstract: Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differen… ▽ More Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2307.04135 [pdf]

Dzyaloshinskii-Moriya torque-driven resonance in antiferromagnetic α-Fe2O3

Authors: Qiyao Liu, Taeheon Kim, Kyusup Lee, Dongsheng Yang, Dushyant Kumar, Fanrui Hu, Hyunsoo Yang

Abstract: We examine the high-frequency optical mode of α-Fe2O3 and report that Dzyaloshinskii-Moriya (DM) interaction generates a new type of torque on the magnetic resonance. Using a continuous-wave terahertz interferometer, we measure the optical mode spectra, where the asymmetric absorption with a large amplitude and broad linewidth is observed near the magnetic transition point, Morin temperature (TM ~… ▽ More We examine the high-frequency optical mode of α-Fe2O3 and report that Dzyaloshinskii-Moriya (DM) interaction generates a new type of torque on the magnetic resonance. Using a continuous-wave terahertz interferometer, we measure the optical mode spectra, where the asymmetric absorption with a large amplitude and broad linewidth is observed near the magnetic transition point, Morin temperature (TM ~ 254.3 K). Based on the spin wave model, the spectral anomaly is attributed to the DM interaction-induced torque, enabling to extract the strength of DM interaction field of 4 T. Our work opens a new avenue to characterize the spin resonance behaviors at an antiferromagnetic singular point for next-generation and high-frequency spin-based information technologies. △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: 4 figures

arXiv:2307.03067 [pdf, other]

DeepOnto: A Python Package for Ontology Engineering with Deep Learning

Authors: Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, Brahmananda Sapkota

Abstract: Integrating deep learning techniques, particularly language models (LMs), with knowledge representation techniques like ontologies has raised widespread attention, urging the need of a platform that supports both paradigms. Although packages such as OWL API and Jena offer robust support for basic ontology processing features, they lack the capability to transform various types of information withi… ▽ More Integrating deep learning techniques, particularly language models (LMs), with knowledge representation techniques like ontologies has raised widespread attention, urging the need of a platform that supports both paradigms. Although packages such as OWL API and Jena offer robust support for basic ontology processing features, they lack the capability to transform various types of information within ontologies into formats suitable for downstream deep learning-based applications. Moreover, widely-used ontology APIs are primarily Java-based while deep learning frameworks like PyTorch and Tensorflow are mainly for Python programming. To address the needs, we present DeepOnto, a Python package designed for ontology engineering with deep learning. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to incorporate other essential components including reasoning, verbalisation, normalisation, taxonomy, projection, and more. Building on this module, DeepOnto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methods, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of DeepOnto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI). △ Less

Submitted 8 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted by the Semantic Web Journal

arXiv:2307.01193 [pdf, other]

Squeezing Large-Scale Diffusion Models for Mobile

Authors: Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, Hyungjun Kim

Abstract: The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more t… ▽ More The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more than one billion parameters to mobile devices poses distinctive challenges due to the limited computational and memory resources, which may vary according to the device. In this paper, we present the challenges and solutions for deploying Stable Diffusion on mobile devices with TensorFlow Lite framework, which supports both iOS and Android devices. The resulting Mobile Stable Diffusion achieves the inference latency of smaller than 7 seconds for a 512x512 image generation on Android devices with mobile GPUs. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 7 pages, 8 figures, ICML 2023 Workshop on Challenges in Deployable Generative AI

arXiv:2306.16692 [pdf, other]

Performance Evaluation of Transport Protocols and Roadmap to a High-Performance Transport Design for Immersive Applications

Authors: Inayat Ali, Seungwoo Hong, Pyung-koo Park, Tae Yeon Kim

Abstract: Immersive technologies such as virtual reality (VR), augmented reality (AR), and holograms will change users' digital experience. These immersive technologies have a multitude of applications, including telesurgeries, teleconferencing, Internet shop**, computer games, etc. Holographic-type communication (HTC) is a type of augmented reality media that provides an immersive experience to Internet… ▽ More Immersive technologies such as virtual reality (VR), augmented reality (AR), and holograms will change users' digital experience. These immersive technologies have a multitude of applications, including telesurgeries, teleconferencing, Internet shop**, computer games, etc. Holographic-type communication (HTC) is a type of augmented reality media that provides an immersive experience to Internet users. However, HTC has different characteristics and network requirements, and the existing network architecture and transport protocols may not be able to cope with the stringent network requirements of HTC. Therefore, in this paper, we provide an in-depth and critical study of the transport protocols for HTC. We also discuss the characteristics and the network requirements for HTC. Based on the performance evaluation of the existing transport protocols, we propose a roadmap to design new high-performance transport protocols for immersive applications. △ Less

Submitted 30 June, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted in The 14th International Conference on Ubiquitous and Future Networks (ICUFN 2023), Paris France, July 4-7 2023

arXiv:2306.16683 [pdf, other]

The Seoul National University AGN Monitoring Project IV: H$α$ reverberation map** of 6 AGNs and the H$α$ Size-Luminosity Relation

Authors: Ho** Cho, Jong-Hak Woo, Shu Wang, Donghoon Son, Jae** Shin, Suvendu Rakshit, Aaron J. Barth, Vardha N. Bennert, Elena Gallo, Edmund Hodges-Kluck, Tommaso Treu, Hyun-** Bae, Wan** Cho, Adi Foord, Jaehyuk Geum, Yashashree Jadhav, Yiseul Jeon, Kyle M. Kabasares, Daeun Kang, Wonseok Kang, Changseok Kim, Donghwa Kim, Min** Kim, Taewoo Kim, Huynh Anh N. Le , et al. (7 additional authors not shown)

Abstract: The broad line region (BLR) size-luminosity relation has paramount importance for estimating the mass of black holes in active galactic nuclei (AGNs). Traditionally, the size of the H$β$ BLR is often estimated from the optical continuum luminosity at 5100\angstrom{} , while the size of the H$α$ BLR and its correlation with the luminosity is much less constrained. As a part of the Seoul National Un… ▽ More The broad line region (BLR) size-luminosity relation has paramount importance for estimating the mass of black holes in active galactic nuclei (AGNs). Traditionally, the size of the H$β$ BLR is often estimated from the optical continuum luminosity at 5100\angstrom{} , while the size of the H$α$ BLR and its correlation with the luminosity is much less constrained. As a part of the Seoul National University AGN Monitoring Project (SAMP) which provides six-year photometric and spectroscopic monitoring data, we present our measurements of the H$α$ lags of 6 high-luminosity AGNs. Combined with the measurements for 42 AGNs from the literature, we derive the size-luminosity relations of H$α$ BLR against broad H$α$ and 5100\angstrom{} continuum luminosities. We find the slope of the relations to be $0.61\pm0.04$ and $0.59\pm0.04$, respectively, which are consistent with the \hb{} size-luminosity relation. Moreover, we find a linear relation between the 5100\angstrom{} continuum luminosity and the broad H$α$ luminosity across 7 orders of magnitude. Using these results, we propose a new virial mass estimator based on the H$α$ broad emission line, finding that the previous mass estimates based on the scaling relations in the literature are overestimated by up to 0.7 dex at masses lower than $10^7$~M$_{\odot}$. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted for publication in ApJ (Jun. 25th, 2023). 21 pages, 12 figures

arXiv:2306.15139 [pdf]

doi 10.1038/s41467-022-29447-6

Non-invasive digital etching of van der Waals semiconductors

Authors: Jian Zhou, Chunchen Zhang, Li Shi, Xiaoqing Chen Tae-Soo Kim, Minseung Gyeon, Jian Chen **lan Wang, Linwei Yu Xinran Wang Kibum Kang, Emanuele Orgiu, Paolo Samorì, Kenji Watanabe, Takashi Taniguchi, Kazuhito Tsukagoshi, Peng Wang, Yi Shi, Songlin Li

Abstract: The capability to finely tailor material thickness with simultaneous atomic precision and non-invasivity would be useful for constructing quantum platforms and post-Moore microelectronics. However, it remains challenging to attain synchronized controls over tailoring selectivity and precision. Here we report a protocol that allows for non-invasive and atomically digital etching of van der Waals tr… ▽ More The capability to finely tailor material thickness with simultaneous atomic precision and non-invasivity would be useful for constructing quantum platforms and post-Moore microelectronics. However, it remains challenging to attain synchronized controls over tailoring selectivity and precision. Here we report a protocol that allows for non-invasive and atomically digital etching of van der Waals transition-metal dichalcogenides through selective alloying via low-temperature thermal diffusion and subsequent wet etching. The mechanism of selective alloying between sacrifice metal atoms and defective or pristine dichalcogenides is analyzed with high-resolution scanning transmission electron microscopy. Also, the non-invasive nature and atomic level precision of our etching technique are corroborated by consistent spectral, crystallographic and electrical characterization measurements. The low-temperature charge mobility of as-etched MoS$_2$ reaches up to $1200\,$cm$^{2}\cdot$V$^{-1}\cdot$s$^{-1}$, comparable to that of exfoliated pristine counterparts. The entire protocol represents a highly precise and non-invasive tailoring route for material manipulation. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 46 pages, 4 figures, with SI

Journal ref: Nature Communications, 13, 1844 (2022)

arXiv:2306.14601 [pdf, other]

Safe Navigation in Unstructured Environments by Minimizing Uncertainty in Control and Perception

Authors: Junwon Seo, Jungwi Mun, Taekyung Kim

Abstract: Uncertainty in control and perception poses challenges for autonomous vehicle navigation in unstructured environments, leading to navigation failures and potential vehicle damage. This paper introduces a framework that minimizes control and perception uncertainty to ensure safe and reliable navigation. The framework consists of two uncertainty-aware models: a learning-based vehicle dynamics model… ▽ More Uncertainty in control and perception poses challenges for autonomous vehicle navigation in unstructured environments, leading to navigation failures and potential vehicle damage. This paper introduces a framework that minimizes control and perception uncertainty to ensure safe and reliable navigation. The framework consists of two uncertainty-aware models: a learning-based vehicle dynamics model and a self-supervised traversability estimation model. We train a vehicle dynamics model that can quantify the epistemic uncertainty of the model to perform active exploration, resulting in the efficient collection of training data and effective avoidance of uncertain state-action spaces. In addition, we employ meta-learning to train a traversability cost prediction network. The model can be trained with driving data from a variety of types of terrain, and it can online-adapt based on interaction experiences to reduce the aleatoric uncertainty. Integrating the dynamics model and traversability cost prediction model with a sampling-based model predictive controller allows for optimizing trajectories that avoid uncertain terrains and state-action spaces. Experimental results demonstrate that the proposed method reduces uncertainty in prediction and improves stability in autonomous vehicle navigation in unstructured environments. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: RSS 2023 Workshop on Inference and Decision Making for Autonomous Vehicles (IDMAV)

arXiv:2306.14513 [pdf, other]

doi 10.1007/s40042-023-00878-8

Microscopic conductivity of passive films on ferritic stainless steel for hydrogen fuel cells

Authors: Taemin Ahn, Tae-Hwan Kim

Abstract: Hydrogen fuel cells offer a clean and sustainable energy conversion solution. The bipolar separator plate, a critical component in fuel cells, plays a vital role in preventing reactant gas cross-contamination and facilitating efficient ion transport in a fuel cell. High chromium ferritic stainless steel with an artificially formed thin chromium oxide passive film has recently gained attention due… ▽ More Hydrogen fuel cells offer a clean and sustainable energy conversion solution. The bipolar separator plate, a critical component in fuel cells, plays a vital role in preventing reactant gas cross-contamination and facilitating efficient ion transport in a fuel cell. High chromium ferritic stainless steel with an artificially formed thin chromium oxide passive film has recently gained attention due to its superior electrical conductivity and corrosion resistance, making it a suitable material for separators. In this study, we investigate the microscopic electrical conductivity of the intrinsic passive oxide film on such ferritic stainless steel. Through advanced surface characterization techniques such as current sensing atomic force microscopy and scanning tunneling microscopy/spectroscopy, we discover highly conductive regions within the film that vary depending on location. These findings provide valuable insights into the behavior of the passive oxide film in fuel cells. By understanding the microscopic electrical properties, we can enhance the design and performance of separator materials in hydrogen fuel cells. Ultimately, this research contributes to a broader understanding of separator materials and supports the wider application of hydrogen fuel cells. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 6 pages, 4 figures. The following article has beenaccepted by Journal of the Korean Physical Society. After it is published, it will be found at the following DOI

Journal ref: Journal of the Korean Physical Society 83, 289-295 (2023)

arXiv:2306.14501 [pdf, other]

doi 10.1063/5.0158595

Atomic-Scale Tailoring of Chemisorbed Atomic Oxygen on Epitaxial Graphene for Graphene-Based Electronic Devices

Authors: Tae Soo Kim, Taemin Ahn, Tae-Hwan Kim, Hee Cheul Choi, Han Woong Yeom

Abstract: Graphene, with its unique band structure, mechanical stability, and high charge mobility, holds great promise for next-generation electronics. Nevertheless, its zero band gap challenges the control of current flow through electrical gating, consequently limiting its practical applications. Recent research indicates that atomic oxygen can oxidize epitaxial graphene in a vacuum without causing unwan… ▽ More Graphene, with its unique band structure, mechanical stability, and high charge mobility, holds great promise for next-generation electronics. Nevertheless, its zero band gap challenges the control of current flow through electrical gating, consequently limiting its practical applications. Recent research indicates that atomic oxygen can oxidize epitaxial graphene in a vacuum without causing unwanted damage. In this study, we have investigated the effects of chemisorbed atomic oxygen on the electronic properties of epitaxial graphene, using scanning tunneling microscopy (STM). Our findings reveal that oxygen atoms effectively modify the electronic states of graphene, resulting in a band gap at its Dirac point. Furthermore, we demonstrate that it is possible to selectively induce desorption or hop** of oxygen atoms with atomic precision by applying appropriate bias sweeps with an STM tip. These results suggest the potential for atomic-scale tailoring of graphene oxide, enabling the development of graphene-based atomic-scale electronic devices. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 5 pages, 3 figures. The following article has been accepted by Applied Physics Letters. After it is published, it will be found at the following DOI

Journal ref: Appl. Phys. Lett. 123, 023502 (2023)

arXiv:2306.14055 [pdf, other]

Transforming a Quadruped into a Guide Robot for the Visually Impaired: Formalizing Wayfinding, Interaction Modeling, and Safety Mechanism

Authors: J. Taery Kim, Wenhao Yu, Yash Kothari, Jie Tan, Greg Turk, Sehoon Ha

Abstract: This paper explores the principles for transforming a quadrupedal robot into a guide robot for individuals with visual impairments. A guide robot has great potential to resolve the limited availability of guide animals that are accessible to only two to three percent of the potential blind or visually impaired (BVI) users. To build a successful guide robot, our paper explores three key topics: (1)… ▽ More This paper explores the principles for transforming a quadrupedal robot into a guide robot for individuals with visual impairments. A guide robot has great potential to resolve the limited availability of guide animals that are accessible to only two to three percent of the potential blind or visually impaired (BVI) users. To build a successful guide robot, our paper explores three key topics: (1) formalizing the navigation mechanism of a guide dog and a human, (2) develo** a data-driven model of their interaction, and (3) improving user safety. First, we formalize the wayfinding task of the human-guide robot team using Markov Decision Processes based on the literature and interviews. Then we collect real human-robot interaction data from three visually impaired and six sighted people and develop an interaction model called the ``Delayed Harness'' to effectively simulate the navigation behaviors of the team. Additionally, we introduce an action shielding mechanism to enhance user safety by predicting and filtering out dangerous actions. We evaluate the developed interaction model and the safety mechanism in simulation, which greatly reduce the prediction errors and the number of collisions, respectively. We also demonstrate the integrated system on a quadrupedal robot with a rigid harness, by guiding users over $100+$~m trajectories. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 16 pages, 8 figures

Journal ref: Proceedings of The 7th Conference on Robot Learning, PMLR 229:2288-2303, 2023

arXiv:2306.13913 [pdf, other]

Temporal Analysis of Misinformation on Parler

Authors: Eliana Norton, Thaïs Thomas, Akaash Kolluri, Torie Hyunsik Kim, Dhiraj Murthy

Abstract: Social media platforms have facilitated the rapid spread of dis- and mis-information. Parler, a US-based fringe social media platform that positions itself as a champion of free-speech, has had substantial information integrity issues. In this study, we seek to characterize temporal misinformation trends on Parler. Comparing a dataset of 189 million posts and comments from Parler against 1591 rate… ▽ More Social media platforms have facilitated the rapid spread of dis- and mis-information. Parler, a US-based fringe social media platform that positions itself as a champion of free-speech, has had substantial information integrity issues. In this study, we seek to characterize temporal misinformation trends on Parler. Comparing a dataset of 189 million posts and comments from Parler against 1591 rated claims (false, barely true, half true, mostly true, pants on fire, true) from Politifact, we identified 231,881 accuracy-labeled posts on Parler. We used BERT-Topic to thematically analyze the Poltifact claims, and then compared trends in these categories to real world events to contextualize their distribution. We identified three distinct categories of misinformation circulating on Parler: COVID-19, the 2020 presidential election, and the Black Lives Matter movement. Our results are significant, with a surprising 69.2% of posts in our dataset found to be 'false' and 7.6% 'barely true'. We also found that when Parler posts ('parleys') containing misinformation were posted increased around major events (e.g., George Floyd's murder). △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 15 pages, 4 figures

arXiv:2306.12009 [pdf, ps, other]

Some numbers and polynomials related to degenerate harmonic and degenerate hyperharmonic numbers

Authors: Dae San Kim, Taekyun Kim

Abstract: Recently, the degenerate harmonic and the degenerate hyperharmonic numbers are introduced respectively as degenerate versions of the harmonic and the hyperharmonic numbers. The aim of this paper is to introduce the degenerate harmonic-Fubini polynomials and numbers related to the degenerate harmonic numbers and to study their properties, explicit expressions and some identities. In addition, as ge… ▽ More Recently, the degenerate harmonic and the degenerate hyperharmonic numbers are introduced respectively as degenerate versions of the harmonic and the hyperharmonic numbers. The aim of this paper is to introduce the degenerate harmonic-Fubini polynomials and numbers related to the degenerate harmonic numbers and to study their properties, explicit expressions and some identities. In addition, as generalizations of those polynomials and numbers, we also introduce the degenerate hyperharmonic-Fubini polynomials and numbers related to the degenerate hyperharmonic numbers and derive similar results to the degenerate harmonic-Fubini polynomials and numbers. \end{abstract} △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 12 pages

MSC Class: 11B73; 11B83

arXiv:2306.11526 [pdf, other]

Understanding Contrastive Learning Through the Lens of Margins

Authors: Daniel Rho, TaeSoo Kim, Sooill Park, Jaehyun Park, JaeHan Park

Abstract: Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enha… ▽ More Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enhancing face and speaker recognition tasks. Interestingly, despite the shared reliance on the same similarity metrics and objective functions, contrastive learning has not actively adopted margins. Furthermore, decision-boundary-based explanations are the only ones that have been used to explain the effect of margins in contrastive learning. In this work, we propose a new perspective to understand the role of margins based on gradient analysis. Based on the new perspective, we analyze how margins affect gradients of contrastive learning and separate the effect into more elemental levels. We separately analyze each and provide possible directions for improving contrastive learning. Our experimental results demonstrate that emphasizing positive samples and scaling gradients depending on positive sample angles and logits are the keys to improving the generalization performance of contrastive learning in both seen and unseen datasets, and other factors can only marginally improve performance. △ Less

Submitted 10 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.11339 [pdf, other]

Masking Augmentation for Supervised Learning

Authors: Byeongho Heo, Taekyung Kim, Sangdoo Yun, Dongyoon Han

Abstract: Pre-training using random masking has emerged as a novel trend in training techniques. However, supervised learning faces a challenge in adopting masking augmentations, primarily due to unstable training. In this paper, we propose a novel way to involve masking augmentations dubbed Masked Sub-model (MaskSub). MaskSub consists of the main-model and sub-model; while the former enjoys conventional tr… ▽ More Pre-training using random masking has emerged as a novel trend in training techniques. However, supervised learning faces a challenge in adopting masking augmentations, primarily due to unstable training. In this paper, we propose a novel way to involve masking augmentations dubbed Masked Sub-model (MaskSub). MaskSub consists of the main-model and sub-model; while the former enjoys conventional training recipes, the latter leverages the benefit of strong masking augmentations in training. MaskSub addresses the challenge by mitigating adverse effects through a relaxed loss function similar to a self-distillation loss. Our analysis shows that MaskSub improves performance, with the training loss converging even faster than regular training, which suggests our method facilitates training. We further validate MaskSub across diverse training recipes and models, including DeiT-III, MAE fine-tuning, CLIP fine-tuning, ResNet, and Swin Transformer. Our results show that MaskSub consistently provides significant performance gains across all the cases. MaskSub provides a practical and effective solution for introducing additional regularization under various training recipes. Code available at https://github.com/naver-ai/augsub △ Less

Submitted 26 February, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 17 pages, 3 figures

arXiv:2306.08839 [pdf, other]

Knowledge Assembly: Semi-Supervised Multi-Task Learning from Multiple Datasets with Disjoint Labels

Authors: Federica Spinola, Philipp Benz, Minhyeong Yu, Tae-hoon Kim

Abstract: In real-world scenarios we often need to perform multiple tasks simultaneously. Multi-Task Learning (MTL) is an adequate method to do so, but usually requires datasets labeled for all tasks. We propose a method that can leverage datasets labeled for only some of the tasks in the MTL framework. Our work, Knowledge Assembly (KA), learns multiple tasks from disjoint datasets by leveraging the unlabel… ▽ More In real-world scenarios we often need to perform multiple tasks simultaneously. Multi-Task Learning (MTL) is an adequate method to do so, but usually requires datasets labeled for all tasks. We propose a method that can leverage datasets labeled for only some of the tasks in the MTL framework. Our work, Knowledge Assembly (KA), learns multiple tasks from disjoint datasets by leveraging the unlabeled data in a semi-supervised manner, using model augmentation for pseudo-supervision. Whilst KA can be implemented on any existing MTL networks, we test our method on jointly learning person re-identification (reID) and pedestrian attribute recognition (PAR). We surpass the single task fully-supervised performance by $4.2\%$ points for reID and $0.9\%$ points for PAR. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted at CVPRW'23

arXiv:2306.05837 [pdf, other]

doi 10.1364/OE.497721

Micromotion compensation of trapped ions by qubit transition and direct scanning of dc voltages

Authors: Woojun Lee, Daun Chung, Jiyong Kang, Honggi Jeon, Changhyun Jung, Dong-Il "Dan" Cho, Taehyun Kim

Abstract: Excess micromotion is detrimental to accurate qubit control of trapped ions, thus measuring and minimizing it is crucial. In this paper, we present a simple approach for measuring and suppressing excess micromotion of trapped ions by leveraging the existing laser-driven qubit transition scheme combined with direct scanning of dc voltages. The compensation voltage is deduced by analyzing the Bessel… ▽ More Excess micromotion is detrimental to accurate qubit control of trapped ions, thus measuring and minimizing it is crucial. In this paper, we present a simple approach for measuring and suppressing excess micromotion of trapped ions by leveraging the existing laser-driven qubit transition scheme combined with direct scanning of dc voltages. The compensation voltage is deduced by analyzing the Bessel expansion of a scanned qubit transition rate. The method provides a fair level of sensitivity for practical quantum computing applications, while demanding minimal deviation of trap condition. By accomplishing compensation of excess micromotion in the qubit momentum-excitation direction, the scheme offers an additional avenue for excess micromotion compensation, complementing existing compensation schemes. △ Less

Submitted 2 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: 11 pages, 6 figures

Journal ref: Opt. Express 31 (2023) 33787-33798

arXiv:2306.04401 [pdf, other]

doi 10.1063/5.0147344

Systematic investigation of wear-induced cold welding in ultrahigh vacuum piezoelectric motors with non-metallic coatings

Authors: Taemin Ahn, Sungmin Song, Ungdon Ham, Tae-Hwan Kim

Abstract: Piezoelectric motors are widely used in various applications where both precision positioning and miniaturization are required. Either inertial or quasi-static motors are commonly employed because of their high accuracy, which demands consistent sliding friction between moving sliders and their static counterparts for reliable operation. In general, slider wear is unavoidable after long-term use.… ▽ More Piezoelectric motors are widely used in various applications where both precision positioning and miniaturization are required. Either inertial or quasi-static motors are commonly employed because of their high accuracy, which demands consistent sliding friction between moving sliders and their static counterparts for reliable operation. In general, slider wear is unavoidable after long-term use. Especially, the wear often leads to more serious cold welding in vacuum, which also refers to friction welding induced by direct contact between similar metal surfaces. Non-metallic coatings can prevent such unwanted cold welding in ultrahigh vacuum (UHV) applications. However, the practical reliability of available coatings under UHV conditions still remains to be elucidated. Here, we systematically investigate the practical reliability of commonly used UHV-compatible lubricant coatings for piezoelectric motors in vacuum. We demonstrate that polytetrafluoroethylene (PTFE) shows the most reliable long-term operation in vacuum, while other coatings eventually lead to wear-induced cold welding and motor failure. Our finding provides a simple and effective way to improve the long-term performance of UHV piezoelectric motors by coating the slider surface with PTFE. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 4 pages, 3 figures. The following article has been accepted by Review of Scientific Instruments. After it is published, it will be found at https://doi.org/10.1063/5.0147344

Journal ref: Rev. Sci. Instrum. 94, 063702 (2023)

arXiv:2306.04139 [pdf, other]

A Comprehensive Survey on Generative Diffusion Models for Structured Data

Authors: Heejoon Koo, To Eun Kim

Abstract: In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series data, has been received comparatively limited attention from the deep learning research community, despite its omnipresence and extensive applications. Thus, th… ▽ More In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series data, has been received comparatively limited attention from the deep learning research community, despite its omnipresence and extensive applications. Thus, there is still a lack of literature and its reviews on structured data modelling via diffusion models, compared to other data modalities such as visual and textual data. To address this gap, we present a comprehensive review of recently proposed diffusion models in the field of structured data. First, this survey provides a concise overview of the score-based diffusion model theory, subsequently proceeding to the technical descriptions of the majority of pioneering works that used structured data in both data-driven general tasks and domain-specific applications. Thereafter, we analyse and discuss the limitations and challenges shown in existing works and suggest potential research directions. We hope this review serves as a catalyst for the research community, promoting developments in generative diffusion models for structured data. △ Less

Submitted 8 July, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 20 pages, 1 figure, 2 tables

arXiv:2306.03361 [pdf, other]

WHAT, WHEN, and HOW to Ground: Designing User Persona-Aware Conversational Agents for Engaging Dialogue

Authors: Deuksin Kwon, Sunwoo Lee, Ki Hyun Kim, Seo** Lee, Taeyoon Kim, Eric Davis

Abstract: This paper presents a method for building a personalized open-domain dialogue system to address the WWH (WHAT, WHEN, and HOW) problem for natural response generation in a commercial setting, where personalized dialogue responses are heavily interleaved with casual response turns. The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the de… ▽ More This paper presents a method for building a personalized open-domain dialogue system to address the WWH (WHAT, WHEN, and HOW) problem for natural response generation in a commercial setting, where personalized dialogue responses are heavily interleaved with casual response turns. The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the design of personalized conversation datasets to address the challenges of WWH in personalized, open-domain dialogue systems. Our work effectively balances dialogue fluency and tendency to ground, while also introducing a response-type label to improve the controllability and explainability of the grounded responses. The combination of these methods leads to more fluent conversations, as evidenced by subjective human evaluations as well as objective evaluations. △ Less

Submitted 3 July, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted in ACL 2023 Industry Track

MSC Class: I.2.1; I.2.7

arXiv:2306.02272 [pdf, other]

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Authors: Changhun Lee, Jungyu **, Taesu Kim, Hyungjun Kim, Eunhyeok Park

Abstract: Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment. To address this challenge, we introduce the outlier-aware weight quantization (OWQ) method, which aims to minimize LLM's footprint through low-precision representation. OWQ prioritizes a small subset of structured weights sensitive to quantizat… ▽ More Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment. To address this challenge, we introduce the outlier-aware weight quantization (OWQ) method, which aims to minimize LLM's footprint through low-precision representation. OWQ prioritizes a small subset of structured weights sensitive to quantization, storing them in high-precision, while applying highly tuned quantization to the remaining dense weights. This sensitivity-aware mixed-precision scheme reduces the quantization error notably, and extensive experiments demonstrate that 3.1-bit models using OWQ perform comparably to 4-bit models optimized by OPTQ. Furthermore, OWQ incorporates a parameter-efficient fine-tuning for task-specific adaptation, called weak column tuning (WCT), enabling accurate task-specific LLM adaptation with minimal memory overhead in the optimized format. OWQ represents a notable advancement in the flexibility, efficiency, and practicality of LLM optimization literature. The source code is available at https://github.com/xvyaward/owq △ Less

Submitted 23 January, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted at AAAI 2024 (oral presentation)

arXiv:2306.01395 [pdf, other]

Masked Autoencoder for Unsupervised Video Summarization

Authors: Minho Shim, Taeoh Kim, **hyung Kim, Dongyoon Wee

Abstract: Summarizing a video requires a diverse understanding of the video, ranging from recognizing scenes to evaluating how much each frame is essential enough to be selected as a summary. Self-supervised learning (SSL) is acknowledged for its robustness and flexibility to multiple downstream tasks, but the video SSL has not shown its value for dense understanding tasks like video summarization. We claim… ▽ More Summarizing a video requires a diverse understanding of the video, ranging from recognizing scenes to evaluating how much each frame is essential enough to be selected as a summary. Self-supervised learning (SSL) is acknowledged for its robustness and flexibility to multiple downstream tasks, but the video SSL has not shown its value for dense understanding tasks like video summarization. We claim an unsupervised autoencoder with sufficient self-supervised learning does not need any extra downstream architecture design or fine-tuning weights to be utilized as a video summarization model. The proposed method to evaluate the importance score of each frame takes advantage of the reconstruction score of the autoencoder's decoder. We evaluate the method in major unsupervised video summarization benchmarks to show its effectiveness under various experimental settings. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2306.00847 [pdf, ps, other]

On a Kurzweil type theorem via ubiquity

Authors: Taehyeong Kim

Abstract: Kurzweil's theorem ('55) is concerned with zero-one laws for well approximable targets in inhomogeneous Diophantine approximation under the badly approximable assumption. In this article, we prove the divergent part of a Kurzweil type theorem via a suitable construction of ubiquitous systems when the badly approximable assumption is relaxed. Moreover, we also discuss some counterparts of Kurzweil'… ▽ More Kurzweil's theorem ('55) is concerned with zero-one laws for well approximable targets in inhomogeneous Diophantine approximation under the badly approximable assumption. In this article, we prove the divergent part of a Kurzweil type theorem via a suitable construction of ubiquitous systems when the badly approximable assumption is relaxed. Moreover, we also discuss some counterparts of Kurzweil's theorem. △ Less

Submitted 27 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 11 pages; Theorem 1.7 and its proof are revised

arXiv:2305.19854 [pdf, other]

doi 10.1103/PhysRevD.108.014010

Next-to-leading BFKL evolution for dijets with large rapidity separation at different LHC energies

Authors: Anatolii Iu. Egorov, Victor T. Kim

Abstract: The calculations based on the next-to-leading logarithm (NLL) approximation for the Balitsky-Fadin-Kuraev-Lipatov (BKFL) evolution are presented for the Mueller-Navelet (MN) dijet production cross section, as well as for their ratios at different collision energies. The MN dijet denotes the jet pair consists of jets, which were selected with $p_{\perp} > p_{\perp\min}$ and with maximal rapidity se… ▽ More The calculations based on the next-to-leading logarithm (NLL) approximation for the Balitsky-Fadin-Kuraev-Lipatov (BKFL) evolution are presented for the Mueller-Navelet (MN) dijet production cross section, as well as for their ratios at different collision energies. The MN dijet denotes the jet pair consists of jets, which were selected with $p_{\perp} > p_{\perp\min}$ and with maximal rapidity separation in the event. The NLL BFKL predictions for the MN cross sections are given for the $pp$ collisions at $\sqrt{s}=2.76$, $8$, and $13$ TeV, for $p_{\perp\min} = 20$ and $35$ GeV. The results are in agreement with the measurement by the CMS experiment in $pp$ collisions at $\sqrt{s}=2.76$ TeV and $p_{\perp\min} = 35$ GeV within the theoretical and experimental uncertainties. The predictions of the NLL BFKL calculation of ratios of the MN cross sections at different collision energies and $p_{\perp\min}$ are also presented. △ Less

Submitted 13 July, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: 8 pages, 5 figures

Journal ref: Phys. Rev. D 108, 014010 (2023)

arXiv:2305.17863 [pdf, other]

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Authors: Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tong Lu, Tae-Kyun Kim, Wei Liu, Hongdong Li

Abstract: Image restoration in adverse weather conditions is a difficult task in computer vision. In this paper, we propose a novel transformer-based framework called GridFormer which serves as a backbone for image restoration under adverse weather conditions. GridFormer is designed in a grid structure using a residual dense transformer block, and it introduces two core designs. First, it uses an enhanced a… ▽ More Image restoration in adverse weather conditions is a difficult task in computer vision. In this paper, we propose a novel transformer-based framework called GridFormer which serves as a backbone for image restoration under adverse weather conditions. GridFormer is designed in a grid structure using a residual dense transformer block, and it introduces two core designs. First, it uses an enhanced attention mechanism in the transformer layer. The mechanism includes stages of the sampler and compact self-attention to improve efficiency, and a local enhancement stage to strengthen local information. Second, we introduce a residual dense transformer block (RDTB) as the final GridFormer layer. This design further improves the network's ability to learn effective features from both preceding and current local features. The GridFormer framework achieves state-of-the-art results on five diverse image restoration tasks in adverse weather conditions, including image deraining, dehazing, deraining \& dehazing, desnowing, and multi-weather restoration. The source code and pre-trained models are available at https://github.com/TaoWangzj/GridFormer. △ Less

Submitted 21 June, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 20 pages, 15 figures, accepted by IJCV

arXiv:2305.17701 [pdf, other]

KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Gunhee Kim, Jung-Woo Ha

Abstract: Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and ta… ▽ More Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3. △ Less

Submitted 29 May, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 17 pages, 8 figures, 12 tables, ACL 2023

arXiv:2305.17696 [pdf, other]

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Ye** Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim, Alice Oh, Sangchul Park, Jung-Woo Ha

Abstract: The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on co** with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-inte… ▽ More The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on co** with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: 19 pages, 10 figures, ACL 2023

arXiv:2305.16866 [pdf]

Automation of Trimming Die Design Inspection by Zigzag Process Between AI and CAD Domains

Authors: **sub Lee, Tae-Hyun Kim, Sang-Hwan Jeon, Sung-Hyun Park, Sang-Hi Kim, Eun-Ho Lee, Jee-Hyong Lee

Abstract: Quality control in the manufacturing industry has improved with the use of artificial intelligence (AI). However, the manual inspection of trimming die designs, which is time-consuming and prone to errors, is still done by engineers. This study introduces an automatic design inspection system for automobile trimming dies by integrating AI modules and computer-aided design (CAD) software. The AI mo… ▽ More Quality control in the manufacturing industry has improved with the use of artificial intelligence (AI). However, the manual inspection of trimming die designs, which is time-consuming and prone to errors, is still done by engineers. This study introduces an automatic design inspection system for automobile trimming dies by integrating AI modules and computer-aided design (CAD) software. The AI modules replace engineers' judgment, and the CAD software carries out operations requested by the AI modules. The inspection process involves a zigzag interaction between the AI modules and CAD software, enabling one-click operation without expert intervention. The AI modules are CAD-independent and data-efficient, making them adaptable to other CAD software. They achieve high performance even with limited training data, with an average length measurement error of only 2.4%. The inspection time is reduced to approximately one-fifth of the time required for manual inspection by experts. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.16575 [pdf, other]

doi 10.3390/universe9050242

Measurements of the Cross-Section for the $t\bar{t}$ + Heavy-Flavor Production at the LHC

Authors: Jorgen D'Hondt, Tae Jeong Kim

Abstract: At the LHC, the process of a Higgs boson decaying into bottom or charm quarks produced in association with a pair of top quarks, ttbarH , allows for an empirical exploration of the heavy-flavor quark Yukawa couplings to the Higgs boson. Accordingly, the cross-sections for the $t\bar{t}$ + heavy-flavor production without the appearance of the Higgs boson have been measured at the LHC in various pha… ▽ More At the LHC, the process of a Higgs boson decaying into bottom or charm quarks produced in association with a pair of top quarks, ttbarH , allows for an empirical exploration of the heavy-flavor quark Yukawa couplings to the Higgs boson. Accordingly, the cross-sections for the $t\bar{t}$ + heavy-flavor production without the appearance of the Higgs boson have been measured at the LHC in various phase spaces using data samples collected in pp collisions at $\sqrt{s}$ = 7, 8 and 13 TeV with the ATLAS and CMS experiments. Flavor ratios of cross-sections of $t\bar{t}$ + heavy-flavors to $t\bar{t}$ + additional jets processes are also measured. In this paper, the measured cross-sections and ratios are reviewed and the prospects with more data are presented. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 20 pages, 13 figures

Journal ref: Universe 2023, 9(5), 242

arXiv:2305.14345 [pdf, other]

NCHO: Unsupervised Learning for Neural 3D Composition of Humans and Objects

Authors: Taeksoo Kim, Shunsuke Saito, Hanbyul Joo

Abstract: Deep generative models have been recently extended to synthesizing 3D digital humans. However, previous approaches treat clothed humans as a single chunk of geometry without considering the compositionality of clothing and accessories. As a result, individual items cannot be naturally composed into novel identities, leading to limited expressiveness and controllability of generative 3D avatars. Wh… ▽ More Deep generative models have been recently extended to synthesizing 3D digital humans. However, previous approaches treat clothed humans as a single chunk of geometry without considering the compositionality of clothing and accessories. As a result, individual items cannot be naturally composed into novel identities, leading to limited expressiveness and controllability of generative 3D avatars. While several methods attempt to address this by leveraging synthetic data, the interaction between humans and objects is not authentic due to the domain gap, and manual asset creation is difficult to scale for a wide variety of objects. In this work, we present a novel framework for learning a compositional generative model of humans and objects (backpacks, coats, scarves, and more) from real-world 3D scans. Our compositional model is interaction-aware, meaning the spatial relationship between humans and objects, and the mutual shape change by physical contact is fully incorporated. The key challenge is that, since humans and objects are in contact, their 3D scans are merged into a single piece. To decompose them without manual annotations, we propose to leverage two sets of 3D scans of a single person with and without objects. Our approach learns to decompose objects and naturally compose them back into a generative human model in an unsupervised manner. Despite our simple setup requiring only the capture of a single subject with objects, our experiments demonstrate the strong generalization of our model by enabling the natural composition of objects to diverse identities in various poses and the composition of multiple objects, which is unseen in training data. https://taeksuu.github.io/ncho/ △ Less

Submitted 29 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: The project page is available at https://taeksuu.github.io/ncho/

Showing 201–250 of 2,005 results for author: Kim, T