Search | arXiv e-print repository

arXiv:2406.01963 [pdf]

Diamond molecular balance: Revolutionizing high-resolution mass spectrometry from MDa to TDa at room temperature

Authors: Donggeun Lee, Seung-Woo Jeon, Chang-Hwan Yi, Yang-Hee Kim, Yeeun Choi, Sang-Hun Lee, **woong Cha, Seung-Bo Shim, Junho Suh, Il-Young Kim, Dongyeon Daniel Kang, Hojoong Jung, Cherlhyun Jeong, Jae-pyoung Ahn, Hee Chul Park, Sang-Wook Han, Chulki Kim

Abstract: The significance of mass spectrometry lies in its unparalleled ability to accurately identify and quantify molecules in complex samples, providing invaluable insights into molecular structures and interactions. Here, we leverage diamond nanostructures as highly sensitive mass sensors by utilizing a self-excitation mechanism under an electron beam in a conventional scanning electron microscope (SEM… ▽ More The significance of mass spectrometry lies in its unparalleled ability to accurately identify and quantify molecules in complex samples, providing invaluable insights into molecular structures and interactions. Here, we leverage diamond nanostructures as highly sensitive mass sensors by utilizing a self-excitation mechanism under an electron beam in a conventional scanning electron microscope (SEM). The diamond molecular balance (DMB) exhibits exceptional mass resolution of a few MDa and an extensive dynamic range from MDa to TDa, positioning itself as a forefront molecular balance operating at room temperature. Notably, the DMB measures the mass of a single bacteriophage T4, achieving a mass resolution of 4.7 MDa for an analyte at 184 MDa, while precisely determining their positional information on the device. These findings highlight the groundbreaking potential of the DMB as a revolutionary tool for mass analysis at room temperature. △ Less

Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 15 pages, 4 figures

arXiv:2406.00909 [pdf, other]

A model of umbral oscillations inherited from subphotospheric fast-body modes

Authors: Juhyung Kang, Jongchul Chae, Kyuhyoun Cho, Soosang Kang, Eun-Kyung Lim

Abstract: Recently, complex horizontal patterns of umbral oscillations have been reported, but their physical nature and origin are still not fully understood. Here we show that the two-dimensional patterns of umbral oscillations of slow waves are inherited from the subphotospheric fast-body modes. Using a simple analytic model, we successfully reproduced the temporal evolution of oscillation patterns with… ▽ More Recently, complex horizontal patterns of umbral oscillations have been reported, but their physical nature and origin are still not fully understood. Here we show that the two-dimensional patterns of umbral oscillations of slow waves are inherited from the subphotospheric fast-body modes. Using a simple analytic model, we successfully reproduced the temporal evolution of oscillation patterns with a finite number of fast-body modes. In this model, the radial apparent propagation of the pattern is associated with the appropriate combination of the amplitudes in radial modes. We also find that the oscillation patterns are dependent on the oscillation period. This result indicates that there is a cutoff radial mode, which is a unique characteristic of the model of fast-body modes. In principle, both internal and external sources can excite these fast-body modes and produce horizontal patterns of umbral oscillations. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 10 pages, 8 figures, accepted for publication in A&A

arXiv:2405.20671 [pdf, other]

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Authors: Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun

Abstract: Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absol… ▽ More Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absolute position mechanism assigning unique position IDs to each of the tokens, we assign the same position IDs to two or more "relevant" tokens; for integer addition tasks, we regard digits of the same significance as in the same position. On the empirical side, we show that with the proposed position coupling, a small (1-layer) Transformer trained on 1 to 30-digit additions can generalize up to 200-digit additions (6.67x of the trained length). On the theoretical side, we prove that a 1-layer Transformer with coupled positions can solve the addition task involving exponentially many digits, whereas any 1-layer Transformer without positional information cannot entirely solve it. We also demonstrate that position coupling can be applied to other algorithmic tasks such as addition with multiple summands, Nx2 multiplication, copy/reverse, and a two-dimensional task. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 73 pages, 20 figures, 90 tables

arXiv:2405.17095 [pdf, other]

Large-amplitude transverse MHD waves prevailing in the H$α$ chromosphere of a solar quiet region revealed by MiHI integrated field spectral observations

Authors: Jongchul Chae, Michiel van Noort, Maria S. Madjarska, Kyeore Lee, Juhyung Kang, Kyuhyoun Cho

Abstract: The investigation of plasma motions in the solar chromosphere is crucial for understanding the transport of mechanical energy from the interior of the Sun to the outer atmosphere and into interplanetary space. We report the finding of large-amplitude oscillatory transverse motions prevailing in the non-spicular Halpha chromosphere of a small quiet region near the solar disk center. The observation… ▽ More The investigation of plasma motions in the solar chromosphere is crucial for understanding the transport of mechanical energy from the interior of the Sun to the outer atmosphere and into interplanetary space. We report the finding of large-amplitude oscillatory transverse motions prevailing in the non-spicular Halpha chromosphere of a small quiet region near the solar disk center. The observation was carried out on 2018 August 25 with the Microlensed Hyperspectral Imager (MiHI) installed as an extension to the spectrograph at the Swedish Solar Telescope (SST). MiHi produced high-resolution Stokes spectra of the Halpha line over a two-dimensional array of points (sampled every 0.066 arcsec on the image plane) every 1.33 s for about 17 min. We extracted the Dopple-shift-insensitive intensity data of the line core by applying a bisector fit to Stoke I line profiles. From our time-distance analysis of the intensity data, we find a variety of transverse motions with velocity amplitudes of up to 40 km/s in fan fibrils and tiny filaments. In particular, in the fan fibrils, large-amplitude transverse MHD waves were seen to occur with a mean velocity amplitude of 25 km/s and a mean period of 5.8 min, propagating at a speed of 40 km/s. These waves are nonlinear and display group behavior. We estimate the wave energy flux in the upper chromosphere at 3 x 10^6 erg cm^-2 s^-1. Our results contribute to the advancement of our understanding of the properties of transverse MHD waves in the solar chromosphere. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: accepted for publication in A&A

arXiv:2404.17638 [pdf, other]

A self-consistent Hartree theory for lattice-relaxed magic-angle twisted bilayer graphene

Authors: Mohammed M. Al Ezzi, Liangtao Peng, Zhengyu Liu, Jonah Huang Zi Chao, Gayani N. Pallewela, Darryl Foo, Shaffique Adam

Abstract: For twisted bilayer graphene close to magic angle, we show that the effects of lattice relaxation and the Hartree interaction both become simultaneously important. Including both effects in a continuum theory reveals a Lifshitz transition to a Fermi surface topology that supports both a ``heavy fermion" pocket and an ultraflat band ($\approx 8~{\rm meV}$) that is pinned to the Fermi energy for a l… ▽ More For twisted bilayer graphene close to magic angle, we show that the effects of lattice relaxation and the Hartree interaction both become simultaneously important. Including both effects in a continuum theory reveals a Lifshitz transition to a Fermi surface topology that supports both a ``heavy fermion" pocket and an ultraflat band ($\approx 8~{\rm meV}$) that is pinned to the Fermi energy for a large range of fillings. We provide analytical and numerical results to understand the narrow ``magic angle range" that supports this pinned ultraflat band and make predictions for its experimental observation. We believe that the bands presented here are accurate at high temperature and provide a good starting point to understand the myriad of complex behaviour observed in this system. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16301 [pdf, other]

Style Adaptation for Domain-adaptive Semantic Segmentation

Authors: Ting Li, Jianshu Chao, Deyu An

Abstract: Unsupervised Domain Adaptation (UDA) refers to the method that utilizes annotated source domain data and unlabeled target domain data to train a model capable of generalizing to the target domain data. Domain discrepancy leads to a significant decrease in the performance of general network models trained on the source domain data when applied to the target domain. We introduce a straightforward ap… ▽ More Unsupervised Domain Adaptation (UDA) refers to the method that utilizes annotated source domain data and unlabeled target domain data to train a model capable of generalizing to the target domain data. Domain discrepancy leads to a significant decrease in the performance of general network models trained on the source domain data when applied to the target domain. We introduce a straightforward approach to mitigate the domain discrepancy, which necessitates no additional parameter calculations and seamlessly integrates with self-training-based UDA methods. Through the transfer of the target domain style to the source domain in the latent feature space, the model is trained to prioritize the target domain style during the decision-making process. We tackle the problem at both the image-level and shallow feature map level by transferring the style information from the target domain to the source domain data. As a result, we obtain a model that exhibits superior performance on the target domain. Our method yields remarkable enhancements in the state-of-the-art performance for synthetic-to-real UDA tasks. For example, our proposed method attains a noteworthy UDA performance of 76.93 mIoU on the GTA->Cityscapes dataset, representing a notable improvement of +1.03 percentage points over the previous state-of-the-art results. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.11885 [pdf, other]

Quantitative bordism over acyclic groups and Cheeger-Gromov $ρ$-invariants

Authors: Jae Choon Cha, Geunho Lim

Abstract: We prove a bordism version of Gromov's linearity conjecture over a large family of acyclic groups, for manifolds with arbitrary dimension. Every group embeds into one of these acyclic groups, and thus it follows that the conjecture is true if one allows to enlarge a given group. Our result holds in both PL and smooth categories, and for both oriented and unoriented cases. In the PL case, our resul… ▽ More We prove a bordism version of Gromov's linearity conjecture over a large family of acyclic groups, for manifolds with arbitrary dimension. Every group embeds into one of these acyclic groups, and thus it follows that the conjecture is true if one allows to enlarge a given group. Our result holds in both PL and smooth categories, and for both oriented and unoriented cases. In the PL case, our results hold without assuming bounded local geometry. As an application, we prove that there is a universal linear bound for the Cheeger-Gromov $L^2$ $ρ$-invariants of PL $(4k-1)$-manifolds associated with arbitrary regular covers. We also show that the minimum number of simplices in a PL triangulation of $(4k-1)$-manifolds with a fixed simple homotopy type is unbounded if the fundamental group has nontrivial torsion. The proof of our main results builds on quantitative algebraic and geometric techniques over the simplicial classifying spaces of groups. △ Less

Submitted 3 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: 25 pages, 3 figures; expositions revised in v2

MSC Class: 53C23; 57Q20; 55U10

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00562 [pdf, other]

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Authors: Junuk Cha, Jihyeon Kim, Jae Shin Yoon, Seungryul Baek

Abstract: This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D. The main challenge arises from the lack of labeled data where existing ground-truth datasets are nowhere near generalizable in interaction type and object category, which inhibits the modeling of diverse 3D hand-object interaction with the correct physical implication (e.g., contacts and… ▽ More This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D. The main challenge arises from the lack of labeled data where existing ground-truth datasets are nowhere near generalizable in interaction type and object category, which inhibits the modeling of diverse 3D hand-object interaction with the correct physical implication (e.g., contacts and semantics) from text prompts. To address this challenge, we propose to decompose the interaction generation task into two subtasks: hand-object contact generation; and hand-object motion generation. For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object during the interaction. The network learns a variety of local geometry structure of diverse objects that is independent of the objects' category, and thus, it is applicable to general objects. For motion generation, a Transformer-based diffusion model utilizes this 3D contact map as a strong prior for generating physically plausible hand-object motion as a function of text prompts by learning from the augmented labeled dataset; where we annotate text labels from many existing 3D hand and object motion data. Finally, we further introduce a hand refiner module that minimizes the distance between the object surface and hand joints to improve the temporal stability of the object-hand contacts and to suppress the penetration artifacts. In the experiments, we demonstrate that our method can generate more realistic and diverse interactions compared to other baseline methods. We also show that our method is applicable to unseen objects. We will release our model and newly labeled data as a strong foundation for future research. Codes and data are available in: https://github.com/JunukCha/Text2HOI. △ Less

Submitted 1 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.15485 [pdf, other]

MOGAM: A Multimodal Object-oriented Graph Attention Model for Depression Detection

Authors: Junyeop Cha, Seoyun Kim, Dongjae Kim, Eunil Park

Abstract: Early detection plays a crucial role in the treatment of depression. Therefore, numerous studies have focused on social media platforms, where individuals express their emotions, aiming to achieve early detection of depression. However, the majority of existing approaches often rely on specific features, leading to limited scalability across different types of social media datasets, such as text,… ▽ More Early detection plays a crucial role in the treatment of depression. Therefore, numerous studies have focused on social media platforms, where individuals express their emotions, aiming to achieve early detection of depression. However, the majority of existing approaches often rely on specific features, leading to limited scalability across different types of social media datasets, such as text, images, or videos. To overcome this limitation, we introduce a Multimodal Object-Oriented Graph Attention Model (MOGAM), which can be applied to diverse types of data, offering a more scalable and versatile solution. Furthermore, to ensure that our model can capture authentic symptoms of depression, we only include vlogs from users with a clinical diagnosis. To leverage the diverse features of vlogs, we adopt a multimodal approach and collect additional metadata such as the title, description, and duration of the vlogs. To effectively aggregate these multimodal features, we employed a cross-attention mechanism. MOGAM achieved an accuracy of 0.871 and an F1-score of 0.888. Moreover, to validate the scalability of MOGAM, we evaluated its performance with a benchmark dataset and achieved comparable results with prior studies (0.61 F1-score). In conclusion, we believe that the proposed model, MOGAM, is an effective solution for detecting depression in social media, offering potential benefits in the early detection and treatment of this mental health condition. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 12 pages, 3 figures, 4 tables

arXiv:2403.13680 [pdf, other]

Step-Calibrated Diffusion for Biomedical Optical Image Restoration

Authors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. Hollon

Abstract: High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, wh… ▽ More High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, which can result in diagnostic errors and misguided treatment. Restoration of optical images is a challenging computer vision task because the sources of image degradation are multi-factorial, stochastic, and tissue-dependent, preventing a straightforward method to obtain paired low-quality/high-quality data. Here, we present Restorative Step-Calibrated Diffusion (RSCD), an unpaired image restoration method that views the image restoration problem as completing the finishing steps of a diffusion-based image generation task. RSCD uses a step calibrator model to dynamically determine the severity of image degradation and the number of steps required to complete the reverse diffusion process for image restoration. RSCD outperforms other widely used unpaired image restoration methods on both image quality and perceptual evaluation metrics for restoring optical images. Medical imaging experts consistently prefer images restored using RSCD in blinded comparison experiments and report minimal to no hallucinations. Finally, we show that RSCD improves performance on downstream clinical imaging tasks, including automated brain tumor diagnosis and deep tissue imaging. Our code is available at https://github.com/MLNeurosurg/restorative_step-calibrated_diffusion. △ Less

Submitted 16 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13294 [pdf, other]

Map-Aware Human Pose Prediction for Robot Follow-Ahead

Authors: Qingyuan Jiang, Burak Susam, Jun-Jee Chao, Volkan Isler

Abstract: In the robot follow-ahead task, a mobile robot is tasked to maintain its relative position in front of a moving human actor while kee** the actor in sight. To accomplish this task, it is important that the robot understand the full 3D pose of the human (since the head orientation can be different than the torso) and predict future human poses so as to plan accordingly. This prediction task is es… ▽ More In the robot follow-ahead task, a mobile robot is tasked to maintain its relative position in front of a moving human actor while kee** the actor in sight. To accomplish this task, it is important that the robot understand the full 3D pose of the human (since the head orientation can be different than the torso) and predict future human poses so as to plan accordingly. This prediction task is especially tricky in a complex environment with junctions and multiple corridors. In this work, we address the problem of forecasting the full 3D trajectory of a human in such environments. Our main insight is to show that one can first predict the 2D trajectory and then estimate the full 3D trajectory by conditioning the estimator on the predicted 2D trajectory. With this approach, we achieve results comparable or better than the state-of-the-art methods three times faster. As part of our contribution, we present a new dataset where, in contrast to existing datasets, the human motion is in a much larger area than a single room. We also present a complete robot system that integrates our human pose forecasting network on the mobile robot to enable real-time robot follow-ahead and present results from real-world experiments in multiple buildings on campus. Our project page, including supplementary material and videos, can be found at: https://qingyuan-jiang.github.io/iros2024_poseForecasting/ △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.11979 [pdf]

Enlightening the blind spot of the Michaelis-Menten rate law: The role of relaxation dynamics in molecular complex formation

Authors: Junghun Chae, Roktaek Lim, Thomas L. P. Martin, Cheol-Min Ghim, Pan-Jun Kim

Abstract: The century-long Michaelis-Menten rate law and its modifications in the modeling of biochemical rate processes stand on the assumption that the concentration of the complex of interacting molecules, at each moment, rapidly approaches an equilibrium (quasi-steady state) compared to the pace of molecular concentration changes. Yet, in the case of actively time-varying molecular concentrations with t… ▽ More The century-long Michaelis-Menten rate law and its modifications in the modeling of biochemical rate processes stand on the assumption that the concentration of the complex of interacting molecules, at each moment, rapidly approaches an equilibrium (quasi-steady state) compared to the pace of molecular concentration changes. Yet, in the case of actively time-varying molecular concentrations with transient or oscillatory dynamics, the deviation of the complex profile from the quasi-steady state becomes relevant. A recent theoretical approach, known as the effective time-delay scheme (ETS), suggests that the delay by the relaxation time of molecular complex formation contributes to the substantial breakdown of the quasi-steady state assumption. Here, we systematically expand this ETS and inquire into the comprehensive roles of relaxation dynamics in complex formation. Through the modeling of rhythmic protein-protein and protein-DNA interactions and the mammalian circadian clock, our analysis reveals the effect of the relaxation dynamics beyond the time delay, which extends to the dampening of changes in the complex concentration with a reduction in the oscillation amplitude against the quasi-steady state. Interestingly, the combined effect of the time delay and amplitude reduction shapes both qualitative and quantitative oscillatory patterns such as the emergence and variability of the mammalian circadian rhythms. These findings highlight the drawback of the routine assumption of quasi-steady states and enhance the mechanistic understanding of rich time-varying biomolecular activities. △ Less

Submitted 7 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.09490 [pdf, other]

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Authors: Young Hyun Yoo, Jii Cha, Changhyeon Kim, Taeuk Kim

Abstract: While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives. In this paper, we introduce Hyper-CL, an efficient methodology that… ▽ More While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives. In this paper, we introduce Hyper-CL, an efficient methodology that integrates hypernetworks with contrastive learning to compute conditioned sentence representations. In our proposed approach, the hypernetwork is responsible for transforming pre-computed condition embeddings into corresponding projection layers. This enables the same sentence embeddings to be projected differently according to various conditions. Evaluation on two representative conditioning benchmarks, namely conditional semantic text similarity and knowledge graph completion, demonstrates that Hyper-CL is effective in flexibly conditioning sentence representations, showcasing its computational efficiency at the same time. We also provide a comprehensive analysis of the inner workings of our approach, leading to a better interpretation of its mechanisms. △ Less

Submitted 6 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: ACL 2024

arXiv:2403.05346 [pdf, other]

VLM-PL: Advanced Pseudo Labeling Approach for Class Incremental Object Detection via Vision-Language Model

Authors: Junsu Kim, Yunhoe Ku, Jihyeon Kim, Junuk Cha, Seungryul Baek

Abstract: In the field of Class Incremental Object Detection (CIOD), creating models that can continuously learn like humans is a major challenge. Pseudo-labeling methods, although initially powerful, struggle with multi-scenario incremental learning due to their tendency to forget past knowledge. To overcome this, we introduce a new approach called Vision-Language Model assisted Pseudo-Labeling (VLM-PL). T… ▽ More In the field of Class Incremental Object Detection (CIOD), creating models that can continuously learn like humans is a major challenge. Pseudo-labeling methods, although initially powerful, struggle with multi-scenario incremental learning due to their tendency to forget past knowledge. To overcome this, we introduce a new approach called Vision-Language Model assisted Pseudo-Labeling (VLM-PL). This technique uses Vision-Language Model (VLM) to verify the correctness of pseudo ground-truths (GTs) without requiring additional model training. VLM-PL starts by deriving pseudo GTs from a pre-trained detector. Then, we generate custom queries for each pseudo GT using carefully designed prompt templates that combine image and text features. This allows the VLM to classify the correctness through its responses. Furthermore, VLM-PL integrates refined pseudo and real GTs from upcoming training, effectively combining new and old knowledge. Extensive experiments conducted on the Pascal VOC and MS COCO datasets not only highlight VLM-PL's exceptional performance in multi-scenario but also illuminate its effectiveness in dual-scenario by achieving state-of-the-art results in both. △ Less

Submitted 8 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: Accept to CVPRW2024 (CLvision). The camera-ready version of the manuscript

arXiv:2403.04261 [pdf]

Advancing Biomedical Text Mining with Community Challenges

Authors: Hui Zong, Rongrong Wu, Jiaxue Cha, Erman Wu, Jiakun Li, Liang Tao, Zuofeng Li, Buzhou Tang, Bairong Shen

Abstract: The field of biomedical research has witnessed a significant increase in the accumulation of vast amounts of textual data from various sources such as scientific literatures, electronic health records, clinical trial reports, and social media. However, manually processing and analyzing these extensive and complex resources is time-consuming and inefficient. To address this challenge, biomedical te… ▽ More The field of biomedical research has witnessed a significant increase in the accumulation of vast amounts of textual data from various sources such as scientific literatures, electronic health records, clinical trial reports, and social media. However, manually processing and analyzing these extensive and complex resources is time-consuming and inefficient. To address this challenge, biomedical text mining, also known as biomedical natural language processing, has garnered great attention. Community challenge evaluation competitions have played an important role in promoting technology innovation and interdisciplinary collaboration in biomedical text mining research. These challenges provide platforms for researchers to develop state-of-the-art solutions for data mining and information processing in biomedical research. In this article, we review the recent advances in community challenges specific to Chinese biomedical text mining. Firstly, we collect the information of these evaluation tasks, such as data sources and task types. Secondly, we conduct systematic summary and comparative analysis, including named entity recognition, entity normalization, attribute extraction, relation extraction, event extraction, text classification, text similarity, knowledge graph construction, question answering, text generation, and large language model evaluation. Then, we summarize the potential clinical applications of these community challenge tasks from translational informatics perspective. Finally, we discuss the contributions and limitations of these community challenges, while highlighting future directions in the era of large language models. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.15079 [pdf, other]

Optimal mesh generation for a non-iterative grid-converged solution of flow through a blade passage using deep reinforcement learning

Authors: Innyoung Kim, Jonghyun Chae, Donghyun You

Abstract: An automatic mesh generation method for optimal computational fluid dynamics (CFD) analysis of a blade passage is developed using deep reinforcement learning (DRL). Unlike conventional automation techniques, which require repetitive tuning of meshing parameters for each new geometry and flow condition, the method developed herein trains a mesh generator to determine optimal parameters across varyi… ▽ More An automatic mesh generation method for optimal computational fluid dynamics (CFD) analysis of a blade passage is developed using deep reinforcement learning (DRL). Unlike conventional automation techniques, which require repetitive tuning of meshing parameters for each new geometry and flow condition, the method developed herein trains a mesh generator to determine optimal parameters across varying configurations in a non-iterative manner. Initially, parameters controlling mesh shape are optimized to maximize geometric mesh quality, as measured by the ratio of determinants of Jacobian matrices and skewness. Subsequently, resolution-controlling parameters are optimized by incorporating CFD results. Multi-agent reinforcement learning is employed, enabling 256 agents to construct meshes and perform CFD analyses across randomly assigned flow configurations in parallel, aiming for maximum simulation accuracy and computational efficiency within a multi-objective optimization framework. After training, the mesh generator is capable of producing meshes that yield converged solutions at desired computational costs for new configurations in a single simulation, thereby eliminating the need for iterative CFD procedures for grid convergence. The robustness and effectiveness of the method are investigated across various blade passage configurations, accommodating a range of blade geometries, including high-pressure and low-pressure turbine blades, axial compressor blades, and impulse rotor blades. Furthermore, the method is capable of identifying the optimal mesh resolution for diverse flow conditions, including complex phenomena like boundary layers, shock waves, and flow separation. The optimality is confirmed by comparing the accuracy and the efficiency achieved in a single attempt with those from the conventional iterative optimization method. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 65 pages, 24 figures, and 1 table

arXiv:2402.14395 [pdf, other]

Semantic Image Synthesis with Unconditional Generator

Authors: Jungwoo Chae, Hyunin Cho, Sooyeon Go, Kyungmook Choi, Youngjung Uh

Abstract: Semantic image synthesis (SIS) aims to generate realistic images that match given semantic masks. Despite recent advances allowing high-quality results and precise spatial control, they require a massive semantic segmentation dataset for training the models. Instead, we propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks. The proxy masks… ▽ More Semantic image synthesis (SIS) aims to generate realistic images that match given semantic masks. Despite recent advances allowing high-quality results and precise spatial control, they require a massive semantic segmentation dataset for training the models. Instead, we propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks. The proxy masks are prepared from the feature maps of random samples in the generator by simple clustering. The feature rearranger learns to rearrange original feature maps to match the shape of the proxy masks that are either from the original sample itself or from random samples. Then we introduce a semantic mapper that produces the proxy masks from various input conditions including semantic masks. Our method is versatile across various applications such as free-form spatial editing of real images, sketch-to-photo, and even scribble-to-photo. Experiments validate advantages of our method on a range of datasets: human faces, animal faces, and buildings. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: NeurIPS 2023, Project Page: https://hhyunn2.github.io/SIS_UncondG/

arXiv:2402.11221 [pdf, other]

MOB-Net: Limb-modularized Uncertainty Torque Learning of Humanoids for Sensorless External Torque Estimation

Authors: Daegyu Lim, Myeong-Ju Kim, Junhyeok Cha, Jaeheung Park

Abstract: Momentum observer (MOB) can estimate external joint torque without requiring additional sensors, such as force/torque or joint torque sensors. However, the estimation performance of MOB deteriorates due to the model uncertainty which encompasses the modeling errors and the joint friction. Moreover, the estimation error is significant when MOB is applied to high-dimensional floating-base humanoids,… ▽ More Momentum observer (MOB) can estimate external joint torque without requiring additional sensors, such as force/torque or joint torque sensors. However, the estimation performance of MOB deteriorates due to the model uncertainty which encompasses the modeling errors and the joint friction. Moreover, the estimation error is significant when MOB is applied to high-dimensional floating-base humanoids, which prevents the estimated external joint torque from being used for force control or collision detection in the real humanoid robot. In this paper, the pure external joint torque estimation method named MOB-Net, is proposed for humanoids. MOB-Net learns the model uncertainty torque and calibrates the estimated signal of MOB. The external joint torque can be estimated in the generalized coordinate including whole-body and virtual joints of the floating-base robot with only internal sensors (an IMU on the pelvis and encoders in the joints). Our method substantially reduces the estimation errors of MOB, and the robust performance of MOB-Net for the unseen data is validated through extensive simulations, real robot experiments, and ablation studies. Finally, various collision handling scenarios are presented using the estimated external joint torque from MOB-Net: contact wrench feedback control for locomotion, collision detection, and collision reaction for safety. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: submitted to IJRR

arXiv:2402.08483 [pdf, other]

High Resolution Imaging Spectroscopy of a Tiny Sigmoidal Mini-filament Eruption

Authors: Jiasheng Wang, Jeongwoo Lee, Jongchul Chae, Yan Xu, Wenda Cao, Haimin Wang

Abstract: Minifilament (MF) eruption producing small jets and micro-flares is regarded as an important source for coronal heating and the solar wind transients through studies mostly based on coronal observations in the extreme ultraviolet (EUV) and X-ray wavelengths. In this study, we focus on the chromospheric plasma diagnostics of a tiny minifilament in quiet Sun located at [71'', 450''] on 2021--08--07… ▽ More Minifilament (MF) eruption producing small jets and micro-flares is regarded as an important source for coronal heating and the solar wind transients through studies mostly based on coronal observations in the extreme ultraviolet (EUV) and X-ray wavelengths. In this study, we focus on the chromospheric plasma diagnostics of a tiny minifilament in quiet Sun located at [71'', 450''] on 2021--08--07 at 19:11 UT observed as part of the ninth encounter of the PSP campaign. Main data obtained are the high cadence, high resolution spectroscopy from the Fast Imaging Solar Spectrograph (FISS) and high-resolution magnetograms from the Near InfraRed Imaging Spectropolarimeter (NIRIS) on the 1.6~m Goode Solar Telescope (GST) at Big Bear Solar Observatory (BBSO). The mini-filament with size $\sim$1''$\times$5'' and a micro-flare are detected in both the H$α$ line center and SDO/AIA 193, 304~Å images. On the NIRIS magnetogram, we found that the cancellation of a magnetic bipole in the footpoints of the minifilament triggered its eruption in a sigmoidal shape. By inversion of the \ha\ and Ca {\sc ii} spectra under the embedded cloud model, we found a temperature increase of 3,800 K in the brightening region, associated with rising speed average of MF increased by 18~$km~s^{-1}$. This cool plasma is also found in the EUV images. We estimate the kinetic energy change of the rising filament as 1.5$\times$$10^{25}$~ergs, and thermal energy accumulation in the MF, 1.4$\times$$10^{25}$~ergs. From the photospheric magnetograms, we find the magnetic energy change is 1.6$\times$$10^{26}$~ergs across the PIL of converging opposite magnetic elements, which amounts to the energy release in the chromosphere in this smallest two-ribbon flare ever observed. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07743 [pdf, other]

Beyond Sparsity: Local Projections Inference with High-Dimensional Covariates

Authors: Jooyoung Cha

Abstract: Impulse response analysis studies how the economy responds to shocks, such as changes in interest rates, and helps policymakers manage these effects. While Vector Autoregression Models (VARs) with structural assumptions have traditionally dominated the estimation of impulse responses, local projections, the projection of future responses on current shock, have recently gained attention for their r… ▽ More Impulse response analysis studies how the economy responds to shocks, such as changes in interest rates, and helps policymakers manage these effects. While Vector Autoregression Models (VARs) with structural assumptions have traditionally dominated the estimation of impulse responses, local projections, the projection of future responses on current shock, have recently gained attention for their robustness and interpretability. Including many lags as controls is proposed as a means of robustness, and including a richer set of controls helps in its interpretation as a causal parameter. In both cases, an extensive number of controls leads to the consideration of high-dimensional techniques. While methods like LASSO exist, they mostly rely on sparsity assumptions - most of the parameters are exactly zero, which has limitations in dense data generation processes. This paper proposes a novel approach that incorporates high-dimensional covariates in local projections without relying on sparsity constraints. Adopting the Orthogonal Greedy Algorithm with a high-dimensional AIC (OGA+HDAIC) model selection method, this approach offers advantages including robustness in both sparse and dense scenarios, improved interpretability by prioritizing cross-sectional explanatory power, and more reliable causal inference in local projections. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06239 [pdf, other]

Neutral pion masses within a hot and magnetized medium in a lattice-improved soft-wall AdS/QCD model

Authors: Nanxiang Wen, Xuanmin Cao, **gyi Chao, Hui Liu

Abstract: We investigate chiral phase transitions and the screening masses, pole masses, and thermal widths of neutral pion meson with finite temperature $T$ and magnetic field $B$ in a lattice-improved AdS/QCD model, which is constructed by fitting the lattice results of the pseudo-critical temperatures ( T_{\text{pc}}(B) ). Specifically, we have that the chiral condensate (σ) undergoes a crossover phase t… ▽ More We investigate chiral phase transitions and the screening masses, pole masses, and thermal widths of neutral pion meson with finite temperature $T$ and magnetic field $B$ in a lattice-improved AdS/QCD model, which is constructed by fitting the lattice results of the pseudo-critical temperatures ( T_{\text{pc}}(B) ). Specifically, we have that the chiral condensate (σ) undergoes a crossover phase transition demonstrating distinct magnetic catalysis and inverse magnetic catalysis effects in very low and high-temperature regions with fixed finite $B$, respectively. For the screening masses, we find that the longitudinal component decreases with $B$ at very low and high temperatures and increases with $B$ near $T_{\text{pc}}$. The transverse component always increases with $B$ at fixed $T$. However, both the longitudinal and transverse screening masses increase with $T$ at fixed $B$. Furthermore, we find that the pole mass decreases with the increasing of $B$ or $T$. Besides, it is interesting to note that the thermal width shows similar behavior to the longitudinal screening masses in the very high temperature region. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 17 pages, 9 figures

arXiv:2402.05350 [pdf, other]

Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

Authors: Junghun Cha, Ali Haider, Seoyun Yang, Hoeyeong **, Subin Yang, A. F. M. Shahab Uddin, Jaehyoung Kim, Soo Ye Kim, Sung-Ho Bae

Abstract: A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies… ▽ More A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies has become an indispensable task for many products, it has not been systematically explored, and to the best of our knowledge, no public datasets are available. In this paper, we define this problem as Descanning and introduce a new high-quality and large-scale dataset named DESCAN-18K. It contains 18K pairs of original and scanned images collected in the wild containing multiple complex degradations. In order to eliminate such complex degradations, we propose a new image restoration model called DescanDiffusion consisting of a color encoder that corrects the global color degradation and a conditional denoising diffusion probabilistic model (DDPM) that removes local degradations. To further improve the generalization ability of DescanDiffusion, we also design a synthetic data generation scheme by reproducing prominent degradations in scanned images. We demonstrate that our DescanDiffusion outperforms other baselines including commercial restoration products, objectively and subjectively, via comprehensive experiments and analyses. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Accepted to AAAI 2024

arXiv:2402.04629 [pdf, other]

Iterated satellite operators on the knot concordance group

Authors: Jae Choon Cha, Taehee Kim

Abstract: We show that for a winding number zero satellite operator $P$ on the knot concordance group, if the axis of $P$ has nontrivial self-pairing under the Blanchfield form of the pattern, then the image of the iteration $P^n$ generates an infinite rank subgroup for each $n$. Furthermore, the graded quotients of the filtration of the knot concordance group associated with $P$ have infinite rank at all l… ▽ More We show that for a winding number zero satellite operator $P$ on the knot concordance group, if the axis of $P$ has nontrivial self-pairing under the Blanchfield form of the pattern, then the image of the iteration $P^n$ generates an infinite rank subgroup for each $n$. Furthermore, the graded quotients of the filtration of the knot concordance group associated with $P$ have infinite rank at all levels. This gives an affirmative answer to a question of Hedden and Pinzón-Caicedo in many cases. We also show that under the same hypotheses, $P^n$ is not a homomorphism on the knot concordance group for each $n$. We use amenable $L^2$-signatures to prove these results. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 32 pages, 3 figures

MSC Class: 57K10; 57N70

arXiv:2402.04287 [pdf]

Association between Prefrontal fNIRS signals during Cognitive tasks and College scholastic ability test (CSAT) scores: Analysis using a quantum annealing approach

Authors: Yeaju Kim, Junggu Choi, Bora Kim, Yongwan Park, Jihyun Cha, Jongkwan Choi, Sanghoon Han

Abstract: Academic achievement is a critical measure of intellectual ability, prompting extensive research into cognitive tasks as potential predictors. Neuroimaging technologies, such as functional near-infrared spectroscopy (fNIRS), offer insights into brain hemodynamics, allowing understanding of the link between cognitive performance and academic achievement. Herein, we explored the association between… ▽ More Academic achievement is a critical measure of intellectual ability, prompting extensive research into cognitive tasks as potential predictors. Neuroimaging technologies, such as functional near-infrared spectroscopy (fNIRS), offer insights into brain hemodynamics, allowing understanding of the link between cognitive performance and academic achievement. Herein, we explored the association between cognitive tasks and academic achievement by analyzing prefrontal fNIRS signals. A novel quantum annealer (QA) feature selection algorithm was applied to fNIRS data to identify cognitive tasks correlated with CSAT scores. Twelve features (signal mean, median, variance, peak, number of peaks, sum of peaks, slope, minimum, kurtosis, skewness, standard deviation, and root mean square) were extracted from fNIRS signals at two time windows (10- and 60-second) to compare results from various feature variable conditions. The feature selection results from the QA-based and XGBoost regressor algorithms were compared to validate the former's performance. In a three-step validation process using multiple linear regression models, correlation coefficients between the feature variables and the CSAT scores, model fitness (adjusted R2), and model prediction error (RMSE) values were calculated. The quantum annealer demonstrated comparable performance to classical machine learning models, and specific cognitive tasks, including verbal fluency, recognition, and the Corsi block tap** task, were correlated with academic achievement. Group analyses revealed stronger associations between Tower of London and N-back tasks with higher CSAT scores. Quantum annealing algorithms have significant potential in feature selection using fNIRS data, and represents a novel research approach. Future studies should explore predictors of academic achievement and cognitive ability. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 42 pages, 11 tables

arXiv:2402.03841 [pdf, other]

Momentum-space Langevin dynamics of holographic Wilsonian RG flow: self-interacting massive scalar field theory

Authors: Ji-seong Chae, Jae-Hyuk Oh

Abstract: We explore mathematical relationship between holographic Wilsonian renormalization group(HWRG) and stochastic quantization(SQ) motivated by the similarity of the monotonicity in RG flow with Langevin dynamics of non-equilibrium thermodynamics. We look at scalar field theory in AdS space with its generic mass, self-interaction, and boundary deformation in the momentum space. Identifying the stochas… ▽ More We explore mathematical relationship between holographic Wilsonian renormalization group(HWRG) and stochastic quantization(SQ) motivated by the similarity of the monotonicity in RG flow with Langevin dynamics of non-equilibrium thermodynamics. We look at scalar field theory in AdS space with its generic mass, self-interaction, and boundary deformation in the momentum space. Identifying the stochastic time $t$ with radial coordinate $r$ in AdS, we establish maps between the fictitious time evolution of stochastic multi point correlation function and the radial evolution of multi-trace deformation, which respectively, express the relaxation process of Langevin dynamics and holographic RG flow. We especially consider marginal multi-trace deformation on the AdS boundary which is successfully captured by a Langevin dynamics of SQ. △ Less

Submitted 27 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 43+1 pages, 3 figures, typos are corrected

arXiv:2402.03694 [pdf, other]

ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis

Authors: Shinan Liu, Ted Shaowang, Gerry Wan, Jeewon Chae, Jonatas Marques, Sanjay Krishnan, Nick Feamster

Abstract: Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presen… ▽ More Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 2.7x-136.3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time! ServeFlow is able to make inferences on 76.3% flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.03116 [pdf, other]

Feature-Action Design Patterns for Storytelling Visualizations with Time Series Data

Authors: Saiful Khan, Scott Jones, Benjamin Bach, Jaehoon Cha, Min Chen, Julie Meikle, Jonathan C Roberts, Jeyan Thiyagalingam, Jo Wood, Panagiotis D. Ritsos

Abstract: We present a method to create storytelling visualization with time series data. Many personal decisions nowadays rely on access to dynamic data regularly, as we have seen during the COVID-19 pandemic. It is thus desirable to construct storytelling visualization for dynamic data that is selected by an individual for a specific context. Because of the need to tell data-dependent stories, predefined… ▽ More We present a method to create storytelling visualization with time series data. Many personal decisions nowadays rely on access to dynamic data regularly, as we have seen during the COVID-19 pandemic. It is thus desirable to construct storytelling visualization for dynamic data that is selected by an individual for a specific context. Because of the need to tell data-dependent stories, predefined storyboards based on known data cannot accommodate dynamic data easily nor scale up to many different individuals and contexts. Motivated initially by the need to communicate time series data during the COVID-19 pandemic, we developed a novel computer-assisted method for meta-authoring of stories, which enables the design of storyboards that include feature-action patterns in anticipation of potential features that may appear in dynamically arrived or selected data. In addition to meta-storyboards involving COVID-19 data, we also present storyboards for telling stories about progress in a machine learning workflow. Our approach is complementary to traditional methods for authoring storytelling visualization, and provides an efficient means to construct data-dependent storyboards for different data-streams of similar contexts. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.07464 [pdf, other]

Quantum Privacy Aggregation of Teacher Ensembles (QPATE) for Privacy-preserving Quantum Machine Learning

Authors: William Watkins, Heehwan Wang, Sangyoon Bae, Huan-Hsin Tseng, Jiook Cha, Samuel Yen-Chi Chen, Shinjae Yoo

Abstract: The utility of machine learning has rapidly expanded in the last two decades and presents an ethical challenge. Papernot et. al. developed a technique, known as Private Aggregation of Teacher Ensembles (PATE) to enable federated learning in which multiple teacher models are trained on disjoint datasets. This study is the first to apply PATE to an ensemble of quantum neural networks (QNN) to pave a… ▽ More The utility of machine learning has rapidly expanded in the last two decades and presents an ethical challenge. Papernot et. al. developed a technique, known as Private Aggregation of Teacher Ensembles (PATE) to enable federated learning in which multiple teacher models are trained on disjoint datasets. This study is the first to apply PATE to an ensemble of quantum neural networks (QNN) to pave a new way of ensuring privacy in quantum machine learning (QML) models. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.06415 [pdf, other]

3D Reconstruction of Interacting Multi-Person in Clothing from a Single Image

Authors: Junuk Cha, Hansol Lee, Jaewon Kim, Nhat Nguyen Bao Truong, Jae Shin Yoon, Seungryul Baek

Abstract: This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image. The main challenge arises from the occlusion: a part of a human body is not visible from a single view due to the occlusion by others or the self, which introduces missing geometry and physical implausibility (e.g., penetration). We over… ▽ More This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image. The main challenge arises from the occlusion: a part of a human body is not visible from a single view due to the occlusion by others or the self, which introduces missing geometry and physical implausibility (e.g., penetration). We overcome this challenge by utilizing two human priors for complete 3D geometry and surface contacts. For the geometry prior, an encoder learns to regress the image of a person with missing body parts to the latent vectors; a decoder decodes these vectors to produce 3D features of the associated geometry; and an implicit network combines these features with a surface normal map to reconstruct a complete and detailed 3D humans. For the contact prior, we develop an image-space contact detector that outputs a probability distribution of surface contacts between people in 3D. We use these priors to globally refine the body poses, enabling the penetration-free and accurate reconstruction of interacting multi-person in clothing on the scene space. The results demonstrate that our method is complete, globally coherent, and physically plausible compared to existing methods. △ Less

Submitted 2 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted to WACV 2024

arXiv:2401.02656 [pdf, other]

GTA: Guided Transfer of Spatial Attention from Object-Centric Representations

Authors: SeokHyun Seo, **woo Hong, JungWoo Chae, Kyungyul Kim, Sangheum Hwang

Abstract: Utilizing well-trained representations in transfer learning often results in superior performance and faster convergence compared to training from scratch. However, even if such good representations are transferred, a model can easily overfit the limited training dataset and lose the valuable properties of the transferred representations. This phenomenon is more severe in ViT due to its low induct… ▽ More Utilizing well-trained representations in transfer learning often results in superior performance and faster convergence compared to training from scratch. However, even if such good representations are transferred, a model can easily overfit the limited training dataset and lose the valuable properties of the transferred representations. This phenomenon is more severe in ViT due to its low inductive bias. Through experimental analysis using attention maps in ViT, we observe that the rich representations deteriorate when trained on a small dataset. Motivated by this finding, we propose a novel and simple regularization method for ViT called Guided Transfer of spatial Attention (GTA). Our proposed method regularizes the self-attention maps between the source and target models. A target model can fully exploit the knowledge related to object localization properties through this explicit regularization. Our experimental results show that the proposed GTA consistently improves the accuracy across five benchmark datasets especially when the number of training data is small. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.00971 [pdf, other]

Efficient Multi-domain Text Recognition Deep Neural Network Parameterization with Residual Adapters

Authors: Jiayou Chao, Wei Zhu

Abstract: Recent advancements in deep neural networks have markedly enhanced the performance of computer vision tasks, yet the specialized nature of these networks often necessitates extensive data and high computational power. Addressing these requirements, this study presents a novel neural network model adept at optical character recognition (OCR) across diverse domains, leveraging the strengths of multi… ▽ More Recent advancements in deep neural networks have markedly enhanced the performance of computer vision tasks, yet the specialized nature of these networks often necessitates extensive data and high computational power. Addressing these requirements, this study presents a novel neural network model adept at optical character recognition (OCR) across diverse domains, leveraging the strengths of multi-task learning to improve efficiency and generalization. The model is designed to achieve rapid adaptation to new domains, maintain a compact size conducive to reduced computational resource demand, ensure high accuracy, retain knowledge from previous learning experiences, and allow for domain-specific performance improvements without the need to retrain entirely. Rigorous evaluation on open datasets has validated the model's ability to significantly lower the number of trainable parameters without sacrificing performance, indicating its potential as a scalable and adaptable solution in the field of computer vision, particularly for applications in optical text recognition. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.16842 [pdf, other]

Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single Camera

Authors: Hansol Lee, Junuk Cha, Yunhoe Ku, Jae Shin Yoon, Seungryul Baek

Abstract: The appearance of a human in clothing is driven not only by the pose but also by its temporal context, i.e., motion. However, such context has been largely neglected by existing monocular human modeling methods whose neural networks often struggle to learn a video of a person with large dynamics due to the motion ambiguity, i.e., there exist numerous geometric configurations of clothes that are de… ▽ More The appearance of a human in clothing is driven not only by the pose but also by its temporal context, i.e., motion. However, such context has been largely neglected by existing monocular human modeling methods whose neural networks often struggle to learn a video of a person with large dynamics due to the motion ambiguity, i.e., there exist numerous geometric configurations of clothes that are dependent on the context of motion even for the same pose. In this paper, we introduce a method for high-quality modeling of clothed 3D human avatars using a video of a person with dynamic movements. The main challenge comes from the lack of 3D ground truth data of geometry and its temporal correspondences. We address this challenge by introducing a novel compositional human modeling framework that takes advantage of both explicit and implicit human modeling. For explicit modeling, a neural network learns to generate point-wise shape residuals and appearance features of a 3D body model by comparing its 2D rendering results and the original images. This explicit model allows for the reconstruction of discriminative 3D motion features from UV space by encoding their temporal correspondences. For implicit modeling, an implicit network combines the appearance and 3D motion features to decode high-fidelity clothed 3D human avatars with motion-dependent geometry and texture. The experiments show that our method can generate a large variation of secondary motion in a physically plausible way. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.12889 [pdf]

Singular Hall response from a correlated ferromagnetic flat nodal-line semimetal

Authors: Woohyun Cho, Yoon-Gu Kang, Jaehun Cha, Dong Hyun David Lee, Do Hoon Kiem, Jaewhan Oh, Jongho Park, Changyoung Kim, Yongsoo Yang, Yeong Kwan Kim, Myung Joon Han, Heejun Yang

Abstract: Topological quantum phases have been largely understood in weakly correlated systems, which have identified various quantum phenomena such as spin Hall effect, protected transport of helical fermions, and topological superconductivity. Robust ferromagnetic order in correlated topological materials particularly attracts attention, as it can provide a versatile platform for novel quantum devices. He… ▽ More Topological quantum phases have been largely understood in weakly correlated systems, which have identified various quantum phenomena such as spin Hall effect, protected transport of helical fermions, and topological superconductivity. Robust ferromagnetic order in correlated topological materials particularly attracts attention, as it can provide a versatile platform for novel quantum devices. Here, we report singular Hall response arising from a unique band structure of flat topological nodal lines in combination with electron correlation in an itinerant, van der Waals ferromagnetic semimetal, Fe3GaTe2, with a high Curie temperature of Tc=360 K. High anomalous Hall conductivity violating the conventional scaling, resistivity upturn at low temperature, and a large Sommerfeld coefficient are observed in Fe3GaTe2, which implies heavy fermion features in this ferromagnetic topological material. Our circular dichroism in angle-resolved photoemission spectroscopy and theoretical calculations support the original electronic features in the material. Thus, low-dimensional Fe3GaTe2 with electronic correlation, topology, and room-temperature ferromagnetic order appears to be a promising candidate for robust quantum devices. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.06742 [pdf, other]

Honeybee: Locality-enhanced Projector for Multimodal LLM

Authors: Junbum Cha, Wooyoung Kang, Jonghwan Mun, Byungseok Roh

Abstract: In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in man… ▽ More In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of visual tokens, crucial for MLLMs' overall efficiency, and (ii) preservation of local context from visual features, vital for spatial understanding. Based on these findings, we propose a novel projector design that is both flexible and locality-enhanced, effectively satisfying the two desirable properties. Additionally, we present comprehensive strategies to effectively utilize multiple and multifaceted instruction datasets. Through extensive experiments, we examine the impact of individual design choices. Finally, our proposed MLLM, Honeybee, remarkably outperforms previous state-of-the-art methods across various benchmarks, including MME, MMBench, SEED-Bench, and LLaVA-Bench, achieving significantly higher efficiency. Code and models are available at https://github.com/kakaobrain/honeybee. △ Less

Submitted 31 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 camera-ready

arXiv:2312.05928 [pdf, other]

AesFA: An Aesthetic Feature-Aware Arbitrary Neural Style Transfer

Authors: Joonwoo Kwon, Sooyoung Kim, Yuewei Lin, Shinjae Yoo, Jiook Cha

Abstract: Neural style transfer (NST) has evolved significantly in recent years. Yet, despite its rapid progress and advancement, existing NST methods either struggle to transfer aesthetic information from a style effectively or suffer from high computational costs and inefficiencies in feature disentanglement due to using pre-trained models. This work proposes a lightweight but effective model, AesFA -- Ae… ▽ More Neural style transfer (NST) has evolved significantly in recent years. Yet, despite its rapid progress and advancement, existing NST methods either struggle to transfer aesthetic information from a style effectively or suffer from high computational costs and inefficiencies in feature disentanglement due to using pre-trained models. This work proposes a lightweight but effective model, AesFA -- Aesthetic Feature-Aware NST. The primary idea is to decompose the image via its frequencies to better disentangle aesthetic styles from the reference image while training the entire model in an end-to-end manner to exclude pre-trained models at inference completely. To improve the network's ability to extract more distinct representations and further enhance the stylization quality, this work introduces a new aesthetic feature: contrastive loss. Extensive experiments and ablations show the approach not only outperforms recent NST methods in terms of stylization quality, but it also achieves faster inference. Codes are available at https://github.com/Sooyyoungg/AesFA. △ Less

Submitted 22 February, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.02103 [pdf, other]

Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

Authors: Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

Abstract: Open-vocabulary object detection (OVOD) has recently gained significant attention as a crucial step toward achieving human-like visual intelligence. Existing OVOD methods extend target vocabulary from pre-defined categories to open-world by transferring knowledge of arbitrary concepts from vision-language pre-training models to the detectors. While previous methods have shown remarkable successes,… ▽ More Open-vocabulary object detection (OVOD) has recently gained significant attention as a crucial step toward achieving human-like visual intelligence. Existing OVOD methods extend target vocabulary from pre-defined categories to open-world by transferring knowledge of arbitrary concepts from vision-language pre-training models to the detectors. While previous methods have shown remarkable successes, they suffer from indirect supervision or limited transferable concepts. In this paper, we propose a simple yet effective method to directly learn region-text alignment for arbitrary concepts. Specifically, the proposed method aims to learn arbitrary image-to-text map** for pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary Concepts (PLAC). The proposed method shows competitive performance on the standard OVOD benchmark for noun concepts and a large improvement on referring expression comprehension benchmark for arbitrary concepts. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.00548 [pdf, other]

Domain Adaptive Imitation Learning with Visual Observation

Authors: Sungho Choi, Seungyul Han, Woojun Kim, Jongseong Chae, Whiyoung Jung, Youngchul Sung

Abstract: In this paper, we consider domain-adaptive imitation learning with visual observation, where an agent in a target domain learns to perform a task by observing expert demonstrations in a source domain. Domain adaptive imitation learning arises in practical scenarios where a robot, receiving visual sensory data, needs to mimic movements by visually observing other robots from different angles or obs… ▽ More In this paper, we consider domain-adaptive imitation learning with visual observation, where an agent in a target domain learns to perform a task by observing expert demonstrations in a source domain. Domain adaptive imitation learning arises in practical scenarios where a robot, receiving visual sensory data, needs to mimic movements by visually observing other robots from different angles or observing robots of different shapes. To overcome the domain shift in cross-domain imitation learning with visual observation, we propose a novel framework for extracting domain-independent behavioral features from input observations that can be used to train the learner, based on dual feature extraction and image reconstruction. Empirical results demonstrate that our approach outperforms previous algorithms for imitation learning from visual observation with domain shift. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2311.04783 [pdf, other]

VioLA: Aligning Videos to 2D LiDAR Scans

Authors: Jun-Jee Chao, Selim Engin, Nikhil Chavan-Dafle, Bhoram Lee, Volkan Isler

Abstract: We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the re… ▽ More We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed scene completion module improves the pose registration performance by up to 20%. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2309.11314 [pdf]

1D-confined crystallization routes for tungsten phosphides

Authors: Gangtae **, Christian D. Multunas, James L. Hart, Mehrdad T. Kiani, Quynh P. Sam, Han Wang, Yeryun Cheon, Khoan Duong, David J. Hynek, Hyeuk ** Han, Ravishankar Sundararaman, Judy J. Cha

Abstract: Topological materials confined in one-dimension (1D) can transform computing technologies, such as 1D topological semimetals for nanoscale interconnects and 1D topological superconductors for fault-tolerant quantum computing. As such, understanding crystallization of 1D-confined topological materials is critical. Here, we demonstrate 1D-confined crystallization routes during template-assisted nano… ▽ More Topological materials confined in one-dimension (1D) can transform computing technologies, such as 1D topological semimetals for nanoscale interconnects and 1D topological superconductors for fault-tolerant quantum computing. As such, understanding crystallization of 1D-confined topological materials is critical. Here, we demonstrate 1D-confined crystallization routes during template-assisted nanowire synthesis where we observe diameter-dependent phase selectivity for topological metal tungsten phosphides. A phase bifurcation occurs to produce tungsten monophosphide and tungsten diphosphide at the cross-over nanowire diameter of ~ 35 nm. Four-dimensional scanning transmission electron microscopy was used to identify the two phases and to map crystallographic orientations of grains at a few nm resolution. The 1D-confined phase selectivity is attributed to the minimization of the total surface energy, which depends on the nanowire diameter and chemical potentials of precursors. Theoretical calculations were carried out to construct the diameter-dependent phase diagram, which agrees with experimental observations. Our find-ings suggest a new crystallization route to stabilize topological materials confined in 1D. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 5 figures

arXiv:2309.06406 [pdf]

In operando cryo-STEM of pulse-induced charge density wave switching in TaS$_2$

Authors: James L Hart, Saif Siddique, Noah Schnitzer, Stephen D. Funni, Lena F. Kourkoutis, Judy J. Cha

Abstract: The charge density wave (CDW) material 1T-TaS$_2$ exhibits a pulse-induced insulator-to-metal transition, which shows promise for next-generation electronics such as memristive memory and neuromorphic hardware. However, the rational design of TaS$_2$ devices is hindered by a poor understanding of the switching mechanism, the pulse-induced phase, and the influence of material defects. Here, we oper… ▽ More The charge density wave (CDW) material 1T-TaS$_2$ exhibits a pulse-induced insulator-to-metal transition, which shows promise for next-generation electronics such as memristive memory and neuromorphic hardware. However, the rational design of TaS$_2$ devices is hindered by a poor understanding of the switching mechanism, the pulse-induced phase, and the influence of material defects. Here, we operate a 2-terminal TaS$_2$ device within a scanning transmission electron microscope (STEM) at cryogenic temperature, and directly visualize the changing CDW structure with nanoscale spatial resolution and down to 300 μs temporal resolution. We show that the pulse-induced transition is driven by Joule heating, and that the pulse-induced state corresponds to nearly commensurate and incommensurate CDW phases, depending on the applied voltage amplitude. With our in operando cryo-STEM experiments, we directly correlate the CDW structure with the device resistance, and show that dislocations significantly impact device performance. This work resolves fundamental questions of resistive switching in TaS$_2$ devices critical for engineering reliable and scalable TaS$_2$ electronics. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.04138 [pdf, other]

doi 10.1109/IROS55552.2023.10342530

Proprioceptive External Torque Learning for Floating Base Robot and its Applications to Humanoid Locomotion

Authors: Daegyu Lim, Myeong-Ju Kim, Junhyeok Cha, Donghyeon Kim, Jaeheung Park

Abstract: The estimation of external joint torque and contact wrench is essential for achieving stable locomotion of humanoids and safety-oriented robots. Although the contact wrench on the foot of humanoids can be measured using a force-torque sensor (FTS), FTS increases the cost, inertia, complexity, and failure possibility of the system. This paper introduces a method for learning external joint torque s… ▽ More The estimation of external joint torque and contact wrench is essential for achieving stable locomotion of humanoids and safety-oriented robots. Although the contact wrench on the foot of humanoids can be measured using a force-torque sensor (FTS), FTS increases the cost, inertia, complexity, and failure possibility of the system. This paper introduces a method for learning external joint torque solely using proprioceptive sensors (encoders and IMUs) for a floating base robot. For learning, the GRU network is used and random walking data is collected. Real robot experiments demonstrate that the network can estimate the external torque and contact wrench with significantly smaller errors compared to the model-based method, momentum observer (MOB) with friction modeling. The study also validates that the estimated contact wrench can be utilized for zero moment point (ZMP) feedback control, enabling stable walking. Moreover, even when the robot's feet and the inertia of the upper body are changed, the trained network shows consistent performance with a model-based calibration. This result demonstrates the possibility of removing FTS on the robot, which reduces the disadvantages of hardware sensors. The summary video is available at https://youtu.be/gT1D4tOiKpo. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: Accepted by 2023 IROS conference

arXiv:2308.11916 [pdf, other]

Semantic-Aware Implicit Template Learning via Part Deformation Consistency

Authors: Sihyeon Kim, Minseok Joo, Jaewon Lee, Juyeon Ko, Juhan Cha, Hyunwoo J. Kim

Abstract: Learning implicit templates as neural fields has recently shown impressive performance in unsupervised shape correspondence. Despite the success, we observe current approaches, which solely rely on geometric information, often learn suboptimal deformation across generic object shapes, which have high structural variability. In this paper, we highlight the importance of part deformation consistency… ▽ More Learning implicit templates as neural fields has recently shown impressive performance in unsupervised shape correspondence. Despite the success, we observe current approaches, which solely rely on geometric information, often learn suboptimal deformation across generic object shapes, which have high structural variability. In this paper, we highlight the importance of part deformation consistency and propose a semantic-aware implicit template learning framework to enable semantically plausible deformation. By leveraging semantic prior from a self-supervised feature extractor, we suggest local conditioning with novel semantic-aware deformation code and deformation consistency regularizations regarding part deformation, global deformation, and global scaling. Our extensive experiments demonstrate the superiority of the proposed method over baselines in various tasks: keypoint transfer, part label transfer, and texture transfer. More interestingly, our framework shows a larger performance gain under more challenging settings. We also provide qualitative analyses to validate the effectiveness of semantic-aware deformation. The code is available at https://github.com/mlvlab/PDC. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: ICCV camera-ready version

arXiv:2307.12644 [pdf, other]

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

Authors: Dae-Yeol Kim, Eunsu Goh, KwangKee Lee, JongEui Chae, JongHyeon Mun, Junyeong Na, Chae-bong Sohn, Do-Yup Kim

Abstract: rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monito… ▽ More rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monitoring, and early prediction of cardiovascular disease. rPPG is rapidly evolving and attracting great attention from both academia and industry by providing great usability and convenience as it can measure biosignals using a camera-equipped device without medical or wearable devices. Despite extensive efforts and advances in this field, serious challenges remain, including issues related to skin color, camera characteristics, ambient lighting, and other sources of noise and artifacts, which degrade accuracy performance. We argue that fair and evaluable benchmarking is urgently required to overcome these challenges and make meaningful progress from both academic and commercial perspectives. In most existing work, models are trained, tested, and validated only on limited datasets. Even worse, some studies lack available code or reproducibility, making it difficult to fairly evaluate and compare performance. Therefore, the purpose of this study is to provide a benchmarking framework to evaluate various rPPG techniques across a wide range of datasets for fair evaluation and comparison, including both conventional non-deep neural network (non-DNN) and deep neural network (DNN) methods. GitHub URL: https://github.com/remotebiosensing/rppg △ Less

Submitted 18 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 20 pages, 10 figures

MSC Class: 68T45; 68T07 ACM Class: I.4.9; I.5.4; I.2

arXiv:2307.05916 [pdf, other]

SwiFT: Swin 4D fMRI Transformer

Authors: Peter Yongho Kim, Junbeom Kwon, Sunghwan Joo, Sangyoon Bae, Donggyu Lee, Yoonho Jung, Shinjae Yoo, Jiook Cha, Taesup Moon

Abstract: Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Trans… ▽ More Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Transformer architecture that can learn brain dynamics directly from fMRI volumes in a memory and computation-efficient manner. SwiFT achieves this by implementing a 4D window multi-head self-attention mechanism and absolute positional embeddings. We evaluate SwiFT using multiple large-scale resting-state fMRI datasets, including the Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD), and UK Biobank (UKB) datasets, to predict sex, age, and cognitive intelligence. Our experimental outcomes reveal that SwiFT consistently outperforms recent state-of-the-art models. Furthermore, by leveraging its end-to-end learning capability, we show that contrastive loss-based self-supervised pre-training of SwiFT can enhance performance on downstream tasks. Additionally, we employ an explainable AI method to identify the brain regions associated with sex classification. To our knowledge, SwiFT is the first Swin Transformer architecture to process dimensional spatiotemporal brain functional data in an end-to-end fashion. Our work holds substantial potential in facilitating scalable learning of functional brain imaging in neuroscience research by reducing the hurdles associated with applying Transformer models to high-dimensional fMRI. △ Less

Submitted 31 October, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023

arXiv:2306.12946 [pdf, other]

Conceptual Design and Analysis of No-Insulation High-Temperature Superconductor Tubular Wave Energy Converter

Authors: Kyoungmo Koo, Wonseok Jang, Jeonghwan Park, Jaemyung Cha, Seungyong Hahn

Abstract: So far, a number of wave energy converters (WEC) have been proposed to increase efficiency and economic feasibility. Particularly, tubular WEC with permanent magnets and coil winding packs is mostly used to convert the wave energy. Due to the demand for high magnetic flux density in WEC, research has been conducted on high-temperature superconductors (HTS) WEC. In this paper, the conceptual design… ▽ More So far, a number of wave energy converters (WEC) have been proposed to increase efficiency and economic feasibility. Particularly, tubular WEC with permanent magnets and coil winding packs is mostly used to convert the wave energy. Due to the demand for high magnetic flux density in WEC, research has been conducted on high-temperature superconductors (HTS) WEC. In this paper, the conceptual design of no-insulation (NI) HTS tubular WEC and its optimization process are proposed. Using NI technology, it has become possible to design WEC with high volumetric efficiency and cost-effectiveness. Furthermore, the design is analyzed in the aspect of electromagnetism, mechanical force, and cryogen. The performance of the proposed WEC is evaluated as a response to various waveforms and their amplitudes. A rectifying circuit of WEC connected in parallel with load resistance is used for the output power study. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.12654 [pdf, other]

Novelty Accommodating Multi-Agent Planning in High Fidelity Simulated Open World

Authors: James Chao, Wiktor Piotrowski, Mitch Manzanares, Douglas S. Lange

Abstract: Autonomous agents acting in real-world environments often need to reason with unknown novelties interfering with their plan execution. Novelty is an unexpected phenomenon that can alter the core characteristics, composition, and dynamics of the environment. Novelty can occur at any time in any sufficiently complex environment without any prior notice or explanation. Previous studies show that nove… ▽ More Autonomous agents acting in real-world environments often need to reason with unknown novelties interfering with their plan execution. Novelty is an unexpected phenomenon that can alter the core characteristics, composition, and dynamics of the environment. Novelty can occur at any time in any sufficiently complex environment without any prior notice or explanation. Previous studies show that novelty has catastrophic impact on agent performance. Intelligent agents reason with an internal model of the world to understand the intricacies of their environment and to successfully execute their plans. The introduction of novelty into the environment usually renders their internal model inaccurate and the generated plans no longer applicable. Novelty is particularly prevalent in the real world where domain-specific and even predicted novelty-specific approaches are used to mitigate the novelty's impact. In this work, we demonstrate that a domain-independent AI agent designed to detect, characterize, and accommodate novelty in smaller-scope physics-based games such as Angry Birds and Cartpole can be adapted to successfully perform and reason with novelty in realistic high-fidelity simulator of the military domain. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.10841 [pdf, other]

Blockchain-Enabled Federated Learning: A Reference Architecture Design, Implementation, and Verification

Authors: Eunsu Goh, Dae-Yeol Kim, Kwangkee Lee, Suyeong Oh, Jong-Eui Chae, Do-Yup Kim

Abstract: This paper presents a novel reference architecture for blockchain-enabled federated learning (BCFL), a state-of-the-art approach that amalgamates the strengths of federated learning and blockchain technology.We define smart contract functions, stakeholders and their roles, and the use of interplanetary file system (IPFS) as key components of BCFL and conduct a comprehensive analysis. In traditiona… ▽ More This paper presents a novel reference architecture for blockchain-enabled federated learning (BCFL), a state-of-the-art approach that amalgamates the strengths of federated learning and blockchain technology.We define smart contract functions, stakeholders and their roles, and the use of interplanetary file system (IPFS) as key components of BCFL and conduct a comprehensive analysis. In traditional centralized federated learning, the selection of local nodes and the collection of learning results for each round are merged under the control of a central server. In contrast, in BCFL, all these processes are monitored and managed via smart contracts. Additionally, we propose an extension architecture to support both crossdevice and cross-silo federated learning scenarios. Furthermore, we implement and verify the architecture in a practical real-world Ethereum development environment. Our BCFL reference architecture provides significant flexibility and extensibility, accommodating the integration of various additional elements, as per specific requirements and use cases, thereby rendering it an adaptable solution for a wide range of BCFL applications. As a prominent example of extensibility, decentralized identifiers (DIDs) have been employed as an authentication method to introduce practical utilization within BCFL. This study not only bridges a crucial gap between research and practical deployment but also lays a solid foundation for future explorations in the realm of BCFL. The pivotal contribution of this study is the successful implementation and verification of a realistic BCFL reference architecture. We intend to make the source code publicly accessible shortly, fostering further advancements and adaptations within the community. △ Less

Submitted 22 November, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 14 pages, 15 figures, 3 tables

MSC Class: 68T01 (Primary) 68M14; 94A60 (Secondary) ACM Class: I.2.6; I.2.11

arXiv:2306.10167 [pdf]

Wafer-scale fabrication of 2D nanostructures via thermomechanical nanomolding

Authors: Mehrdad T Kiani, Quynh P Sam, Yeon Sik Jung, Hyeuk ** Han, Judy J Cha

Abstract: With shrinking dimensions in integrated circuits, sensors, and functional devices, there is a pressing need to develop nanofabrication techniques with simultaneous control of morphology, microstructure, and material composition over wafer length scales. Current techniques are largely unable to meet all these conditions, suffering from poor control of morphology and defect structure or requiring ex… ▽ More With shrinking dimensions in integrated circuits, sensors, and functional devices, there is a pressing need to develop nanofabrication techniques with simultaneous control of morphology, microstructure, and material composition over wafer length scales. Current techniques are largely unable to meet all these conditions, suffering from poor control of morphology and defect structure or requiring extensive optimization or post-processing to achieve desired nanostructures. Recently, thermomechanical nanomolding (TMNM) has been shown to yield single-crystalline, high aspect ratio nanowires of metals, alloys, and intermetallics over wafer-scale distances. Here, we extend TMNM for wafer-scale fabrication of 2D nanostructures. Using Cu, we successfully nanomold Cu nanoribbons with widths < 50 nm, depths ~ 0.5-1 microns and lengths ~ 7 mm into Si trenches at conditions compatible with back end of line processing. Through SEM cross-section imaging and 4D-STEM grain orientation maps, we show that the grain size of the bulk feedstock is transferred to the nanomolded structures up to and including single crystal Cu. Based on the retained microstructures of molded 2D Cu, we discuss the deformation mechanism during molding for 2D TMNM. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 4 figures

arXiv:2306.03040 [pdf, other]

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Authors: Jisoo Cha, Haemin Jeong, Wooju Kim

Abstract: The task of the session-based recommendation is to predict the next interaction of the user based on the anonymized user's behavior pattern. And personalized version of this system is a promising research field due to its availability to deal with user information. However, there's a problem that the user's preferences and historical sessions were not considered in the typical session-based recomm… ▽ More The task of the session-based recommendation is to predict the next interaction of the user based on the anonymized user's behavior pattern. And personalized version of this system is a promising research field due to its availability to deal with user information. However, there's a problem that the user's preferences and historical sessions were not considered in the typical session-based recommendation since it concentrates only on user-item interaction. In addition, the existing personalized session-based recommendation model has a limited capability in that it only considers the preference of the current user without considering those of similar users. It means there can be the loss of information included within the hierarchical data structure of the user-session-item. To tackle with this problem, we propose USP-SBR(abbr. of User Similarity Powered - Session Based Recommender). To model global historical sessions of users, we propose UserGraph that has two types of nodes - ItemNode and UserNode. We then connect the nodes with three types of edges. The first type of edges connects ItemNode as chronological order, and the second connects ItemNode to UserNode, and the last connects UserNode to ItemNode. With these user embeddings, we propose additional contrastive loss, that makes users with similar intention be close to each other in the vector space. we apply graph neural network on these UserGraph and update nodes. Experimental results on two real-world datasets demonstrate that our method outperforms some state-of-the-art approaches. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: 7 pages, 5 figures

MSC Class: 68P20

Showing 1–50 of 320 results for author: Chao, J