Search | arXiv e-print repository

Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN

Authors: Massimiliano Lupo Pasini, Jong Youl Choi, Kshitij Mehta, Pei Zhang, David Rogers, Jonghyun Bae, Khaled Z. Ibrahim, Ashwin M. Aji, Karl W. Schulz, Jorda Polo, Prasanna Balaprakash

Abstract: We present our work on develo** and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that de… ▽ More We present our work on develo** and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFM training to tens of thousands of GPUs on datasets that consist of hundreds of millions of graphs. Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures, such as the total energy and atomic forces. Using over 150 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two United States Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge National Laboratory. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2,000 GPUs on Perlmutter and 16,000 GPUs on Frontier. Hyperparameter optimization (HPO) was performed on over 64,000 GPUs on Frontier to select GFM architectures with high accuracy. Early stop** was applied on each GFM architecture for energy awareness in performing such an extreme-scale task. The training of an ensemble of highest-ranked GFM architectures continued until convergence to establish uncertainty quantification (UQ) capabilities with ensemble learning. Our contribution opens the door for rapidly develo**, training, and deploying GFMs using large-scale computational resources to enable AI-accelerated materials discovery and design. △ Less

Submitted 28 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 16 pages, 13 figures

MSC Class: 68T07; 68T09 ACM Class: C.2.4; I.2.11

arXiv:2405.16301 [pdf, other]

Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples

Authors: Dae Ung Jo, Kyuewang Lee, JaeHo Chung, ** Young Choi

Abstract: Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However,… ▽ More Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However, in the recent ITR studies, the importance of category label is decreased since a retrieval model can be trained with only image-text pairs. For this reason, we set up an active learning scenario where unpaired images (or texts) are given and the annotator provides corresponding texts (or images) to make paired data. The key idea of the proposed AL algorithm is to select unpaired images (or texts) that can be hard negative samples for existing texts (or images). To this end, we introduce a novel scoring function to choose hard negative samples. We validate the effectiveness of the proposed method on Flickr30K and MS-COCO datasets. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15780 [pdf, other]

Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier

Authors: Aristeidis Tsaris, Chengming Zhang, Xiao Wang, Junqi Yin, Siyan Liu, Moetasim Ashfaq, Ming Fan, Jong Youl Choi, Mohamed Wahib, Dan Lu, Prasanna Balaprakash, Feiyi Wang

Abstract: Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to… ▽ More Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to 1M tokens. Our approach, leveraging DeepSpeed-Ulysses and Long-Sequence-Segmentation with model sharding, is the first to apply sequence parallelism in ViT training, achieving a 94% batch scaling efficiency on 2,048 AMD-MI250X GPUs. Evaluating sequence parallelism in ViTs, particularly in models up to 10B parameters, highlighted substantial bottlenecks. We countered these with hybrid sequence, pipeline, tensor parallelism, and flash attention strategies, to scale beyond single GPU memory limits. Our method significantly enhances climate modeling accuracy by 20% in temperature predictions, marking the first training of a transformer model on a full-attention matrix over 188K sequence length. △ Less

Submitted 17 April, 2024; originally announced May 2024.

arXiv:2405.03958 [pdf, other]

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional la… ▽ More Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2403.06225 [pdf, other]

MoST: Motion Style Transformer between Diverse Action Contents

Authors: Boeun Kim, Jungho Kim, Hyung ** Chang, ** Young Choi

Abstract: While existing motion style transfer methods are effective between two motions with identical content, their performance significantly diminishes when transferring style between motions with different contents. This challenge lies in the lack of clear separation between content and style of a motion. To tackle this challenge, we propose a novel motion style transformer that effectively disentangle… ▽ More While existing motion style transfer methods are effective between two motions with identical content, their performance significantly diminishes when transferring style between motions with different contents. This challenge lies in the lack of clear separation between content and style of a motion. To tackle this challenge, we propose a novel motion style transformer that effectively disentangles style from content and generates a plausible motion with transferred style from a source motion. Our distinctive approach to achieving the goal of disentanglement is twofold: (1) a new architecture for motion style transformer with `part-attentive style modulator across body parts' and `Siamese encoders that encode style and content features separately'; (2) style disentanglement loss. Our method outperforms existing methods and demonstrates exceptionally high quality, particularly in motion pairs with different contents, without the need for heuristic post-processing. Codes are available at https://github.com/Boeun-Kim/MoST. △ Less

Submitted 20 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2401.03846 [pdf, other]

UFO: Unidentified Foreground Object Detection in 3D Point Cloud

Authors: Hyunjun Choi, Hawook Jeong, ** Young Choi

Abstract: In this paper, we raise a new issue on Unidentified Foreground Object (UFO) detection in 3D point clouds, which is a crucial technology in autonomous driving in the wild. UFO detection is challenging in that existing 3D object detectors encounter extremely hard challenges in both 3D localization and Out-of-Distribution (OOD) detection. To tackle these challenges, we suggest a new UFO detection fra… ▽ More In this paper, we raise a new issue on Unidentified Foreground Object (UFO) detection in 3D point clouds, which is a crucial technology in autonomous driving in the wild. UFO detection is challenging in that existing 3D object detectors encounter extremely hard challenges in both 3D localization and Out-of-Distribution (OOD) detection. To tackle these challenges, we suggest a new UFO detection framework including three tasks: evaluation protocol, methodology, and benchmark. The evaluation includes a new approach to measure the performance on our goal, i.e. both localization and OOD detection of UFOs. The methodology includes practical techniques to enhance the performance of our goal. The benchmark is composed of the KITTI Misc benchmark and our additional synthetic benchmark for modeling a more diverse range of UFOs. The proposed framework consistently enhances performance by a large margin across all four baseline detectors: SECOND, PointPillars, PV-RCNN, and PartA2, giving insight for future work on UFO detection in the wild. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Under review

arXiv:2312.16388 [pdf, other]

doi 10.1609/aaai.v38i3.28059

Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding

Authors: Sunoh Kim, Jungchan Cho, Joonsang Yu, YoungJoon Yoo, ** Young Choi

Abstract: In the weakly supervised temporal video grounding study, previous methods use predetermined single Gaussian proposals which lack the ability to express diverse events described by the sentence query. To enhance the expression ability of a proposal, we propose a Gaussian mixture proposal (GMP) that can depict arbitrary shapes by learning importance, centroid, and range of every Gaussian in the mixt… ▽ More In the weakly supervised temporal video grounding study, previous methods use predetermined single Gaussian proposals which lack the ability to express diverse events described by the sentence query. To enhance the expression ability of a proposal, we propose a Gaussian mixture proposal (GMP) that can depict arbitrary shapes by learning importance, centroid, and range of every Gaussian in the mixture. In learning GMP, each Gaussian is not trained in a feature space but is implemented over a temporal location. Thus the conventional feature-based learning for Gaussian mixture model is not valid for our case. In our special setting, to learn moderately coupled Gaussian mixture capturing diverse events, we newly propose a pull-push learning scheme using pulling and pushing losses, each of which plays an opposite role to the other. The effects of components in our scheme are verified in-depth with extensive ablation studies and the overall scheme achieves state-of-the-art performance. Our code is available at https://github.com/sunoh-kim/pps. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: Accepted in AAAI 2024

arXiv:2311.05407 [pdf]

Data Distillation for Neural Network Potentials toward Foundational Dataset

Authors: Gang Seob Jung, Sangkeun Lee, Jong Youl Choi

Abstract: Machine learning (ML) techniques and atomistic modeling have rapidly transformed materials design and discovery. Specifically, generative models can swiftly propose promising materials for targeted applications. However, the predicted properties of materials through the generative models often do not match with calculated properties through ab initio calculations. This discrepancy can arise becaus… ▽ More Machine learning (ML) techniques and atomistic modeling have rapidly transformed materials design and discovery. Specifically, generative models can swiftly propose promising materials for targeted applications. However, the predicted properties of materials through the generative models often do not match with calculated properties through ab initio calculations. This discrepancy can arise because the generated coordinates are not fully relaxed, whereas the many properties are derived from relaxed structures. Neural network-based potentials (NNPs) can expedite the process by providing relaxed structures from the initially generated ones. Nevertheless, acquiring data to train NNPs for this purpose can be extremely challenging as it needs to encompass previously unknown structures. This study utilized extended ensemble molecular dynamics (MD) to secure a broad range of liquid- and solid-phase configurations in one of the metallic systems, nickel. Then, we could significantly reduce them through active learning without losing much accuracy. We found that the NNP trained from the distilled data could predict different energy-minimized closed-pack crystal structures even though those structures were not explicitly part of the initial data. Furthermore, the data can be translated to other metallic systems (aluminum and niobium), without repeating the sampling and distillation processes. Our approach to data acquisition and distillation has demonstrated the potential to expedite NNP development and enhance materials design and discovery by integrating generative models. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2308.01030 [pdf, other]

Three Factors to Improve Out-of-Distribution Detection

Authors: Hyunjun Choi, JaeHo Chung, Hawook Jeong, ** Young Choi

Abstract: In the problem of out-of-distribution (OOD) detection, the usage of auxiliary data as outlier data for fine-tuning has demonstrated encouraging performance. However, previous methods have suffered from a trade-off between classification accuracy (ACC) and OOD detection performance (AUROC, FPR, AUPR). To improve this trade-off, we make three contributions: (i) Incorporating a self-knowledge distill… ▽ More In the problem of out-of-distribution (OOD) detection, the usage of auxiliary data as outlier data for fine-tuning has demonstrated encouraging performance. However, previous methods have suffered from a trade-off between classification accuracy (ACC) and OOD detection performance (AUROC, FPR, AUPR). To improve this trade-off, we make three contributions: (i) Incorporating a self-knowledge distillation loss can enhance the accuracy of the network; (ii) Sampling semi-hard outlier data for training can improve OOD detection performance with minimal impact on accuracy; (iii) The introduction of our novel supervised contrastive learning can simultaneously improve OOD detection performance and the accuracy of the network. By incorporating all three factors, our approach enhances both accuracy and OOD detection performance by addressing the trade-off between classification and OOD detection. Our method achieves improvements over previous approaches in both performance metrics. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: Under review

arXiv:2306.10485 [pdf, other]

Balanced Energy Regularization Loss for Out-of-distribution Detection

Authors: Hyunjun Choi, Hawook Jeong, ** Young Choi

Abstract: In the field of out-of-distribution (OOD) detection, a previous method that use auxiliary data as OOD data has shown promising performance. However, the method provides an equal loss to all auxiliary data to differentiate them from inliers. However, based on our observation, in various tasks, there is a general imbalance in the distribution of the auxiliary OOD data across classes. We propose a ba… ▽ More In the field of out-of-distribution (OOD) detection, a previous method that use auxiliary data as OOD data has shown promising performance. However, the method provides an equal loss to all auxiliary data to differentiate them from inliers. However, based on our observation, in various tasks, there is a general imbalance in the distribution of the auxiliary OOD data across classes. We propose a balanced energy regularization loss that is simple but generally effective for a variety of tasks. Our balanced energy regularization loss utilizes class-wise different prior probabilities for auxiliary data to address the class imbalance in OOD data. The main concept is to regularize auxiliary samples from majority classes, more heavily than those from minority classes. Our approach performs better for OOD detection in semantic segmentation, long-tailed image classification, and image classification than the prior energy regularization loss. Furthermore, our approach achieves state-of-the-art performance in two tasks: OOD detection in semantic segmentation and long-tailed image classification. Code is available at https://github.com/hyunjunChhoi/Balanced_Energy. △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: Accepted by CVPR2023

arXiv:2305.16618 [pdf, other]

Confidence-Based Feature Imputation for Graphs with Partially Known Features

Authors: Daeho Um, Jiwoong Park, Seulki Park, ** Young Choi

Abstract: This paper investigates a missing feature imputation problem for graph learning tasks. Several methods have previously addressed learning tasks on graphs with missing features. However, in cases of high rates of missing features, they were unable to avoid significant performance degradation. To overcome this limitation, we introduce a novel concept of channel-wise confidence in a node feature, whi… ▽ More This paper investigates a missing feature imputation problem for graph learning tasks. Several methods have previously addressed learning tasks on graphs with missing features. However, in cases of high rates of missing features, they were unable to avoid significant performance degradation. To overcome this limitation, we introduce a novel concept of channel-wise confidence in a node feature, which is assigned to each imputed channel feature of a node for reflecting certainty of the imputation. We then design pseudo-confidence using the channel-wise shortest path distance between a missing-feature node and its nearest known-feature node to replace unavailable true confidence in an actual learning process. Based on the pseudo-confidence, we propose a novel feature imputation scheme that performs channel-wise inter-node diffusion and node-wise inter-channel propagation. The scheme can endure even at an exceedingly high missing rate (e.g., 99.5\%) and it achieves state-of-the-art accuracy for both semi-supervised node classification and link prediction on various datasets containing a high rate of missing features. Codes are available at https://github.com/daehoum1/pcfi. △ Less

Submitted 28 May, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted to ICLR 2023. 28 pages

arXiv:2304.13995 [pdf, other]

Rotation and Translation Invariant Representation Learning with Implicit Neural Representations

Authors: Sehyun Kwon, Joo Young Choi, Ernest K. Ryu

Abstract: In many computer vision applications, images are acquired with arbitrary or random rotations and translations, and in such setups, it is desirable to obtain semantic representations disentangled from the image orientation. Examples of such applications include semiconductor wafer defect inspection, plankton microscope images, and inference on single-particle cryo-electron microscopy (cryo-EM) micr… ▽ More In many computer vision applications, images are acquired with arbitrary or random rotations and translations, and in such setups, it is desirable to obtain semantic representations disentangled from the image orientation. Examples of such applications include semiconductor wafer defect inspection, plankton microscope images, and inference on single-particle cryo-electron microscopy (cryo-EM) micro-graphs. In this work, we propose Invariant Representation Learning with Implicit Neural Representation (IRL-INR), which uses an implicit neural representation (INR) with a hypernetwork to obtain semantic representations disentangled from the orientation of the image. We show that IRL-INR can effectively learn disentangled semantic representations on more complex images compared to those considered in prior works and show that these semantic representations synergize well with SCAN to produce state-of-the-art unsupervised clustering results. △ Less

Submitted 12 June, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.10727 [pdf, other]

RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models

Authors: Seulki Park, Daeho Um, Hajung Yoon, Sanghyuk Chun, Sangdoo Yun, ** Young Choi

Abstract: In this paper, we propose a robustness benchmark for image-text matching models to assess their vulnerabilities. To this end, we insert adversarial texts and images into the search pool (i.e., gallery set) and evaluate models with the adversarial data. Specifically, we replace a word in the text to change the meaning of the text and mix images with different images to create perceptible changes in… ▽ More In this paper, we propose a robustness benchmark for image-text matching models to assess their vulnerabilities. To this end, we insert adversarial texts and images into the search pool (i.e., gallery set) and evaluate models with the adversarial data. Specifically, we replace a word in the text to change the meaning of the text and mix images with different images to create perceptible changes in pixels. We assume that such explicit alterations would not deceive a robust model, as they should understand the holistic meaning of texts and images simultaneously. However, in our evaluations on the proposed benchmark, many state-of-the-art models show significant performance degradation, e.g., Recall@1: 81.9% $\rightarrow$ 64.5% in BLIP, 66.1% $\rightarrow$ 37.5% in VSE$\infty$, where the models favor adversarial texts/images over the original ones. This reveals the current vision-language models may not account for subtle changes or understand the overall context of texts and images. Our findings can provide insights for improving the robustness of the vision-language models and devising more diverse stress-test methods in cross-modal retrieval task. Source code and dataset will be available at https://github.com/pseulki/rococo. △ Less

Submitted 14 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.05067 [pdf]

Audio Bank: A High-Level Acoustic Signal Representation for Audio Event Recognition

Authors: Tushar Sandhan, Sukanya Sonowal, ** Young Choi

Abstract: Automatic audio event recognition plays a pivotal role in making human robot interaction more closer and has a wide applicability in industrial automation, control and surveillance systems. Audio event is composed of intricate phonic patterns which are harmonically entangled. Audio recognition is dominated by low and mid-level features, which have demonstrated their recognition capability but they… ▽ More Automatic audio event recognition plays a pivotal role in making human robot interaction more closer and has a wide applicability in industrial automation, control and surveillance systems. Audio event is composed of intricate phonic patterns which are harmonically entangled. Audio recognition is dominated by low and mid-level features, which have demonstrated their recognition capability but they have high computational cost and low semantic meaning. In this paper, we propose a new computationally efficient framework for audio recognition. Audio Bank, a new high-level representation of audio, is comprised of distinctive audio detectors representing each audio class in frequency-temporal space. Dimensionality of the resulting feature vector is reduced using non-negative matrix factorization preserving its discriminability and rich semantic information. The high audio recognition performance using several classifiers (SVM, neural network, Gaussian process classification and k-nearest neighbors) shows the effectiveness of the proposed method. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 6 pages, 9 figures, published in IEEE International Conf ICCAS 2014 (Best paper award)

arXiv:2211.10967 [pdf, other]

Font Representation Learning via Paired-glyph Matching

Authors: Junho Cho, Kyuewang Lee, ** Young Choi

Abstract: Fonts can convey profound meanings of words in various forms of glyphs. Without typography knowledge, manually selecting an appropriate font or designing a new font is a tedious and painful task. To allow users to explore vast font styles and create new font styles, font retrieval and font style transfer methods have been proposed. These tasks increase the need for learning high-quality font repre… ▽ More Fonts can convey profound meanings of words in various forms of glyphs. Without typography knowledge, manually selecting an appropriate font or designing a new font is a tedious and painful task. To allow users to explore vast font styles and create new font styles, font retrieval and font style transfer methods have been proposed. These tasks increase the need for learning high-quality font representations. Therefore, we propose a novel font representation learning scheme to embed font styles into the latent space. For the discriminative representation of a font from others, we propose a paired-glyph matching-based font representation learning model that attracts the representations of glyphs in the same font to one another, but pushes away those of other fonts. Through evaluations on font retrieval with query glyphs on new fonts, we show our font representation learning scheme achieves better generalization performance than the existing font representation learning techniques. Finally on the downstream font style transfer and generation tasks, we confirm the benefits of transfer learning with the proposed method. The source code is available at https://github.com/junhocho/paired-glyph-matching. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted to BMVC2022

arXiv:2210.14480 [pdf, other]

Meta-node: A Concise Approach to Effectively Learn Complex Relationships in Heterogeneous Graphs

Authors: Jiwoong Park, Jisu Jeong, Kyungmin Kim, ** Young Choi

Abstract: Existing message passing neural networks for heterogeneous graphs rely on the concepts of meta-paths or meta-graphs due to the intrinsic nature of heterogeneous graphs. However, the meta-paths and meta-graphs need to be pre-configured before learning and are highly dependent on expert knowledge to construct them. To tackle this challenge, we propose a novel concept of meta-node for message passing… ▽ More Existing message passing neural networks for heterogeneous graphs rely on the concepts of meta-paths or meta-graphs due to the intrinsic nature of heterogeneous graphs. However, the meta-paths and meta-graphs need to be pre-configured before learning and are highly dependent on expert knowledge to construct them. To tackle this challenge, we propose a novel concept of meta-node for message passing that can learn enriched relational knowledge from complex heterogeneous graphs without any meta-paths and meta-graphs by explicitly modeling the relations among the same type of nodes. Unlike meta-paths and meta-graphs, meta-nodes do not require any pre-processing steps that require expert knowledge. Going one step further, we propose a meta-node message passing scheme and apply our method to a contrastive learning model. In the experiments on node clustering and classification tasks, the proposed meta-node message passing method outperforms state-of-the-arts that depend on meta-paths. Our results demonstrate that effective heterogeneous graph learning is possible without the need for meta-paths that are frequently used in this field. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2207.11333 [pdf, other]

Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Authors: Jong Youl Choi, Pei Zhang, Kshitij Mehta, Andrew Blanchard, Massimiliano Lupo Pasini

Abstract: Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN surrogate for molecular design requires large-scale graph datasets and is usually a time-consuming process. Recent advances in GPUs and distributed computing op… ▽ More Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN surrogate for molecular design requires large-scale graph datasets and is usually a time-consuming process. Recent advances in GPUs and distributed computing open a path to reduce the computational cost for GCNN training effectively. However, efficient utilization of high performance computing (HPC) resources for training requires simultaneously optimizing large-scale data management and scalable stochastic batched optimization techniques. In this work, we focus on building GCNN models on HPC systems to predict material properties of millions of molecules. We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch. We use ADIOS, a high-performance data management framework for efficient storage and reading of large molecular graph data. We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap. We measure the scalability, accuracy, and convergence of our approach on two DOE supercomputers: the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) and the Perlmutter system at the National Energy Research Scientific Computing Center (NERSC). We present our experimental results with HydraGNN showing i) reduction of data loading time up to 4.2 times compared with a conventional method and ii) linear scaling performance for training up to 1,024 GPUs on both Summit and Perlmutter. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: 19 pages, 9 figures

MSC Class: 68Q85; 68M14; 68W15; 68W15 ACM Class: I.2.11

arXiv:2207.06101 [pdf, other]

Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning

Authors: Boeun Kim, Hyung ** Chang, Jungho Kim, ** Young Choi

Abstract: We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences. The existing transformer model utilized for unsupervised skeleton-based action learning is learned the instantaneous velocity of each joint from adjacent frames without global motion information. Thus, the model has difficulties in learning the attention globally over whole-body motions and tempo… ▽ More We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences. The existing transformer model utilized for unsupervised skeleton-based action learning is learned the instantaneous velocity of each joint from adjacent frames without global motion information. Thus, the model has difficulties in learning the attention globally over whole-body motions and temporally distant joints. In addition, person-to-person interactions have not been considered in the model. To tackle the learning of whole-body motion, long-range temporal dynamics, and person-to-person interactions, we design a global and local attention mechanism, where, global body motions and local joint motions pay attention to each other. In addition, we propose a novel pretraining strategy, multi-interval pose displacement prediction, to learn both global and local attention in diverse time ranges. The proposed model successfully learns local dynamics of the joints and captures global context from the motion sequences. Our model outperforms state-of-the-art models by notable margins in the representative benchmarks. Codes are available at https://github.com/Boeun-Kim/GL-Transformer. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2204.05499 [pdf, ps, other]

doi 10.1109/AVSS52988.2021.9663815

Position-aware Location Regression Network for Temporal Video Grounding

Authors: Sunoh Kim, Kimin Yun, ** Young Choi

Abstract: The key to successful grounding for video surveillance is to understand a semantic phrase corresponding to important actors and objects. Conventional methods ignore comprehensive contexts for the phrase or require heavy computation for multiple phrases. To understand comprehensive contexts with only one semantic phrase, we propose Position-aware Location Regression Network (PLRN) which exploits po… ▽ More The key to successful grounding for video surveillance is to understand a semantic phrase corresponding to important actors and objects. Conventional methods ignore comprehensive contexts for the phrase or require heavy computation for multiple phrases. To understand comprehensive contexts with only one semantic phrase, we propose Position-aware Location Regression Network (PLRN) which exploits position-aware features of a query and a video. Specifically, PLRN first encodes both the video and query using positional information of words and video segments. Then, a semantic phrase feature is extracted from an encoded query with attention. The semantic phrase feature and encoded video are merged and made into a context-aware feature by reflecting local and global contexts. Finally, PLRN predicts start, end, center, and width values of a grounding boundary. Our experiments show that PLRN achieves competitive performance over existing methods with less computation time and memory. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted in AVSS 2021

arXiv:2203.07837 [pdf, other]

Pose-MUM : Reinforcing Key Points Relationship for Semi-Supervised Human Pose Estimation

Authors: JongMok Kim, Hwijun Lee, Jaeseung Lim, Jongkeun Na, Nojun Kwak, ** Young Choi

Abstract: A well-designed strong-weak augmentation strategy and the stable teacher to generate reliable pseudo labels are essential in the teacher-student framework of semi-supervised learning (SSL). Considering these in mind, to suit the semi-supervised human pose estimation (SSHPE) task, we propose a novel approach referred to as Pose-MUM that modifies Mix/UnMix (MUM) augmentation. Like MUM in the dense p… ▽ More A well-designed strong-weak augmentation strategy and the stable teacher to generate reliable pseudo labels are essential in the teacher-student framework of semi-supervised learning (SSL). Considering these in mind, to suit the semi-supervised human pose estimation (SSHPE) task, we propose a novel approach referred to as Pose-MUM that modifies Mix/UnMix (MUM) augmentation. Like MUM in the dense prediction task, the proposed Pose-MUM makes strong-weak augmentation for pose estimation and leads the network to learn the relationship between each human key point much better than the conventional methods by adding the mixing process in intermediate layers in a stochastic manner. In addition, we employ the exponential-moving-average-normalization (EMAN) teacher, which is stable and well-suited to the SSL framework and furthermore boosts the performance. Extensive experiments on MS-COCO dataset show the superiority of our proposed method by consistently improving the performance over the previous methods following SSHPE benchmark. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2202.02981 [pdf, other]

Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

Authors: Jongmin Lee, Joo Young Choi, Ernest K. Ryu, Albert No

Abstract: The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a… ▽ More The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a specific initialization and establish a trainability guarantee using the NTK theory. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments. △ Less

Submitted 27 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Journal ref: Published in International Conference on Machine Learning, 2022

arXiv:2202.01954 [pdf, other]

doi 10.1088/2632-2153/ac6a51

Multi-task graph neural networks for simultaneous prediction of global and atomic properties in ferromagnetic systems

Authors: Massimiliano Lupo Pasini, Pei Zhang, Samuel Temple Reeve, Jong Youl Choi

Abstract: We introduce a multi-tasking graph convolutional neural network, HydraGNN, to simultaneously predict both global and atomic physical properties and demonstrate with ferromagnetic materials. We train HydraGNN on an open-source ab initio density functional theory (DFT) dataset for iron-platinum (FePt) with a fixed body centered tetragonal (BCT) lattice structure and fixed volume to simultaneously pr… ▽ More We introduce a multi-tasking graph convolutional neural network, HydraGNN, to simultaneously predict both global and atomic physical properties and demonstrate with ferromagnetic materials. We train HydraGNN on an open-source ab initio density functional theory (DFT) dataset for iron-platinum (FePt) with a fixed body centered tetragonal (BCT) lattice structure and fixed volume to simultaneously predict the mixing enthalpy (a global feature of the system), the atomic charge transfer, and the atomic magnetic moment across configurations that span the entire compositional range. By taking advantage of underlying physical correlations between material properties, multi-task learning (MTL) with HydraGNN provides effective training even with modest amounts of data. Moreover, this is achieved with just one architecture instead of three, as required by single-task learning (STL). The first convolutional layers of the HydraGNN architecture are shared by all learning tasks and extract features common to all material properties. The following layers discriminate the features of the different properties, the results of which are fed to the separate heads of the final layer to produce predictions. Numerical results show that HydraGNN effectively captures the relation between the configurational entropy and the material properties over the entire compositional range. Overall, the accuracy of simultaneous MTL predictions is comparable to the accuracy of the STL predictions. In addition, the computational cost of training HydraGNN for MTL is much lower than the original DFT calculations and also lower than training separate STL models for each property. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: 13 pages, 6 figures

Journal ref: Mach. Learn.: Sci. Technol. 3 025007 (2022)

arXiv:2112.00412 [pdf, other]

The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed Classification

Authors: Seulki Park, Youngkyu Hong, Byeongho Heo, Sangdoo Yun, ** Young Choi

Abstract: The problem of class imbalanced data is that the generalization performance of the classifier deteriorates due to the lack of data from minority classes. In this paper, we propose a novel minority over-sampling method to augment diversified minority samples by leveraging the rich context of the majority classes as background images. To diversify the minority samples, our key idea is to paste an im… ▽ More The problem of class imbalanced data is that the generalization performance of the classifier deteriorates due to the lack of data from minority classes. In this paper, we propose a novel minority over-sampling method to augment diversified minority samples by leveraging the rich context of the majority classes as background images. To diversify the minority samples, our key idea is to paste an image from a minority class onto rich-context images from a majority class, using them as background images. Our method is simple and can be easily combined with the existing long-tailed recognition methods. We empirically prove the effectiveness of the proposed oversampling method through extensive experiments and ablation studies. Without any architectural changes or complex algorithms, our method achieves state-of-the-art performance on various long-tailed classification benchmarks. Our code is made available at https://github.com/naver-ai/cmo. △ Less

Submitted 27 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted by CVPR 2022, 14 pages

arXiv:2110.02444 [pdf, other]

Influence-Balanced Loss for Imbalanced Visual Classification

Authors: Seulki Park, Jongin Lim, Younghan Jeon, ** Young Choi

Abstract: In this paper, we propose a balancing training method to address problems in imbalanced data learning. To this end, we derive a new loss used in the balancing training phase that alleviates the influence of samples that cause an overfitted decision boundary. The proposed loss efficiently improves the performance of any type of imbalance learning methods. In experiments on multiple benchmark data s… ▽ More In this paper, we propose a balancing training method to address problems in imbalanced data learning. To this end, we derive a new loss used in the balancing training phase that alleviates the influence of samples that cause an overfitted decision boundary. The proposed loss efficiently improves the performance of any type of imbalance learning methods. In experiments on multiple benchmark data sets, we demonstrate the validity of our method and reveal that the proposed loss outperforms the state-of-the-art cost-sensitive loss methods. Furthermore, since our loss is not restricted to a specific task, model, or training method, it can be easily used in combination with other recent re-sampling, meta-learning, and cost-sensitive learning methods for class-imbalance problems. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: Published in ICCV 2021

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 735-744

arXiv:2106.07217 [pdf, other]

Influential Rank: A New Perspective of Post-training for Robust Model against Noisy Labels

Authors: Seulki Park, Hwanjun Song, Daeho Um, Dae Ung Jo, Sangdoo Yun, ** Young Choi

Abstract: Deep neural network can easily overfit to even noisy labels due to its high capacity, which degrades the generalization performance of a model. To overcome this issue, we propose a new approach for learning from noisy labels (LNL) via post-training, which can significantly improve the generalization performance of any pre-trained model on noisy label data. To this end, we rather exploit the overfi… ▽ More Deep neural network can easily overfit to even noisy labels due to its high capacity, which degrades the generalization performance of a model. To overcome this issue, we propose a new approach for learning from noisy labels (LNL) via post-training, which can significantly improve the generalization performance of any pre-trained model on noisy label data. To this end, we rather exploit the overfitting property of a trained model to identify mislabeled samples. Specifically, our post-training approach gradually removes samples with high influence on the decision boundary and refines the decision boundary to improve generalization performance. Our post-training approach creates great synergies when combined with the existing LNL methods. Experimental results on various real-world and synthetic benchmark datasets demonstrate the validity of our approach in diverse realistic scenarios. △ Less

Submitted 19 April, 2023; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: 15 pages

arXiv:2105.12764 [pdf, other]

Scalable Multigrid-based Hierarchical Scientific Data Refactoring on GPUs

Authors: Jieyang Chen, Lipeng Wan, Xin Liang, Ben Whitney, Qing Liu, Qian Gong, David Pugmire, Nicholas Thompson, Jong Youl Choi, Matthew Wolf, Todd Munson, Ian Foster, Scott Klasky

Abstract: Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth makes it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigr… ▽ More Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth makes it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigrid-based hierarchical data representations hold promise as a solution to this problem, allowing for flexible conversion between different fidelities so that, for example, data can be created at high fidelity and then transferred or stored at lower fidelity via logically simple and mathematically sound operations. However, the effective use of such representations has been hindered until now by the relatively high costs of creating, accessing, reducing, and otherwise operating on such representations. We describe here highly optimized data refactoring kernels for GPU accelerators that enable efficient creation and manipulation of data in multigrid-based hierarchical forms. We demonstrate that our optimized design can achieve up to 264 TB/s aggregated data refactoring throughput -- 92% of theoretical peak -- on 1024 nodes of the Summit supercomputer. We showcase our optimized design by applying it to a large-scale scientific visualization workflow and the MGARD lossy compression software. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: arXiv admin note: text overlap with arXiv:2007.04457

arXiv:2103.16046 [pdf, other]

Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders

Authors: Jiwoong Park, Junho Cho, Hyung ** Chang, ** Young Choi

Abstract: Most of the existing literature regarding hyperbolic embedding concentrate upon supervised learning, whereas the use of unsupervised hyperbolic embedding is less well explored. In this paper, we analyze how unsupervised tasks can benefit from learned representations in hyperbolic space. To explore how well the hierarchical structure of unlabeled data can be represented in hyperbolic spaces, we des… ▽ More Most of the existing literature regarding hyperbolic embedding concentrate upon supervised learning, whereas the use of unsupervised hyperbolic embedding is less well explored. In this paper, we analyze how unsupervised tasks can benefit from learned representations in hyperbolic space. To explore how well the hierarchical structure of unlabeled data can be represented in hyperbolic spaces, we design a novel hyperbolic message passing auto-encoder whose overall auto-encoding is performed in hyperbolic space. The proposed model conducts auto-encoding the networks via fully utilizing hyperbolic geometry in message passing. Through extensive quantitative and qualitative analyses, we validate the properties and benefits of the unsupervised hyperbolic representations. Codes are available at https://github.com/junhocho/HGCAE. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2011.06388

An ensemble of Density based Geometric One-Class Classifier and Genetic Algorithm

Authors: Do Gyun Kim, ** Young Choi

Abstract: One of the most rising issues in recent machine learning research is One-Class Classification which considers data set composed of only one class and outliers. It is more reasonable than traditional Multi-Class Classification in dealing with some problematic data set or special cases. Generally, classification accuracy and interpretability for user are considered as trade-off in OCC methods. Class… ▽ More One of the most rising issues in recent machine learning research is One-Class Classification which considers data set composed of only one class and outliers. It is more reasonable than traditional Multi-Class Classification in dealing with some problematic data set or special cases. Generally, classification accuracy and interpretability for user are considered as trade-off in OCC methods. Classifier based on Hyper-Rectangle (H-RTGL) is a sort of classifier that can be a remedy for such trade-off and uses H-RTGL formulated by conjunction of geometric rules called interval. This interval can be basis of interpretability since it can be easily understood by user. However, existing H-RTGL based OCC classifiers have limitations that (i) most of them cannot reflect density of target class and (ii) that considering density has primitive interval generation method, and (iii) there exists no systematic procedure for hyperparameter of H-RTGL based OCC classifier, which influences classification performance of classifier. Based on these remarks, we suggest One-Class Hyper-Rectangle Descriptor based on density (1-HRD_d) with more elaborate interval generation method including parametric and nonparametric approaches. In addition, we designed Genetic Algorithm (GA) that consists of chromosome structure and genetic operators for systematic generation of 1-HRD_d by optimization of hyperparameter. Our work is validated through a numerical experiment using actual data set with comparison of existing OCC algorithms along with other H-RTGL based classifiers. △ Less

Submitted 25 February, 2021; v1 submitted 2 October, 2020; originally announced November 2020.

Comments: This manuscript contains a wrong definitions and equations

arXiv:2008.13742 [pdf, other]

Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool

Authors: Sungsoo Ha, Wonyong Jeong, Gyorgy Matyasfalvi, Cong Xie, Kevin Huck, Jong Youl Choi, Abid Malik, Li Tang, Hubertus Van Dam, Line Pouchard, Wei Xu, Shinjae Yoo, Nicholas D'Imperio, Kerstin Kleese Van Dam

Abstract: Because of the limits input/output systems currently impose on high-performance computing systems, a new generation of workflows that include online data reduction and analysis is emerging. Diagnosing their performance requires sophisticated performance analysis capabilities due to the complexity of execution patterns and underlying hardware, and no tool could handle the voluminous performance tra… ▽ More Because of the limits input/output systems currently impose on high-performance computing systems, a new generation of workflows that include online data reduction and analysis is emerging. Diagnosing their performance requires sophisticated performance analysis capabilities due to the complexity of execution patterns and underlying hardware, and no tool could handle the voluminous performance trace data needed to detect potential problems. This work introduces Chimbuko, a performance analysis framework that provides real-time, distributed, in situ anomaly detection. Data volumes are reduced for human-level processing without losing necessary details. Chimbuko supports online performance monitoring via a visualization module that presents the overall workflow anomaly distribution, call stacks, and timelines. Chimbuko also supports the capture and reduction of performance provenance. To the best of our knowledge, Chimbuko is the first online, distributed, and scalable workflow-level performance trace analysis framework, and we demonstrate the tool's usefulness on Oak Ridge National Laboratory's Summit system. △ Less

Submitted 31 August, 2020; originally announced August 2020.

arXiv:2006.10222 [pdf, other]

Class-Attentive Diffusion Network for Semi-Supervised Classification

Authors: Jongin Lim, Daeho Um, Hyung ** Chang, Dae Ung Jo, ** Young Choi

Abstract: Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class am… ▽ More Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class among K-hop neighbors. To this end, we first propose a novel stochastic process, called Class-Attentive Diffusion (CAD), that strengthens attention to intra-class nodes and attenuates attention to inter-class nodes. In contrast to the existing diffusion methods with a transition matrix determined solely by the graph structure, CAD considers both the node features and the graph structure with the design of our class-attentive transition matrix that utilizes a classifier. Then, we further propose an adaptive update scheme that leverages different reflection ratios of the diffusion result for each node depending on the local class-context. As the main advantage, AdaCAD alleviates the problem of undesired mixing of inter-class features caused by discrepancies between node labels and the graph topology. Built on AdaCAD, we construct a simple model called Class-Attentive Diffusion Network (CAD-Net). Extensive experiments on seven benchmark datasets consistently demonstrate the efficacy of the proposed method and our CAD-Net significantly outperforms the state-of-the-art methods. Code is available at https://github.com/l**0429/CAD-Net. △ Less

Submitted 29 December, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Accepted to AAAI 2021

arXiv:2002.06048 [pdf, other]

AutoLR: Layer-wise Pruning and Auto-tuning of Learning Rates in Fine-tuning of Deep Networks

Authors: Youngmin Ro, ** Young Choi

Abstract: Existing fine-tuning methods use a single learning rate over all layers. In this paper, first, we discuss that trends of layer-wise weight variations by fine-tuning using a single learning rate do not match the well-known notion that lower-level layers extract general features and higher-level layers extract specific features. Based on our discussion, we propose an algorithm that improves fine-tun… ▽ More Existing fine-tuning methods use a single learning rate over all layers. In this paper, first, we discuss that trends of layer-wise weight variations by fine-tuning using a single learning rate do not match the well-known notion that lower-level layers extract general features and higher-level layers extract specific features. Based on our discussion, we propose an algorithm that improves fine-tuning performance and reduces network complexity through layer-wise pruning and auto-tuning of layer-wise learning rates. The proposed algorithm has verified the effectiveness by achieving state-of-the-art performance on the image retrieval benchmark datasets (CUB-200, Cars-196, Stanford online product, and Inshop). Code is available at https://github.com/youngminPIL/AutoLR. △ Less

Submitted 3 January, 2021; v1 submitted 14 February, 2020; originally announced February 2020.

Comments: Accepted to AAAI 2021

arXiv:2002.02868 [pdf, ps, other]

Differentiable Forward and Backward Fixed-Point Iteration Layers

Authors: Younghan Jeon, Minsik Lee, ** Young Choi

Abstract: Recently, several studies proposed methods to utilize some classes of optimization problems in designing deep neural networks to encode constraints that conventional layers cannot capture. However, these methods are still in their infancy and require special treatments, such as analyzing the KKT condition, for deriving the backpropagation formula. In this paper, we propose a new layer formulation… ▽ More Recently, several studies proposed methods to utilize some classes of optimization problems in designing deep neural networks to encode constraints that conventional layers cannot capture. However, these methods are still in their infancy and require special treatments, such as analyzing the KKT condition, for deriving the backpropagation formula. In this paper, we propose a new layer formulation called the fixed-point iteration (FPI) layer that facilitates the use of more complicated operations in deep networks. The backward FPI layer is also proposed for backpropagation, which is motivated by the recurrent back-propagation (RBP) algorithm. But in contrast to RBP, the backward FPI layer yields the gradient by a small network module without an explicit calculation of the Jacobian. In actual applications, both the forward and backward FPI layers can be treated as nodes in the computational graphs. All components in the proposed method are implemented at a high level of abstraction, which allows efficient higher-order differentiations on the nodes. In addition, we present two practical methods of the FPI layer, FPI_NN and FPI_GD, where the update operations of FPI are a small neural network module and a single gradient descent step based on a learnable cost function, respectively. FPI\_NN is intuitive, simple, and fast to train, while FPI_GD can be used for efficient training of energy networks that have been recently studied. While RBP and its related studies have not been applied to practical examples, our experiments show the FPI layer can be successfully applied to real-world problems such as image denoising, optical flow, and multi-label classification. △ Less

Submitted 14 June, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

arXiv:1908.04015 [pdf, other]

Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold

Authors: YoungJoon Yoo, Sangdoo Yun, Hyung ** Chang, Yiannis Demiris, ** Young Choi

Abstract: This paper proposes a new high dimensional regression method by merging Gaussian process regression into a variational autoencoder framework. In contrast to other regression methods, the proposed method focuses on the case where output responses are on a complex high dimensional manifold, such as images. Our contributions are summarized as follows: (i) A new regression method estimating high dimen… ▽ More This paper proposes a new high dimensional regression method by merging Gaussian process regression into a variational autoencoder framework. In contrast to other regression methods, the proposed method focuses on the case where output responses are on a complex high dimensional manifold, such as images. Our contributions are summarized as follows: (i) A new regression method estimating high dimensional image responses, which is not handled by existing regression algorithms, is proposed. (ii) The proposed regression method introduces a strategy to learn the latent space as well as the encoder and decoder so that the result of the regressed response in the latent space coincide with the corresponding response in the data space. (iii) The proposed regression is embedded into a generative model, and the whole procedure is developed by the variational autoencoder framework. We demonstrate the robustness and effectiveness of our method through a number of experiments on various visual data regression problems. △ Less

Submitted 12 August, 2019; originally announced August 2019.

Comments: Published in CVPR 2017

arXiv:1908.02441 [pdf, other]

Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning

Authors: Jiwoong Park, Minsik Lee, Hyung ** Chang, Kyuewang Lee, ** Young Choi

Abstract: We propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a graph. In contrast to the existing graph autoencoders with asymmetric decoder parts, the proposed autoencoder has a newly designed decoder which builds a completely symmetric autoencoder form. For the reconstruction of node features, the decoder is designed based on Laplacian sharpe… ▽ More We propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a graph. In contrast to the existing graph autoencoders with asymmetric decoder parts, the proposed autoencoder has a newly designed decoder which builds a completely symmetric autoencoder form. For the reconstruction of node features, the decoder is designed based on Laplacian sharpening as the counterpart of Laplacian smoothing of the encoder, which allows utilizing the graph structure in the whole processes of the proposed autoencoder architecture. In order to prevent the numerical instability of the network caused by the Laplacian sharpening introduction, we further propose a new numerically stable form of the Laplacian sharpening by incorporating the signed graphs. In addition, a new cost function which finds a latent representation and a latent affinity matrix simultaneously is devised to boost the performance of image clustering tasks. The experimental results on clustering, link prediction and visualization tasks strongly support that the proposed model is stable and outperforms various state-of-the-art algorithms. △ Less

Submitted 7 August, 2019; originally announced August 2019.

Comments: 10 pages, 3 figures, ICCV 2019 accepted

arXiv:1905.12867 [pdf, other]

Cross-modal Variational Auto-encoder with Distributed Latent Spaces and Associators

Authors: Dae Ung Jo, ByeongJu Lee, Jongwon Choi, Haanju Yoo, ** Young Choi

Abstract: In this paper, we propose a novel structure for a cross-modal data association, which is inspired by the recent research on the associative learning structure of the brain. We formulate the cross-modal association in Bayesian inference framework realized by a deep neural network with multiple variational auto-encoders and variational associators. The variational associators transfer the latent spa… ▽ More In this paper, we propose a novel structure for a cross-modal data association, which is inspired by the recent research on the associative learning structure of the brain. We formulate the cross-modal association in Bayesian inference framework realized by a deep neural network with multiple variational auto-encoders and variational associators. The variational associators transfer the latent spaces between auto-encoders that represent different modalities. The proposed structure successfully associates even heterogeneous modal data and easily incorporates the additional modality to the entire network via the proposed cross-modal associator. Furthermore, the proposed structure can be trained with only a small amount of paired data since auto-encoders can be trained by unsupervised manner. Through experiments, the effectiveness of the proposed structure is validated on various datasets including visual and auditory data. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: 10 pages, 6 figures

arXiv:1905.10079 [pdf, other]

Neuro-Optimization: Learning Objective Functions Using Neural Networks

Authors: Younghan Jeon, Minsik Lee, ** Young Choi

Abstract: Mathematical optimization is widely used in various research fields. With a carefully-designed objective function, mathematical optimization can be quite helpful in solving many problems. However, objective functions are usually hand-crafted and designing a good one can be quite challenging. In this paper, we propose a novel framework to learn the objective function based on a neural net-work. The… ▽ More Mathematical optimization is widely used in various research fields. With a carefully-designed objective function, mathematical optimization can be quite helpful in solving many problems. However, objective functions are usually hand-crafted and designing a good one can be quite challenging. In this paper, we propose a novel framework to learn the objective function based on a neural net-work. The basic idea is to consider the neural network as an objective function, and the input as an optimization variable. For the learning of objective function from the training data, two processes are conducted: In the inner process, the optimization variable (the input of the network) are optimized to minimize the objective function (the network output), while fixing the network weights. In the outer process, on the other hand, the weights are optimized based on how close the final solution of the inner process is to the desired solution. After learning the objective function, the solution for the test set is obtained in the same manner of the inner process. The potential and applicability of our approach are demonstrated by the experiments on toy examples and a computer vision task, optical flow. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: 10 pages,6 figures

arXiv:1904.08833 [pdf, other]

A Passivity-based Nonlinear Admittance Control with Application to Powered Upper-limb Control under Unknown Environmental Interactions

Authors: Min Jun Kim, Woongyong Lee, Jae Yeon Choi, Goobong Chung, Kyung-Lyong Han, Il Seop Choi, Christian Ott, Wan Kyun Chung

Abstract: This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although… ▽ More This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although the environmental interaction cannot be detected by any sensors (hence unknown), passivity allows us to have natural interaction. An analysis shows that the behavior of the actual system mimics that of a nominal model as the control gain goes to infinity, which implies that the proposed approach is an admittance controller. However, because the control gain cannot grow infinitely in practice, the performance limitation according to the achievable control gain is also analyzed. The result of this analysis indicates that the performance in the sense of infinite norm increases linearly with the control gain. In the experiments, the proposed properties were verified using 1 degree-of-freedom testbench, and an actual powered upper-limb exoskeleton was used to lift and maneuver the unknown payload. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: Accepted in IEEE/ASME Transactions on Mechatronics (T-MECH)

arXiv:1904.01866 [pdf, other]

A Comprehensive Overhaul of Feature Distillation

Authors: Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyo** Park, Nojun Kwak, ** Young Choi

Abstract: We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. Our proposed distillation loss includes a feature transform with a newly designed… ▽ More We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. Our proposed distillation loss includes a feature transform with a newly designed margin ReLU, a new distillation feature position, and a partial L2 distance function to skip redundant information giving adverse effects to the compression of student. In ImageNet, our proposed method achieves 21.65% of top-1 error with ResNet50, which outperforms the performance of the teacher network, ResNet152. Our proposed method is evaluated on various tasks such as image classification, object detection and semantic segmentation and achieves a significant performance improvement in all tasks. The code is available at https://sites.google.com/view/byeongho-heo/overhaul △ Less

Submitted 9 August, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: Accepted at ICCV 2019

arXiv:1901.06882 [pdf, other]

doi 10.1109/WACV.2019.00014

Skeleton-based Action Recognition of People Handling Objects

Authors: Sunoh Kim, Kimin Yun, Jongyoul Park, ** Young Choi

Abstract: In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectiv… ▽ More In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition. △ Less

Submitted 21 January, 2019; originally announced January 2019.

Comments: Accepted in WACV 2019

arXiv:1901.06140 [pdf, other]

Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification

Authors: Youngmin Ro, Jongwon Choi, Dae Ung Jo, Byeongho Heo, Jongin Lim, ** Young Choi

Abstract: In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-… ▽ More In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture. △ Less

Submitted 18 January, 2019; originally announced January 2019.

Comments: Accepted to AAAI 2019

arXiv:1811.03233 [pdf, other]

Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons

Authors: Byeongho Heo, Minsik Lee, Sangdoo Yun, ** Young Choi

Abstract: An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classification friendly partitions of the hidden feature space. However, as far as we know, this aspect of… ▽ More An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classification friendly partitions of the hidden feature space. However, as far as we know, this aspect of neural networks has not been considered in the literature of knowledge transfer. In this paper, we propose a knowledge transfer method via distillation of activation boundaries formed by hidden neurons. For the distillation, we propose an activation transfer loss that has the minimum value when the boundaries generated by the student coincide with those by the teacher. Since the activation transfer loss is not differentiable, we design a piecewise differentiable loss approximating the activation transfer loss. By the proposed method, the student learns a separating boundary between activation region and deactivation region formed by each neuron in the teacher. Through the experiments in various aspects of knowledge transfer, it is verified that the proposed method outperforms the current state-of-the-art. △ Less

Submitted 14 December, 2018; v1 submitted 7 November, 2018; originally announced November 2018.

Comments: Accepted to AAAI 2019

arXiv:1805.05532 [pdf, other]

Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

Authors: Byeongho Heo, Minsik Lee, Sangdoo Yun, ** Young Choi

Abstract: Many recent works on knowledge distillation have provided ways to transfer the knowledge of a trained network for improving the learning process of a new one, but finding a good technique for knowledge distillation is still an open problem. In this paper, we provide a new perspective based on a decision boundary, which is one of the most important component of a classifier. The generalization perf… ▽ More Many recent works on knowledge distillation have provided ways to transfer the knowledge of a trained network for improving the learning process of a new one, but finding a good technique for knowledge distillation is still an open problem. In this paper, we provide a new perspective based on a decision boundary, which is one of the most important component of a classifier. The generalization performance of a classifier is closely related to the adequacy of its decision boundary, so a good classifier bears a good decision boundary. Therefore, transferring information closely related to the decision boundary can be a good attempt for knowledge distillation. To realize this goal, we utilize an adversarial attack to discover samples supporting a decision boundary. Based on this idea, to transfer more accurate information about the decision boundary, the proposed algorithm trains a student classifier based on the adversarial samples supporting the decision boundary. Experiments show that the proposed method indeed improves knowledge distillation and achieves the state-of-the-arts performance. △ Less

Submitted 14 December, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

Comments: Accepted to AAAI 2019

arXiv:1803.10537 [pdf, other]

doi 10.1109/CVPR.2018.00057

Context-aware Deep Feature Compression for High-speed Visual Tracking

Authors: Jongwon Choi, Hyung ** Chang, Tobias Fischer, Sangdoo Yun, Kyuewang Lee, Jiyeoup Jeong, Yiannis Demiris, ** Young Choi

Abstract: We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the… ▽ More We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the coarse category of the tracking target according to appearance patterns. In the pre-training phase, one expert auto-encoder is trained per category. In the tracking phase, the best expert auto-encoder is selected for a given target, and only this auto-encoder is used. To achieve high tracking performance with the compressed feature map, we introduce extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert auto-encoders. We validate the proposed context-aware framework through a number of experiments, where our method achieves a comparable performance to state-of-the-art trackers which cannot run in real-time, while running at a significantly fast speed of over 100 fps. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: 9 pages, 6 figures, Accepted in CVPR2018 (IEEE conference on Computer Vision and Pattern Recognition)

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR2018)

arXiv:1706.00522 [pdf, other]

doi 10.1007/978-3-319-67630-2_2

On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

Authors: Axel Huebl, Rene Widera, Felix Schmitt, Alexander Matthes, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Michael Bussmann

Abstract: We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threa… ▽ More We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency. △ Less

Submitted 1 June, 2017; originally announced June 2017.

Comments: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'17

ACM Class: D.4.8; B.4.3; I.6.6

Journal ref: J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, LNCS 10524, pp. 15-29, 2017

arXiv:1505.03532 [pdf, other]

Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma

Authors: Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Y. Choi, Andreas Stathopoulos, Cs Chang, Scott Klasky

Abstract: A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: loca… ▽ More A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grou** feature cells into extended feature, and tracking movement of feature through overlap** in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC. △ Less

Submitted 2 July, 2016; v1 submitted 13 May, 2015; originally announced May 2015.

Comments: 14 pages, 40 figures

Showing 1–45 of 45 results for author: Choi, J Y