Search | arXiv e-print repository

A Framework for Quantum Finite-State Languages with Density Map**

Authors: SeungYeop Baik, Sicheol Sung, Yo-Sub Han

Abstract: A quantum finite-state automaton (QFA) is a theoretical model designed to simulate the evolution of a quantum system with finite memory in response to sequential input strings. We define the language of a QFA as the set of strings that lead the QFA to an accepting state when processed from its initial state. QFAs exemplify how quantum computing can achieve greater efficiency compared to classical… ▽ More A quantum finite-state automaton (QFA) is a theoretical model designed to simulate the evolution of a quantum system with finite memory in response to sequential input strings. We define the language of a QFA as the set of strings that lead the QFA to an accepting state when processed from its initial state. QFAs exemplify how quantum computing can achieve greater efficiency compared to classical computing. While being one of the simplest quantum models, QFAs are still notably challenging to construct from scratch due to the preliminary knowledge of quantum mechanics required for superimposing unitary constraints on the automata. Furthermore, even when QFAs are correctly assembled, the limitations of a current quantum computer may cause fluctuations in the simulation results depending on how an assembled QFA is translated into a quantum circuit. We present a framework that provides a simple and intuitive way to build QFAs and maximize the simulation accuracy. Our framework relies on two methods: First, it offers a predefined construction for foundational types of QFAs that recognize special languages MOD and EQU. They play a role of basic building blocks for more complex QFAs. In other words, one can obtain more complex QFAs from these foundational automata using standard language operations. Second, we improve the simulation accuracy by converting these QFAs into quantum circuits such that the resulting circuits perform well on noisy quantum computers. Our framework is available at https://github.com/sybaik1/qfa-toolkit. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 14 pages, 5 figures

arXiv:2402.03399 [pdf, other]

Rethinking RGB Color Representation for Image Restoration Models

Authors: Jaerin Lee, JoonKyu Park, Sungyong Baik, Kyoung Mu Lee

Abstract: Image restoration models are typically trained with a pixel-wise distance loss defined over the RGB color representation space, which is well known to be a source of blurry and unrealistic textures in the restored images. The reason, we believe, is that the three-channel RGB space is insufficient for supervising the restoration models. To this end, we augment the representation to hold structural… ▽ More Image restoration models are typically trained with a pixel-wise distance loss defined over the RGB color representation space, which is well known to be a source of blurry and unrealistic textures in the restored images. The reason, we believe, is that the three-channel RGB space is insufficient for supervising the restoration models. To this end, we augment the representation to hold structural information of local neighborhoods at each pixel while kee** the color information and pixel-grainedness unharmed. The result is a new representation space, dubbed augmented RGB ($a$RGB) space. Substituting the underlying representation space for the per-pixel losses facilitates the training of image restoration models, thereby improving the performance without affecting the evaluation phase. Notably, when combined with auxiliary objectives such as adversarial or perceptual losses, our $a$RGB space consistently improves overall metrics by reconstructing both color and local structures, overcoming the conventional perception-distortion trade-off. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 31 pages (11 pages main manuscript + 20 pages appendices), 22 figures

arXiv:2401.08719 [pdf, other]

CodeComplex: A Time-Complexity Dataset for Bilingual Source Codes

Authors: Seung-Yeop Baik, Mingi Jeon, Joonghyuk Hahn, Jungin Kim, Yo-Sub Han, Sang-Ki Ko

Abstract: Analyzing the worst-case time complexity of a code is a crucial task in computer science and software engineering for ensuring the efficiency, reliability, and robustness of software systems. However, it is well-known that the problem of determining the worst-case time complexity of a given code written in general-purpose programming language is theoretically undecidable by the famous Halting prob… ▽ More Analyzing the worst-case time complexity of a code is a crucial task in computer science and software engineering for ensuring the efficiency, reliability, and robustness of software systems. However, it is well-known that the problem of determining the worst-case time complexity of a given code written in general-purpose programming language is theoretically undecidable by the famous Halting problem proven by Alan Turing. Thus, we move towards more realistic scenarios where the inputs and outputs of a program exist. This allows us to discern the correctness of given codes, challenging to analyze their time complexity exhaustively. In response to this challenge, we introduce CodeComplex, a novel source code dataset where each code is manually annotated with a corresponding worst-case time complexity. CodeComplex comprises 4,900 Java codes and an equivalent number of Python codes, all sourced from programming competitions and annotated with complexity labels by a panel of algorithmic experts. To the best of our knowledge, CodeComplex stands as the most extensive code dataset tailored for predicting complexity. Subsequently, we present the outcomes of our experiments employing various baseline models, leveraging state-of-the-art neural models in code comprehension like CodeBERT, GraphCodeBERT, UniXcoder, PLBART, CodeT5, CodeT5+, and ChatGPT. We analyze how the dataset impacts the model's learning in predicting time complexity. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2306.09020 [pdf, other]

Distributionally Robust Stratified Sampling for Stochastic Simulations with Multiple Uncertain Input Models

Authors: Seung Min Baik, Eunshin Byon, Young Myoung Ko

Abstract: This paper presents a robust version of the stratified sampling method when multiple uncertain input models are considered for stochastic simulation. Various variance reduction techniques have demonstrated their superior performance in accelerating simulation processes. Nevertheless, they often use a single input model and further assume that the input model is exactly known and fixed. We consider… ▽ More This paper presents a robust version of the stratified sampling method when multiple uncertain input models are considered for stochastic simulation. Various variance reduction techniques have demonstrated their superior performance in accelerating simulation processes. Nevertheless, they often use a single input model and further assume that the input model is exactly known and fixed. We consider more general cases in which it is necessary to assess a simulation's response to a variety of input models, such as when evaluating the reliability of wind turbines under nonstationary wind conditions or the operation of a service system when the distribution of customer inter-arrival time is heterogeneous at different times. Moreover, the estimation variance may be considerably impacted by uncertainty in input models. To address such nonstationary and uncertain input models, we offer a distributionally robust (DR) stratified sampling approach with the goal of minimizing the maximum of worst-case estimator variances among plausible but uncertain input models. Specifically, we devise a bi-level optimization framework for formulating DR stochastic problems with different ambiguity set designs, based on the $L_2$-norm, 1-Wasserstein distance, parametric family of distributions, and distribution moments. In order to cope with the non-convexity of objective function, we present a solution approach that uses Bayesian optimization. Numerical experiments and the wind turbine case study demonstrate the robustness of the proposed approach. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2212.08328 [pdf, other]

MEIL-NeRF: Memory-Efficient Incremental Learning of Neural Radiance Fields

Authors: Jaeyoung Chung, Kanggeon Lee, Sungyong Baik, Kyoung Mu Lee

Abstract: Hinged on the representation power of neural networks, neural radiance fields (NeRF) have recently emerged as one of the promising and widely applicable methods for 3D object and scene representation. However, NeRF faces challenges in practical applications, such as large-scale scenes and edge devices with a limited amount of memory, where data needs to be processed sequentially. Under such increm… ▽ More Hinged on the representation power of neural networks, neural radiance fields (NeRF) have recently emerged as one of the promising and widely applicable methods for 3D object and scene representation. However, NeRF faces challenges in practical applications, such as large-scale scenes and edge devices with a limited amount of memory, where data needs to be processed sequentially. Under such incremental learning scenarios, neural networks are known to suffer catastrophic forgetting: easily forgetting previously seen data after training with new data. We observe that previous incremental learning algorithms are limited by either low performance or memory scalability issues. As such, we develop a Memory-Efficient Incremental Learning algorithm for NeRF (MEIL-NeRF). MEIL-NeRF takes inspiration from NeRF itself in that a neural network can serve as a memory that provides the pixel RGB values, given rays as queries. Upon the motivation, our framework learns which rays to query NeRF to extract previous pixel values. The extracted pixel values are then used to train NeRF in a self-distillation manner to prevent catastrophic forgetting. As a result, MEIL-NeRF demonstrates constant memory consumption and competitive performance. △ Less

Submitted 31 December, 2022; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: 18 pages. For the project page, see https://robot0321.github.io/meil-nerf/index.html

arXiv:2207.10354 [pdf, other]

Learning from Data with Noisy Labels Using Temporal Self-Ensemble

Authors: Jun Ho Lee, Jae Soon Baik, Tae Hwan Hwang, Jun Won Choi

Abstract: There are inevitably many mislabeled data in real-world datasets. Because deep neural networks (DNNs) have an enormous capacity to memorize noisy labels, a robust training scheme is required to prevent labeling errors from degrading the generalization performance of DNNs. Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small los… ▽ More There are inevitably many mislabeled data in real-world datasets. Because deep neural networks (DNNs) have an enormous capacity to memorize noisy labels, a robust training scheme is required to prevent labeling errors from degrading the generalization performance of DNNs. Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses. In practice, however, training two networks simultaneously can burden computing resources. In this study, we propose a simple yet effective robust training scheme that operates by training only a single network. During training, the proposed method generates temporal self-ensemble by sampling intermediate network parameters from the weight trajectory formed by stochastic gradient descent optimization. The loss sum evaluated with these self-ensembles is used to identify incorrectly labeled samples. In parallel, our method generates multi-view predictions by transforming an input data into various forms and considers their agreement to identify incorrectly labeled samples. By combining the aforementioned metrics, we present the proposed {\it self-ensemble-based robust training} (SRT) method, which can filter the samples with noisy labels to reduce their influence on training. Experiments on widely-used public datasets demonstrate that the proposed method achieves a state-of-the-art performance in some categories without training the dual networks. △ Less

Submitted 21 July, 2022; originally announced July 2022.

arXiv:2207.10345 [pdf, other]

CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution

Authors: Cheeun Hong, Sungyong Baik, Heewon Kim, Seungjun Nah, Kyoung Mu Lee

Abstract: Despite breakthrough advances in image super-resolution (SR) with convolutional neural networks (CNNs), SR has yet to enjoy ubiquitous applications due to the high computational complexity of SR networks. Quantization is one of the promising approaches to solve this problem. However, existing methods fail to quantize SR models with a bit-width lower than 8 bits, suffering from severe accuracy loss… ▽ More Despite breakthrough advances in image super-resolution (SR) with convolutional neural networks (CNNs), SR has yet to enjoy ubiquitous applications due to the high computational complexity of SR networks. Quantization is one of the promising approaches to solve this problem. However, existing methods fail to quantize SR models with a bit-width lower than 8 bits, suffering from severe accuracy loss due to fixed bit-width quantization applied everywhere. In this work, to achieve high average bit-reduction with less accuracy loss, we propose a novel Content-Aware Dynamic Quantization (CADyQ) method for SR networks that allocates optimal bits to local regions and layers adaptively based on the local contents of an input image. To this end, a trainable bit selector module is introduced to determine the proper bit-width and quantization level for each layer and a given local image patch. This module is governed by the quantization sensitivity that is estimated by using both the average magnitude of image gradient of the patch and the standard deviation of the input feature of the layer. The proposed quantization pipeline has been tested on various SR networks and evaluated on several standard benchmarks extensively. Significant reduction in computational complexity and the elevated restoration accuracy clearly demonstrate the effectiveness of the proposed CADyQ framework for SR. Codes are available at https://github.com/Cheeun/CADyQ. △ Less

Submitted 30 October, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: ECCV 2022

arXiv:2207.02182 [pdf, other]

ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning

Authors: Jae Soon Baik, In Young Yoon, Jun Won Choi

Abstract: Modern deep learning has achieved great success in various fields. However, it requires the labeling of huge amounts of data, which is expensive and labor-intensive. Active learning (AL), which identifies the most informative samples to be labeled, is becoming increasingly important to maximize the efficiency of the training process. The existing AL methods mostly use only a single final fixed mod… ▽ More Modern deep learning has achieved great success in various fields. However, it requires the labeling of huge amounts of data, which is expensive and labor-intensive. Active learning (AL), which identifies the most informative samples to be labeled, is becoming increasingly important to maximize the efficiency of the training process. The existing AL methods mostly use only a single final fixed model for acquiring the samples to be labeled. This strategy may not be good enough in that the structural uncertainty of a model for given training data is not considered to acquire the samples. In this study, we propose a novel acquisition criterion based on temporal self-ensemble generated by conventional stochastic gradient descent (SGD) optimization. These self-ensemble models are obtained by capturing the intermediate network weights obtained through SGD iterations. Our acquisition function relies on a consistency measure between the student and teacher models. The student models are given a fixed number of temporal self-ensemble models, and the teacher model is constructed by averaging the weights of the student models. Using the proposed acquisition criterion, we present an AL algorithm, namely student-teacher consistency-based AL (ST-CoNAL). Experiments conducted for image classification tasks on CIFAR-10, CIFAR-100, Caltech-256, and Tiny ImageNet datasets demonstrate that the proposed ST-CoNAL achieves significantly better performance than the existing acquisition methods. Furthermore, extensive experiments show the robustness and effectiveness of our methods. △ Less

Submitted 16 October, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2207.02173 [pdf, other]

doi 10.1016/j.patcog.2023.110107

DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition

Authors: Jae Soon Baik, In Young Yoon, Jun Won Choi

Abstract: There is growing interest in the challenging visual perception task of learning from long-tailed class distributions. The extreme class imbalance in the training dataset biases the model to prefer recognizing majority class data over minority class data. Furthermore, the lack of diversity in minority class samples makes it difficult to find a good representation. In this paper, we propose an effec… ▽ More There is growing interest in the challenging visual perception task of learning from long-tailed class distributions. The extreme class imbalance in the training dataset biases the model to prefer recognizing majority class data over minority class data. Furthermore, the lack of diversity in minority class samples makes it difficult to find a good representation. In this paper, we propose an effective data augmentation method, referred to as bilateral mixup augmentation, which can improve the performance of long-tailed visual recognition. The bilateral mixup augmentation combines two samples generated by a uniform sampler and a re-balanced sampler and augments the training dataset to enhance the representation learning for minority classes. We also reduce the classifier bias using class-wise temperature scaling, which scales the logits differently per class in the training phase. We apply both ideas to the dual-branch network (DBN) framework, presenting a new model, named dual-branch network with bilateral mixup (DBN-Mix). Experiments on popular long-tailed visual recognition datasets show that DBN-Mix improves performance significantly over baseline and that the proposed method achieves state-of-the-art performance in some categories of benchmarks. △ Less

Submitted 20 August, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2204.06788 [pdf, other]

Pyramidal Attention for Saliency Detection

Authors: Tanveer Hussain, Abbas Anwar, Saeed Anwar, Lars Petersson, Sung Wook Baik

Abstract: Salient object detection (SOD) extracts meaningful contents from an input image. RGB-based SOD methods lack the complementary depth clues; hence, providing limited performance for complex scenarios. Similarly, RGB-D models process RGB and depth inputs, but the depth data availability during testing may hinder the model's practical applicability. This paper exploits only RGB images, estimates depth… ▽ More Salient object detection (SOD) extracts meaningful contents from an input image. RGB-based SOD methods lack the complementary depth clues; hence, providing limited performance for complex scenarios. Similarly, RGB-D models process RGB and depth inputs, but the depth data availability during testing may hinder the model's practical applicability. This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features. We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations and further enhance the subsequent ones. At each stage, the backbone transformer model produces global receptive fields and computing in parallel to attain fine-grained global predictions refined by our residual convolutional attention decoder for optimal saliency prediction. We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets, respectively. Consequently, we present a new SOD perspective of generating RGB-D SOD without acquiring depth data during training and testing and assist RGB methods with depth clues for improved performance. The code and trained models are available at https://github.com/tanveer-hussain/EfficientSOD2 △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted at CVPRW 2022. (2022 IEEE CVPR Workshop on Fair, Data Efficient and Trusted Computer Vision)

arXiv:2112.01155 [pdf, other]

Batch Normalization Tells You Which Filter is Important

Authors: Junghun Oh, Heewon Kim, Sungyong Baik, Cheeun Hong, Kyoung Mu Lee

Abstract: The goal of filter pruning is to search for unimportant filters to remove in order to make convolutional neural networks (CNNs) efficient without sacrificing the performance in the process. The challenge lies in finding information that can help determine how important or relevant each filter is with respect to the final output of neural networks. In this work, we share our observation that the ba… ▽ More The goal of filter pruning is to search for unimportant filters to remove in order to make convolutional neural networks (CNNs) efficient without sacrificing the performance in the process. The challenge lies in finding information that can help determine how important or relevant each filter is with respect to the final output of neural networks. In this work, we share our observation that the batch normalization (BN) parameters of pre-trained CNNs can be used to estimate the feature distribution of activation outputs, without processing of training data. Upon observation, we propose a simple yet effective filter pruning method by evaluating the importance of each filter based on the BN parameters of pre-trained CNNs. The experimental results on CIFAR-10 and ImageNet demonstrate that the proposed method can achieve outstanding performance with and without fine-tuning in terms of the trade-off between the accuracy drop and the reduction in computational complexity and number of parameters of pruned networks. △ Less

Submitted 23 April, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

arXiv:2110.03909 [pdf, other]

Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning

Authors: Sungyong Baik, Janghoon Choi, Heewon Kim, Dohee Cho, Jaesik Min, Kyoung Mu Lee

Abstract: In few-shot learning scenarios, the challenge is to generalize and perform well on new unseen examples when only very few labeled examples are available for each task. Model-agnostic meta-learning (MAML) has gained the popularity as one of the representative few-shot learning methods for its flexibility and applicability to diverse problems. However, MAML and its variants often resort to a simple… ▽ More In few-shot learning scenarios, the challenge is to generalize and perform well on new unseen examples when only very few labeled examples are available for each task. Model-agnostic meta-learning (MAML) has gained the popularity as one of the representative few-shot learning methods for its flexibility and applicability to diverse problems. However, MAML and its variants often resort to a simple loss function without any auxiliary loss function or regularization terms that can help achieve better generalization. The problem lies in that each application and task may require different auxiliary loss function, especially when tasks are diverse and distinct. Instead of attempting to hand-design an auxiliary loss function for each application and task, we introduce a new meta-learning framework with a loss function that adapts to each task. Our proposed framework, named Meta-Learning with Task-Adaptive Loss Function (MeTAL), demonstrates the effectiveness and the flexibility across various domains, such as few-shot classification and few-shot regression. △ Less

Submitted 17 October, 2021; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: ICCV 2021 (Oral). Code at https://github.com/baiksung/MeTAL

arXiv:2102.06407 [pdf, other]

Densely Deformable Efficient Salient Object Detection Network

Authors: Tanveer Hussain, Saeed Anwar, Amin Ullah, Khan Muhammad, Sung Wook Baik

Abstract: Salient Object Detection (SOD) domain using RGB-D data has lately emerged with some current models' adequately precise results. However, they have restrained generalization abilities and intensive computational complexity. In this paper, inspired by the best background/foreground separation abilities of deformable convolutions, we employ them in our Densely Deformable Network (DDNet) to achieve ef… ▽ More Salient Object Detection (SOD) domain using RGB-D data has lately emerged with some current models' adequately precise results. However, they have restrained generalization abilities and intensive computational complexity. In this paper, inspired by the best background/foreground separation abilities of deformable convolutions, we employ them in our Densely Deformable Network (DDNet) to achieve efficient SOD. The salient regions from densely deformable convolutions are further refined using transposed convolutions to optimally generate the saliency maps. Quantitative and qualitative evaluations using the recent SOD dataset against 22 competing techniques show our method's efficiency and effectiveness. We also offer evaluation using our own created cross-dataset, surveillance-SOD (S-SOD), to check the trained models' validity in terms of their applicability in diverse scenarios. The results indicate that the current models have limited generalization potentials, demanding further research in this direction. Our code and new dataset will be publicly available at https://github.com/tanveer-hussain/EfficientSOD △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2101.06147 [pdf]

Estimation of the Frequency of Occurrence of Italian Phonemes in Text

Authors: Javi Arango, Alec DeCaprio, Sunwoo Baik, Luca De Nardis, Stefanie Shattuck-Hufnagel, Maria Gabriella Di Benedetto

Abstract: The purpose of this project was to derive a reliable estimate of the frequency of occurrence of the 30 phonemes - plus consonant geminated counterparts - of the Italian language, based on four selected written texts. Since no comparable dataset was found in previous literature, the present analysis may serve as a reference in future studies. Four textual sources were considered: Come si fa una tes… ▽ More The purpose of this project was to derive a reliable estimate of the frequency of occurrence of the 30 phonemes - plus consonant geminated counterparts - of the Italian language, based on four selected written texts. Since no comparable dataset was found in previous literature, the present analysis may serve as a reference in future studies. Four textual sources were considered: Come si fa una tesi di laurea: le materie umanistiche by Umberto Eco, I promessi sposi by Alessandro Manzoni, a recent article in Corriere della Sera (a popular daily Italian newspaper), and In altre parole by Jhumpa Lahiri. The sources were chosen to represent varied genres, subject matter, time periods, and writing styles. Results of the analysis, which also included an analysis of variance, showed that, for all four sources, the frequencies of occurrence reached relatively stable values after about 6,000 phonemes (approx. 1,250 words), varying by <0.025%. Estimated frequencies are provided for each single source and as an average across sources. △ Less

Submitted 18 January, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

Comments: submitted to Speech Communication

arXiv:2012.11230 [pdf, other]

DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks

Authors: Cheeun Hong, Heewon Kim, Sungyong Baik, Junghun Oh, Kyoung Mu Lee

Abstract: Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs. However, existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance. To our knowledge, this vulnerability to low precisions relies on two statistical observations… ▽ More Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs. However, existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance. To our knowledge, this vulnerability to low precisions relies on two statistical observations of feature map values. First, distribution of feature map values varies significantly per channel and per input image. Second, feature maps have outliers that can dominate the quantization error. Based on these observations, we propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision. A simple function of DAQ determines dynamic range of feature maps and weights with low computational burden. Furthermore, our method enables mixed-precision quantization by calculating the relative sensitivity of each channel, without any training process involved. Nonetheless, quantization-aware training is also applicable for auxiliary performance gain. Our new method outperforms recent training-free and even training-based quantization methods to the state-of-the-art image super-resolution networks in ultra-low precision. △ Less

Submitted 6 July, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: WACV 2022

arXiv:2012.11225 [pdf, other]

Searching for Controllable Image Restoration Networks

Authors: Heewon Kim, Sungyong Baik, Myungsub Choi, Janghoon Choi, Kyoung Mu Lee

Abstract: Diverse user preferences over images have recently led to a great amount of interest in controlling the imagery effects for image restoration tasks. However, existing methods require separate inference through the entire network per each output, which hinders users from readily comparing multiple imagery effects due to long latency. To this end, we propose a novel framework based on a neural archi… ▽ More Diverse user preferences over images have recently led to a great amount of interest in controlling the imagery effects for image restoration tasks. However, existing methods require separate inference through the entire network per each output, which hinders users from readily comparing multiple imagery effects due to long latency. To this end, we propose a novel framework based on a neural architecture search technique that enables efficient generation of multiple imagery effects via two stages of pruning: task-agnostic and task-specific pruning. Specifically, task-specific pruning learns to adaptively remove the irrelevant network parameters for each task, while task-agnostic pruning learns to find an efficient architecture by sharing the early layers of the network across different tasks. Since the shared layers allow for feature reuse, only a single inference of the task-agnostic layers is needed to generate multiple imagery effects from the input image. Using the proposed task-agnostic and task-specific pruning schemes together significantly reduces the FLOPs and the actual latency of inference compared to the baseline. We reduce 95.7% of the FLOPs when generating 27 imagery effects, and make the GPU latency 73.0% faster on 4K-resolution images. △ Less

Submitted 21 December, 2020; originally announced December 2020.

arXiv:2011.00209 [pdf, other]

Meta-Learning with Adaptive Hyperparameters

Authors: Sungyong Baik, Myungsub Choi, Janghoon Choi, Heewon Kim, Kyoung Mu Lee

Abstract: Despite its popularity, several recent works question the effectiveness of MAML when test tasks are different from training tasks, thus suggesting various task-conditioned methodology to improve the initialization. Instead of searching for better task-aware initialization, we focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation). Consequently, we propose a… ▽ More Despite its popularity, several recent works question the effectiveness of MAML when test tasks are different from training tasks, thus suggesting various task-conditioned methodology to improve the initialization. Instead of searching for better task-aware initialization, we focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation). Consequently, we propose a new weight update rule that greatly enhances the fast adaptation process. Specifically, we introduce a small meta-network that can adaptively generate per-step hyperparameters: learning rate and weight decay coefficients. The experimental results validate that the Adaptive Learning of hyperparameters for Fast Adaptation (ALFA) is the equally important ingredient that was often neglected in the recent few-shot learning approaches. Surprisingly, fast adaptation from random initialization with ALFA can already outperform MAML. △ Less

Submitted 8 December, 2020; v1 submitted 31 October, 2020; originally announced November 2020.

Comments: NeurIPS 2020. Code at https://github.com/baiksung/alfa. Typo fix in the updated version

arXiv:2008.09310 [pdf, other]

Domain Adaptation of Learned Features for Visual Localization

Authors: Sungyong Baik, Hyo ** Kim, Tianwei Shen, Eddy Ilg, Kyoung Mu Lee, Chris Sweeney

Abstract: We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons. Recent learned local features based on deep neural networks have shown superior performance over classical hand-crafted local features. However, in a real-world scenario, there often exists a large domain gap between training and target images, which can significantly degrade the loca… ▽ More We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons. Recent learned local features based on deep neural networks have shown superior performance over classical hand-crafted local features. However, in a real-world scenario, there often exists a large domain gap between training and target images, which can significantly degrade the localization accuracy. While existing methods utilize a large amount of data to tackle the problem, we present a novel and practical approach, where only a few examples are needed to reduce the domain gap. In particular, we propose a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization. The experimental results demonstrate the superior performance over baselines, while using a scarce number of training examples from the target domain. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: BMVC 2020

arXiv:2004.00779 [pdf, other]

Scene-Adaptive Video Frame Interpolation via Meta-Learning

Authors: Myungsub Choi, Janghoon Choi, Sungyong Baik, Tae Hyun Kim, Kyoung Mu Lee

Abstract: Video frame interpolation is a challenging problem because there are different scenarios for each video depending on the variety of foreground and background motion, frame rate, and occlusion. It is therefore difficult for a single network with fixed parameters to generalize across different videos. Ideally, one could have a different network for each scenario, but this is computationally infeasib… ▽ More Video frame interpolation is a challenging problem because there are different scenarios for each video depending on the variety of foreground and background motion, frame rate, and occlusion. It is therefore difficult for a single network with fixed parameters to generalize across different videos. Ideally, one could have a different network for each scenario, but this is computationally infeasible for practical applications. In this work, we propose to adapt the model to each video by making use of additional information that is readily available at test time and yet has not been exploited in previous works. We first show the benefits of `test-time adaptation' through simple fine-tuning of a network, then we greatly improve its efficiency by incorporating meta-learning. We obtain significant performance gains with only a single gradient update without any additional parameters. Finally, we show that our meta-learning framework can be easily employed to any video frame interpolation network and can consistently improve its performance on multiple benchmark datasets. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: CVPR 2020

arXiv:1912.09870 [pdf, ps, other]

doi 10.1080/24725854.2023.2183531

QoS-aware energy-efficient workload routing and server speed control policy in data centers: a robust queueing theoretic approach

Authors: Seung Min Baik, Young Myoung Ko

Abstract: Operating cloud service infrastructures requires high energy efficiency while ensuring a satisfactory service level. Motivated by data centers, we consider a workload routing and server speed control policy applicable to the system operating under fluctuating demands. Dynamic control algorithms are generally more energy-efficient than static ones. However, they often require frequent information e… ▽ More Operating cloud service infrastructures requires high energy efficiency while ensuring a satisfactory service level. Motivated by data centers, we consider a workload routing and server speed control policy applicable to the system operating under fluctuating demands. Dynamic control algorithms are generally more energy-efficient than static ones. However, they often require frequent information exchanges between routers and servers, making the data centers' management hesitate to deploy these algorithms. This study presents a static routing and server speed control policy that could achieve energy efficiency similar to a dynamic algorithm and eliminate the necessity of frequent communication among resources. We take a robust queueing theoretic approach to response time constraints for the quality of service (QoS) conditions. Each server is modeled as a G/G/1 processor sharing queue, and the concept of uncertainty sets defines the domain of stochastic primitives. We derive an approximative upper bound of sojourn times from uncertainty sets and develop an approximative sojourn time quantile estimation method for QoS. Numerical experiments confirm the proposed static policy offers competitive solutions compared with the dynamic algorithm. △ Less

Submitted 3 March, 2023; v1 submitted 20 December, 2019; originally announced December 2019.

Journal ref: IISE Transactions, 2023

arXiv:1909.08219 [pdf, other]

Performance of Recommender Systems: Based on Content Navigator and Collaborative Filtering

Authors: Keum Gang Cha, Soo-Ryeon Lee, Jung-Woo Lee, Seung Bin Baik

Abstract: In the world of big data, many people find it difficult to access the information they need quickly and accurately. In order to overcome this, research on the system that recommends information accurately to users is continuously conducted. Collaborative Filtering is one of the famous algorithms among the most used in the industry. However, collaborative filtering is difficult to use in online sys… ▽ More In the world of big data, many people find it difficult to access the information they need quickly and accurately. In order to overcome this, research on the system that recommends information accurately to users is continuously conducted. Collaborative Filtering is one of the famous algorithms among the most used in the industry. However, collaborative filtering is difficult to use in online systems because user recommendation is highly volatile in recommendation quality and requires computation using large matrices. To overcome this problem, this paper proposes a method similar to database queries and a clustering method (Contents Navigator) originating from a complex network. △ Less

Submitted 18 September, 2019; originally announced September 2019.

arXiv:1906.05895 [pdf, other]

Learning to Forget for Meta-Learning

Authors: Sungyong Baik, Seokil Hong, Kyoung Mu Lee

Abstract: Few-shot learning is a challenging problem where the goal is to achieve generalization from only few examples. Model-agnostic meta-learning (MAML) tackles the problem by formulating prior knowledge as a common initialization across tasks, which is then used to quickly adapt to unseen tasks. However, forcibly sharing an initialization can lead to conflicts among tasks and the compromised (undesired… ▽ More Few-shot learning is a challenging problem where the goal is to achieve generalization from only few examples. Model-agnostic meta-learning (MAML) tackles the problem by formulating prior knowledge as a common initialization across tasks, which is then used to quickly adapt to unseen tasks. However, forcibly sharing an initialization can lead to conflicts among tasks and the compromised (undesired by tasks) location on optimization landscape, thereby hindering the task adaptation. Further, we observe that the degree of conflict differs among not only tasks but also layers of a neural network. Thus, we propose task-and-layer-wise attenuation on the compromised initialization to reduce its influence. As the attenuation dynamically controls (or selectively forgets) the influence of prior knowledge for a given task and each layer, we name our method as L2F (Learn to Forget). The experimental results demonstrate that the proposed method provides faster adaptation and greatly improves the performance. Furthermore, L2F can be easily applied and improve other state-of-the-art MAML-based frameworks, illustrating its simplicity and generalizability. △ Less

Submitted 15 June, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

Comments: CVPR 2020. Code at https://github.com/baiksung/L2F

arXiv:1809.06965 [pdf, other]

A Study on Deep Learning Based Sauvegrain Method for Measurement of Puberty Bone Age

Authors: Seung Bin Baik, Keum Gang Cha

Abstract: This study applies a technique to expand the number of images to a level that allows deep learning. And the applicability of the Sauvegrain method through deep learning with relatively few elbow X-rays is studied. The study was composed of processes similar to the physicians' bone age assessment procedures. The selected reference images were learned without being included in the evaluation data, a… ▽ More This study applies a technique to expand the number of images to a level that allows deep learning. And the applicability of the Sauvegrain method through deep learning with relatively few elbow X-rays is studied. The study was composed of processes similar to the physicians' bone age assessment procedures. The selected reference images were learned without being included in the evaluation data, and at the same time, the data was extended to accommodate the number of cases. In addition, we adjusted the X-ray images to better images using U-Net and selected the ROI with RPN + so as to be able to perform bone age estimation through CNN. The mean absolute error of the Sauvegrain method based on deep learning is 2.8 months and the Mean Absolute Percentage Error (MAPE) is 0.018. This result shows that X - ray analysis using the Sauvegrain method shows higher accuracy than that of the age group of puberty even in the deep learning base. This means that deep learning of the Suvegrain method can be measured at a level similar to that of an expert, based on the extended X-ray image with the image data extension technique. Finally, we applied the Sauvegrain method to deep learning for accurate measurement of bone age at puberty. As a result, the present study is based on deep learning, and compared with the evaluation results of experts, it is possible to overcome limitations of the method of measuring bone age based on machine learning which was in TW3 or Greulich & Pyle due to lack of X- I confirmed the fact. And we also presented the Sauvegrain method, which is applicable to adolescents as well. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: 5 pages, 6 figures, 1 table

arXiv:1610.01382 [pdf]

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

Authors: Abdul Malik Badshah, Jamil Ahmad, Mi Young Lee, Sung Wook Baik

Abstract: Besides spoken words, speech signals also carry information about speaker gender, age, and emotional state which can be used in a variety of speech analysis applications. In this paper, a divide and conquer strategy for ensemble classification has been proposed to recognize emotions in speech. Intrinsic hierarchy in emotions has been utilized to construct an emotions tree, which assisted in breaki… ▽ More Besides spoken words, speech signals also carry information about speaker gender, age, and emotional state which can be used in a variety of speech analysis applications. In this paper, a divide and conquer strategy for ensemble classification has been proposed to recognize emotions in speech. Intrinsic hierarchy in emotions has been utilized to construct an emotions tree, which assisted in breaking down the emotion recognition task into smaller sub tasks. The proposed framework generates predictions in three phases. Firstly, emotions are detected in the input speech signal by classifying it as neutral or emotional. If the speech is classified as emotional, then in the second phase, it is further classified into positive and negative classes. Finally, individual positive or negative emotions are identified based on the outcomes of the previous stages. Several experiments have been performed on a widely used benchmark dataset. The proposed method was able to achieve improved recognition rates as compared to several other approaches. △ Less

Submitted 5 October, 2016; originally announced October 2016.

Comments: 8 pages, conference paper, The 2nd International Integrated Conference & Concert on Convergence (2016)

arXiv:1601.01577 [pdf]

Gender Identification using MFCC for Telephone Applications - A Comparative Study

Authors: Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, Sung Wook Baik

Abstract: Gender recognition is an essential component of automatic speech recognition and interactive voice response systems. Determining gender of the speaker reduces the computational burden of such systems for any further processing. Typical methods for gender recognition from speech largely depend on features extraction and classification processes. The purpose of this study is to evaluate the performa… ▽ More Gender recognition is an essential component of automatic speech recognition and interactive voice response systems. Determining gender of the speaker reduces the computational burden of such systems for any further processing. Typical methods for gender recognition from speech largely depend on features extraction and classification processes. The purpose of this study is to evaluate the performance of various state-of-the-art classification methods along with tuning their parameters for hel** selection of the optimal classification methods for gender recognition tasks. Five classification schemes including k-nearest neighbor, naïve Bayes, multilayer perceptron, random forest, and support vector machine are comprehensively evaluated for determination of gender from telephonic speech using the Mel-frequency cepstral coefficients. Different experiments were performed to determine the effects of training data sizes, length of the speech streams, and parameter tuning on classification performance. Results suggest that SVM is the best classifier among all the five schemes for gender recognition. △ Less

Submitted 7 January, 2016; originally announced January 2016.

Journal ref: International Journal of Computer Science and Electronics Engineering 3.5 (2015): 351-355

arXiv:1510.02177 [pdf]

Ontology-based Secure Retrieval of Semantically Significant Visual Contents

Authors: Khan Muhammad, Irfan Mehmood, Mi Young Lee, Su Mi Ji, Sung Wook Baik

Abstract: Image classification is an enthusiastic research field where large amount of image data is classified into various classes based on their visual contents. Researchers have presented various low-level features-based techniques for classifying images into different categories. However, efficient and effective classification and retrieval is still a challenging problem due to complex nature of visual… ▽ More Image classification is an enthusiastic research field where large amount of image data is classified into various classes based on their visual contents. Researchers have presented various low-level features-based techniques for classifying images into different categories. However, efficient and effective classification and retrieval is still a challenging problem due to complex nature of visual contents. In addition, the traditional information retrieval techniques are vulnerable to security risks, making it easy for attackers to retrieve personal visual contents such as patients records and law enforcement agencies databases. Therefore, we propose a novel ontology-based framework using image steganography for secure image classification and information retrieval. The proposed framework uses domain-specific ontology for map** the low-level image features to high-level concepts of ontologies which consequently results in efficient classification. Furthermore, the proposed method utilizes image steganography for hiding the image semantics as a secret message inside them, making the information retrieval process secure from third parties. The proposed framework minimizes the computational complexity of traditional techniques, increasing its suitability for secure and real-time visual contents retrieval from personalized image databases. Experimental results confirm the efficiency, effectiveness, and security of the proposed framework as compared with other state-of-the-art systems. △ Less

Submitted 7 October, 2015; originally announced October 2015.

Comments: A short paper of 11 pages for secure visual contents retrieval.The original version can be accessed at this link: http://www.kingpc.or.kr/inc_html/index.html

Journal ref: Khan Muhammad, Irfan Mehmood, Mi Young Lee, Su Mi Ji, Sung Wook Baik, "Ontology-based Secure Retrieval of Semantically Significant Visual Contents," JOURNAL OF KOREAN INSTITUTE OF NEXT GENERATION COMPUTING, vol. 11, pp. 87-96, 2015

arXiv:1506.02100 [pdf]

doi 10.1007/s11042-015-2671-9

A novel magic LSB substitution method (M-LSB-SM) using multi-level encryption and achromatic component of an image

Authors: Khan Muhammad, Muhammad Sajjad, Irfan Mehmood, Seungmin Rho, Sung Wook Baik

Abstract: Image Steganography is a thriving research area of information security where secret data is embedded in images to hide its existence while getting the minimum possible statistical detectability. This paper proposes a novel magic least significant bit substitution method (M-LSB-SM) for RGB images. The proposed method is based on the achromatic component (I-plane) of the hue-saturation-intensity (H… ▽ More Image Steganography is a thriving research area of information security where secret data is embedded in images to hide its existence while getting the minimum possible statistical detectability. This paper proposes a novel magic least significant bit substitution method (M-LSB-SM) for RGB images. The proposed method is based on the achromatic component (I-plane) of the hue-saturation-intensity (HSI) color model and multi-level encryption (MLE) in the spatial domain. The input image is transposed and converted into an HSI color space. The I-plane is divided into four sub-images of equal size, rotating each sub-image with a different angle using a secret key. The secret information is divided into four blocks, which are then encrypted using an MLE algorithm (MLEA). Each sub-block of the message is embedded into one of the rotated sub-images based on a specific pattern using magic LSB substitution. Experimental results validate that the proposed method not only enhances the visual quality of stego images but also provides good imperceptibility and multiple security levels as compared to several existing prominent methods. △ Less

Submitted 5 June, 2015; originally announced June 2015.

Comments: This paper has been published in Multimedia Tools and Applications Journal with impact factor=1.058. The readers can study the formatted paper using the following link: http://link.springer.com/article/10.1007/s11042-015-2671-9. Please use sci-hub.org for downloading this paper if you are unable to access it freely or email us at [email protected]

Journal ref: Multimedia Tools and Applications, pp. 1-27, 2015

arXiv:1502.07041 [pdf]

Describing Colors, Textures and Shapes for Content Based Image Retrieval - A Survey

Authors: Jamil Ahmad, Muhammad Sajjad, Irfan Mehmood, Seungmin Rho, Sung Wook Baik

Abstract: Visual media has always been the most enjoyed way of communication. From the advent of television to the modern day hand held computers, we have witnessed the exponential growth of images around us. Undoubtedly it's a fact that they carry a lot of information in them which needs be utilized in an effective manner. Hence intense need has been felt to efficiently index and store large image collecti… ▽ More Visual media has always been the most enjoyed way of communication. From the advent of television to the modern day hand held computers, we have witnessed the exponential growth of images around us. Undoubtedly it's a fact that they carry a lot of information in them which needs be utilized in an effective manner. Hence intense need has been felt to efficiently index and store large image collections for effective and on- demand retrieval. For this purpose low-level features extracted from the image contents like color, texture and shape has been used. Content based image retrieval systems employing these features has proven very successful. Image retrieval has promising applications in numerous fields and hence has motivated researchers all over the world. New and improved ways to represent visual content are being developed each day. Tremendous amount of research has been carried out in the last decade. In this paper we will present a detailed overview of some of the powerful color, texture and shape descriptors for content based image retrieval. A comparative analysis will also be carried out for providing an insight into outstanding challenges in this field. △ Less

Submitted 24 February, 2015; originally announced February 2015.

Journal ref: (2014), Journal of Platform Technology 2(4): 34-48

Showing 1–28 of 28 results for author: Baik, S