Search | arXiv e-print repository

Stochastic Restarting to Overcome Overfitting in Neural Networks with Noisy Labels

Authors: Youngkyoung Bae, Yeongwoo Song, Hawoong Jeong

Abstract: Despite its prevalence, giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that restarting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the… ▽ More Despite its prevalence, giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that restarting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually overfit to the noisy labels. To combat this overfitting phenomenon, we developed a method based on stochastic restarting, which has been actively explored in the statistical physics field for finding targets efficiently. By approximating the dynamics of stochastic gradient descent into Langevin dynamics, we theoretically show that restarting can provide great improvements as the batch size and the proportion of corrupted data increase. We then empirically validate our theory, confirming the significant improvements achieved by restarting. An important aspect of our method is its ease of implementation and compatibility with other methods, while still yielding notably improved performance. We envision it as a valuable tool that can complement existing methods for handling noisy labels. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 21 pages, 10 figures

arXiv:2405.00642 [pdf, other]

From Empirical Observations to Universality: Dynamics of Deep Learning with Inputs Built on Gaussian mixture

Authors: Jaeyong Bae, Hawoong Jeong

Abstract: This study broadens the scope of theoretical frameworks in deep learning by delving into the dynamics of neural networks with inputs that demonstrate the structural characteristics to Gaussian Mixture (GM). We analyzed how the dynamics of neural networks under GM-structured inputs diverge from the predictions of conventional theories based on simple Gaussian structures. A revelation of our work is… ▽ More This study broadens the scope of theoretical frameworks in deep learning by delving into the dynamics of neural networks with inputs that demonstrate the structural characteristics to Gaussian Mixture (GM). We analyzed how the dynamics of neural networks under GM-structured inputs diverge from the predictions of conventional theories based on simple Gaussian structures. A revelation of our work is the observed convergence of neural network dynamics towards conventional theory even with standardized GM inputs, highlighting an unexpected universality. We found that standardization, especially in conjunction with certain nonlinear functions, plays a critical role in this phenomena. Consequently, despite the complex and varied nature of GM distributions, we demonstrate that neural networks exhibit asymptotic behaviors in line with predictions under simple Gaussian frameworks. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 19 pages, 9 figures

arXiv:2403.01204 [pdf, ps, other]

Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Authors: Halyun Jeong, Deanna Needell, Elizaveta Rebrova

Abstract: We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is t… ▽ More We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is the first convergence guarantee result for robust ReLU regression in the streaming setting, and it shows the improved convergence rate over previous robust methods for $L_1$ linear regression due to a choice of an exponentially decaying step size, known for its efficiency in practice. Our analysis is based on the drift analysis of a discrete stochastic process, which could also be interesting on its own. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: Submitted to a journal

MSC Class: 65F10; 60-XX

arXiv:2402.10482 [pdf, other]

Understanding Self-Distillation and Partial Label Learning in Multi-Class Classification with Label Noise

Authors: Hyeonsu Jeong, Hye Won Chung

Abstract: Self-distillation (SD) is the process of training a student model using the outputs of a teacher model, with both models sharing the same architecture. Our study theoretically examines SD in multi-class classification with cross-entropy loss, exploring both multi-round SD and SD with refined teacher outputs, inspired by partial label learning (PLL). By deriving a closed-form solution for the stude… ▽ More Self-distillation (SD) is the process of training a student model using the outputs of a teacher model, with both models sharing the same architecture. Our study theoretically examines SD in multi-class classification with cross-entropy loss, exploring both multi-round SD and SD with refined teacher outputs, inspired by partial label learning (PLL). By deriving a closed-form solution for the student model's outputs, we discover that SD essentially functions as label averaging among instances with high feature correlations. Initially beneficial, this averaging helps the model focus on feature clusters correlated with a given instance for predicting the label. However, it leads to diminishing performance with increasing distillation rounds. Additionally, we demonstrate SD's effectiveness in label noise scenarios and identify the label corruption condition and minimum number of distillation rounds needed to achieve 100% classification accuracy. Our study also reveals that one-step distillation with refined teacher outputs surpasses the efficacy of multi-step SD using the teacher's direct output in high noise rate regimes. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2310.01107 [pdf, other]

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

Authors: Hyeonho Jeong, Jong Chul Ye

Abstract: Recent endeavors in video editing have showcased promising results in single-attribute editing or style transfer tasks, either by training text-to-video (T2V) models on text-video data or adopting training-free methods. However, when confronted with the complexities of multi-attribute editing scenarios, they exhibit shortcomings such as omitting or overlooking intended attribute changes, modifying… ▽ More Recent endeavors in video editing have showcased promising results in single-attribute editing or style transfer tasks, either by training text-to-video (T2V) models on text-video data or adopting training-free methods. However, when confronted with the complexities of multi-attribute editing scenarios, they exhibit shortcomings such as omitting or overlooking intended attribute changes, modifying the wrong elements of the input video, and failing to preserve regions of the input video that should remain intact. To address this, here we present a novel grounding-guided video-to-video translation framework called Ground-A-Video for multi-attribute video editing. Ground-A-Video attains temporally consistent multi-attribute editing of input videos in a training-free manner without aforementioned shortcomings. Central to our method is the introduction of Cross-Frame Gated Attention which incorporates groundings information into the latent representations in a temporally consistent fashion, along with Modulated Cross-Attention and optical flow guided inverted latents smoothing. Extensive experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency. Further results and code are available at our project page (http://ground-a-video.github.io). △ Less

Submitted 24 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted to ICLR 2024, Project Page: http://ground-a-video.github.io

arXiv:2308.16058 [pdf, other]

A Classification of Observation-Driven State-Space Count Models for Panel Data

Authors: Jae Youn Ahn, Himchan Jeong, Yang Lu, Mario V. Wüthrich

Abstract: State-space models are widely used in many applications. In the domain of count data, one such example is the model proposed by Harvey and Fernandes (1989). Unlike many of its parameter-driven alternatives, this model is observation-driven, leading to closed-form expressions for the predictive density. In this paper, we demonstrate the need to extend the model of Harvey and Fernandes (1989) by sho… ▽ More State-space models are widely used in many applications. In the domain of count data, one such example is the model proposed by Harvey and Fernandes (1989). Unlike many of its parameter-driven alternatives, this model is observation-driven, leading to closed-form expressions for the predictive density. In this paper, we demonstrate the need to extend the model of Harvey and Fernandes (1989) by showing that their model is not variance stationary. Our extension can accommodate for a wide range of variance processes that are either increasing, decreasing, or stationary, while kee** the tractability of the original model. Simulation and numerical studies are included to illustrate the performance of our method. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 28 pages, 2 figures

MSC Class: 62M10 ACM Class: G.3

arXiv:2304.10123 [pdf, other]

Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints

Authors: Halyun Jeong, Deanna Needell

Abstract: The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure… ▽ More The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure, and optimization with nonconvex constraints. Recently, a hybrid method called Kaczmarz-based IHT (KZIHT) has been proposed, combining the benefits of both approaches, but its theoretical guarantees are missing. In this paper, we provide the first theoretical convergence guarantees for KZIHT by showing that it converges linearly to the solution of a system with sparsity constraints up to optimal statistical bias when the reshuffling data sampling scheme is used. We also propose the Kaczmarz with periodic thresholding (KZPT) method, which generalizes KZIHT by applying the thresholding operation for every certain number of KZ iterations and by employing two different types of step sizes. We establish a linear convergence guarantee for KZPT for randomly subsampled bounded orthonormal systems (BOS) and mean-zero isotropic sub-Gaussian random matrices, which are most commonly used models in compressed sensing, dimension reduction, matrix sketching, and many inverse problems in neural networks. Our analysis shows that KZPT with an optimal thresholding period outperforms KZIHT. To support our theory, we include several numerical experiments. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Submitted to a journal

MSC Class: 65F10; 65F22; 90C26

arXiv:2302.03900 [pdf, other]

Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models

Authors: Hyeonho Jeong, Gihyun Kwon, Jong Chul Ye

Abstract: Recent advancements in large scale text-to-image models have opened new possibilities for guiding the creation of images through human-devised natural language. However, while prior literature has primarily focused on the generation of individual images, it is essential to consider the capability of these models to ensure coherency within a sequence of images to fulfill the demands of real-world a… ▽ More Recent advancements in large scale text-to-image models have opened new possibilities for guiding the creation of images through human-devised natural language. However, while prior literature has primarily focused on the generation of individual images, it is essential to consider the capability of these models to ensure coherency within a sequence of images to fulfill the demands of real-world applications such as storytelling. To address this, here we present a novel neural pipeline for generating a coherent storybook from the plain text of a story. Specifically, we leverage a combination of a pre-trained Large Language Model and a text-guided Latent Diffusion Model to generate coherent images. While previous story synthesis frameworks typically require a large-scale text-to-image model trained on expensive image-caption pairs to maintain the coherency, we employ simple textual inversion techniques along with detector-based semantic image editing which allows zero-shot generation of the coherent storybook. Experimental results show that our proposed method outperforms state-of-the-art image editing baselines. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.00006 [pdf, other]

Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Authors: Hyeonsu Jeong, Hye Won Chung

Abstract: Crowdsourcing has emerged as an effective platform for labeling large amounts of data in a cost- and time-efficient manner. Most previous work has focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and th… ▽ More Crowdsourcing has emerged as an effective platform for labeling large amounts of data in a cost- and time-efficient manner. Most previous work has focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and the confusion probability. The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices. Task difficulty is quantified by the probability of confusion between the top two, and worker reliability is quantified by the probability of giving an answer among the top two. Under this model, we propose a two-stage inference algorithm to infer both the top two answers and the confusion probability. We show that our algorithm achieves the minimax optimal convergence rate. We conduct both synthetic and real data experiments and demonstrate that our algorithm outperforms other recent algorithms. We also show the applicability of our algorithms in inferring the difficulty of tasks and in training neural networks with top-two soft labels. △ Less

Submitted 31 May, 2023; v1 submitted 29 December, 2022; originally announced January 2023.

Comments: ICML 2023

arXiv:2212.01168 [pdf, other]

Towards Cross Domain Generalization of Hamiltonian Representation via Meta Learning

Authors: Yeongwoo Song, Hawoong Jeong

Abstract: Recent advances in deep learning for physics have focused on discovering shared representations of target systems by incorporating physics priors or inductive biases into neural networks. While effective, these methods are limited to the system domain, where the type of system remains consistent and thus cannot ensure the adaptation to new, or unseen physical systems governed by different laws. Fo… ▽ More Recent advances in deep learning for physics have focused on discovering shared representations of target systems by incorporating physics priors or inductive biases into neural networks. While effective, these methods are limited to the system domain, where the type of system remains consistent and thus cannot ensure the adaptation to new, or unseen physical systems governed by different laws. For instance, a neural network trained on a mass-spring system cannot guarantee accurate predictions for the behavior of a two-body system or any other system with different physical laws. In this work, we take a significant leap forward by targeting cross domain generalization within the field of Hamiltonian dynamics. We model our system with a graph neural network (GNN) and employ a meta learning algorithm to enable the model to gain experience over a distribution of systems and make it adapt to new physics. Our approach aims to learn a unified Hamiltonian representation that is generalizable across multiple system domains, thereby overcoming the limitations of system-specific models. We demonstrate that the meta-trained model captures the generalized Hamiltonian representation that is consistent across different physical domains. Overall, through the use of meta learning, we offer a framework that achieves cross domain generalization, providing a step towards a unified model for understanding a wide array of dynamical systems via deep learning. △ Less

Submitted 27 April, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Conference paper at ICLR 2024

arXiv:2210.05816 [pdf, other]

Finding and Listing Front-door Adjustment Sets

Authors: Hyunchai Jeong, ** Tian, Elias Bareinboim

Abstract: Identifying the effects of new interventions from data is a significant challenge found across a wide range of the empirical sciences. A well-known strategy for identifying such effects is Pearl's front-door (FD) criterion (Pearl, 1995). The definition of the FD criterion is declarative, only allowing one to decide whether a specific set satisfies the criterion. In this paper, we present algorithm… ▽ More Identifying the effects of new interventions from data is a significant challenge found across a wide range of the empirical sciences. A well-known strategy for identifying such effects is Pearl's front-door (FD) criterion (Pearl, 1995). The definition of the FD criterion is declarative, only allowing one to decide whether a specific set satisfies the criterion. In this paper, we present algorithms for finding and enumerating possible sets satisfying the FD criterion in a given causal diagram. These results are useful in facilitating the practical applications of the FD criterion for causal effects estimation and hel** scientists to select estimands with desired properties, e.g., based on cost, feasibility of measurement, or statistical power. △ Less

Submitted 14 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: Pages: 18 (main paper 10, references 2, appendix 6), Figures: 9 (main paper 7, appendix 2), to be published in Proceedings of the 36th Annual Conference on Neural Information Processing Systems

arXiv:2110.09657 [pdf, ps, other]

A simple Bayesian state-space model for the collective risk model

Authors: Jae Youn Ahn, Himchan Jeong, Yang Lu

Abstract: The collective risk model (CRM) for frequency and severity is an important tool for retail insurance ratemaking, macro-level catastrophic risk forecasting, as well as operational risk in banking regulation. This model, which is initially designed for cross-sectional data, has recently been adapted to a longitudinal context to conduct both a priori and a posteriori ratemaking, through the introduct… ▽ More The collective risk model (CRM) for frequency and severity is an important tool for retail insurance ratemaking, macro-level catastrophic risk forecasting, as well as operational risk in banking regulation. This model, which is initially designed for cross-sectional data, has recently been adapted to a longitudinal context to conduct both a priori and a posteriori ratemaking, through the introduction of random effects. However, so far, the random effect(s) is usually assumed static due to computational concerns, leading to predictive premium that omit the seniority of the claims. In this paper, we propose a new CRM model with bivariate dynamic random effect process. The model is based on Bayesian state-space models. It is associated with the simple predictive mean and closed form expression for the likelihood function, while also allowing for the dependence between the frequency and severity components. Real data application to auto insurance is proposed to show the performance of our method. △ Less

Submitted 18 October, 2021; originally announced October 2021.

arXiv:2109.10431 [pdf, other]

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

Authors: Haewon Jeong, Hao Wang, Flavio P. Calmon

Abstract: We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fai… ▽ More We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets. △ Less

Submitted 13 April, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2109.07956 [pdf, other]

On the ordering of credibility factors

Authors: Jae Youn Ahn, Himchan Jeong, Yang Lu

Abstract: Traditional credibility analysis of risks in insurance is based on the random effects model, where the heterogeneity across the policyholders is assumed to be time-invariant. One popular extension is the dynamic random effects (or state-space) model. However, while the latter allows for time-varying heterogeneity, its application to the credibility analysis should be conducted with care due to the… ▽ More Traditional credibility analysis of risks in insurance is based on the random effects model, where the heterogeneity across the policyholders is assumed to be time-invariant. One popular extension is the dynamic random effects (or state-space) model. However, while the latter allows for time-varying heterogeneity, its application to the credibility analysis should be conducted with care due to the possibility of negative credibilities per period [see Pinquet (2020a)]. Another important but under-explored topic is the ordering of the credibility factors in a monotonous manner -- recent claims ought to have larger weights than the old ones. This paper shows that the ordering of the covariance structure of the random effects in the dynamic random effects model does not necessarily imply that of the credibility factors. Subsequently, we show that the state-space model, with AR(1)-type autocorrelation function, guarantees the ordering of the credibility factors. Simulation experiments and a case study with a real dataset are conducted to show the relevance in insurance applications. △ Less

Submitted 16 September, 2021; originally announced September 2021.

arXiv:2102.04008 [pdf, other]

Discovering conservation laws from trajectories via machine learning

Authors: Seungwoong Ha, Hawoong Jeong

Abstract: Invariants and conservation laws convey critical information about the underlying dynamics of a system, yet it is generally infeasible to find them from large-scale data without any prior knowledge or human insight. We propose ConservNet to achieve this goal, a neural network that spontaneously discovers a conserved quantity from grouped data where the members of each group share invariants, simil… ▽ More Invariants and conservation laws convey critical information about the underlying dynamics of a system, yet it is generally infeasible to find them from large-scale data without any prior knowledge or human insight. We propose ConservNet to achieve this goal, a neural network that spontaneously discovers a conserved quantity from grouped data where the members of each group share invariants, similar to a general experimental setting where trajectories from different trials are observed. As a neural network trained with a novel and intuitive loss function called noise-variance loss, ConservNet learns the hidden invariants in each group of multi-dimensional observables in a data-driven, end-to-end manner. Our model successfully discovers underlying invariants from the simulated systems having invariants as well as a real-world double pendulum trajectory. Since the model is robust to various noises and data conditions compared to baseline, our approach is directly applicable to experimental data for discovering hidden conservation laws and further, general relationships between variables. △ Less

Submitted 30 June, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: 12 pages, 9 figures

arXiv:2102.03065 [pdf, other]

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Authors: Jang-Hyun Kim, Wonho Choo, Hosan Jeong, Hyun Oh Song

Abstract: While deep neural networks show great performance on fitting to the training distribution, improving the networks' generalization performance to the test distribution and robustness to the sensitivity to input perturbations still remain as a challenge. Although a number of mixup based augmentation strategies have been proposed to partially address them, it remains unclear as to how to best utilize… ▽ More While deep neural networks show great performance on fitting to the training distribution, improving the networks' generalization performance to the test distribution and robustness to the sensitivity to input perturbations still remain as a challenge. Although a number of mixup based augmentation strategies have been proposed to partially address them, it remains unclear as to how to best utilize the supervisory signal within each input data for mixup from the optimization perspective. We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data maximizing the data saliency measure of each individual mixup data and encouraging the supermodular diversity among the constructed mixup data. This leads to a novel discrete optimization problem minimizing the difference between submodular functions. We also propose an efficient modular approximation based iterative submodular minimization algorithm for efficient mixup computation per each minibatch suitable for minibatch based neural network training. Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results compared to other mixup methods. The source code is available at https://github.com/snu-mllab/Co-Mixup. △ Less

Submitted 5 February, 2021; originally announced February 2021.

Comments: Published at ICLR 2021 (Oral)

arXiv:2006.12777 [pdf, other]

Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Learning

Authors: A. Tuan Nguyen, Hyewon Jeong, Eunho Yang, Sung Ju Hwang

Abstract: Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., predict… ▽ More Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., prediction of mortality risk). Existing asymmetric multi-task learning methods tackle this negative transfer problem by performing knowledge transfer from tasks with low loss to tasks with high loss. However, using loss as a measure of reliability is risky since it could be a result of overfitting. In the case of time-series prediction tasks, knowledge learned for one task (e.g., predicting the sepsis onset) at a specific timestep may be useful for learning another task (e.g., prediction of mortality) at a later timestep, but lack of loss at each timestep makes it difficult to measure the reliability at each timestep. To capture such dynamically changing asymmetric relationships between tasks in time-series data, we propose a novel temporal asymmetric multi-task learning model that performs knowledge transfer from certain tasks/timesteps to relevant uncertain tasks, based on feature-level uncertainty. We validate our model on multiple clinical risk prediction tasks against various deep learning models for time-series prediction, which our model significantly outperforms, without any sign of negative transfer. Further qualitative analysis of learned knowledge graphs by clinicians shows that they are helpful in analyzing the predictions of the model. Our final code is available at https://github.com/anhtuan5696/TPAMTL. △ Less

Submitted 18 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: AAAI 2021. The first two authors contributed equally to this work. 10 pages, 4 figures, 4 tables

arXiv:2006.10190 [pdf, other]

Learning to Track Dynamic Targets in Partially Known Environments

Authors: Hee** Jeong, Hamed Hassani, Manfred Morari, Daniel D. Lee, George J. Pappas

Abstract: We solve active target tracking, one of the essential tasks in autonomous systems, using a deep reinforcement learning (RL) approach. In this problem, an autonomous agent is tasked with acquiring information about targets of interests using its onboard sensors. The classical challenges in this problem are system model dependence and the difficulty of computing information-theoretic cost functions… ▽ More We solve active target tracking, one of the essential tasks in autonomous systems, using a deep reinforcement learning (RL) approach. In this problem, an autonomous agent is tasked with acquiring information about targets of interests using its onboard sensors. The classical challenges in this problem are system model dependence and the difficulty of computing information-theoretic cost functions for a long planning horizon. RL provides solutions for these challenges as the length of its effective planning horizon does not affect the computational complexity, and it drops the strong dependency of an algorithm on system models. In particular, we introduce Active Tracking Target Network (ATTN), a unified RL policy that is capable of solving major sub-tasks of active target tracking -- in-sight tracking, navigation, and exploration. The policy shows robust behavior for tracking agile and anomalous targets with a partially known target model. Additionally, the same policy is able to navigate in obstacle environments to reach distant targets as well as explore the environment when targets are positioned in unexpected locations. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: IEEE Transaction on Robotics (under review); Demo video: https://youtu.be/0ZFyOWJ2ulo ; Source code: https://github.com/coco66/ttenv

arXiv:2006.06151 [pdf, other]

On a Multi-Year Microlevel Collective Risk Model

Authors: Rosy Oh, Himchan Jeong, Jae Youn Ahn, Emiliano A. Valdez

Abstract: For a typical insurance portfolio, the claims process for a short period, typically one year, is characterized by observing frequency of claims together with the associated claims severities. The collective risk model describes this portfolio as a random sum of the aggregation of the claim amounts. In the classical framework, for simplicity, the claim frequency and claim severities are assumed to… ▽ More For a typical insurance portfolio, the claims process for a short period, typically one year, is characterized by observing frequency of claims together with the associated claims severities. The collective risk model describes this portfolio as a random sum of the aggregation of the claim amounts. In the classical framework, for simplicity, the claim frequency and claim severities are assumed to be mutually independent. However, there is a growing interest in relaxing this independence assumption which is more realistic and useful for the practical insurance ratemaking. While the common thread has been capturing the dependence between frequency and aggregate severity within a single period, the work of Oh et al. (2020a) provides an interesting extension to the addition of capturing dependence among individual severities. In this paper, we extend these works within a framework where we have a portfolio of microlevel frequencies and severities for multiple years. This allows us to develop a factor copula model framework that captures various types of dependence between claim frequencies and claim severities over multiple years. It is therefore a clear extension of earlier works on one-year dependent frequency-severity models and on random effects model for capturing serial dependence of claims. We focus on the results using a family of elliptical copulas to model the dependence. The paper further describes how to calibrate the proposed model using illustrative claims data arising from a Singapore insurance company. The estimated results provide strong evidence of all forms of dependencies captured by our model. △ Less

Submitted 10 June, 2020; originally announced June 2020.

arXiv:2006.05419 [pdf, other]

Cost-effective Interactive Attention Learning with Neural Attention Processes

Authors: Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang Joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang

Abstract: We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is… ▽ More We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:2004.08032 [pdf, other]

A non-convex regularization approach for stable estimation of loss development factors

Authors: Himchan Jeong, Hyunwoong Chang, Emiliano A. Valdez

Abstract: In this article, we apply non-convex regularization methods in order to obtain stable estimation of loss development factors in insurance claims reserving. Among the non-convex regularization methods, we focus on the use of the log-adjusted absolute deviation (LAAD) penalty and provide discussion on optimization of LAAD penalized regression model, which we prove to converge with a coordinate desce… ▽ More In this article, we apply non-convex regularization methods in order to obtain stable estimation of loss development factors in insurance claims reserving. Among the non-convex regularization methods, we focus on the use of the log-adjusted absolute deviation (LAAD) penalty and provide discussion on optimization of LAAD penalized regression model, which we prove to converge with a coordinate descent algorithm under mild conditions. This has the advantage of obtaining a consistent estimator for the regression coefficients while allowing for the variable selection, which is linked to the stable estimation of loss development factors. We calibrate our proposed model using a multi-line insurance dataset from a property and casualty insurer where we observed reported aggregate loss along accident years and development periods. When compared to other regression models, our LAAD penalized regression model provides very promising results. △ Less

Submitted 6 December, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

Comments: 23 pages, 11 Tables, 6 Figures

MSC Class: 62P05

arXiv:2003.04166 [pdf, other]

doi 10.1103/PhysRevLett.125.140604

Learning entropy production via neural networks

Authors: Dong-Kyum Kim, Youngkyoung Bae, Sangyun Lee, Hawoong Jeong

Abstract: This Letter presents a neural estimator for entropy production, or NEEP, that estimates entropy production (EP) from trajectories of relevant variables without detailed information on the system dynamics. For steady state, we rigorously prove that the estimator, which can be built up from different choices of deep neural networks, provides stochastic EP by optimizing the objective function propose… ▽ More This Letter presents a neural estimator for entropy production, or NEEP, that estimates entropy production (EP) from trajectories of relevant variables without detailed information on the system dynamics. For steady state, we rigorously prove that the estimator, which can be built up from different choices of deep neural networks, provides stochastic EP by optimizing the objective function proposed here. We verify the NEEP with the stochastic processes of the bead-spring and discrete flashing ratchet models, and also demonstrate that our method is applicable to high-dimensional data and can provide coarse-grained EP for Markov systems with unobservable states. △ Less

Submitted 11 September, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: 6+8 pages, 4+8 figures

Journal ref: Phys. Rev. Lett. 125, 140604 (2020)

arXiv:2001.10631 [pdf, ps, other]

Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications

Authors: Halyun Jeong, Xiaowei Li, Yaniv Plan, Özgür Yılmaz

Abstract: Random linear map**s are widely used in modern signal processing, compressed sensing and machine learning. These map**s may be used to embed the data into a significantly lower dimension while at the same time preserving useful information. This is done by approximately preserving the distances between data points, which are assumed to belong to $\mathbb{R}^n$. Thus, the performance of these m… ▽ More Random linear map**s are widely used in modern signal processing, compressed sensing and machine learning. These map**s may be used to embed the data into a significantly lower dimension while at the same time preserving useful information. This is done by approximately preserving the distances between data points, which are assumed to belong to $\mathbb{R}^n$. Thus, the performance of these map**s is usually captured by how close they are to an isometry on the data. Gaussian linear map**s have been the object of much study, while the sub-Gaussian settings is not yet fully understood. In the latter case, the performance depends on the sub-Gaussian norm of the rows. In many applications, e.g., compressed sensing, this norm may be large, or even growing with dimension, and thus it is important to characterize this dependence. We study when a sub-Gaussian matrix can become a near isometry on a set, show that previous best known dependence on the sub-Gaussian norm was sub-optimal, and present the optimal dependence. Our result not only answers a remaining question posed by Liaw, Mehrabian, Plan and Vershynin in 2017, but also generalizes their work. We also develop a new Bernstein type inequality for sub-exponential random variables, and a new Hanson-Wright inequality for quadratic forms of sub-Gaussian random variables, in both cases improving the bounds in the sub-Gaussian regime under moment constraints. Finally, we illustrate popular applications such as Johnson-Lindenstrauss embeddings, null space property for 0-1 matrices, randomized sketches and blind demodulation, whose theoretical guarantees can be improved by our results (in the sub-Gaussian case). △ Less

Submitted 20 January, 2021; v1 submitted 28 January, 2020; originally announced January 2020.

arXiv:1912.05827 [pdf, other]

An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks

Authors: Giyoung Jeon, Haedong Jeong, Jaesik Choi

Abstract: Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. In particular, the adversarial training scheme has been applied to many DGNNs and has exhibited powerful performance. Despite of recent advances in generative networks, identifying the image generation mechanism still remains challenging. In this paper, we present an explorative sampling algorithm to… ▽ More Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. In particular, the adversarial training scheme has been applied to many DGNNs and has exhibited powerful performance. Despite of recent advances in generative networks, identifying the image generation mechanism still remains challenging. In this paper, we present an explorative sampling algorithm to analyze generation mechanism of DGNNs. Our method efficiently obtains samples with identical attributes from a query image in a perspective of the trained model. We define generative boundaries which determine the activation of nodes in the internal layer and probe inside the model with this information. To handle a large number of boundaries, we obtain the essential set of boundaries using optimization. By gathering samples within the region surrounded by generative boundaries, we can empirically reveal the characteristics of the internal layers of DGNNs. We also demonstrate that our algorithm can find more homogeneous, the model specific samples compared to the variations of ε-based sampling method. △ Less

Submitted 12 December, 2019; originally announced December 2019.

Comments: AAAI 2020

arXiv:1910.10754 [pdf, other]

Learning Q-network for Active Information Acquisition

Authors: Hee** Jeong, Brent Schlotfeldt, Hamed Hassani, Manfred Morari, Daniel D. Lee, George J. Pappas

Abstract: In this paper, we propose a novel Reinforcement Learning approach for solving the Active Information Acquisition problem, which requires an agent to choose a sequence of actions in order to acquire information about a process of interest using on-board sensors. The classic challenges in the information acquisition problem are the dependence of a planning algorithm on known models and the difficult… ▽ More In this paper, we propose a novel Reinforcement Learning approach for solving the Active Information Acquisition problem, which requires an agent to choose a sequence of actions in order to acquire information about a process of interest using on-board sensors. The classic challenges in the information acquisition problem are the dependence of a planning algorithm on known models and the difficulty of computing information-theoretic cost functions over arbitrary distributions. In contrast, the proposed framework of reinforcement learning does not require any knowledge on models and alleviates the problems during an extended training stage. It results in policies that are efficient to execute online and applicable for real-time control of robotic systems. Furthermore, the state-of-the-art planning methods are typically restricted to short horizons, which may become problematic with local minima. Reinforcement learning naturally handles the issue of planning horizon in information problems as it maximizes a discounted sum of rewards over a long finite or infinite time horizon. We discuss the potential benefits of the proposed framework and compare the performance of the novel algorithm to an existing information acquisition method for multi-target tracking scenarios. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: IROS 2019, Video https://youtu.be/0ZFyOWJ2ulo

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019

arXiv:physics/0702148 [pdf, ps, other]

doi 10.1140/epjb/e2007-00033-7

Reliability of rank order in sampled networks

Authors: Pan-Jun Kim, Hawoong Jeong

Abstract: In complex scale-free networks, ranking the individual nodes based upon their importance has useful applications, such as the identification of hubs for epidemic control, or bottlenecks for controlling traffic congestion. However, in most real situations, only limited sub-structures of entire networks are available, and therefore the reliability of the order relationships in sampled networks req… ▽ More In complex scale-free networks, ranking the individual nodes based upon their importance has useful applications, such as the identification of hubs for epidemic control, or bottlenecks for controlling traffic congestion. However, in most real situations, only limited sub-structures of entire networks are available, and therefore the reliability of the order relationships in sampled networks requires investigation. With a set of randomly sampled nodes from the underlying original networks, we rank individual nodes by three centrality measures: degree, betweenness, and closeness. The higher-ranking nodes from the sampled networks provide a relatively better characterisation of their ranks in the original networks than the lower-ranking nodes. A closeness-based order relationship is more reliable than any other quantity, due to the global nature of the closeness measure. In addition, we show that if access to hubs is limited during the sampling process, an increase in the sampling fraction can in fact decrease the sampling accuracy. Finally, an estimation method for assessing sampling accuracy is suggested. △ Less

Submitted 16 February, 2007; originally announced February 2007.

Journal ref: Eur. Phys. J. B 55, 109-114 (2007)

arXiv:cond-mat/0505232 [pdf, ps, other]

doi 10.1103/PhysRevE.73.016102

Statistical properties of sampled networks

Authors: Sang Hoon Lee, Pan-Jun Kim, Hawoong Jeong

Abstract: We study the statistical properties of the sampled scale-free networks, deeply related to the proper identification of various real-world networks. We exploit three methods of sampling and investigate the topological properties such as degree and betweenness centrality distribution, average path length, assortativity, and clustering coefficient of sampled networks compared with those of original… ▽ More We study the statistical properties of the sampled scale-free networks, deeply related to the proper identification of various real-world networks. We exploit three methods of sampling and investigate the topological properties such as degree and betweenness centrality distribution, average path length, assortativity, and clustering coefficient of sampled networks compared with those of original networks. It is found that the quantities related to those properties in sampled networks appear to be estimated quite differently for each sampling method. We explain why such a biased estimation of quantities would emerge from the sampling procedure and give appropriate criteria for each sampling method to prevent the quantities from being overestimated or underestimated. △ Less

Submitted 24 November, 2009; v1 submitted 10 May, 2005; originally announced May 2005.

Comments: 8 pages, 11 figures

Journal ref: Phys. Rev. E 73, 016102 (2006)

Showing 1–27 of 27 results for author: Jeong, H