Skip to main content

Showing 1–43 of 43 results for author: Hwang, S J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.00256  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

    Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

    MSC Class: 68T01

    Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

  2. arXiv:2310.07216  [pdf, other

    cs.LG stat.ML

    Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes

    Authors: Jaehyeong Jo, Sung Ju Hwang

    Abstract: Learning the distribution of data on Riemannian manifolds is crucial for modeling data from non-Euclidean space, which is required by many applications in diverse scientific fields. Yet, existing generative models on manifolds suffer from expensive divergence computation or rely on approximations of heat kernel. These limitations restrict their applicability to simple geometries and hinder scalabi… ▽ More

    Submitted 2 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  3. arXiv:2208.12401  [pdf, other

    cs.LG stat.ML

    Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation

    Authors: Jeffrey Willette, Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

    Abstract: Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during tr… ▽ More

    Submitted 8 June, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: ICML 2023

  4. arXiv:2207.01394  [pdf, other

    cs.CV stat.ML

    BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation

    Authors: Geon Park, Jaehong Yoon, Haiyang Zhang, Xing Zhang, Sung Ju Hwang, Yonina C. Eldar

    Abstract: Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original model. However, extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures (e.g., MobileNets) often used for edge-devic… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  5. arXiv:2110.06381  [pdf, other

    stat.ML cs.LG

    Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty

    Authors: Jeffrey Willette, Hae Beom Lee, Juho Lee, Sung Ju Hwang

    Abstract: Numerous recent works utilize bi-Lipschitz regularization of neural network layers to preserve relative distances between data instances in the feature spaces of each layer. This distance sensitivity with respect to the data aids in tasks such as uncertainty calibration and out-of-distribution (OOD) detection. In previous works, features extracted with a distance sensitive model are used to constr… ▽ More

    Submitted 15 March, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

  6. arXiv:2104.05600  [pdf, other

    cs.LG cs.CV stat.ML

    PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging

    Authors: Anthony Sicilia, Xingchen Zhao, Anastasia Sosnovskikh, Seong Jae Hwang

    Abstract: Application of deep neural networks to medical imaging tasks has in some sense become commonplace. Still, a "thorn in the side" of the deep learning movement is the argument that deep networks are prone to overfitting and are thus unable to generalize well when datasets are small (as is common in medical imaging tasks). One way to bolster confidence is to provide mathematical guarantees, or bounds… ▽ More

    Submitted 8 July, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: MICCAI 2021

  7. arXiv:2008.02956  [pdf, other

    cs.LG stat.ML

    Bootstrap** Neural Processes

    Authors: Juho Lee, Yoonho Lee, Jungtaek Kim, Eunho Yang, Sung Ju Hwang, Yee Whye Teh

    Abstract: Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely… ▽ More

    Submitted 27 October, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Published in Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) Code is available at https://github.com/juho-lee/bnp

  8. arXiv:2008.02953  [pdf, other

    cs.LG stat.ML

    Neural Complexity Measures

    Authors: Yoonho Lee, Juho Lee, Sung Ju Hwang, Eunho Yang, Seung** Choi

    Abstract: While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging. We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven wa… ▽ More

    Submitted 23 October, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Published in Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) Code is available at https://github.com/yoonholee/neural-complexity

  9. arXiv:2007.12020  [pdf, other

    cs.LG cs.AI stat.ML

    Few-shot Visual Reasoning with Meta-analogical Contrastive Learning

    Authors: Youngsung Kim, **woo Shin, Eunho Yang, Sung Ju Hwang

    Abstract: While humans can solve a visual puzzle that requires logical reasoning by observing only few samples, it would require training over large amount of data for state-of-the-art deep reasoning models to obtain similar performance on the same task. In this work, we propose to solve such a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning, which is a unique human abi… ▽ More

    Submitted 23 July, 2020; originally announced July 2020.

  10. arXiv:2007.11362  [pdf, other

    cs.LG physics.comp-ph stat.ML

    Time-Reversal Symmetric ODE Network

    Authors: In Huh, Eunho Yang, Sung Ju Hwang, **woo Shin

    Abstract: Time-reversal symmetry, which requires that the dynamics of a system should not change with the reversal of time axis, is a fundamental property that frequently holds in classical and quantum mechanics. In this paper, we propose a novel loss function that measures how well our ordinary differential equation (ODE) networks comply with this time-reversal symmetry; it is formally defined by the discr… ▽ More

    Submitted 6 January, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: 15 pages; accepted to NeurIPS 2020; Code is available at https://github.com/inhuh/trs-oden; v3: references added, typo corrected

  11. arXiv:2007.08844  [pdf, other

    cs.LG stat.ML

    Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

    Authors: Jaehyung Kim, Youngbum Hur, Sejun Park, Eunho Yang, Sung Ju Hwang, **woo Shin

    Abstract: While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo… ▽ More

    Submitted 13 September, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: 19 pages; NeurIPS 2020

  12. arXiv:2007.07358  [pdf, other

    cs.LG stat.ML

    Learning to Sample with Local and Global Contexts in Experience Replay Buffer

    Authors: Youngmin Oh, Kimin Lee, **woo Shin, Eunho Yang, Sung Ju Hwang

    Abstract: Experience replay, which enables the agents to remember and reuse experience from the past, has played a significant role in the success of off-policy reinforcement learning (RL). To utilize the experience replay efficiently, the existing sampling methods allow selecting out more meaningful experiences by imposing priorities on them based on certain metrics (e.g. TD-error). However, they may resul… ▽ More

    Submitted 7 April, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

  13. arXiv:2006.14222  [pdf, other

    cs.LG stat.ML

    Set Based Stochastic Subsampling

    Authors: Bruno Andreis, Seanie Lee, A. Tuan Nguyen, Juho Lee, Eunho Yang, Sung Ju Hwang

    Abstract: Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} usin… ▽ More

    Submitted 30 May, 2022; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: 20 pages

  14. arXiv:2006.12777  [pdf, other

    cs.LG stat.ML

    Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Learning

    Authors: A. Tuan Nguyen, Hyewon Jeong, Eunho Yang, Sung Ju Hwang

    Abstract: Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., predict… ▽ More

    Submitted 18 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: AAAI 2021. The first two authors contributed equally to this work. 10 pages, 4 figures, 4 tables

  15. arXiv:2006.12139  [pdf, other

    cs.LG stat.ML

    Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning

    Authors: Minyoung Song, Jaehong Yoon, Eunho Yang, Sung Ju Hwang

    Abstract: As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of a given network. A common limitation of most existing pruning techniques, is that they require pre-training of the network at least once before pruning, and th… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  16. arXiv:2006.12135  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Learning to Generate Noise for Multi-Attack Robustness

    Authors: Divyam Madaan, **woo Shin, Sung Ju Hwang

    Abstract: Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations. However, the majority of existing defense methods are tailored to defend against a single category of adversarial perturbation (e.g. $\ell_\infty$-attack). In safety-critical applications, this makes these methods extraneous as the attacker ca… ▽ More

    Submitted 24 June, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted to ICML 2021. Code available at https://github.com/divyam3897/MNG_AC

  17. arXiv:2006.12097  [pdf, other

    cs.LG stat.ML

    Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning

    Authors: Wonyong Jeong, Jaehong Yoon, Eunho Yang, Sung Ju Hwang

    Abstract: While existing federated learning approaches mostly require that clients have fully-labeled data to train on, in realistic settings, data obtained at the client-side often comes without any accompanying labels. Such deficiency of labels may result from either high labeling cost, or difficulty of annotation due to the requirement of expert knowledge. Thus the private data at each client may be eith… ▽ More

    Submitted 29 March, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Journal ref: International Conference on Learning Representations (ICLR 2021), International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2020 (FL-ICML'20)

  18. arXiv:2006.07589  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarial Self-Supervised Contrastive Learning

    Authors: Minseon Kim, Jihoon Tack, Sung Ju Hwang

    Abstract: Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions, which are then used to augment the training of the model for improved robustness. While some recent works propose semi-supervised adversarial learning methods that utilize unlabeled data, they still require class labels. However, do we really need class labels at all… ▽ More

    Submitted 26 October, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020. Code: https://github.com/Kim-Minseon/RoCL

  19. arXiv:2006.07540  [pdf, other

    cs.LG stat.ML

    MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

    Authors: Jeongun Ryu, Jaewoong Shin, Hae Beom Lee, Sung Ju Hwang

    Abstract: Regularization and transfer learning are two popular techniques to enhance generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not gene… ▽ More

    Submitted 15 February, 2022; v1 submitted 12 June, 2020; originally announced June 2020.

    Report number: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  20. arXiv:2006.06648  [pdf, other

    cs.LG stat.ML

    Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction

    Authors: **heon Baek, Dong Bok Lee, Sung Ju Hwang

    Abstract: Many practical graph problems, such as knowledge graph construction and drug-drug interaction prediction, require to handle multi-relational graphs. However, handling real-world multi-relational graphs with Graph Neural Networks (GNNs) is often challenging due to their evolving nature, as new entities (nodes) can emerge over time. Moreover, newly emerged entities often have few links, which makes… ▽ More

    Submitted 29 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  21. arXiv:2006.05419  [pdf, other

    cs.LG cs.HC stat.ML

    Cost-effective Interactive Attention Learning with Neural Attention Processes

    Authors: Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang Joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang

    Abstract: We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

  22. arXiv:2004.02863  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

    Authors: Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

    Abstract: In practical settings, a speaker recognition system needs to identify a speaker given a short utterance, while the enrollment utterance may be relatively long. However, existing speaker recognition models perform poorly with such short utterances. To solve this problem, we introduce a meta-learning framework for imbalance length pairs. Specifically, we use a Prototypical Networks and train it with… ▽ More

    Submitted 10 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to Interspeech 2020. The codes are available at https://github.com/seongmin-kye/meta-SR

  23. arXiv:2003.03196  [pdf, other

    cs.LG stat.ML

    Federated Continual Learning with Weighted Inter-client Transfer

    Authors: Jaehong Yoon, Wonyong Jeong, Giwoong Lee, Eunho Yang, Sung Ju Hwang

    Abstract: There has been a surge of interest in continual learning and federated learning, both of which are important in deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from a private local data stream. This problem of federated continual learning poses new challenges to continual learning, such as utiliz… ▽ More

    Submitted 14 June, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: ICML 2021

  24. arXiv:2002.12017  [pdf, other

    cs.LG cs.CV stat.ML

    Meta-Learned Confidence for Few-shot Learning

    Authors: Seong Min Kye, Hae Beom Lee, Hoirin Kim, Sung Ju Hwang

    Abstract: Transductive inference is an effective means of tackling the data deficiency problem in few-shot learning settings. A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples, or confidence-weighted average of all the query samples. However, a caveat here is that the model confidence m… ▽ More

    Submitted 24 June, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  25. arXiv:1910.05872  [pdf, other

    cs.LG cs.CV stat.ML

    Self-supervised Label Augmentation via Input Transformations

    Authors: Hankook Lee, Sung Ju Hwang, **woo Shin

    Abstract: Self-supervised learning, which learns by constructing artificial labels given only the input signals, has recently gained considerable attention for learning representations with unlabeled datasets, i.e., learning without any human-annotated supervision. In this paper, we show that such a technique can be used to significantly improve the model accuracy even under fully-labeled datasets. Our sche… ▽ More

    Submitted 29 June, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

    Comments: Accepted to ICML 2020. Code available at https://github.com/hankook/SLA

  26. arXiv:1909.04311  [pdf, other

    cs.LG stat.ML

    Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection

    Authors: Byunggill Joe, Sung Ju Hwang, Insik Shin

    Abstract: Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversar… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: main: 10 pages appendix: 5 pages

  27. arXiv:1908.04355  [pdf, other

    cs.LG cs.CR cs.CV cs.NE stat.ML

    Adversarial Neural Pruning with Latent Vulnerability Suppression

    Authors: Divyam Madaan, **woo Shin, Sung Ju Hwang

    Abstract: Despite the remarkable performance of deep neural networks on various computer vision tasks, they are known to be susceptible to adversarial perturbations, which makes it challenging to deploy them in real-world safety-critical applications. In this paper, we conjecture that the leading cause of adversarial vulnerability is the distortion in the latent feature space, and provide methods to suppres… ▽ More

    Submitted 2 July, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Accepted to ICML 2020. Code available at https://github.com/divyam3897/ANP_VS

  28. arXiv:1908.01457  [pdf, other

    cs.LG stat.ML

    Learning to Generalize to Unseen Tasks with Bilevel Optimization

    Authors: Hayeon Lee, Donghyun Na, Hae Beom Lee, Sung Ju Hwang

    Abstract: Recent metric-based meta-learning approaches, which learn a metric space that generalizes well over combinatorial number of different classification tasks sampled from a task distribution, have been shown to be effective for few-shot classification tasks of unseen classes. They are often trained with episodic training where they iteratively train a common metric space that reduces distance between… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

    Comments: 9 pages, 3 figures

  29. arXiv:1906.03118  [pdf, other

    cs.LG stat.ML

    Reliable Estimation of Individual Treatment Effect with Causal Information Bottleneck

    Authors: Sungyub Kim, Yongsu Baek, Sung Ju Hwang, Eunho Yang

    Abstract: Estimating individual level treatment effects (ITE) from observational data is a challenging and important area in causal machine learning and is commonly considered in diverse mission-critical applications. In this paper, we propose an information theoretic approach in order to find more reliable representations for estimating ITE. We leverage the Information Bottleneck (IB) principle, which addr… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  30. arXiv:1906.00150  [pdf, other

    cs.LG stat.ML

    Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks

    Authors: Joonyoung Yi, Juhyuk Lee, Kwang Joon Kim, Sung Ju Hwang, Eunho Yang

    Abstract: Handling missing data is one of the most fundamental problems in machine learning. Among many approaches, the simplest and most intuitive way is zero imputation, which treats the value of a missing entry simply as zero. However, many studies have experimentally confirmed that zero imputation results in suboptimal performances in training neural networks. Yet, none of the existing work has explaine… ▽ More

    Submitted 6 February, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

    Comments: 27 pages

    Journal ref: Nucl.Phys.Proc.Suppl. 109 (2002) 3-9 Nucl.Phys.Proc.Suppl. 109 (2002) 3-9 Nucl.Phys.Proc.Suppl. 109 (2002) 3-9 Proceedings of International Conference on Learning Representations (ICLR) 2020

  31. arXiv:1905.12917  [pdf, other

    cs.LG stat.ML

    Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

    Authors: Hae Beom Lee, Hayeon Lee, Donghyun Na, Saehoon Kim, Minseop Park, Eunho Yang, Sung Ju Hwang

    Abstract: While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that the number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover,… ▽ More

    Submitted 12 February, 2022; v1 submitted 30 May, 2019; originally announced May 2019.

  32. arXiv:1905.12914  [pdf, other

    cs.LG stat.ML

    Meta Dropout: Learning to Perturb Features for Generalization

    Authors: Hae Beom Lee, Taewook Nam, Eunho Yang, Sung Ju Hwang

    Abstract: A machine learning model that generalizes well should obtain low errors on unseen test examples. Thus, if we know how to optimally perturb training examples to account for test examples, we may achieve better generalization performance. However, obtaining such perturbation is not possible in standard machine learning frameworks as the distribution of the test data is unknown. To tackle this challe… ▽ More

    Submitted 12 February, 2022; v1 submitted 30 May, 2019; originally announced May 2019.

  33. arXiv:1905.05901  [pdf, other

    cs.LG stat.ML

    Learning What and Where to Transfer

    Authors: Yunhun Jang, Hankook Lee, Sung Ju Hwang, **woo Shin

    Abstract: As the application of deep learning has expanded to real-world problems with insufficient volume of training data, transfer learning recently has gained much attention as means of improving the performance in such small-data regime. However, when existing methods are applied between heterogeneous architectures and tasks, it becomes more important to manage their detailed configurations and often r… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: Accepted to ICML 2019

  34. arXiv:1903.06164  [pdf, other

    cs.LG cs.CL stat.ML

    Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data

    Authors: Moonsu Han, Minki Kang, Hyunwoo Jung, Sung Ju Hwang

    Abstract: We consider a novel question answering (QA) task where the machine needs to read from large streaming data (long documents or videos) without knowing when the questions will be given, which is difficult to solve with existing QA methods due to their lack of scalability. To tackle this problem, we propose a novel end-to-end deep network model for reading comprehension, which we refer to as Episodic… ▽ More

    Submitted 9 June, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

    Comments: 18 pages, 20 figures, 1 table

  35. arXiv:1902.09432  [pdf, other

    cs.LG stat.ML

    Scalable and Order-robust Continual Learning with Additive Parameter Decomposition

    Authors: Jaehong Yoon, Saehoon Kim, Eunho Yang, Sung Ju Hwang

    Abstract: While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where th… ▽ More

    Submitted 15 February, 2020; v1 submitted 25 February, 2019; originally announced February 2019.

    Comments: Published in "International Conference on Learning Representation (ICLR)" 2020

    ACM Class: I.2.6; I.2.10

  36. arXiv:1810.07368  [pdf, other

    cs.LG stat.ML

    Learning the Compositional Spaces for Generalized Zero-shot Learning

    Authors: Hanze Dong, Yanwei Fu, Sung Ju Hwang, Leonid Sigal, Xiangyang Xue

    Abstract: This paper studies the problem of Generalized Zero-shot Learning (G-ZSL), whose goal is to classify instances belonging to both seen and unseen classes at the test time. We propose a novel space decomposition method to solve G-ZSL. Some previous models with space decomposition operations only calibrate the confident prediction of source classes (W-SVM [46]) or take target-class instances as outlie… ▽ More

    Submitted 30 August, 2021; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: 10 pages, 2 figures

  37. arXiv:1806.01551  [pdf, other

    stat.ML cs.LG

    Deep Mixed Effect Model using Gaussian Processes: A Personalized and Reliable Prediction for Healthcare

    Authors: Ingyo Chung, Saehoon Kim, Juho Lee, Kwang Joon Kim, Sung Ju Hwang, Eunho Yang

    Abstract: We present a personalized and reliable prediction model for healthcare, which can provide individually tailored medical services such as diagnosis, disease treatment, and prevention. Our proposed framework targets at making personalized and reliable predictions from time-series data, such as Electronic Health Records (EHR), by modeling two complementary components: i) a shared component that captu… ▽ More

    Submitted 24 November, 2019; v1 submitted 5 June, 2018; originally announced June 2018.

    Comments: AAAI 2020

  38. arXiv:1805.10896  [pdf, other

    stat.ML cs.LG

    Adaptive Network Sparsification with Dependent Variational Beta-Bernoulli Dropout

    Authors: Juho Lee, Saehoon Kim, Jaehong Yoon, Hae Beom Lee, Eunho Yang, Sung Ju Hwang

    Abstract: While variational dropout approaches have been shown to be effective for network sparsification, they are still suboptimal in the sense that they set the dropout rate for each neuron without consideration of the input data. With such input-independent dropout, each neuron is evolved to be generic across inputs, which makes it difficult to sparsify networks without accuracy loss. To overcome this l… ▽ More

    Submitted 3 March, 2019; v1 submitted 28 May, 2018; originally announced May 2018.

  39. arXiv:1805.10002  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

    Authors: Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, Yi Yang

    Abstract: The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class. The recently introduced meta-learning approaches tackle this problem by learning a generic classifier across a large number of multiclass classification tasks and generalizing the model to a new task. Yet, even with such meta-learning, the low-data p… ▽ More

    Submitted 8 February, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

    Comments: Accepted in ICLR 2019; code available at https://github.com/csyanbin/TPN

    MSC Class: 68T10; 97R40 ACM Class: I.5.1; I.2.6

  40. arXiv:1805.09653  [pdf, other

    stat.ML cs.AI cs.LG

    Uncertainty-Aware Attention for Reliable Interpretation and Prediction

    Authors: Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang

    Abstract: Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each fe… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  41. arXiv:1804.07351  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families

    Authors: Seong Jae Hwang, Ronak Mehta, Hyunwoo J. Kim, Vikas Singh

    Abstract: There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make. Clearly, there are enormous benefits to a system that is not only accurate but also has a sense for when it is not sure. Existing proposals center around Bayesian interpretations of modern deep architectures -- these are effective but c… ▽ More

    Submitted 2 September, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

    Comments: Version 2

  42. arXiv:1709.05964  [pdf, other

    cs.LG stat.ML

    Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification

    Authors: Ha** Shim, Sung Ju Hwang, Eunho Yang

    Abstract: We consider the problem of active feature acquisition, where we sequentially select the subset of features in order to achieve the maximum prediction performance in the most cost-effective way. In this work, we formulate this active feature acquisition problem as a reinforcement learning problem, and provide a novel framework for jointly learning both the RL agent and the classifier (environment).… ▽ More

    Submitted 18 September, 2017; originally announced September 2017.

  43. arXiv:1708.00260  [pdf, other

    cs.LG stat.ML

    Deep Asymmetric Multi-task Feature Learning

    Authors: Hae Beom Lee, Eunho Yang, Sung Ju Hwang

    Abstract: We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppres… ▽ More

    Submitted 30 June, 2018; v1 submitted 1 August, 2017; originally announced August 2017.