Search | arXiv e-print repository

A Variational Approach for Modeling High-dimensional Spatial Generalized Linear Mixed Models

Abstract: Gaussian and discrete non-Gaussian spatial datasets are prevalent across many fields such as public health, ecology, geosciences, and social sciences. Bayesian spatial generalized linear mixed models (SGLMMs) are a flexible class of models designed for these data, but SGLMMs do not scale well, even to moderately large datasets. State-of-the-art scalable SGLMMs (i.e., basis representations or spars… ▽ More Gaussian and discrete non-Gaussian spatial datasets are prevalent across many fields such as public health, ecology, geosciences, and social sciences. Bayesian spatial generalized linear mixed models (SGLMMs) are a flexible class of models designed for these data, but SGLMMs do not scale well, even to moderately large datasets. State-of-the-art scalable SGLMMs (i.e., basis representations or sparse covariance/precision matrices) require posterior sampling via Markov chain Monte Carlo (MCMC), which can be prohibitive for large datasets. While variational Bayes (VB) have been extended to SGLMMs, their focus has primarily been on smaller spatial datasets. In this study, we propose two computationally efficient VB approaches for modeling moderate-sized and massive (millions of locations) Gaussian and discrete non-Gaussian spatial data. Our scalable VB method embeds semi-parametric approximations for the latent spatial random processes and parallel computing offered by modern high-performance computing systems. Our approaches deliver nearly identical inferential and predictive performance compared to 'gold standard' methods but achieve computational speedups of up to 1000x. We demonstrate our approaches through a comparative numerical study as well as applications to two real-world datasets. Our proposed VB methodology enables practitioners to model millions of non-Gaussian spatial observations using a standard laptop within a short timeframe. △ Less

Submitted 17 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 34 Pages for the main paper, 72 pages for the supplemental information, 4 tables, 5 figures

arXiv:2401.00104 [pdf, other]

Causal State Distillation for Explainable Reinforcement Learning

Authors: Wenhao Lu, Xufeng Zhao, Thilo Fryen, Jae Hee Lee, Mengdi Li, Sven Magg, Stefan Wermter

Abstract: Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promi… ▽ More Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections. △ Less

Submitted 1 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: https://lukaswill.github.io/; Accepted as oral by CLeaR 2024

arXiv:2311.10792 [pdf]

Enhancing Data Efficiency and Feature Identification for Lithium-Ion Battery Lifespan Prediction by Deciphering Interpretation of Temporal Patterns and Cyclic Variability Using Attention-Based Models

Authors: Jaewook Lee, Seongmin Heo, Jay H. Lee

Abstract: Accurately predicting the lifespan of lithium-ion batteries is crucial for optimizing operational strategies and mitigating risks. While numerous studies have aimed at predicting battery lifespan, few have examined the interpretability of their models or how such insights could improve predictions. Addressing this gap, we introduce three innovative models that integrate shallow attention layers in… ▽ More Accurately predicting the lifespan of lithium-ion batteries is crucial for optimizing operational strategies and mitigating risks. While numerous studies have aimed at predicting battery lifespan, few have examined the interpretability of their models or how such insights could improve predictions. Addressing this gap, we introduce three innovative models that integrate shallow attention layers into a foundational model from our previous work, which combined elements of recurrent and convolutional neural networks. Utilizing a well-known public dataset, we showcase our methodology's effectiveness. Temporal attention is applied to identify critical timesteps and highlight differences among test cell batches, particularly underscoring the significance of the "rest" phase. Furthermore, by applying cyclic attention via self-attention to context vectors, our approach effectively identifies key cycles, enabling us to strategically decrease the input size for quicker predictions. Employing both single- and multi-head attention mechanisms, we have systematically minimized the required input from 100 to 50 and then to 30 cycles, refining this process based on cyclic attention scores. Our refined model exhibits strong regression capabilities, accurately forecasting the initiation of rapid capacity fade with an average deviation of only 58 cycles by analyzing just the initial 30 cycles of easily accessible input data. △ Less

Submitted 11 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2211.13866 [pdf, ps, other]

Minimal Width for Universal Property of Deep RNN

Authors: Chang hoon Song, Geonho Hwang, Jun ho Lee, Myungjoo Kang

Abstract: A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to… ▽ More A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $\tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs. △ Less

Submitted 28 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

arXiv:2101.11568 [pdf, other]

doi 10.1016/j.jeconom.2022.11.006

Predictive Quantile Regression with Mixed Roots and Increasing Dimensions: The ALQR Approach

Authors: Rui Fan, Ji Hyung Lee, Youngki Shin

Abstract: In this paper we propose the adaptive lasso for predictive quantile regression (ALQR). Reflecting empirical findings, we allow predictors to have various degrees of persistence and exhibit different signal strengths. The number of predictors is allowed to grow with the sample size. We study regularity conditions under which stationary, local unit root, and cointegrated predictors are present simul… ▽ More In this paper we propose the adaptive lasso for predictive quantile regression (ALQR). Reflecting empirical findings, we allow predictors to have various degrees of persistence and exhibit different signal strengths. The number of predictors is allowed to grow with the sample size. We study regularity conditions under which stationary, local unit root, and cointegrated predictors are present simultaneously. We next show the convergence rates, model selection consistency, and asymptotic distributions of ALQR. We apply the proposed method to the out-of-sample quantile prediction problem of stock returns and find that it outperforms the existing alternatives. We also provide numerical evidence from additional Monte Carlo experiments, supporting the theoretical results. △ Less

Submitted 3 December, 2022; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: 71 pages, 5 figures, 18 tables

Journal ref: Journal of Econometrics, Vol 237, No 2, Part C, Article 105372, 2023

arXiv:2011.09707 [pdf, other]

Application of Deep Learning-based Interpolation Methods to Nearshore Bathymetry

Authors: Yizhou Qian, Mojtaba Forghani, Jonghyun Harry Lee, Matthew Farthing, Tyler Hesser, Peter Kitanidis, Eric Darve

Abstract: Nearshore bathymetry, the topography of the ocean floor in coastal zones, is vital for predicting the surf zone hydrodynamics and for route planning to avoid subsurface features. Hence, it is increasingly important for a wide variety of applications, including ship** operations, coastal management, and risk assessment. However, direct high resolution surveys of nearshore bathymetry are rarely pe… ▽ More Nearshore bathymetry, the topography of the ocean floor in coastal zones, is vital for predicting the surf zone hydrodynamics and for route planning to avoid subsurface features. Hence, it is increasingly important for a wide variety of applications, including ship** operations, coastal management, and risk assessment. However, direct high resolution surveys of nearshore bathymetry are rarely performed due to budget constraints and logistical restrictions. Another option when only sparse observations are available is to use Gaussian Process regression (GPR), also called Kriging. But GPR has difficulties recognizing patterns with sharp gradients, like those found around sand bars and submerged objects, especially when observations are sparse. In this work, we present several deep learning-based techniques to estimate nearshore bathymetry with sparse, multi-scale measurements. We propose a Deep Neural Network (DNN) to compute posterior estimates of the nearshore bathymetry, as well as a conditional Generative Adversarial Network (cGAN) that samples from the posterior distribution. We train our neural networks based on synthetic data generated from nearshore surveys provided by the U.S.\ Army Corps of Engineer Field Research Facility (FRF) in Duck, North Carolina. We compare our methods with Kriging on real surveys as well as surveys with artificially added sharp gradients. Results show that direct estimation by DNN gives better predictions than Kriging in this application. We use bootstrap** with DNN for uncertainty quantification. We also propose a method, named DNN-Kriging, that combines deep learning with Kriging and shows further improvement of the posterior estimates. △ Less

Submitted 19 November, 2020; originally announced November 2020.

arXiv:2010.16040 [pdf, other]

Deep Hurdle Networks for Zero-Inflated Multi-Target Regression: Application to Multiple Species Abundance Estimation

Authors: Shufeng Kong, Junwen Bai, Jae Hee Lee, Di Chen, Andrew Allyn, Michelle Stuart, Malin Pinsky, Katherine Mills, Carla P. Gomes

Abstract: A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both… ▽ More A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both issues simultaneously, which we refer to as the zero-inflated multi-target regression problem, has not been addressed by previous methods in statistics and machine learning. In this paper, we propose a novel deep model for the zero-inflated multi-target regression problem. To this end, we first model the joint distribution of multiple response variables as a multivariate probit model and then couple the positive outcomes with a multivariate log-normal distribution. By penalizing the difference between the two distributions' covariance matrices, a link between both distributions is established. The whole model is cast as an end-to-end learning framework and we provide an efficient learning algorithm for our model that can be fully implemented on GPUs. We show that our model outperforms the existing state-of-the-art baselines on two challenging real-world species distribution datasets concerning bird and fish populations. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: Accepted by IJCAI 2020

arXiv:2009.11253 [pdf, other]

Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning

Authors: Henry Kvinge, Zachary New, Nico Courts, Jung H. Lee, Lauren A. Phillips, Courtney D. Corley, Aaron Tuor, Andrew Avila, Nathan O. Hodas

Abstract: Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category… ▽ More Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category membership). One can also ask how well a model can generalize to fundamentally different tasks within a fixed dataset (for example: moving from category membership to tasks that involve detecting object orientation or quantity). To formalize this kind of shift we define a notion of "independence of tasks" and identify three new sets of labels for established computer vision datasets that test a model's ability to generalize to tasks which draw on orthogonal attributes in the data. We use these datasets to investigate the failure modes of metric-based few-shot models. Based on our findings, we introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data. In particular, FSN models can not only form multiple representations for a given class but can also begin to capture the low-dimensional structure which characterizes class manifolds in the encoded space of deep networks. We show that FSN outperforms state-of-the-art models on the challenging tasks we introduce in this paper while remaining competitive on standard few-shot benchmarks. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: 17 pages

arXiv:2007.00873 [pdf, other]

Compressed Sensing via Measurement-Conditional Generative Models

Authors: Kyung-Su Kim, Jung Hyun Lee, Eunho Yang

Abstract: A pre-trained generator has been frequently adopted in compressed sensing (CS) due to its ability to effectively estimate signals with the prior of NNs. In order to further refine the NN-based prior, we propose a framework that allows the generator to utilize additional information from a given measurement for prior learning, thereby yielding more accurate prediction for signals. As our framework… ▽ More A pre-trained generator has been frequently adopted in compressed sensing (CS) due to its ability to effectively estimate signals with the prior of NNs. In order to further refine the NN-based prior, we propose a framework that allows the generator to utilize additional information from a given measurement for prior learning, thereby yielding more accurate prediction for signals. As our framework has a simple form, it is easily applied to existing CS methods using pre-trained generators. We demonstrate through extensive experiments that our framework exhibits uniformly superior performances by large margin and can reduce the reconstruction error up to an order of magnitude for some applications. We also explain the experimental success in theory by showing that our framework can slightly relax the stringent signal presence condition, which is required to guarantee the success of signal recovery. △ Less

Submitted 2 November, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

arXiv:2003.03299 [pdf, other]

doi 10.1017/S0266466621000402

Complete Subset Averaging for Quantile Regressions

Authors: Ji Hyung Lee, Youngki Shin

Abstract: We propose a novel conditional quantile prediction method based on complete subset averaging (CSA) for quantile regressions. All models under consideration are potentially misspecified and the dimension of regressors goes to infinity as the sample size increases. Since we average over the complete subsets, the number of models is much larger than the usual model averaging method which adopts sophi… ▽ More We propose a novel conditional quantile prediction method based on complete subset averaging (CSA) for quantile regressions. All models under consideration are potentially misspecified and the dimension of regressors goes to infinity as the sample size increases. Since we average over the complete subsets, the number of models is much larger than the usual model averaging method which adopts sophisticated weighting schemes. We propose to use an equal weight but select the proper size of the complete subset based on the leave-one-out cross-validation method. Building upon the theory of Lu and Su (2015), we investigate the large sample properties of CSA and show the asymptotic optimality in the sense of Li (1987). We check the finite sample performance via Monte Carlo simulations and empirical applications. △ Less

Submitted 12 July, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: 46 pages, 3 figures, 9 tables

arXiv:1909.13360 [pdf, ps, other]

Library network, a possible path to explainable neural networks

Authors: Jung Hoon Lee

Abstract: Deep neural networks (DNNs) may outperform human brains in complex tasks, but the lack of transparency in their decision-making processes makes us question whether we could fully trust DNNs with high stakes problems. As DNNs' operations rely on a massive number of both parallel and sequential linear/nonlinear computations, predicting their mistakes is nearly impossible. Also, a line of studies sug… ▽ More Deep neural networks (DNNs) may outperform human brains in complex tasks, but the lack of transparency in their decision-making processes makes us question whether we could fully trust DNNs with high stakes problems. As DNNs' operations rely on a massive number of both parallel and sequential linear/nonlinear computations, predicting their mistakes is nearly impossible. Also, a line of studies suggests that DNNs can be easily deceived by adversarial attacks, indicating that their decisions can easily be corrupted by unexpected factors. Such vulnerability must be overcome if we intend to take advantage of DNNs' efficiency in high stakes problems. Here, we propose an algorithm that can help us better understand DNNs' decision-making processes. Our empirical evaluations suggest that this algorithm can effectively trace DNNs' decision processes from one layer to another and detect adversarial attacks. △ Less

Submitted 17 March, 2020; v1 submitted 29 September, 2019; originally announced September 2019.

Comments: 15 page, 5 figures, 1 table

arXiv:1908.05006 [pdf, other]

doi 10.1007/s10618-020-00700-0

Visualizing Image Content to Explain Novel Image Discovery

Authors: Jake H. Lee, Kiri L. Wagstaff

Abstract: The initial analysis of any large data set can be divided into two phases: (1) the identification of common trends or patterns and (2) the identification of anomalies or outliers that deviate from those trends. We focus on the goal of detecting observations with novel content, which can alert us to artifacts in the data set or, potentially, the discovery of previously unknown phenomena. To aid in… ▽ More The initial analysis of any large data set can be divided into two phases: (1) the identification of common trends or patterns and (2) the identification of anomalies or outliers that deviate from those trends. We focus on the goal of detecting observations with novel content, which can alert us to artifacts in the data set or, potentially, the discovery of previously unknown phenomena. To aid in interpreting and diagnosing the novel aspect of these selected observations, we recommend the use of novelty detection methods that generate explanations. In the context of large image data sets, these explanations should highlight what aspect of a given image is new (color, shape, texture, content) in a human-comprehensible form. We propose DEMUD-VIS, the first method for providing visual explanations of novel image content by employing a convolutional neural network (CNN) to extract image features, a method that uses reconstruction error to detect novel content, and an up-convolutional network to convert CNN feature representations back into image space. We demonstrate this approach on diverse images from ImageNet, freshwater streams, and the surface of Mars. △ Less

Submitted 12 September, 2022; v1 submitted 14 August, 2019; originally announced August 2019.

Comments: Published in DMKD

Journal ref: Data Mining and Knowledge Discovery, vol. 34, pp. 1777-1804, 2020

arXiv:1907.10905 [pdf, other]

A Group-Theoretic Framework for Data Augmentation

Authors: Shuxiao Chen, Edgar Dobriban, Jane H Lee

Abstract: Data augmentation is a widely used trick when training deep neural networks: in addition to the original data, properly transformed data are also added to the training set. However, to the best of our knowledge, a clear mathematical framework to explain the performance benefits of data augmentation is not available. In this paper, we develop such a theoretical framework. We show data augmentation… ▽ More Data augmentation is a widely used trick when training deep neural networks: in addition to the original data, properly transformed data are also added to the training set. However, to the best of our knowledge, a clear mathematical framework to explain the performance benefits of data augmentation is not available. In this paper, we develop such a theoretical framework. We show data augmentation is equivalent to an averaging operation over the orbits of a certain group that keeps the data distribution approximately invariant. We prove that it leads to variance reduction. We study empirical risk minimization, and the examples of exponential families, linear regression, and certain two-layer neural networks. We also discuss how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM). △ Less

Submitted 6 November, 2020; v1 submitted 25 July, 2019; originally announced July 2019.

Comments: To appear in Journal of Machine Learning Research

arXiv:1901.00188 [pdf, ps, other]

Complementary reinforcement learning towards explainable agents

Authors: Jung Hoon Lee

Abstract: Reinforcement learning (RL) algorithms allow agents to learn skills and strategies to perform complex tasks without detailed instructions or expensive labelled training examples. That is, RL agents can learn, as we learn. Given the importance of learning in our intelligence, RL has been thought to be one of key components to general artificial intelligence, and recent breakthroughs in deep reinfor… ▽ More Reinforcement learning (RL) algorithms allow agents to learn skills and strategies to perform complex tasks without detailed instructions or expensive labelled training examples. That is, RL agents can learn, as we learn. Given the importance of learning in our intelligence, RL has been thought to be one of key components to general artificial intelligence, and recent breakthroughs in deep reinforcement learning suggest that neural networks (NN) are natural platforms for RL agents. However, despite the efficiency and versatility of NN-based RL agents, their decision-making remains incomprehensible, reducing their utilities. To deploy RL into a wider range of applications, it is imperative to develop explainable NN-based RL agents. Here, we propose a method to derive a secondary comprehensible agent from a NN-based RL agent, whose decision-makings are based on simple rules. Our empirical evaluation of this secondary agent's performance supports the possibility of building a comprehensible and transparent agent using a NN-based RL agent. △ Less

Submitted 23 January, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

Comments: 14 pages, 5 figures

arXiv:1808.04447 [pdf, other]

Deep Learning Super-Resolution Enables Rapid Simultaneous Morphological and Quantitative Magnetic Resonance Imaging

Authors: Akshay Chaudhari, Zhongnan Fang, ** Hyung Lee, Garry Gold, Brian Hargreaves

Abstract: Obtaining magnetic resonance images (MRI) with high resolution and generating quantitative image-based biomarkers for assessing tissue biochemistry is crucial in clinical and research applications. How- ever, acquiring quantitative biomarkers requires high signal-to-noise ratio (SNR), which is at odds with high-resolution in MRI, especially in a single rapid sequence. In this paper, we demonstrate… ▽ More Obtaining magnetic resonance images (MRI) with high resolution and generating quantitative image-based biomarkers for assessing tissue biochemistry is crucial in clinical and research applications. How- ever, acquiring quantitative biomarkers requires high signal-to-noise ratio (SNR), which is at odds with high-resolution in MRI, especially in a single rapid sequence. In this paper, we demonstrate how super-resolution can be utilized to maintain adequate SNR for accurate quantification of the T2 relaxation time biomarker, while simultaneously generating high- resolution images. We compare the efficacy of resolution enhancement using metrics such as peak SNR and structural similarity. We assess accuracy of cartilage T2 relaxation times by comparing against a standard reference method. Our evaluation suggests that SR can successfully maintain high-resolution and generate accurate biomarkers for accelerating MRI scans and enhancing the value of clinical and research MRI. △ Less

Submitted 7 August, 2018; originally announced August 2018.

Comments: Accepted for the Machine Learning for Medical Image Reconstruction Workshop at MICCAI 2018

arXiv:1806.06253 [pdf]

DynMat, a network that can learn after learning

Authors: Jung H. Lee

Abstract: To survive in the dynamically-evolving world, we accumulate knowledge and improve our skills based on experience. In the process, gaining new knowledge does not disrupt our vigilance to external stimuli. In other words, our learning process is 'accumulative' and 'online' without interruption. However, despite the recent success, artificial neural networks (ANNs) must be trained offline, and they s… ▽ More To survive in the dynamically-evolving world, we accumulate knowledge and improve our skills based on experience. In the process, gaining new knowledge does not disrupt our vigilance to external stimuli. In other words, our learning process is 'accumulative' and 'online' without interruption. However, despite the recent success, artificial neural networks (ANNs) must be trained offline, and they suffer catastrophic interference between old and new learning, indicating that ANNs' conventional learning algorithms may not be suitable for building intelligent agents comparable to our brain. In this study, we propose a novel neural network architecture (DynMat) consisting of dual learning systems, inspired by the complementary learning system (CLS) theory suggesting that the brain relies on short- and long-term learning systems to learn continuously. Our experiments show that 1) DynMat can learn a new class without catastrophic interference and 2) it does not strictly require offline training. △ Less

Submitted 2 February, 2019; v1 submitted 16 June, 2018; originally announced June 2018.

Comments: 40 pages and 9 figures

arXiv:0710.5837 [pdf, ps, other]

On estimating covariances between many assets with histories of highly variable length

Authors: Robert B. Gramacy, Joo Hee Lee, Ricardo Silva

Abstract: Quantitative portfolio allocation requires the accurate and tractable estimation of covariances between a large number of assets, whose histories can greatly vary in length. Such data are said to follow a monotone missingness pattern, under which the likelihood has a convenient factorization. Upon further assuming that asset returns are multivariate normally distributed, with histories at least… ▽ More Quantitative portfolio allocation requires the accurate and tractable estimation of covariances between a large number of assets, whose histories can greatly vary in length. Such data are said to follow a monotone missingness pattern, under which the likelihood has a convenient factorization. Upon further assuming that asset returns are multivariate normally distributed, with histories at least as long as the total asset count, maximum likelihood (ML) estimates are easily obtained by performing repeated ordinary least squares (OLS) regressions, one for each asset. Things get more interesting when there are more assets than historical returns. OLS becomes unstable due to rank--deficient design matrices, which is called a "big p small n" problem. We explore remedies that involve making a change of basis, as in principal components or partial least squares regression, or by applying shrinkage methods like ridge regression or the lasso. This enables the estimation of covariances between large sets of assets with histories of essentially arbitrary length, and offers improvements in accuracy and interpretation. We further extend the method by showing how external factors can be incorporated. This allows for the adaptive use of factors without the restrictive assumptions common in factor models. Our methods are demonstrated on randomly generated data, and then benchmarked by the performance of balanced portfolios using real historical financial returns. An accompanying R package called monomvn, containing code implementing the estimators described herein, has been made freely available on CRAN. △ Less

Submitted 24 February, 2009; v1 submitted 31 October, 2007; originally announced October 2007.

Comments: 39 pages, 5 figures, 2 tables, submitted to JCGS

Showing 1–17 of 17 results for author: Lee, J H