-
Mixture of partially linear experts
Authors:
Yeongsan Hwang,
Byungtae Seo,
Sangkon Oh
Abstract:
In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to cap…
▽ More
In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to capture nonlinear relationships. We establish the identifiability of the proposed model under mild conditions and introduce a practical estimation algorithm. We present the performance of our approach through numerical studies, including simulations and real data analysis.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Adaptive Accelerated Failure Time modeling with a Semiparametric Skewed Error Distribution
Authors:
Sangkon Oh,
Hyunjae Lee,
Sangwook Kang,
Byungtae Seo
Abstract:
The accelerated failure time (AFT) model is widely used to analyze relationships between variables in the presence of censored observations. However, this model relies on some assumptions such as the error distribution, which can lead to biased or inefficient estimates if these assumptions are violated. In order to overcome this challenge, we propose a novel approach that incorporates a semiparame…
▽ More
The accelerated failure time (AFT) model is widely used to analyze relationships between variables in the presence of censored observations. However, this model relies on some assumptions such as the error distribution, which can lead to biased or inefficient estimates if these assumptions are violated. In order to overcome this challenge, we propose a novel approach that incorporates a semiparametric skew-normal scale mixture distribution for the error term in the AFT model. By allowing for more flexibility and robustness, this approach reduces the risk of misspecification and improves the accuracy of parameter estimation. We investigate the identifiability and consistency of the proposed model and develop a practical estimation algorithm. To evaluate the performance of our approach, we conduct extensive simulation studies and real data analyses. The results demonstrate the effectiveness of our method in providing robust and accurate estimates in various scenarios.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
A2I Transformer: Permutation-equivariant attention network for pairwise and many-body interactions with minimal featurization
Authors:
Ji Woong Yu,
Min Young Ha,
Bumjoon Seo,
Won Bo Lee
Abstract:
The combination of neural network potential (NNP) with molecular simulations plays an important role in an efficient and thorough understanding of a molecular system's potential energy surface (PES). However, gras** the interplay between input features and their local contribution to NNP is growingly evasive due to heavy featurization. In this work, we suggest an end-to-end model which directly…
▽ More
The combination of neural network potential (NNP) with molecular simulations plays an important role in an efficient and thorough understanding of a molecular system's potential energy surface (PES). However, gras** the interplay between input features and their local contribution to NNP is growingly evasive due to heavy featurization. In this work, we suggest an end-to-end model which directly predicts per-atom energy from the coordinates of particles, avoiding expert-guided featurization of the network input. Employing self-attention as the main workhorse, our model is intrinsically equivariant under the permutation operation, resulting in the invariance of the total potential energy. We tested our model against several challenges in molecular simulation problems, including periodic boundary condition (PBC), $n$-body interaction, and binary composition. Our model yielded stable predictions in all tested systems with errors significantly smaller than the potential energy fluctuation acquired from molecular dynamics simulations. Thus, our work provides a minimal baseline model that encodes complex interactions in a condensed phase system to facilitate the data-driven analysis of physicochemical systems.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Mixture of Linear Models Co-supervised by Deep Neural Networks
Authors:
Beomseok Seo,
Lin Lin,
Jia Li
Abstract:
Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true f…
▽ More
Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true for every application, in some applications, especially in economics, social science, healthcare industry, and administrative decision making, scientists or practitioners are resistant to use predictions made by a black-box system for multiple reasons. One reason is that a major purpose of a study can be to make discoveries based upon the prediction function, e.g., to reveal the relationships between measurements. Another reason can be that the training dataset is not large enough to make researchers feel completely sure about a purely data-driven result. Being able to examine and interpret the prediction function will enable researchers to connect the result with existing knowledge or gain insights about new directions to explore. Although classic statistical models are much more explainable, their accuracy often falls considerably below DNN. In this paper, we propose an approach to fill the gap between relatively simple explainable models and DNN such that we can more flexibly tune the trade-off between interpretability and accuracy. Our main idea is a mixture of discriminative models that is trained with the guidance from a DNN. Although mixtures of discriminative models have been studied before, our way of generating the mixture is quite different.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths
Authors:
Hyoungwook Nam,
Seung Byum Seo,
Vikram Sharma Mailthody,
Noor Michael,
Lan Li
Abstract:
Self-attention has emerged as a vital component of state-of-the-art sequence-to-sequence models for natural language processing in recent years, brought to the forefront by pre-trained bi-directional Transformer models. Its effectiveness is partly due to its non-sequential architecture, which promotes scalability and parallelism but limits the model to inputs of a bounded length. In particular, su…
▽ More
Self-attention has emerged as a vital component of state-of-the-art sequence-to-sequence models for natural language processing in recent years, brought to the forefront by pre-trained bi-directional Transformer models. Its effectiveness is partly due to its non-sequential architecture, which promotes scalability and parallelism but limits the model to inputs of a bounded length. In particular, such architectures perform poorly on algorithmic tasks, where the model must learn a procedure which generalizes to input lengths unseen in training, a capability we refer to as inductive generalization. Identifying the computational limits of existing self-attention mechanisms, we propose I-BERT, a bi-directional Transformer that replaces positional encodings with a recurrent layer. The model inductively generalizes on a variety of algorithmic tasks where state-of-the-art Transformer models fail to do so. We also test our method on masked language modeling tasks where training and validation sets are partitioned to verify inductive generalization. Out of three algorithmic and two natural language inductive generalization tasks, I-BERT achieves state-of-the-art results on four tasks.
△ Less
Submitted 19 June, 2020; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Hyperbolic normal stochastic volatility model
Authors:
Jaehyuk Choi,
Chenru Liu,
Byoung Ki Seo
Abstract:
For option pricing models and heavy-tailed distributions, this study proposes a continuous-time stochastic volatility model based on an arithmetic Brownian motion: a one-parameter extension of the normal stochastic alpha-beta-rho (SABR) model. Using two generalized Bougerol's identities in the literature, the study shows that our model has a closed-form Monte-Carlo simulation scheme and that the t…
▽ More
For option pricing models and heavy-tailed distributions, this study proposes a continuous-time stochastic volatility model based on an arithmetic Brownian motion: a one-parameter extension of the normal stochastic alpha-beta-rho (SABR) model. Using two generalized Bougerol's identities in the literature, the study shows that our model has a closed-form Monte-Carlo simulation scheme and that the transition probability for one special case follows Johnson's $S_U$ distribution---a popular heavy-tailed distribution originally proposed without stochastic process. It is argued that the $S_U$ distribution serves as an analytically superior alternative to the normal SABR model because the two distributions are empirically similar.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs
Authors:
Se** Park,
Woochan Hwang,
Kyu Hwan Jung,
Joon Beom Seo,
Namkug Kim
Abstract:
Generating large quantities of quality labeled data in medical imaging is very time consuming and expensive. The performance of supervised algorithms for various tasks on imaging has improved drastically over the years, however the availability of data to train these algorithms have become one of the main bottlenecks for implementation. To address this, we propose a semi-supervised learning method…
▽ More
Generating large quantities of quality labeled data in medical imaging is very time consuming and expensive. The performance of supervised algorithms for various tasks on imaging has improved drastically over the years, however the availability of data to train these algorithms have become one of the main bottlenecks for implementation. To address this, we propose a semi-supervised learning method where pseudo-negative labels from unlabeled data are used to further refine the performance of a pulmonary nodule detection network in chest radiographs. After training with the proposed network, the false positive rate was reduced to 0.1266 from 0.4864 while maintaining sensitivity at 0.89.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography
Authors:
Eunhee Kang,
Hyun Jung Koo,
Dong Hyun Yang,
Joon Bum Seo,
Jong Chul Ye
Abstract:
In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. To address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phase…
▽ More
In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. To address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phases by learning from the CT images in the routine dose phases. Although a supervised learning approach is not possible due to the differences in the underlying heart structure in two phases, the images in the two phases are closely related so that we propose a cycle-consistent adversarial denoising network to learn the non-degenerate map** between the low and high dose cardiac phases. Experimental results showed that the proposed method effectively reduces the noise in the low-dose CT image while the preserving detailed texture and edge information. Moreover, thanks to the cyclic consistency and identity loss, the proposed network does not create any artificial features that are not present in the input images. Visual grading and quality evaluation also confirm that the proposed method provides significant improvement in diagnostic quality.
△ Less
Submitted 7 November, 2018; v1 submitted 25 June, 2018;
originally announced June 2018.