Search | arXiv e-print repository

Mixture of partially linear experts

Authors: Yeongsan Hwang, Byungtae Seo, Sangkon Oh

Abstract: In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to cap… ▽ More In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to capture nonlinear relationships. We establish the identifiability of the proposed model under mild conditions and introduce a practical estimation algorithm. We present the performance of our approach through numerical studies, including simulations and real data analysis. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2402.02128 [pdf, other]

Adaptive Accelerated Failure Time modeling with a Semiparametric Skewed Error Distribution

Authors: Sangkon Oh, Hyunjae Lee, Sangwook Kang, Byungtae Seo

Abstract: The accelerated failure time (AFT) model is widely used to analyze relationships between variables in the presence of censored observations. However, this model relies on some assumptions such as the error distribution, which can lead to biased or inefficient estimates if these assumptions are violated. In order to overcome this challenge, we propose a novel approach that incorporates a semiparame… ▽ More The accelerated failure time (AFT) model is widely used to analyze relationships between variables in the presence of censored observations. However, this model relies on some assumptions such as the error distribution, which can lead to biased or inefficient estimates if these assumptions are violated. In order to overcome this challenge, we propose a novel approach that incorporates a semiparametric skew-normal scale mixture distribution for the error term in the AFT model. By allowing for more flexibility and robustness, this approach reduces the risk of misspecification and improves the accuracy of parameter estimation. We investigate the identifiability and consistency of the proposed model and develop a practical estimation algorithm. To evaluate the performance of our approach, we conduct extensive simulation studies and real data analyses. The results demonstrate the effectiveness of our method in providing robust and accurate estimates in various scenarios. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2110.14374 [pdf, other]

A2I Transformer: Permutation-equivariant attention network for pairwise and many-body interactions with minimal featurization

Authors: Ji Woong Yu, Min Young Ha, Bumjoon Seo, Won Bo Lee

Abstract: The combination of neural network potential (NNP) with molecular simulations plays an important role in an efficient and thorough understanding of a molecular system's potential energy surface (PES). However, gras** the interplay between input features and their local contribution to NNP is growingly evasive due to heavy featurization. In this work, we suggest an end-to-end model which directly… ▽ More The combination of neural network potential (NNP) with molecular simulations plays an important role in an efficient and thorough understanding of a molecular system's potential energy surface (PES). However, gras** the interplay between input features and their local contribution to NNP is growingly evasive due to heavy featurization. In this work, we suggest an end-to-end model which directly predicts per-atom energy from the coordinates of particles, avoiding expert-guided featurization of the network input. Employing self-attention as the main workhorse, our model is intrinsically equivariant under the permutation operation, resulting in the invariance of the total potential energy. We tested our model against several challenges in molecular simulation problems, including periodic boundary condition (PBC), $n$-body interaction, and binary composition. Our model yielded stable predictions in all tested systems with errors significantly smaller than the potential energy fluctuation acquired from molecular dynamics simulations. Thus, our work provides a minimal baseline model that encodes complex interactions in a condensed phase system to facilitate the data-driven analysis of physicochemical systems. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2108.04035 [pdf, other]

Mixture of Linear Models Co-supervised by Deep Neural Networks

Authors: Beomseok Seo, Lin Lin, Jia Li

Abstract: Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true f… ▽ More Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true for every application, in some applications, especially in economics, social science, healthcare industry, and administrative decision making, scientists or practitioners are resistant to use predictions made by a black-box system for multiple reasons. One reason is that a major purpose of a study can be to make discoveries based upon the prediction function, e.g., to reveal the relationships between measurements. Another reason can be that the training dataset is not large enough to make researchers feel completely sure about a purely data-driven result. Being able to examine and interpret the prediction function will enable researchers to connect the result with existing knowledge or gain insights about new directions to explore. Although classic statistical models are much more explainable, their accuracy often falls considerably below DNN. In this paper, we propose an approach to fill the gap between relatively simple explainable models and DNN such that we can more flexibly tune the trade-off between interpretability and accuracy. Our main idea is a mixture of discriminative models that is trained with the guidance from a DNN. Although mixtures of discriminative models have been studied before, our way of generating the mixture is quite different. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: Submitted to Journal of Computational and Graphical Statistics on April 19, 2021

arXiv:2006.10220 [pdf, other]

I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths

Authors: Hyoungwook Nam, Seung Byum Seo, Vikram Sharma Mailthody, Noor Michael, Lan Li

Abstract: Self-attention has emerged as a vital component of state-of-the-art sequence-to-sequence models for natural language processing in recent years, brought to the forefront by pre-trained bi-directional Transformer models. Its effectiveness is partly due to its non-sequential architecture, which promotes scalability and parallelism but limits the model to inputs of a bounded length. In particular, su… ▽ More Self-attention has emerged as a vital component of state-of-the-art sequence-to-sequence models for natural language processing in recent years, brought to the forefront by pre-trained bi-directional Transformer models. Its effectiveness is partly due to its non-sequential architecture, which promotes scalability and parallelism but limits the model to inputs of a bounded length. In particular, such architectures perform poorly on algorithmic tasks, where the model must learn a procedure which generalizes to input lengths unseen in training, a capability we refer to as inductive generalization. Identifying the computational limits of existing self-attention mechanisms, we propose I-BERT, a bi-directional Transformer that replaces positional encodings with a recurrent layer. The model inductively generalizes on a variety of algorithmic tasks where state-of-the-art Transformer models fail to do so. We also test our method on masked language modeling tasks where training and validation sets are partitioned to verify inductive generalization. Out of three algorithmic and two natural language inductive generalization tasks, I-BERT achieves state-of-the-art results on four tasks. △ Less

Submitted 19 June, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Submitted to NeurIPS2020

arXiv:1809.04035 [pdf, other]

doi 10.1002/fut.21967

Hyperbolic normal stochastic volatility model

Authors: Jaehyuk Choi, Chenru Liu, Byoung Ki Seo

Abstract: For option pricing models and heavy-tailed distributions, this study proposes a continuous-time stochastic volatility model based on an arithmetic Brownian motion: a one-parameter extension of the normal stochastic alpha-beta-rho (SABR) model. Using two generalized Bougerol's identities in the literature, the study shows that our model has a closed-form Monte-Carlo simulation scheme and that the t… ▽ More For option pricing models and heavy-tailed distributions, this study proposes a continuous-time stochastic volatility model based on an arithmetic Brownian motion: a one-parameter extension of the normal stochastic alpha-beta-rho (SABR) model. Using two generalized Bougerol's identities in the literature, the study shows that our model has a closed-form Monte-Carlo simulation scheme and that the transition probability for one special case follows Johnson's $S_U$ distribution---a popular heavy-tailed distribution originally proposed without stochastic process. It is argued that the $S_U$ distribution serves as an analytically superior alternative to the normal SABR model because the two distributions are empirically similar. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: 26 pages, 4 figures, 5 tables

Journal ref: Journal of Futures Markets, 39(2):186-204, 2019

arXiv:1807.10756 [pdf, other]

False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs

Authors: Se** Park, Woochan Hwang, Kyu Hwan Jung, Joon Beom Seo, Namkug Kim

Abstract: Generating large quantities of quality labeled data in medical imaging is very time consuming and expensive. The performance of supervised algorithms for various tasks on imaging has improved drastically over the years, however the availability of data to train these algorithms have become one of the main bottlenecks for implementation. To address this, we propose a semi-supervised learning method… ▽ More Generating large quantities of quality labeled data in medical imaging is very time consuming and expensive. The performance of supervised algorithms for various tasks on imaging has improved drastically over the years, however the availability of data to train these algorithms have become one of the main bottlenecks for implementation. To address this, we propose a semi-supervised learning method where pseudo-negative labels from unlabeled data are used to further refine the performance of a pulmonary nodule detection network in chest radiographs. After training with the proposed network, the false positive rate was reduced to 0.1266 from 0.4864 while maintaining sensitivity at 0.89. △ Less

Submitted 26 July, 2018; originally announced July 2018.

Comments: Presented at the 2nd SIIM C-MIMI(SIIM Conference on Machine Intelligence in Medical Imaging)

arXiv:1806.09748 [pdf, other]

doi 10.1002/mp.13284

Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography

Authors: Eunhee Kang, Hyun Jung Koo, Dong Hyun Yang, Joon Bum Seo, Jong Chul Ye

Abstract: In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. To address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phase… ▽ More In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. To address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phases by learning from the CT images in the routine dose phases. Although a supervised learning approach is not possible due to the differences in the underlying heart structure in two phases, the images in the two phases are closely related so that we propose a cycle-consistent adversarial denoising network to learn the non-degenerate map** between the low and high dose cardiac phases. Experimental results showed that the proposed method effectively reduces the noise in the low-dose CT image while the preserving detailed texture and edge information. Moreover, thanks to the cyclic consistency and identity loss, the proposed network does not create any artificial features that are not present in the input images. Visual grading and quality evaluation also confirm that the proposed method provides significant improvement in diagnostic quality. △ Less

Submitted 7 November, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

Comments: This work is accepted in Medical Physics

Showing 1–8 of 8 results for author: Seo, B