-
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs
Authors:
Tianyuan **,
Hao-Lun Hsu,
William Chang,
Pan Xu
Abstract:
We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents are factored into $ρ$ overlap** groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local…
▽ More
We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents are factored into $ρ$ overlap** groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm \citep{verstraeten2020multiagent} and derived a Bayesian regret bound. However, it remains an open problem how to derive a frequentist regret bound for Thompson sampling in this multi-agent setting. To address these issues, we propose an efficient variant of MATS, the $ε$-exploring Multi-Agent Thompson Sampling ($ε$-MATS) algorithm, which performs MATS exploration with probability $ε$ while adopts a greedy policy otherwise. We prove that $ε$-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms, when the hypergraph is sufficiently sparse. Thorough experiments on standard MAMAB problems demonstrate the superior performance and the improved computational efficiency of $ε$-MATS compared with existing algorithms in the same setting.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication
Authors:
William Chang,
Yuanhao Lu
Abstract:
We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand, but cannot communicate during the learning process. In this problem, each player simultaneously selects an action. Based on the actions selected by all players, the team of players receives a reward. The actions of all the players are commonly observed. However, each…
▽ More
We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand, but cannot communicate during the learning process. In this problem, each player simultaneously selects an action. Based on the actions selected by all players, the team of players receives a reward. The actions of all the players are commonly observed. However, each player receives a noisy version of the reward which cannot be shared with other players. Since players receive potentially different rewards, there is an asymmetry in the information used to select their actions. In this paper, we provide an algorithm based on upper and lower confidence bounds that the players can use to select their optimal actions despite the asymmetry in the reward information. We show that this algorithm can achieve logarithmic $O(\frac{\log T}{Δ_{\bm{a}}})$ (gap-dependent) regret as well as $O(\sqrt{T\log T})$ (gap-independent) regret. This is asymptotically optimal in $T$. We also show that it performs empirically better than the current state of the art algorithm for this environment.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
For SALE: State-Action Representation Learning for Deep Reinforcement Learning
Authors:
Scott Fujimoto,
Wei-Di Chang,
Edward J. Smith,
Shixiang Shane Gu,
Doina Precup,
David Meger
Abstract:
In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-le…
▽ More
In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.
△ Less
Submitted 5 November, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
Authors:
Tiancheng **,
Junyan Liu,
Chloé Rouyer,
William Chang,
Chen-Yu Wei,
Haipeng Luo
Abstract:
Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossib…
▽ More
Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys $\widetilde{O}(\sqrt{T} + C^{\textsf{P}})$ regret where $C^{\textsf{P}}$ measures how adversarial the transition functions are and can be at most ${O}(T)$. While this algorithm itself requires knowledge of $C^{\textsf{P}}$, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in ** et al. [2021]) and achieves $\widetilde{O}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}})$ regret, where $U$ is some standard gap-dependent coefficient and $C^{\textsf{L}}$ is the amount of corruption on losses.
△ Less
Submitted 26 October, 2023; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Fast Computer Model Calibration using Annealed and Transformed Variational Inference
Authors:
Dongkyu Derek Cho,
Won Chang,
Jaewoo Park
Abstract:
Computer models play a crucial role in numerous scientific and engineering domains. To ensure the accuracy of simulations, it is essential to properly calibrate the input parameters of these models through statistical inference. While Bayesian inference is the standard approach for this task, employing Markov Chain Monte Carlo methods often encounters computational hurdles due to the costly evalua…
▽ More
Computer models play a crucial role in numerous scientific and engineering domains. To ensure the accuracy of simulations, it is essential to properly calibrate the input parameters of these models through statistical inference. While Bayesian inference is the standard approach for this task, employing Markov Chain Monte Carlo methods often encounters computational hurdles due to the costly evaluation of likelihood functions and slow mixing rates. Although variational inference (VI) can be a fast alternative to traditional Bayesian approaches, VI has limited applicability due to boundary issues and local optima problems. To address these challenges, we propose flexible VI methods based on deep generative models that do not require parametric assumptions on the variational distribution. We embed a surjective transformation in our framework to avoid posterior truncation at the boundary. Additionally, we provide theoretical conditions that guarantee the success of the algorithm. Furthermore, our temperature annealing scheme can prevent being trapped in local optima through a series of intermediate posteriors. We apply our method to infectious disease models and a geophysical model, illustrating that the proposed method can provide fast and accurate inference compared to its competitors.
△ Less
Submitted 5 March, 2024; v1 submitted 22 November, 2022;
originally announced November 2022.
-
A Bayesian Convolutional Neural Network-based Generalized Linear Model
Authors:
Yeseul Jeon,
Won Chang,
Seonghyun Jeong,
Sanghoon Han,
Jaewoo Park
Abstract:
Convolutional neural networks (CNNs) provide flexible function approximations for a wide variety of applications when the input variables are in the form of images or spatial data. Although CNNs often outperform traditional statistical models in prediction accuracy, statistical inference, such as estimating the effects of covariates and quantifying the prediction uncertainty, is not trivial due to…
▽ More
Convolutional neural networks (CNNs) provide flexible function approximations for a wide variety of applications when the input variables are in the form of images or spatial data. Although CNNs often outperform traditional statistical models in prediction accuracy, statistical inference, such as estimating the effects of covariates and quantifying the prediction uncertainty, is not trivial due to the highly complicated model structure and overparameterization. To address this challenge, we propose a new Bayesian approach by embedding CNNs within the generalized linear models (GLMs) framework. We use extracted nodes from the last hidden layer of CNN with Monte Carlo (MC) dropout as informative covariates in GLM. This improves accuracy in prediction and regression coefficient inference, allowing for the interpretation of coefficients and uncertainty quantification. By fitting ensemble GLMs across multiple realizations from MC dropout, we can account for uncertainties in extracting the features. We apply our methods to biological and epidemiological problems, which have both high-dimensional correlated inputs and vector covariates. Specifically, we consider malaria incidence data, brain tumor image data, and fMRI data. By extracting information from correlated inputs, the proposed method can provide an interpretable Bayesian analysis. The algorithm can be broadly applicable to image regressions or correlated data analysis by enabling accurate Bayesian inference quickly.
△ Less
Submitted 22 May, 2024; v1 submitted 17 October, 2022;
originally announced October 2022.
-
A Spatio-Temporal Dirichlet Process Mixture Model for Coronavirus Disease-19
Authors:
Jaewoo Park,
Seorim Yi,
Won Chang,
Jorge Mateu
Abstract:
Understanding the spatio-temporal patterns of the coronavirus disease 2019 (COVID-19) is essential to construct public health interventions. Spatially referenced data can provide richer opportunities to understand the mechanism of the disease spread compared to the more often encountered aggregated count data. We propose a spatio-temporal Dirichlet process mixture model to analyze confirmed cases…
▽ More
Understanding the spatio-temporal patterns of the coronavirus disease 2019 (COVID-19) is essential to construct public health interventions. Spatially referenced data can provide richer opportunities to understand the mechanism of the disease spread compared to the more often encountered aggregated count data. We propose a spatio-temporal Dirichlet process mixture model to analyze confirmed cases of COVID-19 in an urban environment. Our method can detect unobserved cluster centers of the epidemics, and estimate the space-time range of the clusters that are useful to construct a warning system. Furthermore, our model can measure the impact of different types of landmarks in the city, which provides an intuitive explanation of disease spreading sources from different time points. To efficiently capture the temporal dynamics of the disease patterns, we employ a sequential approach that uses the posterior distribution of the parameters for the previous time step as the prior information for the current time step. This approach enables us to incorporate time dependence into our model in a computationally efficient manner without complicating the model structure. We also develop a model assessment by comparing the data with theoretical densities, and outline the goodness-of-fit of our fitted model.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Gradient-based Quadratic Multiform Separation
Authors:
Wen-Teng Chang
Abstract:
Classification as a supervised learning concept is an important content in machine learning. It aims at categorizing a set of data into classes. There are several commonly-used classification methods nowadays such as k-nearest neighbors, random forest, and support vector machine. Each of them has its own pros and cons, and none of them is invincible for all kinds of problems. In this thesis, we fo…
▽ More
Classification as a supervised learning concept is an important content in machine learning. It aims at categorizing a set of data into classes. There are several commonly-used classification methods nowadays such as k-nearest neighbors, random forest, and support vector machine. Each of them has its own pros and cons, and none of them is invincible for all kinds of problems. In this thesis, we focus on Quadratic Multiform Separation (QMS), a classification method recently proposed by Michael Fan et al. (2019). Its fresh concept, rich mathematical structure, and innovative definition of loss function set it apart from the existing classification methods. Inspired by QMS, we propose utilizing a gradient-based optimization method, Adam, to obtain a classifier that minimizes the QMS-specific loss function. In addition, we provide suggestions regarding model tuning through explorations of the relationships between hyperparameters and accuracies. Our empirical result shows that QMS performs as good as most classification methods in terms of accuracy. Its superior performance is almost comparable to those of gradient boosting algorithms that win massive machine learning competitions.
△ Less
Submitted 26 October, 2021; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Bayesian Model Calibration and Sensitivity Analysis for Oscillating Biological Experiments
Authors:
Youngdeok Hwang,
Hang J. Kim,
Won Chang,
Christian Hong,
Steven N. MacEachern
Abstract:
Understanding the oscillating behaviors that govern organisms' internal biological processes requires interdisciplinary efforts combining both biological and computer experiments, as the latter can complement the former by simulating perturbed conditions with higher resolution. Harmonizing the two types of experiment, however, poses significant statistical challenges due to identifiability issues,…
▽ More
Understanding the oscillating behaviors that govern organisms' internal biological processes requires interdisciplinary efforts combining both biological and computer experiments, as the latter can complement the former by simulating perturbed conditions with higher resolution. Harmonizing the two types of experiment, however, poses significant statistical challenges due to identifiability issues, numerical instability, and ill behavior in high dimension. This article devises a new Bayesian calibration framework for oscillating biochemical models. The proposed Bayesian model is estimated relying on an advanced Markov chain Monte Carlo (MCMC) technique which can efficiently infer the parameter values that match the simulated and observed oscillatory processes. Also proposed is an approach to sensitivity analysis based on the intervention posterior. This approach measures the influence of individual parameters on the target process by using the obtained MCMC samples as a computational tool. The proposed framework is illustrated with circadian oscillations observed in a filamentous fungus, Neurospora crassa.
△ Less
Submitted 28 November, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Authors:
Jiong Zhang,
Wei-cheng Chang,
Hsiang-fu Yu,
Inderjit S. Dhillon
Abstract:
Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC…
▽ More
Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XR-Transformer takes significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%.
△ Less
Submitted 28 October, 2021; v1 submitted 1 October, 2021;
originally announced October 2021.
-
Spatially and Robustly Hybrid Mixture Regression Model for Inference of Spatial Dependence
Authors:
Wennan Chang,
Pengtao Dang,
Changlin Wan,
Xiaoyu Lu,
Yue Fang,
Tong Zhao,
Yong Zang,
Bo Li,
Chi Zhang,
Sha Cao
Abstract:
In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression mode…
▽ More
In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial nonstationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for nonstationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods.
△ Less
Submitted 28 September, 2021; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Label Disentanglement in Partition-based Extreme Multilabel Classification
Authors:
Xuanqing Liu,
Wei-Cheng Chang,
Hsiang-Fu Yu,
Cho-Jui Hsieh,
Inderjit S. Dhillon
Abstract:
Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label "Apple" can be the fruit or the brand…
▽ More
Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label "Apple" can be the fruit or the brand name, which leads to the following research question: can we disentangle these multi-modal labels with non-exclusive clustering tailored for downstream XMC tasks? In this paper, we show that the label assignment problem in partition-based XMC can be formulated as an optimization problem, with the objective of maximizing precision rates. This leads to an efficient algorithm to form flexible and overlapped label clusters, and a method that can alternatively optimizes the cluster assignments and the model parameters for partition-based XMC. Experimental results on synthetic and real datasets show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
An Interaction Neyman-Scott Point Process Model for Coronavirus Disease-19
Authors:
J. Park,
W. Chang,
B. Choi
Abstract:
With rapid transmission, the coronavirus disease 2019 (COVID-19) has led to over 2 million deaths worldwide, posing significant societal challenges. Understanding the spatial patterns of patient visits and detecting the local spreading events are crucial to controlling disease outbreaks. We analyze highly detailed COVID-19 contact tracing data collected from Seoul, which provides a unique opportun…
▽ More
With rapid transmission, the coronavirus disease 2019 (COVID-19) has led to over 2 million deaths worldwide, posing significant societal challenges. Understanding the spatial patterns of patient visits and detecting the local spreading events are crucial to controlling disease outbreaks. We analyze highly detailed COVID-19 contact tracing data collected from Seoul, which provides a unique opportunity to understand the mechanism of patient visit occurrence. Analyzing contact tracing data is challenging because patient visits show strong clustering patterns while clusters of events may have complex interaction behavior. To account for such behaviors, we develop a novel interaction Neyman-Scott process that regards the observed patient visit events as offsprings generated from a parent spreading event. Inference for such models is complicated since the likelihood involves intractable normalizing functions. To address this issue, we embed an auxiliary variable algorithm into our Markov chain Monte Carlo. We fit our model to several simulated and real data examples under different outbreak scenarios and show that our method can describe spatial patterns of patient visits well. We also provide visualization tools that can inform public health interventions for infectious diseases such as social distancing.
△ Less
Submitted 5 February, 2021;
originally announced February 2021.
-
Fast and accurate learned multiresolution dynamical downscaling for precipitation
Authors:
Jiali Wang,
Zhengchun Liu,
Ian Foster,
Won Chang,
Rajkumar Kettimuthu,
Rao Kotamarthi
Abstract:
This study develops a neural network-based approach for emulating high-resolution modeled precipitation data with comparable statistical properties but at greatly reduced computational cost. The key idea is to use combination of low- and high- resolution simulations to train a neural network to map from the former to the latter. Specifically, we define two types of CNNs, one that stacks variables…
▽ More
This study develops a neural network-based approach for emulating high-resolution modeled precipitation data with comparable statistical properties but at greatly reduced computational cost. The key idea is to use combination of low- and high- resolution simulations to train a neural network to map from the former to the latter. Specifically, we define two types of CNNs, one that stacks variables directly and one that encodes each variable before stacking, and we train each CNN type both with a conventional loss function, such as mean square error (MSE), and with a conditional generative adversarial network (CGAN), for a total of four CNN variants. We compare the four new CNN-derived high-resolution precipitation results with precipitation generated from original high resolution simulations, a bilinear interpolater and the state-of-the-art CNN-based super-resolution (SR) technique. Results show that the SR technique produces results similar to those of the bilinear interpolator with smoother spatial and temporal distributions and smaller data variabilities and extremes than the original high resolution simulations. While the new CNNs trained by MSE generate better results over some regions than the interpolator and SR technique do, their predictions are still not as close as the original high resolution simulations. The CNNs trained by CGAN generate more realistic and physically reasonable results, better capturing not only data variability in time and space but also extremes such as intense and long-lasting storms. The new proposed CNN-based downscaling approach can downscale precipitation from 50~km to 12~km in 14~min for 30~years once the network is trained (training takes 4~hours using 1~GPU), while the conventional dynamical downscaling would take 1~month using 600 CPU cores to generate simulations at the resolution of 12~km over contiguous United States.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Computer Model Calibration with Time Series Data using Deep Learning and Quantile Regression
Authors:
Saumya Bhatnagar,
Won Chang,
Seon** Kim Jiali Wang
Abstract:
Computer models play a key role in many scientific and engineering problems. One major source of uncertainty in computer model experiment is input parameter uncertainty. Computer model calibration is a formal statistical procedure to infer input parameters by combining information from model runs and observational data. The existing standard calibration framework suffers from inferential issues wh…
▽ More
Computer models play a key role in many scientific and engineering problems. One major source of uncertainty in computer model experiment is input parameter uncertainty. Computer model calibration is a formal statistical procedure to infer input parameters by combining information from model runs and observational data. The existing standard calibration framework suffers from inferential issues when the model output and observational data are high-dimensional dependent data such as large time series due to the difficulty in building an emulator and the non-identifiability between effects from input parameters and data-model discrepancy. To overcome these challenges we propose a new calibration framework based on a deep neural network (DNN) with long-short term memory layers that directly emulates the inverse relationship between the model output and input parameters. Adopting the 'learning with noise' idea we train our DNN model to filter out the effects from data model discrepancy on input parameter inference. We also formulate a new way to construct interval predictions for DNN using quantile regression to quantify the uncertainty in input parameter estimates. Through a simulation study and real data application with WRF-hydro model we show that our approach can yield accurate point estimates and well calibrated interval estimates for input parameters.
△ Less
Submitted 8 September, 2020; v1 submitted 29 August, 2020;
originally announced August 2020.
-
Geometric All-Way Boolean Tensor Decomposition
Authors:
Changlin Wan,
Wennan Chang,
Tong Zhao,
Sha Cao,
Chi Zhang
Abstract:
Boolean tensor has been broadly utilized in representing high dimensional logical data collected on spatial, temporal and/or other relational domains. Boolean Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors, which is an NP-hard problem. Existing BTD methods have been limited by their high computational cost, in applications to large scale or hi…
▽ More
Boolean tensor has been broadly utilized in representing high dimensional logical data collected on spatial, temporal and/or other relational domains. Boolean Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors, which is an NP-hard problem. Existing BTD methods have been limited by their high computational cost, in applications to large scale or higher order tensors. In this work, we presented a computationally efficient BTD algorithm, namely \textit{Geometric Expansion for all-order Tensor Factorization} (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. We conducted rigorous theoretical analysis on the validity as well as algorithemic efficiency of GETF in decomposing all-order tensor. Experiments on both synthetic and real-world data demonstrated that GETF has significantly improved performance in reconstruction accuracy, extraction of latent structures and it is an order of magnitude faster than other state-of-the-art methods.
△ Less
Submitted 26 October, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Denoising individual bias for a fairer binary submatrix detection
Authors:
Changlin Wan,
Wennan Chang,
Tong Zhao,
Sha Cao,
Chi Zhang
Abstract:
Low rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results…
▽ More
Low rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.
△ Less
Submitted 9 August, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Supervised clustering of high dimensional data using regularized mixture modeling
Authors:
Wennan Chang,
Changlin Wan,
Yong Zang,
Chi Zhang,
Sha Cao
Abstract:
Identifying relationships between molecular variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high dimensional molecular manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects. We proposed a novel supervised clustering al…
▽ More
Identifying relationships between molecular variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high dimensional molecular manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects. We proposed a novel supervised clustering algorithm using penalized mixture regression model, called CSMR, to deal with the challenges in studying the heterogeneous relationships between high dimensional molecular features to a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their co** mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical manifestations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease, and could be of special relevance in the growing field of personalized medicine.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Kernel Stein Generative Modeling
Authors:
Wei-Cheng Chang,
Chun-Liang Li,
Youssef Mroueh,
Yiming Yang
Abstract:
We are interested in gradient-based Explicit Generative Modeling where samples can be derived from iterative gradient updates based on an estimate of the score function of the data distribution. Recent advances in Stochastic Gradient Langevin Dynamics (SGLD) demonstrates impressive results with energy-based models on high-dimensional and complex data distributions. Stein Variational Gradient Desce…
▽ More
We are interested in gradient-based Explicit Generative Modeling where samples can be derived from iterative gradient updates based on an estimate of the score function of the data distribution. Recent advances in Stochastic Gradient Langevin Dynamics (SGLD) demonstrates impressive results with energy-based models on high-dimensional and complex data distributions. Stein Variational Gradient Descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate a given distribution, based on functional gradient descent that decreases the KL divergence. SVGD has promising results on several Bayesian inference applications. However, applying SVGD on high dimensional problems is still under-explored. The goal of this work is to study high dimensional inference with SVGD. We first identify key challenges in practical kernel SVGD inference in high-dimension. We propose noise conditional kernel SVGD (NCK-SVGD), that works in tandem with the recently introduced Noise Conditional Score Network estimator. NCK is crucial for successful inference with SVGD in high dimension, as it adapts the kernel to the noise level of the score estimate. As we anneal the noise, NCK-SVGD targets the real data distribution. We then extend the annealed SVGD with an entropic regularization. We show that this offers a flexible control between sample quality and diversity, and verify it empirically by precision and recall evaluations. The NCK-SVGD produces samples comparable to GANs and annealed SGLD on computer vision benchmarks, including MNIST and CIFAR-10.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
Component-wise Adaptive Trimming For Robust Mixture Regression
Authors:
Wennan Chang,
Xinyu Zhou,
Yong Zang,
Chi Zhang,
Sha Cao
Abstract:
Parameter estimation of mixture regression model using the expectation maximization (EM) algorithm is highly sensitive to outliers. Here we propose a fast and efficient robust mixture regression algorithm, called Component-wise Adaptive Trimming (CAT) method. We consider simultaneous outlier detection and robust parameter estimation to minimize the effect of outlier contamination. Robust mixture r…
▽ More
Parameter estimation of mixture regression model using the expectation maximization (EM) algorithm is highly sensitive to outliers. Here we propose a fast and efficient robust mixture regression algorithm, called Component-wise Adaptive Trimming (CAT) method. We consider simultaneous outlier detection and robust parameter estimation to minimize the effect of outlier contamination. Robust mixture regression has many important applications including in human cancer genomics data, where the population often displays strong heterogeneity added by unwanted technological perturbations. Existing robust mixture regression methods suffer from outliers as they either conduct parameter estimation in the presence of outliers, or rely on prior knowledge of the level of outlier contamination. CAT was implemented in the framework of classification expectation maximization, under which a natural definition of outliers could be derived. It implements a least trimmed squares (LTS) approach within each exclusive mixing component, where the robustness issue could be transformed from the mixture case to simple linear regression case. The high breakdown point of the LTS approach allows us to avoid the pre-specification of trimming parameter. Compared with multiple existing algorithms, CAT is the most competitive one that can handle and adaptively trim off outliers as well as heavy tailed noise, in different scenarios of simulated data and real genomic data. CAT has been implemented in an R package `RobMixReg' available in CRAN.
△ Less
Submitted 19 April, 2021; v1 submitted 23 May, 2020;
originally announced May 2020.
-
Correlation-aware Unsupervised Change-point Detection via Graph Neural Networks
Authors:
Ruohong Zhang,
Yu Hao,
Donghan Yu,
Wei-Cheng Chang,
Guokun Lai,
Yiming Yang
Abstract:
Change-point detection (CPD) aims to detect abrupt changes over time series data. Intuitively, effective CPD over multivariate time series should require explicit modeling of the dependencies across input variables. However, existing CPD methods either ignore the dependency structures entirely or rely on the (unrealistic) assumption that the correlation structures are static over time. In this pap…
▽ More
Change-point detection (CPD) aims to detect abrupt changes over time series data. Intuitively, effective CPD over multivariate time series should require explicit modeling of the dependencies across input variables. However, existing CPD methods either ignore the dependency structures entirely or rely on the (unrealistic) assumption that the correlation structures are static over time. In this paper, we propose a Correlation-aware Dynamics Model for CPD, which explicitly models the correlation structure and dynamics of variables by incorporating graph neural networks into an encoder-decoder framework. Extensive experiments on synthetic and real-world datasets demonstrate the advantageous performance of the proposed model on CPD tasks over strong baselines, as well as its ability to classify the change-points as correlation changes or independent changes. Keywords: Multivariate Time Series, Change-point Detection, Graph Neural Networks
△ Less
Submitted 13 September, 2020; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Pre-training Tasks for Embedding-based Large-scale Retrieval
Authors:
Wei-Cheng Chang,
Felix X. Yu,
Yin-Wen Chang,
Yiming Yang,
Sanjiv Kumar
Abstract:
We consider the large-scale query-document retrieval problem: given a query (e.g., a question), return the set of relevant documents (e.g., paragraphs containing the answer) from a large document corpus. This problem is often solved in two steps. The retrieval phase first reduces the solution space, returning a subset of candidate documents. The scoring phase then re-ranks the documents. Criticall…
▽ More
We consider the large-scale query-document retrieval problem: given a query (e.g., a question), return the set of relevant documents (e.g., paragraphs containing the answer) from a large document corpus. This problem is often solved in two steps. The retrieval phase first reduces the solution space, returning a subset of candidate documents. The scoring phase then re-ranks the documents. Critically, the retrieval algorithm not only desires high recall but also requires to be highly efficient, returning candidates in time sublinear to the number of documents. Unlike the scoring phase witnessing significant advances recently due to the BERT-style pre-training tasks on cross-attention models, the retrieval phase remains less well studied. Most previous works rely on classic Information Retrieval (IR) methods such as BM-25 (token matching + TF-IDF weights). These models only accept sparse handcrafted features and can not be optimized for different downstream tasks of interest. In this paper, we conduct a comprehensive study on the embedding-based retrieval models. We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks. With adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers. The paragraph-level pre-training tasks we studied are Inverse Cloze Task (ICT), Body First Selection (BFS), Wiki Link Prediction (WLP), and the combination of all three.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Two-stage dimension reduction for noisy high-dimensional images and application to Cryogenic Electron Microscopy
Authors:
Szu-Chi Chung,
Shao-Hsuan Wang,
Po-Yao Niu,
Su-Yun Huang,
Wei-Hau Chang,
I-** Tu
Abstract:
Principal component analysis (PCA) is arguably the most widely used dimension-reduction method for vector-type data. When applied to a sample of images, PCA requires vectorization of the image data, which in turn entails solving an eigenvalue problem for the sample covariance matrix. We propose herein a two-stage dimension reduction (2SDR) method for image reconstruction from high-dimensional nois…
▽ More
Principal component analysis (PCA) is arguably the most widely used dimension-reduction method for vector-type data. When applied to a sample of images, PCA requires vectorization of the image data, which in turn entails solving an eigenvalue problem for the sample covariance matrix. We propose herein a two-stage dimension reduction (2SDR) method for image reconstruction from high-dimensional noisy image data. The first stage treats the image as a matrix, which is a tensor of order 2, and uses multilinear principal component analysis (MPCA) for matrix rank reduction and image denoising. The second stage vectorizes the reduced-rank matrix and achieves further dimension and noise reduction. Simulation studies demonstrate excellent performance of 2SDR, for which we also develop an asymptotic theory that establishes consistency of its rank selection. Applications to cryo-EM (cryogenic electronic microscopy), which has revolutionized structural biology, organic and medical chemistry, cellular and molecular physiology in the past decade, are also provided and illustrated with benchmark cryo-EM datasets. Connections to other contemporaneous developments in image reconstruction and high-dimensional statistical inference are also discussed.
△ Less
Submitted 27 February, 2021; v1 submitted 21 November, 2019;
originally announced November 2019.
-
XL-Editor: Post-editing Sentences with XLNet
Authors:
Yong-Siang Shih,
Wei-Cheng Chang,
Yiming Yang
Abstract:
While neural sequence generation models achieve initial success for many NLP applications, the canonical decoding procedure with left-to-right generation order (i.e., autoregressive) in one-pass can not reflect the true nature of human revising a sentence to obtain a refined result. In this work, we propose XL-Editor, a novel training framework that enables state-of-the-art generalized autoregress…
▽ More
While neural sequence generation models achieve initial success for many NLP applications, the canonical decoding procedure with left-to-right generation order (i.e., autoregressive) in one-pass can not reflect the true nature of human revising a sentence to obtain a refined result. In this work, we propose XL-Editor, a novel training framework that enables state-of-the-art generalized autoregressive pretraining methods, XLNet specifically, to revise a given sentence by the variable-length insertion probability. Concretely, XL-Editor can (1) estimate the probability of inserting a variable-length sequence into a specific position of a given sentence; (2) execute post-editing operations such as insertion, deletion, and replacement based on the estimated variable-length insertion probability; (3) complement existing sequence-to-sequence models to refine the generated sequences. Empirically, we first demonstrate better post-editing capabilities of XL-Editor over XLNet on the text insertion and deletion tasks, which validates the effectiveness of our proposed framework. Furthermore, we extend XL-Editor to the unpaired text style transfer task, where transferring the target style onto a given sentence can be naturally viewed as post-editing the sentence into the target style. XL-Editor achieves significant improvement in style transfer accuracy and also maintains coherent semantic of the original sentence, showing the broad applicability of our method.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
Vanishing Nodes: Another Phenomenon That Makes Training Deep Neural Networks Difficult
Authors:
Wen-Yu Chang,
Tsung-Nan Lin
Abstract:
It is well known that the problem of vanishing/exploding gradients is a challenge when training deep networks. In this paper, we describe another phenomenon, called vanishing nodes, that also increases the difficulty of training deep neural networks. As the depth of a neural network increases, the network's hidden nodes have more highly correlated behavior. This results in great similarities betwe…
▽ More
It is well known that the problem of vanishing/exploding gradients is a challenge when training deep networks. In this paper, we describe another phenomenon, called vanishing nodes, that also increases the difficulty of training deep neural networks. As the depth of a neural network increases, the network's hidden nodes have more highly correlated behavior. This results in great similarities between these nodes. The redundancy of hidden nodes thus increases as the network becomes deeper. We call this problem vanishing nodes, and we propose the metric vanishing node indicator (VNI) for quantitatively measuring the degree of vanishing nodes. The VNI can be characterized by the network parameters, which is shown analytically to be proportional to the depth of the network and inversely proportional to the network width. The theoretical results show that the effective number of nodes vanishes to one when the VNI increases to one (its maximal value), and that vanishing/exploding gradients and vanishing nodes are two different challenges that increase the difficulty of training deep neural networks. The numerical results from the experiments suggest that the degree of vanishing nodes will become more evident during back-propagation training, and that when the VNI is equal to 1, the network cannot learn simple tasks (e.g. the XOR problem) even when the gradients are neither vanishing nor exploding. We refer to this kind of gradients as the walking dead gradients, which cannot help the network converge when having a relatively large enough scale. Finally, the experiments show that the likelihood of failed training increases as the depth of the network increases. The training will become much more difficult due to the lack of network representation capability.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Orthogonality Constrained Multi-Head Attention For Keyword Spotting
Authors:
Mingu Lee,
**kyu Lee,
Hye ** Jang,
Byeonggeun Kim,
Wonil Chang,
Kyuwoong Hwang
Abstract:
Multi-head attention mechanism is capable of learning various representations from sequential data while paying attention to different subsequences, e.g., word-pieces or syllables in a spoken word. From the subsequences, it retrieves richer information than a single-head attention which only summarizes the whole sequence into one context vector. However, a naive use of the multi-head attention doe…
▽ More
Multi-head attention mechanism is capable of learning various representations from sequential data while paying attention to different subsequences, e.g., word-pieces or syllables in a spoken word. From the subsequences, it retrieves richer information than a single-head attention which only summarizes the whole sequence into one context vector. However, a naive use of the multi-head attention does not guarantee such richness as the attention heads may have positional and representational redundancy. In this paper, we propose a regularization technique for multi-head attention mechanism in an end-to-end neural keyword spotting system. Augmenting regularization terms which penalize positional and contextual non-orthogonality between the attention heads encourages to output different representations from separate subsequences, which in turn enables leveraging structured information without explicit sequence models such as hidden Markov models. In addition, intra-head contextual non-orthogonality regularization encourages each attention head to have similar representations across keyword examples, which helps classification by reducing feature variability. The experimental results demonstrate that the proposed regularization technique significantly improves the keyword spotting performance for the keyword "Hey Snapdragon".
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Fast And Efficient Boolean Matrix Factorization By Geometric Segmentation
Authors:
Changlin Wan,
Wennan Chang,
Tong Zhao,
Mengya Li,
Sha Cao,
Chi Zhang
Abstract:
Boolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relatio…
▽ More
Boolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relationships between the features and samples. Inspired by binary matrix permutation theories and geometric segmentation, we developed a fast and efficient BMF approach called MEBF (Median Expansion for Boolean Factorization). Overall, MEBF adopted a heuristic approach to locate binary patterns presented as submatrices that are dense in 1's. At each iteration, MEBF permutates the rows and columns such that the permutated matrix is approximately Upper Triangular-Like (UTL) with so-called Simultaneous Consecutive-ones Property (SC1P). The largest submatrix dense in 1 would lies on the upper triangular area of the permutated matrix, and its location was determined based on a geometric segmentation of a triangular. We compared MEBF with other state of the art approaches on data scenarios with different sparsity and noise levels. MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate sparse patterns than popular methods such as ASSO, PANDA and MP. We demonstrated the application of MEBF on both binary and non-binary data sets, and revealed its further potential in knowledge retrieving and data denoising.
△ Less
Submitted 10 February, 2020; v1 submitted 9 September, 2019;
originally announced September 2019.
-
An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network
Authors:
Sungrack Yun,
Janghoon Cho,
Jungyun Eum,
Wonil Chang,
Kyuwoong Hwang
Abstract:
This paper presents an end-to-end text-independent speaker verification framework by jointly considering the speaker embedding (SE) network and automatic speech recognition (ASR) network. The SE network learns to output an embedding vector which distinguishes the speaker characteristics of the input utterance, while the ASR network learns to recognize the phonetic context of the input. In training…
▽ More
This paper presents an end-to-end text-independent speaker verification framework by jointly considering the speaker embedding (SE) network and automatic speech recognition (ASR) network. The SE network learns to output an embedding vector which distinguishes the speaker characteristics of the input utterance, while the ASR network learns to recognize the phonetic context of the input. In training our speaker verification framework, we consider both the triplet loss minimization and adversarial gradient of the ASR network to obtain more discriminative and text-independent speaker embedding vectors. With the triplet loss, the distances between the embedding vectors of the same speaker are minimized while those of different speakers are maximized. Also, with the adversarial gradient of the ASR network, the text-dependency of the speaker embedding vector can be reduced. In the experiments, we evaluated our speaker verification framework using the LibriSpeech and CHiME 2013 dataset, and the evaluation results show that our speaker verification framework shows lower equal error rate and better text-independency compared to the other approaches.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Ice Model Calibration Using Semi-continuous Spatial Data
Authors:
Won Chang,
Bledar A. Konomi,
Georgios Karagiannis,
Yawen Guan,
Murali Haran
Abstract:
Rapid changes in Earth's cryosphere caused by human activity can lead to significant environmental impacts. Computer models provide a useful tool for understanding the behavior and projecting the future of Arctic and Antarctic ice sheets. However, these models are typically subject to large parametric uncertainties due to poorly constrained model input parameters that govern the behavior of simula…
▽ More
Rapid changes in Earth's cryosphere caused by human activity can lead to significant environmental impacts. Computer models provide a useful tool for understanding the behavior and projecting the future of Arctic and Antarctic ice sheets. However, these models are typically subject to large parametric uncertainties due to poorly constrained model input parameters that govern the behavior of simulated ice sheets. Computer model calibration provides a formal statistical framework to infer parameters using observational data, and to quantify the uncertainty in projections due to the uncertainty in these parameters. Calibration of ice sheet models is often challenging because the relevant model output and observational data take the form of semi-continuous spatial data, with a point mass at zero and a right-skewed continuous distribution for positive values. Current calibration approaches cannot handle such data. Here we introduce a hierarchical latent variable model that handles binary spatial patterns and positive continuous spatial patterns as separate components. To overcome challenges due to high-dimensionality we use likelihood-based generalized principal component analysis to impose low-dimensional structures on the latent variables for spatial dependence. We apply our methodology to calibrate a physical model for the Antarctic ice sheet and demonstrate that we can overcome the aforementioned modeling and computational challenges. As a result of our calibration, we obtain improved future ice-volume change projections.
△ Less
Submitted 31 July, 2019;
originally announced July 2019.
-
Domain-Specific Batch Normalization for Unsupervised Domain Adaptation
Authors:
Woong-Gi Chang,
Tackgeun You,
Seonguk Seo,
Suha Kwak,
Bohyung Han
Abstract:
We propose a novel unsupervised domain adaptation framework based on domain-specific batch normalization in deep neural networks. We aim to adapt to both domains by specializing batch normalization layers in convolutional neural networks while allowing them to share all other model parameters, which is realized by a two-stage algorithm. In the first stage, we estimate pseudo-labels for the example…
▽ More
We propose a novel unsupervised domain adaptation framework based on domain-specific batch normalization in deep neural networks. We aim to adapt to both domains by specializing batch normalization layers in convolutional neural networks while allowing them to share all other model parameters, which is realized by a two-stage algorithm. In the first stage, we estimate pseudo-labels for the examples in the target domain using an external unsupervised domain adaptation algorithm---for example, MSTN or CPUA---integrating the proposed domain-specific batch normalization. The second stage learns the final models using a multi-task classification loss for the source and target domains. Note that the two domains have separate batch normalization layers in both stages. Our framework can be easily incorporated into the domain adaptation techniques based on deep neural networks with batch normalization layers. We also present that our approach can be extended to the problem with multiple source domains. The proposed algorithm is evaluated on multiple benchmark datasets and achieves the state-of-the-art accuracy in the standard setting and the multi-source domain adaption scenario.
△ Less
Submitted 27 May, 2019;
originally announced June 2019.
-
Random Sampling for Distributed Coded Matrix Multiplication
Authors:
Wei-Ting Chang,
Ravi Tandon
Abstract:
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix multiplication, that are robust to stragglers (i.e., machines that may perform slower computations). In many scenarios, instead of exact computation, approximate matr…
▽ More
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix multiplication, that are robust to stragglers (i.e., machines that may perform slower computations). In many scenarios, instead of exact computation, approximate matrix multiplication, i.e., allowing for a tolerable error is also sufficient. Such approximate schemes make use of randomization techniques to speed up the computation process. In this paper, we initiate the study of approximate coded matrix multiplication, and investigate the joint synergies offered by randomization and coding. Specifically, we propose two coded randomized sampling schemes that use (a) codes to achieve a desired recovery threshold and (b) random sampling to obtain approximation of the matrix multiplication. Tradeoffs between the recovery threshold and approximation error obtained through random sampling are investigated for a class of coded matrix multiplication schemes.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Taming Pretrained Transformers for Extreme Multi-label Text Classification
Authors:
Wei-Cheng Chang,
Hsiang-Fu Yu,
Kai Zhong,
Yiming Yang,
Inderjit Dhillon
Abstract:
We consider the extreme multi-label text classification (XMC) problem: given an input text, return the most relevant labels from a large label collection. For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community. Recently, deep pretrained transformer models have achieved sta…
▽ More
We consider the extreme multi-label text classification (XMC) problem: given an input text, return the most relevant labels from a large label collection. For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community. Recently, deep pretrained transformer models have achieved state-of-the-art performance on many NLP tasks including sentence classification, albeit with small label sets. However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue. In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. The proposed method achieves new state-of-the-art results on four XMC benchmark datasets. In particular, on a Wiki dataset with around 0.5 million labels, the prec@1 of X-Transformer is 77.28%, a substantial improvement over state-of-the-art XMC approaches Parabel (linear) and AttentionXML (neural), which achieve 68.70% and 76.95% precision@1, respectively. We further apply X-Transformer to a product2query dataset from Amazon and gained 10.7% relative improvement on prec@1 over Parabel.
△ Less
Submitted 23 June, 2020; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Implicit Kernel Learning
Authors:
Chun-Liang Li,
Wei-Cheng Chang,
Youssef Mroueh,
Yiming Yang,
Barnabás Póczos
Abstract:
Kernels are powerful and versatile tools in machine learning and statistics. Although the notion of universal kernels and characteristic kernels has been studied, kernel selection still greatly influences the empirical performance. While learning the kernel in a data driven way has been investigated, in this paper we explore learning the spectral distribution of kernel via implicit generative mode…
▽ More
Kernels are powerful and versatile tools in machine learning and statistics. Although the notion of universal kernels and characteristic kernels has been studied, kernel selection still greatly influences the empirical performance. While learning the kernel in a data driven way has been investigated, in this paper we explore learning the spectral distribution of kernel via implicit generative models parametrized by deep neural networks. We called our method Implicit Kernel Learning (IKL). The proposed framework is simple to train and inference is performed via sampling random Fourier features. We investigate two applications of the proposed IKL as examples, including generative adversarial networks with MMD (MMD GAN) and standard supervised learning. Empirically, MMD GAN with IKL outperforms vanilla predefined kernels on both image and text generation benchmarks; using IKL with Random Kitchen Sinks also leads to substantial improvement over existing state-of-the-art kernel learning algorithms on popular supervised learning benchmarks. Theory and conditions for using IKL in both applications are also studied as well as connections to previous state-of-the-art methods.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Kernel Change-point Detection with Auxiliary Deep Generative Models
Authors:
Wei-Cheng Chang,
Chun-Liang Li,
Yiming Yang,
Barnabás Póczos
Abstract:
Detecting the emergence of abrupt property changes in time series is a challenging problem. Kernel two-sample test has been studied for this task which makes fewer assumptions on the distributions than traditional parametric approaches. However, selecting kernels is non-trivial in practice. Although kernel selection for two-sample test has been studied, the insufficient samples in change point det…
▽ More
Detecting the emergence of abrupt property changes in time series is a challenging problem. Kernel two-sample test has been studied for this task which makes fewer assumptions on the distributions than traditional parametric approaches. However, selecting kernels is non-trivial in practice. Although kernel selection for two-sample test has been studied, the insufficient samples in change point detection problem hinder the success of those developed kernel selection algorithms. In this paper, we propose KL-CPD, a novel kernel learning framework for time series CPD that optimizes a lower bound of test power via an auxiliary generative model. With deep kernel parameterization, KL-CPD endows kernel two-sample test with the data-driven kernel to detect different types of change-points in real-world applications. The proposed approach significantly outperformed other state-of-the-art methods in our comparative evaluation of benchmark datasets and simulation studies.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
A Regularized Spatial Market Segmentation Method with Dirichlet Process Gaussian Mixture Prior
Authors:
Won Chang,
Sunghoon Kim,
Heewon Chae
Abstract:
Spatially referenced data are increasingly available thanks to the development of modern GPS technology. They also provide rich opportunities for spatial analytics in the field of marketing science. Our main interest is to propose a new efficient statistical framework to conduct spatial segmentation analysis for restaurants located in a metropolitan area in the U.S. The spatial segmentation proble…
▽ More
Spatially referenced data are increasingly available thanks to the development of modern GPS technology. They also provide rich opportunities for spatial analytics in the field of marketing science. Our main interest is to propose a new efficient statistical framework to conduct spatial segmentation analysis for restaurants located in a metropolitan area in the U.S. The spatial segmentation problem poses important statistical challenges: selecting the optimal number of underlying structures of market segments, capturing complex and flexible spatial structures, and resolving any possible small n and large p issue which can be typical in latent class analysis. Existing approaches try to tackle these issues in heuristic ways or seem silent on them. To overcome these challenges, we propose a new statistical framework based on regularized Bayesian spatial mixture regressions with Dirichlet process integrating ridge or lasso regularization. Our simulation study demonstrates that the proposed models successfully recover the underlying spatial clustering structures and outperforms two existing benchmark models. In the empirical analysis using online customer satisfaction data from the Yelp, our models provides interesting insights on segment-level key drivers of customer satisfaction and interpretable relationships between regional demographics and restaurants' characteristics.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Computer model calibration based on image war** metrics: an application for sea ice deformation
Authors:
Yawen Guan,
Christian Sampson,
J. Derek Tucker,
Won Chang,
Anirban Mondal,
Murali Haran,
Deborah Sulsky
Abstract:
Arctic sea ice plays an important role in the global climate. Sea ice models governed by physical equations have been used to simulate the state of the ice including characteristics such as ice thickness, concentration, and motion. More recent models also attempt to capture features such as fractures or leads in the ice. These simulated features can be partially misaligned or misshapen when compar…
▽ More
Arctic sea ice plays an important role in the global climate. Sea ice models governed by physical equations have been used to simulate the state of the ice including characteristics such as ice thickness, concentration, and motion. More recent models also attempt to capture features such as fractures or leads in the ice. These simulated features can be partially misaligned or misshapen when compared to observational data, whether due to numerical approximation or incomplete physics. In order to make realistic forecasts and improve understanding of the underlying processes, it is necessary to calibrate the numerical model to field data. Traditional calibration methods based on generalized least-square metrics are flawed for linear features such as sea ice cracks. We develop a statistical emulation and calibration framework that accounts for feature misalignment and misshapenness, which involves optimally aligning model output with observed features using cutting edge image registration techniques. This work can also have application to other physical models which produce coherent structures.
△ Less
Submitted 24 January, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Instance-based entropy fuzzy support vector machine for imbalanced data
Authors:
Poong** Cho,
Minhyuk Lee,
Woo** Chang
Abstract:
Imbalanced classification has been a major challenge for machine learning because many standard classifiers mainly focus on balanced datasets and tend to have biased results towards the majority class. We modify entropy fuzzy support vector machine (EFSVM) and introduce instance-based entropy fuzzy support vector machine (IEFSVM). Both EFSVM and IEFSVM use the entropy information of k-nearest neig…
▽ More
Imbalanced classification has been a major challenge for machine learning because many standard classifiers mainly focus on balanced datasets and tend to have biased results towards the majority class. We modify entropy fuzzy support vector machine (EFSVM) and introduce instance-based entropy fuzzy support vector machine (IEFSVM). Both EFSVM and IEFSVM use the entropy information of k-nearest neighbors to determine the fuzzy membership value for each sample which prioritizes the importance of each sample. IEFSVM considers the diversity of entropy patterns for each sample when increasing the size of neighbors, k, while EFSVM uses single entropy information of the fixed size of neighbors for all samples. By varying k, we can reflect the component change of sample's neighbors from near to far distance in the determination of fuzzy value membership. Numerical experiments on 35 public and 12 real-world imbalanced datasets are performed to validate IEFSVM and area under the receiver operating characteristic curve (AUC) is used to compare its performance with other SVMs and machine learning methods. IEFSVM shows a much higher AUC value for datasets with high imbalance ratio, implying that IEFSVM is effective in dealing with the class imbalance problem.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Diagnosing added value of convection-permitting regional models using precipitation event identification and tracking
Authors:
Won Chang,
Jiali Wang,
Julian Marohnic,
Rao Kotamarthi,
Elisabeth J. Moyer
Abstract:
Dynamical downscaling with high-resolution regional climate models may offer the possibility of realistically reproducing precipitation and weather events in climate simulations. As resolutions fall to order kilometers, the use of explicit rather than parametrized convection may offer even greater fidelity. However, these increased model resolutions both allow and require increasingly complex diag…
▽ More
Dynamical downscaling with high-resolution regional climate models may offer the possibility of realistically reproducing precipitation and weather events in climate simulations. As resolutions fall to order kilometers, the use of explicit rather than parametrized convection may offer even greater fidelity. However, these increased model resolutions both allow and require increasingly complex diagnostics for evaluating model fidelity. In this study we use a suite of dynamically downscaled simulations of the summertime U.S. (WRF driven by NCEP) with systematic variations in parameters and treatment of convection as a test case for evaluation of model precipitation. In particular, we use a novel rainstorm identification and tracking algorithm that allocates essentially all rainfall to individual precipitation events (Chang et al. 2016). This approach allows multiple insights, including that, at least in these runs, model wet bias is driven by excessive areal extent of precipitating events. Biases are time-dependent, producing excessive diurnal cycle amplitude. We show that this effect is produced not by new production of events but by excessive enlargement of long-lived precipitation events during daytime, and that in the domain average, precipitation biases appear best represented as additive offsets. Of all model configurations evaluated, convection-permitting simulations most consistently reduced biases in precipitation event characteristics.
△ Less
Submitted 11 December, 2017;
originally announced December 2017.
-
The Mixing method: low-rank coordinate descent for semidefinite programming with diagonal constraints
Authors:
Po-Wei Wang,
Wei-Cheng Chang,
J. Zico Kolter
Abstract:
In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints. The approach, which we call the Mixing method, is extremely simple to implement, has no free parameters, and typically attains an order of magnitude or better improvement in optimization performance over the current state of the art. We show that the method is strictly…
▽ More
In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints. The approach, which we call the Mixing method, is extremely simple to implement, has no free parameters, and typically attains an order of magnitude or better improvement in optimization performance over the current state of the art. We show that the method is strictly decreasing, converges to a critical point, and further that for sufficient rank all non-optimal critical points are unstable. Moreover, we prove that with a step size, the Mixing method converges to the global optimum of the semidefinite program almost surely in a locally linear rate under random initialization. This is the first low-rank semidefinite programming method that has been shown to achieve a global optimum on the spherical manifold without assumption. We apply our algorithm to two related domains: solving the maximum cut semidefinite relaxation, and solving a maximum satisfiability relaxation (we also briefly consider additional applications such as learning word embeddings). In all settings, we demonstrate substantial improvement over the existing state of the art along various dimensions, and in total, this work expands the scope and scale of problems that can be solved using semidefinite programming methods.
△ Less
Submitted 4 July, 2018; v1 submitted 1 June, 2017;
originally announced June 2017.
-
MMD GAN: Towards Deeper Understanding of Moment Matching Network
Authors:
Chun-Liang Li,
Wei-Cheng Chang,
Yu Cheng,
Yiming Yang,
Barnabás Póczos
Abstract:
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large be…
▽ More
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN. The new distance measure in MMD GAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works.
△ Less
Submitted 27 November, 2017; v1 submitted 23 May, 2017;
originally announced May 2017.
-
Data-driven Random Fourier Features using Stein Effect
Authors:
Wei-Cheng Chang,
Chun-Liang Li,
Yiming Yang,
Barnabas Poczos
Abstract:
Large-scale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular [Rahimi and Recht, 2007], where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC) integration [Yang et al., 2014]. A limitation of the current approaches is that all the features r…
▽ More
Large-scale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular [Rahimi and Recht, 2007], where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC) integration [Yang et al., 2014]. A limitation of the current approaches is that all the features receive an equal weight summing to 1. In this paper, we propose a novel shrinkage estimator from "Stein effect", which provides a data-driven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk. We further present an efficient randomized algorithm for large-scale applications of the proposed method. Our empirical results on six benchmark data sets demonstrate the advantageous performance of this approach over representative baselines in both kernel approximation and supervised learning tasks.
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
Changes in Spatio-temporal Precipitation Patterns in Changing Climate Conditions
Authors:
Won Chang,
Michael L. Stein,
Jiali Wang,
V. Rao Kotamarthi,
Elisabeth J. Moyer
Abstract:
Climate models robustly imply that some significant change in precipitation patterns will occur. Models consistently project that the intensity of individual precipitation events increases by approximately 6-7%/K, following the increase in atmospheric water content, but that total precipitation increases by a lesser amount (1-2 %/K in the global average in transient runs). Some other aspect of pre…
▽ More
Climate models robustly imply that some significant change in precipitation patterns will occur. Models consistently project that the intensity of individual precipitation events increases by approximately 6-7%/K, following the increase in atmospheric water content, but that total precipitation increases by a lesser amount (1-2 %/K in the global average in transient runs). Some other aspect of precipitation events must then change to compensate for this difference. We develop here a new methodology for identifying individual rainstorms and studying their physical characteristics - including starting location, intensity, spatial extent, duration, and trajectory - that allows identifying that compensating mechanism. We apply this technique to precipitation over the contiguous U.S. from both radar-based data products and high-resolution model runs simulating 80 years of business-as-usual warming. In model studies, we find that the dominant compensating mechanism is a reduction of storm size. In summer, rainstorms become more intense but smaller, in winter, rainstorm shrinkage still dominates, but storms also become less numerous and shorter duration. These results imply that flood impacts from climate change will be less severe than would be expected from changes in precipitation intensity alone. We show also that projected changes are smaller than model-observation biases, implying that the best means of incorporating them into impact assessments is via "data-driven simulations" that apply model-projected changes to observational data. We therefore develop a simulation algorithm that statistically describes model changes in precipitation characteristics and adjusts data accordingly, and show that, especially for summertime precipitation, it outperforms simulation approaches that do not include spatial information.
△ Less
Submitted 24 May, 2016; v1 submitted 6 January, 2016;
originally announced January 2016.
-
Improving Ice Sheet Model Calibration Using Paleoclimate and Modern Data
Authors:
Won Chang,
Murali Haran,
Patrick Applegate,
David Pollard
Abstract:
Human-induced climate change may cause significant ice volume loss from the West Antarctic Ice Sheet (WAIS). Projections of ice volume change from ice-sheet models and corresponding future sea-level rise have large uncertainties due to poorly constrained input parameters. In most future applications to date, model calibration has utilized only modern or recent (decadal) observations, leaving input…
▽ More
Human-induced climate change may cause significant ice volume loss from the West Antarctic Ice Sheet (WAIS). Projections of ice volume change from ice-sheet models and corresponding future sea-level rise have large uncertainties due to poorly constrained input parameters. In most future applications to date, model calibration has utilized only modern or recent (decadal) observations, leaving input parameters that control the long-term behavior of WAIS largely unconstrained. Many paleo-observations are in the form of localized time series, while modern observations are non-Gaussian spatial data; combining information across these types poses non-trivial statistical challenges. Here we introduce a computationally efficient calibration approach that utilizes both modern and paleo-observations to generate better-constrained ice volume projections. Using fast emulators built upon principal component analysis and a reduced dimension calibration model, we can efficiently handle high-dimensional and non-Gaussian data. We apply our calibration approach to the PSU3D-ICE model which can realistically simulate long-term behavior of WAIS. Our results show that using paleo observations in calibration significantly reduces parametric uncertainty, resulting in sharper projections about the future state of WAIS. One benefit of using paleo observations is found to be that unrealistic simulations with overshoots in past ice retreat and projected future regrowth are eliminated.
△ Less
Submitted 24 August, 2016; v1 submitted 6 October, 2015;
originally announced October 2015.
-
Calibrating an ice sheet model using high-dimensional binary spatial data
Authors:
Won Chang,
Murali Haran,
Patrick Applegate,
David Pollard
Abstract:
Rapid retreat of ice in the Amundsen Sea sector of West Antarctica may cause drastic sea level rise, posing significant risks to populations in low-lying coastal regions. Calibration of computer models representing the behavior of the West Antarctic Ice Sheet is key for informative projections of future sea level rise. However, both the relevant observations and the model output are high-dimension…
▽ More
Rapid retreat of ice in the Amundsen Sea sector of West Antarctica may cause drastic sea level rise, posing significant risks to populations in low-lying coastal regions. Calibration of computer models representing the behavior of the West Antarctic Ice Sheet is key for informative projections of future sea level rise. However, both the relevant observations and the model output are high-dimensional binary spatial data; existing computer model calibration methods are unable to handle such data. Here we present a novel calibration method for computer models whose output is in the form of binary spatial data. To mitigate the computational and inferential challenges posed by our approach, we apply a generalized principal component based dimension reduction method. To demonstrate the utility of our method, we calibrate the PSU3D-ICE model by comparing the output from a 499-member perturbed-parameter ensemble with observations from the Amundsen Sea sector of the ice sheet. Our methods help rigorously characterize the parameter uncertainty even in the presence of systematic data-model discrepancies and dependence in the errors. Our method also helps inform environmental risk analyses by contributing to improved projections of sea level rise from the ice sheets.
△ Less
Submitted 20 May, 2016; v1 submitted 8 January, 2015;
originally announced January 2015.
-
A composite likelihood approach to computer model calibration using high-dimensional spatial data
Authors:
Won Chang,
Murali Haran,
Roman Olson,
Klaus Keller
Abstract:
Computer models are used to model complex processes in various disciplines. Often, a key source of uncertainty in the behavior of complex computer models is uncertainty due to unknown model input parameters. Statistical computer model calibration is the process of inferring model parameter values, along with associated uncertainties, from observations of the physical process and from model outputs…
▽ More
Computer models are used to model complex processes in various disciplines. Often, a key source of uncertainty in the behavior of complex computer models is uncertainty due to unknown model input parameters. Statistical computer model calibration is the process of inferring model parameter values, along with associated uncertainties, from observations of the physical process and from model outputs at various parameter settings. Observations and model outputs are often in the form of high-dimensional spatial fields, especially in the environmental sciences. Sound statistical inference may be computationally challenging in such situations. Here we introduce a composite likelihood-based approach to perform computer model calibration with high-dimensional spatial data. While composite likelihood has been studied extensively in the context of spatial statistics, computer model calibration using composite likelihood poses several new challenges. We propose a computationally efficient approach for Bayesian computer model calibration using composite likelihood. We also develop a methodology based on asymptotic theory for adjusting the composite likelihood posterior distribution so that it accurately represents posterior uncertainties. We study the application of our new approach in the context of calibration for a climate model.
△ Less
Submitted 31 July, 2013;
originally announced August 2013.
-
Fast dimension-reduced climate model calibration and the effect of data aggregation
Authors:
Won Chang,
Murali Haran,
Roman Olson,
Klaus Keller
Abstract:
How will the climate system respond to anthropogenic forcings? One approach to this question relies on climate model projections. Current climate projections are considerably uncertain. Characterizing and, if possible, reducing this uncertainty is an area of ongoing research. We consider the problem of making projections of the North Atlantic meridional overturning circulation (AMOC). Uncertaintie…
▽ More
How will the climate system respond to anthropogenic forcings? One approach to this question relies on climate model projections. Current climate projections are considerably uncertain. Characterizing and, if possible, reducing this uncertainty is an area of ongoing research. We consider the problem of making projections of the North Atlantic meridional overturning circulation (AMOC). Uncertainties about climate model parameters play a key role in uncertainties in AMOC projections. When the observational data and the climate model output are high-dimensional spatial data sets, the data are typically aggregated due to computational constraints. The effects of aggregation are unclear because statistically rigorous approaches for model parameter inference have been infeasible for high-resolution data. Here we develop a flexible and computationally efficient approach using principal components and basis expansions to study the effect of spatial data aggregation on parametric and projection uncertainties. Our Bayesian reduced-dimensional calibration approach allows us to study the effect of complicated error structures and data-model discrepancies on our ability to learn about climate model parameters from high-dimensional data. Considering high-dimensional spatial observations reduces the effect of deep uncertainty associated with prior specifications for the data-model discrepancy. Also, using the unaggregated data results in sharper projections based on our climate model. Our computationally efficient approach may be widely applicable to a variety of high-dimensional computer model calibration problems.
△ Less
Submitted 31 July, 2014; v1 submitted 6 March, 2013;
originally announced March 2013.
-
$γ$-SUP: A clustering algorithm for cryo-electron microscopy images of asymmetric particles
Authors:
Ting-Li Chen,
Dai-Ni Hsieh,
Hung Hung,
I-** Tu,
Pei-Shien Wu,
Yi-Ming Wu,
Wei-Hau Chang,
Su-Yun Huang
Abstract:
Cryo-electron microscopy (cryo-EM) has recently emerged as a powerful tool for obtaining three-dimensional (3D) structures of biological macromolecules in native states. A minimum cryo-EM image data set for deriving a meaningful reconstruction is comprised of thousands of randomly orientated projections of identical particles photographed with a small number of electrons. The computation of 3D str…
▽ More
Cryo-electron microscopy (cryo-EM) has recently emerged as a powerful tool for obtaining three-dimensional (3D) structures of biological macromolecules in native states. A minimum cryo-EM image data set for deriving a meaningful reconstruction is comprised of thousands of randomly orientated projections of identical particles photographed with a small number of electrons. The computation of 3D structure from 2D projections requires clustering, which aims to enhance the signal to noise ratio in each view by grou** similarly oriented images. Nevertheless, the prevailing clustering techniques are often compromised by three characteristics of cryo-EM data: high noise content, high dimensionality and large number of clusters. Moreover, since clustering requires registering images of similar orientation into the same pixel coordinates by 2D alignment, it is desired that the clustering algorithm can label misaligned images as outliers. Herein, we introduce a clustering algorithm $γ$-SUP to model the data with a $q$-Gaussian mixture and adopt the minimum $γ$-divergence for estimation, and then use a self-updating procedure to obtain the numerical solution. We apply $γ$-SUP to the cryo-EM images of two benchmark macromolecules, RNA polymerase II and ribosome. In the former case, simulated images were chosen to decouple clustering from alignment to demonstrate $γ$-SUP is more robust to misalignment outliers than the existing clustering methods used in the cryo-EM community. In the latter case, the clustering of real cryo-EM data by our $γ$-SUP method eliminates noise in many views to reveal true structure features of ribosome at the projection level.
△ Less
Submitted 25 April, 2014; v1 submitted 9 May, 2012;
originally announced May 2012.