Search | arXiv e-print repository

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Authors: Xiaoxia Wu, Haojun Xia, Stephen Youn, Zhen Zheng, Shiyang Chen, Arash Bakhtiari, Michael Wyatt, Reza Yazdani Aminabadi, Yuxiong He, Olatunji Ruwase, Leon Song, Zhewei Yao

Abstract: This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to more generative categories such as code generation and abstractive summarization, in which we found that INT4 quantization can significantly underperf… ▽ More This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to more generative categories such as code generation and abstractive summarization, in which we found that INT4 quantization can significantly underperform. However, simply shifting to higher precision formats like FP6 has been particularly challenging, thus overlooked, due to poor performance caused by the lack of sophisticated integration and system acceleration strategies on current AI hardware. Our results show that FP6, even with a coarse-grain quantization scheme, performs robustly across various algorithms and tasks, demonstrating its superiority in accuracy and versatility. Notably, with the FP6 quantization, \codestar-15B model performs comparably to its FP16 counterpart in code generation, and for smaller models like the 406M it closely matches their baselines in summarization. Neither can be achieved by INT4. To better accommodate various AI hardware and achieve the best system performance, we propose a novel 4+2 design for FP6 to achieve similar latency to the state-of-the-art INT4 fine-grain quantization. With our design, FP6 can become a promising solution to the current 4-bit quantization methods used in LLMs. △ Less

Submitted 18 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2310.15932 [pdf, other]

Online Robust Mean Estimation

Authors: Daniel M. Kane, Ilias Diakonikolas, Hanshen Xiao, Sihan Liu

Abstract: We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $μ_t$ for the true mean value of the process at time… ▽ More We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $μ_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $ε$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $μ$ to the true mean $μ^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of $(ε,δ)$-stability, we give an efficient online algorithm that outputs estimates $μ_t$, $t \in [T],$ such that with high probability it holds that $\|μ-μ^\ast\|_2 = O(δ\log(T))$, where $μ= (μ_t)_{t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(δ)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: To appear in SODA2024

arXiv:2309.09452 [pdf, other]

doi 10.1016/j.ecolind.2024.111828

Beyond expected values: Making environmental decisions using value of information analysis when measurement outcome matters

Authors: Morenikeji D. Akinlotan, David J. Warne, Kate J. Helmstedt, Sarah A. Vollert, Iadine Chadès, Ryan F. Heneghan, Hui Xiao, Matthew P. Adams

Abstract: In ecological and environmental contexts, management actions must sometimes be chosen urgently. Value of information (VoI) analysis provides a quantitative toolkit for projecting the improved management outcomes expected after making additional measurements. However, traditional VoI analysis reports metrics as expected values (i.e. risk-neutral). This can be problematic because expected values hid… ▽ More In ecological and environmental contexts, management actions must sometimes be chosen urgently. Value of information (VoI) analysis provides a quantitative toolkit for projecting the improved management outcomes expected after making additional measurements. However, traditional VoI analysis reports metrics as expected values (i.e. risk-neutral). This can be problematic because expected values hide uncertainties in projections. The true value of a measurement will only be known after the measurement's outcome is known, leaving large uncertainty in the measurement's value before it is performed. As a result, the expected value metrics produced in traditional VoI analysis may not align with the priorities of a risk-averse decision-maker who wants to avoid low-value measurement outcomes. In the present work, we introduce four new VoI metrics that can address a decision-maker's risk-aversion to different measurement outcomes. We demonstrate the benefits of the new metrics with two ecological case studies for which traditional VoI analysis has been previously applied. Using the new metrics, we also demonstrate a clear mathematical link between the often-separated environmental decision-making disciplines of VoI and optimal design of experiments. This mathematical link has the potential to catalyse future collaborations between ecologists and statisticians to work together to quantitatively address environmental decision-making questions of fundamental importance. Overall, the introduced VoI metrics complement existing metrics to provide decision-makers with a comprehensive view of the value of, and risks associated with, a proposed monitoring or measurement activity. This is critical for improved environmental outcomes when decisions must be urgently made. △ Less

Submitted 14 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: 53 pages, 3 figures

Journal ref: Ecological Indicators 160 (2024) 111828

arXiv:2304.09981 [pdf, other]

Interpretable (not just posthoc-explainable) heterogeneous survivor bias-corrected treatment effects for assignment of postdischarge interventions to prevent readmissions

Authors: Hong**g Xia, Joshua C. Chang, Sarah Nowak, Sonya Mahajan, Rohit Mahajan, Ted L. Chang, Carson C. Chow

Abstract: We used survival analysis to quantify the impact of postdischarge evaluation and management (E/M) services in preventing hospital readmission or death. Our approach avoids a specific pitfall of applying machine learning to this problem, which is an inflated estimate of the effect of interventions, due to survivors bias -- where the magnitude of inflation may be conditional on heterogeneous confoun… ▽ More We used survival analysis to quantify the impact of postdischarge evaluation and management (E/M) services in preventing hospital readmission or death. Our approach avoids a specific pitfall of applying machine learning to this problem, which is an inflated estimate of the effect of interventions, due to survivors bias -- where the magnitude of inflation may be conditional on heterogeneous confounders in the population. This bias arises simply because in order to receive an intervention after discharge, a person must not have been readmitted in the intervening period. After deriving an expression for this phantom effect, we controlled for this and other biases within an inherently interpretable Bayesian survival framework. We identified case management services as being the most impactful for reducing readmissions overall. △ Less

Submitted 3 August, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: Submitted

Journal ref: PMLR 219:884-905, 2023

arXiv:2303.05223 [pdf, other]

LEAP: The latent exchangeability prior for borrowing information from historical data

Authors: Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, Joseph G. Ibrahim

Abstract: It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic prior provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue,… ▽ More It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic prior provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue, propensity score (PS) approaches have been proposed. However, PS approaches are only concerned with the covariate distribution, whereas exchangeability is typically assessed with parameters pertaining to the outcome. In this paper, we introduce the latent exchangeability prior (LEAP), where observations in the historical data are classified into exchangeable and non-exchangeable groups. The LEAP discounts the historical data by identifying the most relevant subjects from the historical data. We compare our proposed approach against alternative approaches in simulations and present a case study using our proposed prior to augment a control arm in a phase 3 clinical trial in plaque psoriasis with an unbalanced randomization scheme. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2208.12814 [pdf, other]

Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to prevent avoidable all-cause readmissions or death

Authors: Joshua C. Chang, Ted L. Chang, Carson C. Chow, Rohit Mahajan, Sonya Mahajan, Joe Maisog, Shashaank Vattikuti, Hong**g Xia

Abstract: We developed an inherently interpretable multilevel Bayesian framework for representing variation in regression coefficients that mimics the piecewise linearity of ReLU-activated deep neural networks. We used the framework to formulate a survival model for using medical claims to predict hospital readmission and death that focuses on discharge placement, adjusting for confounding in estimating cau… ▽ More We developed an inherently interpretable multilevel Bayesian framework for representing variation in regression coefficients that mimics the piecewise linearity of ReLU-activated deep neural networks. We used the framework to formulate a survival model for using medical claims to predict hospital readmission and death that focuses on discharge placement, adjusting for confounding in estimating causal local average treatment effects. We trained the model on a 5% sample of Medicare beneficiaries from 2008 and 2011, based on their 2009--2011 inpatient episodes, and then tested the model on 2012 episodes. The model scored an AUROC of approximately 0.76 on predicting all-cause readmissions -- defined using official Centers for Medicare and Medicaid Services (CMS) methodology -- or death within 30-days of discharge, being competitive against XGBoost and a Bayesian deep neural network, demonstrating that one need-not sacrifice interpretability for accuracy. Crucially, as a regression model, we provide what blackboxes cannot -- the exact gold-standard global interpretation of the model, identifying relative risk factors and quantifying the effect of discharge placement. We also show that the posthoc explainer SHAP fails to provide accurate explanations. △ Less

Submitted 29 January, 2023; v1 submitted 28 August, 2022; originally announced August 2022.

Comments: In review

arXiv:2110.00928 [pdf, other]

Multi-linear Tensor Autoregressive Models

Authors: Zebang Li, Han Xiao

Abstract: Contemporary time series analysis has seen more and more tensor type data, from many fields. For example, stocks can be grouped according to Size, Book-to-Market ratio, and Operating Profitability, leading to a 3-way tensor observation at each month. We propose an autoregressive model for the tensor-valued time series, with autoregressive terms depending on multi-linear coefficient matrices. Compa… ▽ More Contemporary time series analysis has seen more and more tensor type data, from many fields. For example, stocks can be grouped according to Size, Book-to-Market ratio, and Operating Profitability, leading to a 3-way tensor observation at each month. We propose an autoregressive model for the tensor-valued time series, with autoregressive terms depending on multi-linear coefficient matrices. Comparing with the traditional approach of vectoring the tensor observations and then applying the vector autoregressive model, the tensor autoregressive model preserves the tensor structure and admits corresponding interpretations. We introduce three estimators based on projection, least squares, and maximum likelihood. Our analysis considers both fixed dimensional and high dimensional settings. For the former we establish the central limit theorems of the estimators, and for the latter we focus on the convergence rates and the model selection. The performance of the model is demonstrated by simulated and real examples. △ Less

Submitted 3 October, 2021; originally announced October 2021.

arXiv:2110.00174 [pdf, other]

Empirical Quantitative Analysis of COVID-19 Forecasting Models

Authors: Yun Zhao, Yuqing Wang, Junfeng Liu, Haotian Xia, Zhenni Xu, Qinghang Hong, Zhiyang Zhou, Linda Petzold

Abstract: COVID-19 has been a public health emergency of international concern since early 2020. Reliable forecasting is critical to diminish the impact of this disease. To date, a large number of different forecasting models have been proposed, mainly including statistical models, compartmental models, and deep learning models. However, due to various uncertain factors across different regions such as econ… ▽ More COVID-19 has been a public health emergency of international concern since early 2020. Reliable forecasting is critical to diminish the impact of this disease. To date, a large number of different forecasting models have been proposed, mainly including statistical models, compartmental models, and deep learning models. However, due to various uncertain factors across different regions such as economics and government policy, no forecasting model appears to be the best for all scenarios. In this paper, we perform quantitative analysis of COVID-19 forecasting of confirmed cases and deaths across different regions in the United States with different forecasting horizons, and evaluate the relative impacts of the following three dimensions on the predictive performance (improvement and variation) through different evaluation metrics: model selection, hyperparameter tuning, and the length of time series required for training. We find that if a dimension brings about higher performance gains, if not well-tuned, it may also lead to harsher performance penalties. Furthermore, model selection is the dominant factor in determining the predictive performance. It is responsible for both the largest improvement and the largest variation in performance in all prediction tasks across different regions. While practitioners may perform more complicated time series analysis in practice, they should be able to achieve reasonable results if they have adequate insight into key decisions like model selection. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: ICDM workshop 2021

arXiv:2108.09431 [pdf, other]

Equivariant Variance Estimation for Multiple Change-point Model

Authors: Ning Hao, Yue Selena Niu, Han Xiao

Abstract: The variance of noise plays an important role in many change-point detection procedures and the associated inferences. Most commonly used variance estimators require strong assumptions on the true mean structure or normality of the error distribution, which may not hold in applications. More importantly, the qualities of these estimators have not been discussed systematically in the literature. In… ▽ More The variance of noise plays an important role in many change-point detection procedures and the associated inferences. Most commonly used variance estimators require strong assumptions on the true mean structure or normality of the error distribution, which may not hold in applications. More importantly, the qualities of these estimators have not been discussed systematically in the literature. In this paper, we introduce a framework of equivariant variance estimation for multiple change-point models. In particular, we characterize the set of all equivariant unbiased quadratic variance estimators for a family of change-point model classes, and develop a minimax theory for such estimators. △ Less

Submitted 15 November, 2023; v1 submitted 21 August, 2021; originally announced August 2021.

Comments: 44 pages

arXiv:2107.11136 [pdf, other]

High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data

Authors: Lijie Hu, Shuo Ni, Hanshen Xiao, Di Wang

Abstract: As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distribution or irregular data in the low dimensional space case. To better understand the challenges arising from irregul… ▽ More As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distribution or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distribution, in this paper we provide the first study on the problem of DP-SCO with heavy-tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the $\ell_1$-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get a (high probability) error bound (excess population risk) of $\tilde{O}(\frac{\log d}{(nε)^\frac{1}{3}})$ in the $ε$-DP model, where $n$ is the sample size and $d$ is the dimensionality of the underlying space. Next, for LASSO, if the data distribution that has bounded fourth-order moments, we improve the bound to $\tilde{O}(\frac{\log d}{(nε)^\frac{2}{5}})$ in the $(ε, δ)$-DP model. In the second part of the paper, we study sparse learning with heavy-tailed data. We first revisit the sparse linear model and propose a truncated DP-IHT method whose output could achieve an error of $\tilde{O}(\frac{s^{*2}\log d}{nε})$, where $s^*$ is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity ({\em i.e.,} $\ell_0$-norm) constraint, and show that it is possible to achieve an error of $\tilde{O}(\frac{s^{*\frac{3}{2}}\log d}{nε})$, which is also near optimal up to a factor of $\tilde{O}{(\sqrt{s^*})}$, if the loss function is smooth and strongly convex. △ Less

Submitted 9 August, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

arXiv:2106.00612 [pdf, other]

Weak target detection with multi-bit quantization in colocated MIMO radar

Authors: Hang Xiao, Shixing Yang, Wei Yi

Abstract: We consider the weak target detection problem with unknown parameter in colocated multiple-input multiple-output (MIMO) radar. To cope with the sheer amount of data for large-size systems, a multi-bit quantizer is utilized in the sampling process. As a low-complexity alternative to classic generalized likelihood ratio test (GLRT) for quantized data, we propose the multi-bit detector on Rao test wi… ▽ More We consider the weak target detection problem with unknown parameter in colocated multiple-input multiple-output (MIMO) radar. To cope with the sheer amount of data for large-size systems, a multi-bit quantizer is utilized in the sampling process. As a low-complexity alternative to classic generalized likelihood ratio test (GLRT) for quantized data, we propose the multi-bit detector on Rao test with a closed-form test statistic, whose theoretical asymptotic distribution is provided to generalize the actual detection performance. Besides, we refine the design of quantizer by optimized quantization thresholds, which are obtained resorting to the popular particle swarm optimization algorithmthe (PSOA). The simulation is conducted to demonstrate the performance variations of detectors based on unquantized and quantized data. The numerical results corroborate our theoretical analyses and show that the performance with 3-bit quantization approaches the case without quantization. △ Less

Submitted 5 September, 2021; v1 submitted 29 May, 2021; originally announced June 2021.

Comments: 6 pages, 3 figures, conference

arXiv:2105.05532 [pdf, other]

Generalized Autoregressive Moving Average Models with GARCH Errors

Authors: Tingguo Zheng, Han Xiao, Rong Chen

Abstract: One of the important and widely used classes of models for non-Gaussian time series is the generalized autoregressive model average models (GARMA), which specifies an ARMA structure for the conditional mean process of the underlying time series. However, in many applications one often encounters conditional heteroskedasticity. In this paper we propose a new class of models, referred to as GARMA-GA… ▽ More One of the important and widely used classes of models for non-Gaussian time series is the generalized autoregressive model average models (GARMA), which specifies an ARMA structure for the conditional mean process of the underlying time series. However, in many applications one often encounters conditional heteroskedasticity. In this paper we propose a new class of models, referred to as GARMA-GARCH models, that jointly specify both the conditional mean and conditional variance processes of a general non-Gaussian time series. Under the general modeling framework, we propose three specific models, as examples, for proportional time series, nonnegative time series, and skewed and heavy-tailed financial time series. Maximum likelihood estimator (MLE) and quasi Gaussian MLE (GMLE) are used to estimate the parameters. Simulation studies and three applications are used to demonstrate the properties of the models and the estimation procedures. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2105.00866 [pdf]

Causal Discovery of Flight Service Process Based on Event Sequence

Authors: Zhiwei Xing, Lin Zhang, Huan Xia, Qian Luo, Zhao-xin Chen

Abstract: The development of the civil aviation industry has continuously increased the requirements for the efficiency of airport ground support services. In the existing ground support research, there has not yet been a process model that directly obtains support from the ground support log to study the causal relationship between service nodes and flight delays. Most ground support studies mainly use mac… ▽ More The development of the civil aviation industry has continuously increased the requirements for the efficiency of airport ground support services. In the existing ground support research, there has not yet been a process model that directly obtains support from the ground support log to study the causal relationship between service nodes and flight delays. Most ground support studies mainly use machine learning methods to predict flight delays, and the flight support model they are based on is an ideal model. The study did not conduct an in-depth study of the causal mechanism behind the ground support link and did not reveal the true cause of flight delays. Therefore, there is a certain deviation in the prediction of flight delays by machine learning, and there is a certain deviation between the ideal model based on the research and the actual service process. Therefore, it is of practical significance to obtain the process model from the guarantee log and analyze its causality. However, the existing process causal factor discovery methods only do certain research when the assumption of causal sufficiency is established and does not consider the existence of latent variables. Therefore, this article proposes a framework to realize the discovery of process causal factors without assuming causal sufficiency. The optimized fuzzy mining process model is used as the service benchmark model, and the local causal discovery algorithm is used to discover the causal factors. Under this framework, this paper proposes a new Markov blanket discovery algorithm that does not assume causal sufficiency to discover causal factors and uses benchmark data sets for testing. Finally, the actual flight service data is used. △ Less

Submitted 28 April, 2021; originally announced May 2021.

arXiv:2011.04418 [pdf, other]

doi 10.1103/PhysRevD.103.024040

Improved deep learning techniques in gravitational-wave data analysis

Authors: Heming Xia, Li**g Shao, Junjie Zhao, Zhoujian Cao

Abstract: In recent years, convolutional neural network (CNN) and other deep learning models have been gradually introduced into the area of gravitational-wave (GW) data processing. Compared with the traditional matched-filtering techniques, CNN has significant advantages in efficiency in GW signal detection tasks. In addition, matched-filtering techniques are based on the template bank of the existing theo… ▽ More In recent years, convolutional neural network (CNN) and other deep learning models have been gradually introduced into the area of gravitational-wave (GW) data processing. Compared with the traditional matched-filtering techniques, CNN has significant advantages in efficiency in GW signal detection tasks. In addition, matched-filtering techniques are based on the template bank of the existing theoretical waveform, which makes it difficult to find GW signals beyond theoretical expectation. In this paper, based on the task of GW detection of binary black holes, we introduce the optimization techniques of deep learning, such as batch normalization and dropout, to CNN models. Detailed studies of model performance are carried out. Through this study, we recommend to use batch normalization and dropout techniques in CNN models in GW signal detection tasks. Furthermore, we investigate the generalization ability of CNN models on different parameter ranges of GW signals. We point out that CNN models are robust to the variation of the parameter range of the GW waveform. This is a major advantage of deep learning models over matched-filtering techniques. △ Less

Submitted 23 December, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: 13 pages, 11 figures; accepted by PRD

Journal ref: Phys. Rev. D 103, 024040 (2021)

arXiv:2010.11082 [pdf, ps, other]

On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data

Authors: Di Wang, Hanshen Xiao, Srini Devadas, **hui Xu

Abstract: In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a compreh… ▽ More In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a comprehensive study of DP-SCO under various settings. First, we consider the case where the loss function is strongly convex and smooth. For this case, we propose a method based on the sample-and-aggregate framework, which has an excess population risk of $\tilde{O}(\frac{d^3}{nε^4})$ (after omitting other factors), where $n$ is the sample size and $d$ is the dimensionality of the data. Then, we show that with some additional assumptions on the loss functions, it is possible to reduce the \textit{expected} excess population risk to $\tilde{O}(\frac{ d^2}{ nε^2 })$. To lift these additional conditions, we also provide a gradient smoothing and trimming based scheme to achieve excess population risks of $\tilde{O}(\frac{ d^2}{nε^2})$ and $\tilde{O}(\frac{d^\frac{2}{3}}{(nε^2)^\frac{1}{3}})$ for strongly convex and general convex loss functions, respectively, \textit{with high probability}. Experiments suggest that our algorithms can effectively deal with the challenges caused by data irregularity. △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: Published in ICML 2020

arXiv:2009.07875 [pdf, other]

A Survival Mediation Model with Bayesian Model Averaging

Authors: Jie Zhou, Xun Jiang, H. Amy Xia, Peng Wei, Brian P. Hobbs

Abstract: Determining the extent to which a patient is benefiting from cancer therapy is challenging. Criteria for quantifying the extent of "tumor response" observed within a few cycles of treatment have been established for various types of solid as well as hematologic malignancies. These measures comprise the primary endpoints of phase II trials. Regulatory approvals of new cancer therapies, however, are… ▽ More Determining the extent to which a patient is benefiting from cancer therapy is challenging. Criteria for quantifying the extent of "tumor response" observed within a few cycles of treatment have been established for various types of solid as well as hematologic malignancies. These measures comprise the primary endpoints of phase II trials. Regulatory approvals of new cancer therapies, however, are usually contingent upon the demonstration of superior overall survival with randomized evidence acquired with a phase III trial comparing the novel therapy to an appropriate standard of care treatment. With nearly two thirds of phase III oncology trials failing to achieve statistically significant results, researchers continue to refine and propose new surrogate endpoints. This article presents a Bayesian framework for studying relationships among treatment, patient subgroups, tumor response and survival. Combining classical components of mediation analysis with Bayesian model averaging (BMA), the methodology is robust to model mis-specification among various possible relationships among the observable entities. Posterior inference is demonstrated via application to a randomized controlled phase III trial in metastatic colorectal cancer. Moreover, the article details posterior predictive distributions of survival and statistical metrics for quantifying the extent of direct and indirect, or tumor response mediated, treatment effects. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: 25 pages, 3 figures and 3 tables in the main manuscript. Supplementary materials included

arXiv:1912.02955 [pdf, other]

Hybrid Kronecker Product Decomposition and Approximation

Authors: Chencheng Cai, Rong Chen, Han Xiao

Abstract: Discovering the underlying low dimensional structure of high dimensional data has attracted a significant amount of researches recently and has shown to have a wide range of applications. As an effective dimension reduction tool, singular value decomposition is often used to analyze high dimensional matrices, which are traditionally assumed to have a low rank matrix approximation. In this paper, w… ▽ More Discovering the underlying low dimensional structure of high dimensional data has attracted a significant amount of researches recently and has shown to have a wide range of applications. As an effective dimension reduction tool, singular value decomposition is often used to analyze high dimensional matrices, which are traditionally assumed to have a low rank matrix approximation. In this paper, we propose a new approach. We assume a high dimensional matrix can be approximated by a sum of a small number of Kronecker products of matrices with potentially different configurations, named as a hybird Kronecker outer Product Approximation (hKoPA). It provides an extremely flexible way of dimension reduction compared to the low-rank matrix approximation. Challenges arise in estimating a hKoPA when the configurations of component Kronecker products are different or unknown. We propose an estimation procedure when the set of configurations are given and a joint configuration determination and component estimation procedure when the configurations are unknown. Specifically, a least squares backfitting algorithm is used when the configuration is given. When the configuration is unknown, an iterative greedy algorithm is used. Both simulation and real image examples show that the proposed algorithms have promising performances. The hybrid Kronecker product approximation may have potentially wider applications in low dimensional representation of high dimensional data △ Less

Submitted 5 December, 2019; originally announced December 2019.

arXiv:1912.02392 [pdf, other]

KoPA: Automated Kronecker Product Approximation

Authors: Chencheng Cai, Rong Chen, Han Xiao

Abstract: We consider the problem of matrix approximation and denoising induced by the Kronecker product decomposition. Specifically, we propose to approximate a given matrix by the sum of a few Kronecker products of matrices, which we refer to as the Kronecker product approximation (KoPA). Because the Kronecker product is an extension of the outer product from vectors to matrices, KoPA extends the low rank… ▽ More We consider the problem of matrix approximation and denoising induced by the Kronecker product decomposition. Specifically, we propose to approximate a given matrix by the sum of a few Kronecker products of matrices, which we refer to as the Kronecker product approximation (KoPA). Because the Kronecker product is an extension of the outer product from vectors to matrices, KoPA extends the low rank matrix approximation, and includes it as a special case. Comparing with the latter, KoPA also offers a greater flexibility, since it allows the user to choose the configuration, which are the dimensions of the two smaller matrices forming the Kronecker product. On the other hand, the configuration to be used is usually unknown, and needs to be determined from the data in order to achieve the optimal balance between accuracy and parsimony. We propose to use extended information criteria to select the configuration. Under the paradigm of high dimensional analysis, we show that the proposed procedure is able to select the true configuration with probability tending to one, under suitable conditions on the signal-to-noise ratio. We demonstrate the superiority of KoPA over the low rank approximations through numerical studies, and several benchmark image examples. △ Less

Submitted 26 August, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

arXiv:1911.11774 [pdf, other]

Matrix Completion using Kronecker Product Approximation

Authors: Chencheng Cai, Rong Chen, Han Xiao

Abstract: A matrix completion problem is to recover the missing entries in a partially observed matrix. Most of the existing matrix completion methods assume a low rank structure of the underlying complete matrix. In this paper, we introduce an alternative and more general form of the underlying complete matrix, which assumes a low Kronecker rank instead of a low regular rank, but includes the latter as a s… ▽ More A matrix completion problem is to recover the missing entries in a partially observed matrix. Most of the existing matrix completion methods assume a low rank structure of the underlying complete matrix. In this paper, we introduce an alternative and more general form of the underlying complete matrix, which assumes a low Kronecker rank instead of a low regular rank, but includes the latter as a special case. The extra flexibility allows for a much more parsimonious representation of the underlying matrix, but also raises the challenge of determining the proper Kronecker product configuration to be used. We find that the configuration can be identified using the mean squared error criterion as well as a modified cross-validation criterion. We establish the consistency of this procedure under suitable conditions on the signal-to-noise ratio. A aggregation procedure is also proposed to deal with special missing patterns and complex underlying structures. Both numerical and empirical studies are carried out to demonstrate the performance of the new method. △ Less

Submitted 13 November, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

arXiv:1911.06683 [pdf, other]

doi 10.1016/j.cma.2020.113097

Enforcing Boundary Conditions on Physical Fields in Bayesian Inversion

Authors: Carlos A. Michelén Ströfer, Xinlei Zhang, Heng Xiao, Olivier Coutier-Delgosha

Abstract: Inverse problems in computational mechanics consist of inferring physical fields that are latent in the model describing some observable fields. For instance, an inverse problem of interest is inferring the Reynolds stress field in the Navier--Stokes equations describing mean fluid velocity and pressure. The physical nature of the latent fields means they have their own set of physical constra… ▽ More Inverse problems in computational mechanics consist of inferring physical fields that are latent in the model describing some observable fields. For instance, an inverse problem of interest is inferring the Reynolds stress field in the Navier--Stokes equations describing mean fluid velocity and pressure. The physical nature of the latent fields means they have their own set of physical constraints, including boundary conditions. The inherent ill-posedness of inverse problems, however, means that there exist many possible latent fields that do not satisfy their physical constraints while still resulting in a satisfactory agreement in the observation space. These physical constraints must therefore be enforced through the problem formulation. So far there has been no general approach to enforce boundary conditions on latent fields in inverse problems in computational mechanics, with these constraints often simply ignored. In this work we demonstrate how to enforce boundary conditions in Bayesian inversion problems by choice of the statistical model for the latent fields. Specifically, this is done by modifying the covariance kernel to guarantee that all realizations satisfy known values or derivatives at the boundary. As a test case the problem of inferring the eddy viscosity in the Reynolds-averaged Navier--Stokes equations is considered. The results show that enforcing these constraints results in similar improvements in the output fields but with latent fields that behave as expected at the boundaries. △ Less

Submitted 15 November, 2019; originally announced November 2019.

arXiv:1911.06671 [pdf, other]

Enforcing Deterministic Constraints on Generative Adversarial Networks for Emulating Physical Systems

Authors: Zeng Yang, **-Long Wu, Heng Xiao

Abstract: Generative adversarial networks (GANs) were initially proposed to generate images by learning from a large number of samples. Recently, GANs have been used to emulate complex physical systems such as turbulent flows. However, a critical question must be answered before GANs can be considered trusted emulators for physical systems: do GANs-generated samples conform to the various physical constrain… ▽ More Generative adversarial networks (GANs) were initially proposed to generate images by learning from a large number of samples. Recently, GANs have been used to emulate complex physical systems such as turbulent flows. However, a critical question must be answered before GANs can be considered trusted emulators for physical systems: do GANs-generated samples conform to the various physical constraints? These include both deterministic constraints (e.g., conservation laws) and statistical constraints (e.g., energy spectrum of turbulent flows). The latter have been studied in a companion paper (Wu et al., Enforcing statistical constraints in generative adversarial networks for modeling chaotic dynamical systems. Journal of Computational Physics. 406, 109209, 2020). In the present work, we enforce deterministic yet imprecise constraints on GANs by incorporating them into the loss function of the generator. We evaluate the performance of physics-constrained GANs on two representative tasks with geometrical constraints (generating points on circles) and differential constraints (generating divergence-free flow velocity fields), respectively. In both cases, the constrained GANs produced samples that conform to the underlying constraints rather accurately, even though the constraints are only enforced up to a specified interval. More importantly, the imposed constraints significantly accelerate the convergence and improve the robustness in the training, indicating that they serve as a physics-based regularization. These improvements are noteworthy, as the convergence and robustness are two well-known obstacles in the training of GANs. △ Less

Submitted 21 November, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

arXiv:1909.00225 [pdf, other]

Statistical Robust Chinese Remainder Theorem for Multiple Numbers

Authors: Hanshen Xiao, Nan Du, Zhikang T. Wang, Guoqiang Xiao

Abstract: Generalized Chinese Remainder Theorem (CRT) is a well-known approach to solve ambiguity resolution related problems. In this paper, we study the robust CRT reconstruction for multiple numbers from a view of statistics. To the best of our knowledge, it is the first rigorous analysis on the underlying statistical model of CRT-based multiple parameter estimation. To address the problem, two novel app… ▽ More Generalized Chinese Remainder Theorem (CRT) is a well-known approach to solve ambiguity resolution related problems. In this paper, we study the robust CRT reconstruction for multiple numbers from a view of statistics. To the best of our knowledge, it is the first rigorous analysis on the underlying statistical model of CRT-based multiple parameter estimation. To address the problem, two novel approaches are established. One is to directly calculate a conditional maximum a posteriori probability (MAP) estimation of the residue clustering, and the other is based on a generalized wrapped Gaussian mixture model to iteratively search for MAP of both estimands and clustering. Residue error correcting codes are introduced to improve the robustness further. Experimental results show that the statistical schemes achieve much stronger robustness compared to state-of-the-art deterministic schemes, especially in heavy-noise scenarios. △ Less

Submitted 31 August, 2019; originally announced September 2019.

arXiv:1908.00618 [pdf, other]

Analyzing Basket Trials under Multisource Exchangeability Assumptions

Authors: Michael J. Kane, Nan Chen, Alexander M. Kaizer, Xun Jiang, H. Amy Xia, Brian P. Hobbs

Abstract: Basket designs are prospective clinical trials that are devised with the hypothesis that the presence of selected molecular features determine a patient's subsequent response to a particular "targeted" treatment strategy. Basket trials are designed to enroll multiple clinical subpopulations to which it is assumed that the therapy in question offers beneficial efficacy in the presence of the target… ▽ More Basket designs are prospective clinical trials that are devised with the hypothesis that the presence of selected molecular features determine a patient's subsequent response to a particular "targeted" treatment strategy. Basket trials are designed to enroll multiple clinical subpopulations to which it is assumed that the therapy in question offers beneficial efficacy in the presence of the targeted molecular profile. The treatment, however, may not offer acceptable efficacy to all subpopulations enrolled. Moreover, for rare disease settings, such as oncology wherein these trials have become popular, marginal measures of statistical evidence are difficult to interpret for sparsely enrolled subpopulations. Consequently, basket trials pose challenges to the traditional paradigm for trial design, which assumes inter-patient exchangeability. The R-package \pkg{basket} facilitates the analysis of basket trials by implementing multi-source exchangeability models. By evaluating all possible pairwise exchangeability relationships, this hierarchical modeling framework facilitates Bayesian posterior shrinkage among a collection of discrete and pre-specified subpopulations. Analysis functions are provided to implement posterior inference of the response rates and all possible exchangeability relationships between subpopulations. In addition, the package can identify "poolable" subsets of and report their response characteristics. The functionality of the package is demonstrated using data from an oncology study with subpopulations defined by tumor histology. △ Less

Submitted 1 August, 2019; originally announced August 2019.

Comments: 18 pages, 4 figures, 3 tables, submitted to the Journal of Open Source Software

MSC Class: 62-04 ACM Class: G.3

arXiv:1907.05496 [pdf]

Online Learning to Estimate Warfarin Dose with Contextual Linear Bandits

Authors: Hai Xiao

Abstract: Warfarin is one of the most commonly used oral blood anticoagulant agent in the world, the proper dose of Warfarin is difficult to establish not only because it is substantially variant among patients, but also adverse even severe consequences of taking an incorrect dose. Typical practice is to prescribe an initial dose, then doctor closely monitor patient response and adjust accordingly to the co… ▽ More Warfarin is one of the most commonly used oral blood anticoagulant agent in the world, the proper dose of Warfarin is difficult to establish not only because it is substantially variant among patients, but also adverse even severe consequences of taking an incorrect dose. Typical practice is to prescribe an initial dose, then doctor closely monitor patient response and adjust accordingly to the correct dosage. The three commonly used strategies for an initial dosage are the fixed-dose approach, the Warfarin Clinical algorithm, and the Pharmacogenetic algorithm developed by the IWPC (International Warfarin Pharmacogenetics Consortium). It is always best to prescribe correct initial dosage, motivated by this challenge, this work explores the performance of multi-armed bandit algorithms to best predict the correct dosage of Warfarin instead of trial-and-error procedure. Real data from the Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) is used, with it a series of linear bandit algorithms and variants are developed and evaluated on Warfarin dataset. All proposed algorithms outperformed the fixed-dose baseline algorithm, and some even matched up the Warfarin Clinical Dosing Algorithm. In addition, a few promising future directions are given for further exploration and development. △ Less

Submitted 11 July, 2019; originally announced July 2019.

arXiv:1906.08113 [pdf, other]

Wasserstein Adversarial Imitation Learning

Authors: Huang Xiao, Michael Herman, Joerg Wagner, Sebastian Ziesche, Jalal Etesami, Thai Hong Linh

Abstract: Imitation Learning describes the problem of recovering an expert policy from demonstrations. While inverse reinforcement learning approaches are known to be very sample-efficient in terms of expert demonstrations, they usually require problem-dependent reward functions or a (task-)specific reward-function regularization. In this paper, we show a natural connection between inverse reinforcement lea… ▽ More Imitation Learning describes the problem of recovering an expert policy from demonstrations. While inverse reinforcement learning approaches are known to be very sample-efficient in terms of expert demonstrations, they usually require problem-dependent reward functions or a (task-)specific reward-function regularization. In this paper, we show a natural connection between inverse reinforcement learning approaches and Optimal Transport, that enables more general reward functions with desirable properties (e.g., smoothness). Based on our observation, we propose a novel approach called Wasserstein Adversarial Imitation Learning. Our approach considers the Kantorovich potentials as a reward function and further leverages regularized optimal transport to enable large-scale applications. In several robotic experiments, our approach outperforms the baselines in terms of average cumulative rewards and shows a significant improvement in sample-efficiency, by requiring just one expert demonstration. △ Less

Submitted 19 June, 2019; originally announced June 2019.

arXiv:1905.06841 [pdf, other]

doi 10.1016/j.jcp.2019.109209

Enforcing Statistical Constraints in Generative Adversarial Networks for Modeling Chaotic Dynamical Systems

Authors: **-Long Wu, Karthik Kashinath, Adrian Albert, Dragos Chirila, Prabhat, Heng Xiao

Abstract: Simulating complex physical systems often involves solving partial differential equations (PDEs) with some closures due to the presence of multi-scale physics that cannot be fully resolved. Therefore, reliable and accurate closure models for unresolved physics remains an important requirement for many computational physics problems, e.g., turbulence simulation. Recently, several researchers have a… ▽ More Simulating complex physical systems often involves solving partial differential equations (PDEs) with some closures due to the presence of multi-scale physics that cannot be fully resolved. Therefore, reliable and accurate closure models for unresolved physics remains an important requirement for many computational physics problems, e.g., turbulence simulation. Recently, several researchers have adopted generative adversarial networks (GANs), a novel paradigm of training machine learning models, to generate solutions of PDEs-governed complex systems without having to numerically solve these PDEs. However, GANs are known to be difficult in training and likely to converge to local minima, where the generated samples do not capture the true statistics of the training data. In this work, we present a statistical constrained generative adversarial network by enforcing constraints of covariance from the training data, which results in an improved machine-learning-based emulator to capture the statistics of the training data generated by solving fully resolved PDEs. We show that such a statistical regularization leads to better performance compared to standard GANs, measured by (1) the constrained model's ability to more faithfully emulate certain physical properties of the system and (2) the significantly reduced (by up to 80%) training time to reach the solution. We exemplify this approach on the Rayleigh-Benard convection, a turbulent flow system that is an idealized model of the Earth's atmosphere. With the growth of high-fidelity simulation databases of physical systems, this work suggests great potential for being an alternative to the explicit modeling of closures or parameterizations for unresolved physics, which are known to be a major source of uncertainty in simulating multi-scale physical systems, e.g., turbulence or Earth's climate. △ Less

Submitted 13 May, 2019; originally announced May 2019.

arXiv:1904.10639 [pdf, other]

Efficient Simulation Budget Allocation for Subset Selection Using Regression Metamodels

Authors: Fei Gao, Zhongshun Shi, Siyang Gao, Hui Xiao

Abstract: This research considers the ranking and selection (R&S) problem of selecting the optimal subset from a finite set of alternative designs. Given the total simulation budget constraint, we aim to maximize the probability of correctly selecting the top-m designs. In order to improve the selection efficiency, we incorporate the information from across the domain into regression metamodels. In this res… ▽ More This research considers the ranking and selection (R&S) problem of selecting the optimal subset from a finite set of alternative designs. Given the total simulation budget constraint, we aim to maximize the probability of correctly selecting the top-m designs. In order to improve the selection efficiency, we incorporate the information from across the domain into regression metamodels. In this research, we assume that the mean performance of each design is approximately quadratic. To achieve a better fit of this model, we divide the solution space into adjacent partitions such that the quadratic assumption can be satisfied within each partition. Using the large deviation theory, we propose an approximately optimal simulation budget allocation rule in the presence of partitioned domains. Numerical experiments demonstrate that our approach can enhance the simulation efficiency significantly. △ Less

Submitted 24 April, 2019; originally announced April 2019.

arXiv:1904.08249 [pdf, other]

Bonsai -- Diverse and Shallow Trees for Extreme Multi-label Classification

Authors: Sujay Khandagale, Han Xiao, Rohit Babbar

Abstract: Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousand or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label repre… ▽ More Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousand or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including : (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds - fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, \bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at \url{https://github.com/xmc-aalto/bonsai} △ Less

Submitted 10 August, 2019; v1 submitted 17 April, 2019; originally announced April 2019.

arXiv:1902.09347 [pdf, other]

doi 10.1145/3308558.3313658

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification

Authors: Huiru Xiao, Xin Liu, Yangqiu Song

Abstract: Hierarchical text classification has many real-world applications. However, labeling a large number of documents is costly. In practice, we can use semi-supervised learning or weakly supervised learning (e.g., dataless classification) to reduce the labeling cost. In this paper, we propose a path cost-sensitive learning algorithm to utilize the structural information and further make use of unlabel… ▽ More Hierarchical text classification has many real-world applications. However, labeling a large number of documents is costly. In practice, we can use semi-supervised learning or weakly supervised learning (e.g., dataless classification) to reduce the labeling cost. In this paper, we propose a path cost-sensitive learning algorithm to utilize the structural information and further make use of unlabeled and weakly-labeled data. We use a generative model to leverage the large amount of unlabeled data and introduce path constraints into the learning algorithm to incorporate the structural information of the class hierarchy. The posterior probabilities of both unlabeled and weakly labeled data can be incorporated with path-dependent scores. Since we put a structure-sensitive cost to the learning algorithm to constrain the classification consistent with the class hierarchy and do not need to reconstruct the feature vectors for different structures, we can significantly reduce the computational cost compared to structural output learning. Experimental results on two hierarchical text classification benchmarks show that our approach is not only effective but also efficient to handle the semi-supervised and weakly supervised hierarchical text classification. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: Aceepted by 2019 World Wide Web Conference (WWW19)

arXiv:1812.08916 [pdf, other]

Autoregressive Models for Matrix-Valued Time Series

Authors: Rong Chen, Han Xiao, Dan Yang

Abstract: In finance, economics and many other fields, observations in a matrix form are often generated over time. For example, a set of key economic indicators are regularly reported in different countries every quarter. The observations at each quarter neatly form a matrix and are observed over many consecutive quarters. Dynamic transport networks with observations generated on the edges can be formed as… ▽ More In finance, economics and many other fields, observations in a matrix form are often generated over time. For example, a set of key economic indicators are regularly reported in different countries every quarter. The observations at each quarter neatly form a matrix and are observed over many consecutive quarters. Dynamic transport networks with observations generated on the edges can be formed as a matrix observed over time. Although it is natural to turn the matrix observations into a long vector, and then use the standard vector time series models for analysis, it is often the case that the columns and rows of the matrix represent different types of structures that are closely interplayed. In this paper we follow the autoregressive structure for modeling time series and propose a novel matrix autoregressive model in a bilinear form that maintains and utilizes the matrix structure to achieve a greater dimensional reduction, as well as more interpretable results. Probabilistic properties of the models are investigated. Estimation procedures with their theoretical properties are presented and demonstrated with simulated and real examples. △ Less

Submitted 24 July, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

MSC Class: 62M10; 62H99

arXiv:1812.04808 [pdf, other]

doi 10.1142/S2424922X19500062

Kernel Treelets

Authors: Hedi Xia, Hector D. Ceniceros

Abstract: A new method for hierarchical clustering is presented. It combines treelets, a particular multiscale decomposition of data, with a projection on a reproducing kernel Hilbert space. The proposed approach, called kernel treelets (KT), effectively substitutes the correlation coefficient matrix used in treelets with a symmetric, positive semi-definite matrix efficiently constructed from a kernel funct… ▽ More A new method for hierarchical clustering is presented. It combines treelets, a particular multiscale decomposition of data, with a projection on a reproducing kernel Hilbert space. The proposed approach, called kernel treelets (KT), effectively substitutes the correlation coefficient matrix used in treelets with a symmetric, positive semi-definite matrix efficiently constructed from a kernel function. Unlike most clustering methods, which require data sets to be numeric, KT can be applied to more general data and yield a multi-resolution sequence of basis on the data directly in feature space. The effectiveness and potential of KT in clustering analysis is illustrated with some examples. △ Less

Submitted 11 December, 2018; originally announced December 2018.

arXiv:1812.02598 [pdf]

Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis

Authors: Hao-Ting Wang, Jonathan Smallwood, Janaina Mourao-Miranda, Cedric Huchuan Xia, Theodore D. Satterthwaite, Danielle S. Bassett, Danilo Bzdok

Abstract: Since the beginning of the 21st century, the size, breadth, and granularity of data in biology and medicine has grown rapidly. In the example of neuroscience, studies with thousands of subjects are becoming more common, which provide extensive phenoty** on the behavioral, neural, and genomic level with hundreds of variables. The complexity of such big data repositories offer new opportunities an… ▽ More Since the beginning of the 21st century, the size, breadth, and granularity of data in biology and medicine has grown rapidly. In the example of neuroscience, studies with thousands of subjects are becoming more common, which provide extensive phenoty** on the behavioral, neural, and genomic level with hundreds of variables. The complexity of such big data repositories offer new opportunities and pose new challenges to investigate brain, cognition, and disease. Canonical correlation analysis (CCA) is a prototypical family of methods for wrestling with and harvesting insight from such rich datasets. This doubly-multivariate tool can simultaneously consider two variable sets from different modalities to uncover essential hidden associations. Our primer discusses the rationale, promises, and pitfalls of CCA in biomedicine. △ Less

Submitted 6 December, 2018; originally announced December 2018.

arXiv:1811.11339 [pdf, other]

Statistical Robust Chinese Remainder Theorem for Multiple Numbers: Wrapped Gaussian Mixture Model

Authors: Nan Du, Zhikang Wang, Hanshen Xiao

Abstract: Generalized Chinese Remainder Theorem (CRT) has been shown to be a powerful approach to solve the ambiguity resolution problem. However, with its close relationship to number theory, study in this area is mainly from a coding theory perspective under deterministic conditions. Nevertheless, it can be proved that even with the best deterministic condition known, the probability of success in robust… ▽ More Generalized Chinese Remainder Theorem (CRT) has been shown to be a powerful approach to solve the ambiguity resolution problem. However, with its close relationship to number theory, study in this area is mainly from a coding theory perspective under deterministic conditions. Nevertheless, it can be proved that even with the best deterministic condition known, the probability of success in robust reconstruction degrades exponentially as the number of estimand increases. In this paper, we present the first rigorous analysis on the underlying statistical model of CRT-based multiple parameter estimation, where a generalized Gaussian mixture with background knowledge on samplings is proposed. To address the problem, two novel approaches are introduced. One is to directly calculate the conditional maximal a posteriori probability (MAP) estimation of residue clustering, and the other is to iteratively search for MAP of both common residues and clustering. Moreover, remainder error-correcting codes are introduced to improve the robustness further. It is shown that this statistically based scheme achieves much stronger robustness compared to state-of-the-art deterministic schemes, especially in low and median Signal Noise Ratio (SNR) scenarios. △ Less

Submitted 27 November, 2018; originally announced November 2018.

arXiv:1808.07449 [pdf, other]

Robust Spatial Extent Inference with a Semiparametric Bootstrap Joint Testing Procedure

Authors: Simon N. Vandekar, Theodore D. Satterthwaite, Cedric H. Xia, Kosha Ruparel, Ruben C. Gur, Raquel E. Gur, Russell T. Shinohara

Abstract: Spatial extent inference (SEI) is widely used across neuroimaging modalities to study brain-phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF) based tools can have inflated family-wise error rates (FWERs). This has led to fervent discussion as to which preprocessing steps are necessary to control the FWER using GRF-based SEI.… ▽ More Spatial extent inference (SEI) is widely used across neuroimaging modalities to study brain-phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF) based tools can have inflated family-wise error rates (FWERs). This has led to fervent discussion as to which preprocessing steps are necessary to control the FWER using GRF-based SEI. The failure of GRF-based methods is due to unrealistic assumptions about the covariance function of the imaging data. The permutation procedure is the most robust SEI tool because it estimates the covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi-) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use our methods to study the association between performance and executive functioning in a working fMRI study. The sPBJ procedure is robust to variance misspecification and maintains nominal FWER in small samples, in contrast to the GRF methods. The sPBJ also has equal or superior power to the PBJ and permutation procedures. We provide an R package https://github.com/simonvandekar/pbj to perform inference using the PBJ and sPBJ procedures △ Less

Submitted 22 August, 2018; originally announced August 2018.

arXiv:1807.09751 [pdf, other]

Multi-Perspective Neural Architecture for Recommendation System

Authors: Han Xiao, Yidong Chen, Xiaodong Shi

Abstract: Currently, there starts a research trend to leverage neural architecture for recommendation systems. Though several deep recommender models are proposed, most methods are too simple to characterize users' complex preference. In this paper, for a fine-grain analysis, users' ratings are explained from multiple perspectives, based on which, we propose our neural architecture. Specifically, our model… ▽ More Currently, there starts a research trend to leverage neural architecture for recommendation systems. Though several deep recommender models are proposed, most methods are too simple to characterize users' complex preference. In this paper, for a fine-grain analysis, users' ratings are explained from multiple perspectives, based on which, we propose our neural architecture. Specifically, our model employs several sequential stages to encode the user and item into hidden representations. In one stage, the user and item are represented from multiple perspectives and in each perspective, the representations of user and item put attentions to each other. Last, we metric the output representations of final stage to approach the users' rating. Extensive experiments demonstrate that our method achieves substantial improvements against baselines. △ Less

Submitted 12 July, 2018; originally announced July 2018.

arXiv:1804.07933 [pdf, other]

Is feature selection secure against training data poisoning?

Authors: Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, Fabio Roli

Abstract: Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be ben… ▽ More Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures. △ Less

Submitted 21 April, 2018; originally announced April 2018.

Journal ref: Proc. of the 32nd ICML, Lille, France, 2015. JMLR: W&CP vol. 37

arXiv:1801.02901 [pdf, other]

Convexification of Neural Graph

Authors: Han Xiao

Abstract: Traditionally, most complex intelligence architectures are extremely non-convex, which could not be well performed by convex optimization. However, this paper decomposes complex structures into three types of nodes: operators, algorithms and functions. Iteratively, propagating from node to node along edge, we prove that "regarding the tree-structured neural graph, it is nearly convex in each varia… ▽ More Traditionally, most complex intelligence architectures are extremely non-convex, which could not be well performed by convex optimization. However, this paper decomposes complex structures into three types of nodes: operators, algorithms and functions. Iteratively, propagating from node to node along edge, we prove that "regarding the tree-structured neural graph, it is nearly convex in each variable, when the other variables are fixed." In fact, the non-convex properties stem from circles and functions, which could be transformed to be convex with our proposed \textit{\textbf{scale mechanism}}. Experimentally, we justify our theoretical analysis by two practical applications. △ Less

Submitted 13 January, 2018; v1 submitted 9 January, 2018; originally announced January 2018.

arXiv:1711.01790 [pdf, ps, other]

Simultaneous Block-Sparse Signal Recovery Using Pattern-Coupled Sparse Bayesian Learning

Authors: Hang Xiao, Zhengli Xing, Linxiao Yang, Jun Fang, Yanlun Wu

Abstract: In this paper, we consider the block-sparse signals recovery problem in the context of multiple measurement vectors (MMV) with common row sparsity patterns. We develop a new method for recovery of common row sparsity MMV signals, where a pattern-coupled hierarchical Gaussian prior model is introduced to characterize both the block-sparsity of the coefficients and the statistical dependency between… ▽ More In this paper, we consider the block-sparse signals recovery problem in the context of multiple measurement vectors (MMV) with common row sparsity patterns. We develop a new method for recovery of common row sparsity MMV signals, where a pattern-coupled hierarchical Gaussian prior model is introduced to characterize both the block-sparsity of the coefficients and the statistical dependency between neighboring coefficients of the common row sparsity MMV signals. Unlike many other methods, the proposed method is able to automatically capture the block sparse structure of the unknown signal. Our method is developed using an expectation-maximization (EM) framework. Simulation results show that our proposed method offers competitive performance in recovering block-sparse common row sparsity pattern MMV signals. △ Less

Submitted 6 November, 2017; originally announced November 2017.

arXiv:1708.07747 [pdf, ps, other]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Authors: Han Xiao, Kashif Rasul, Roland Vollgraf

Abstract: We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image s… ▽ More We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist △ Less

Submitted 15 September, 2017; v1 submitted 25 August, 2017; originally announced August 2017.

Comments: Dataset is freely available at https://github.com/zalandoresearch/fashion-mnist Benchmark is available at http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/

Showing 1–39 of 39 results for author: Xiao, H