Search | arXiv e-print repository

Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data

Authors: Yu Xia, Chi-Hua Wang, Joshua Mabry, Guang Cheng

Abstract: The evaluation of synthetic data generation is crucial, especially in the retail sector where data accuracy is paramount. This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy. Our approach differentiates between continuous and discrete data attributes, providing precise evaluation criteria. Fidelity is measured through stab… ▽ More The evaluation of synthetic data generation is crucial, especially in the retail sector where data accuracy is paramount. This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy. Our approach differentiates between continuous and discrete data attributes, providing precise evaluation criteria. Fidelity is measured through stability and generalizability. Stability ensures synthetic data accurately replicates known data distributions, while generalizability confirms its robustness in novel scenarios. Utility is demonstrated through the synthetic data's effectiveness in critical retail tasks such as demand forecasting and dynamic pricing, proving its value in predictive analytics and strategic planning. Privacy is safeguarded using Differential Privacy, ensuring synthetic data maintains a perfect balance between resembling training and holdout datasets without compromising security. Our findings validate that this framework provides reliable and scalable evaluation for synthetic retail data. It ensures high fidelity, utility, and privacy, making it an essential tool for advancing retail data science. This framework meets the evolving needs of the retail industry with precision and confidence, paving the way for future advancements in synthetic data methodologies. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.08180 [pdf, other]

Stochastic Process-based Method for Degree-Degree Correlation of Evolving Networks

Authors: Yue Xiao, Xiaojun Zhang

Abstract: Existing studies on the degree correlation of evolving networks typically rely on differential equations and statistical analysis, resulting in only approximate solutions due to inherent randomness. To address this limitation, we propose an improved Markov chain method for modeling degree correlation in evolving networks. By redesigning the network evolution rules to reflect actual network dynamic… ▽ More Existing studies on the degree correlation of evolving networks typically rely on differential equations and statistical analysis, resulting in only approximate solutions due to inherent randomness. To address this limitation, we propose an improved Markov chain method for modeling degree correlation in evolving networks. By redesigning the network evolution rules to reflect actual network dynamics more accurately, we achieve a topological structure that closely matches real-world network evolution. Our method models the degree correlation evolution process for both directed and undirected networks and provides theoretical results that are verified through simulations. This work offers the first theoretical solution for the steady-state degree correlation in evolving network models and is applicable to more complex evolution mechanisms and networks with directional attributes. Additionally, it supports the study of dynamic characteristic control based on network structure at any given time, offering a new tool for researchers in the field. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.01864 [pdf, other]

Variance-reduced sampling importance resampling

Authors: Yao Xiao, Kang Fu, Kun Li

Abstract: The sampling importance resampling method is widely utilized in various fields, such as numerical integration and statistical simulation. In this paper, two modified methods are presented by incorporating two variance reduction techniques commonly used in Monte Carlo simulation, namely antithetic sampling and Latin hypercube sampling, into the process of sampling importance resampling method respe… ▽ More The sampling importance resampling method is widely utilized in various fields, such as numerical integration and statistical simulation. In this paper, two modified methods are presented by incorporating two variance reduction techniques commonly used in Monte Carlo simulation, namely antithetic sampling and Latin hypercube sampling, into the process of sampling importance resampling method respectively. Theoretical evidence is provided to demonstrate that the proposed methods significantly reduce estimation errors compared to the original approach. Furthermore, the effectiveness and advantages of the proposed methods are validated through both numerical studies and real data analysis. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.10469 [pdf, other]

Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions

Authors: Yu Xia, Sriram Narayanamoorthy, Zhengyuan Zhou, Joshua Mabry

Abstract: The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shop** behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained a… ▽ More The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shop** behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained agents using offline batch data comprising summarized customer purchase histories to help mitigate this effect. Our experiments revealed that contextual bandit and deep RL methods that are less prone to over-fitting the sparse reward distributions significantly outperform static policies. This study offers a practical framework for simulating AI agents that optimize the entire retail customer journey. It aims to inspire the further development of simulation tools for retail AI systems. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2403.07213 [pdf, other]

Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Authors: Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A. Rossi, Sungchul Kim, Shuai Li

Abstract: Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of LLMs. Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a cos… ▽ More Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of LLMs. Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a costly API-based LLM or a locally finetuned small LLM, weighing cost against performance. Traditional selection methods often evaluate every candidate model before choosing one, which are becoming impractical given the rising costs of training and finetuning LLMs. Moreover, it is undesirable to allocate excessive resources towards exploring poor-performing models. While some recent works leverage online bandit algorithm to manage such exploration-exploitation trade-off in model selection, they tend to overlook the increasing-then-converging trend in model performances as the model is iteratively finetuned, leading to less accurate predictions and suboptimal model selections. In this paper, we propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances due to finetuning and efficiently balances exploration and exploitation in model selection. To further capture the converging points of models, we develop a change detection mechanism by comparing consecutive increase predictions. We theoretically prove that our algorithm achieves a logarithmic regret upper bound in a typical increasing bandit setting, which implies a fast convergence rate. The advantage of our method is also empirically validated through extensive experiments on classification model selection and online selection of LLMs. Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection in the deployment of LLMs. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Accepted by WWW'24 (Oral)

arXiv:2401.05784 [pdf, other]

Covariance Function Estimation for High-Dimensional Functional Time Series with Dual Factor Structures

Authors: Chenlei Leng, Degui Li, Hanlin Shang, Yingcun Xia

Abstract: We propose a flexible dual functional factor model for modelling high-dimensional functional time series. In this model, a high-dimensional fully functional factor parametrisation is imposed on the observed functional processes, whereas a low-dimensional version (via series approximation) is assumed for the latent functional factors. We extend the classic principal component analysis technique for… ▽ More We propose a flexible dual functional factor model for modelling high-dimensional functional time series. In this model, a high-dimensional fully functional factor parametrisation is imposed on the observed functional processes, whereas a low-dimensional version (via series approximation) is assumed for the latent functional factors. We extend the classic principal component analysis technique for the estimation of a low-rank structure to the estimation of a large covariance matrix of random functions that satisfies a notion of (approximate) functional "low-rank plus sparse" structure; and generalise the matrix shrinkage method to functional shrinkage in order to estimate the sparse structure of functional idiosyncratic components. Under appropriate regularity conditions, we derive the large sample theory of the developed estimators, including the consistency of the estimated factors and functional factor loadings and the convergence rates of the estimated matrices of covariance functions measured by various (functional) matrix norms. Consistent selection of the number of factors and a data-driven rule to choose the shrinkage parameter are discussed. Simulation and empirical studies are provided to demonstrate the finite-sample performance of the developed model and estimation methodology. △ Less

Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.16769 [pdf, other]

Estimation and Inference for High-dimensional Multi-response Growth Curve Model

Authors: Xin Zhou, Yin Xia, Lexin Li

Abstract: A growth curve model (GCM) aims to characterize how an outcome variable evolves, develops and grows as a function of time, along with other predictors. It provides a particularly useful framework to model growth trend in longitudinal data. However, the estimation and inference of GCM with a large number of response variables faces numerous challenges, and remains underdeveloped. In this article, w… ▽ More A growth curve model (GCM) aims to characterize how an outcome variable evolves, develops and grows as a function of time, along with other predictors. It provides a particularly useful framework to model growth trend in longitudinal data. However, the estimation and inference of GCM with a large number of response variables faces numerous challenges, and remains underdeveloped. In this article, we study the high-dimensional multivariate-response linear GCM, and develop the corresponding estimation and inference procedures. Our proposal is far from a straightforward extension, and involves several innovative components. Specifically, we introduce a Kronecker product structure, which allows us to effectively decompose a very large covariance matrix, and to pool the correlated samples to improve the estimation accuracy. We devise a highly non-trivial multi-step estimation approach to estimate the individual covariance components separately and effectively. We also develop rigorous statistical inference procedures to test both the global effects and the individual effects, and establish the size and power properties, as well as the proper false discovery control. We demonstrate the effectiveness of the new method through both intensive simulations, and the analysis of a longitudinal neuroimaging data for Alzheimer's disease. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.14095 [pdf, other]

RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation

Authors: Yu Xia, Ali Arian, Sriram Narayanamoorthy, Joshua Mabry

Abstract: Significant research effort has been devoted in recent years to develo** personalized pricing, promotions, and product recommendation algorithms that can leverage rich customer data to learn and earn. Systematic benchmarking and evaluation of these causal learning systems remains a critical challenge, due to the lack of suitable datasets and simulation environments. In this work, we propose a mu… ▽ More Significant research effort has been devoted in recent years to develo** personalized pricing, promotions, and product recommendation algorithms that can leverage rich customer data to learn and earn. Systematic benchmarking and evaluation of these causal learning systems remains a critical challenge, due to the lack of suitable datasets and simulation environments. In this work, we propose a multi-stage model for simulating customer shop** behavior that captures important sources of heterogeneity, including price sensitivity and past experiences. We embedded this model into a working simulation environment -- RetailSynth. RetailSynth was carefully calibrated on publicly available grocery data to create realistic synthetic shop** transactions. Multiple pricing policies were implemented within the simulator and analyzed for impact on revenue, category penetration, and customer retention. Applied researchers can use RetailSynth to validate causal demand models for multi-category retail and to incorporate realistic price sensitivity into emerging benchmarking suites for personalized pricing, promotions, and product recommendations. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 30 pages, 8 figures

arXiv:2311.16771 [pdf, other]

The HR-Calculus: Enabling Information Processing with Quaternion Algebra

Authors: Danilo P. Mandic, Sayed Pouria Talebi, Clive Cheong Took, Yili Xia, Dongpo Xu, Min Xiang, Pauline Bourigault

Abstract: From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing technique… ▽ More From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing techniques specifically designed for quaternion-valued signals have only recently come to the attention of the machine learning, signal processing, and control communities. The most important development in this direction is introduction of the HR-calculus, which provides the required mathematical foundation for deriving adaptive information processing techniques directly in the quaternion domain. In this article, the foundations of the HR-calculus are revised and the required tools for deriving adaptive learning techniques suitable for dealing with quaternion-valued signals, such as the gradient operator, chain and product derivative rules, and Taylor series expansion are presented. This serves to establish the most important applications of adaptive information processing in the quaternion domain for both single-node and multi-node formulations. The article is supported by Supplementary Material, which will be referred to as SM. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2310.17845 [pdf, other]

A Unified and Optimal Multiple Testing Framework based on rho-values

Authors: Bowen Gang, Shenghao Qin, Yin Xia

Abstract: Multiple testing is an important research direction that has gained major attention in recent years. Currently, most multiple testing procedures are designed with p-values or Local false discovery rate (Lfdr) statistics. However, p-values obtained by applying probability integral transform to some well-known test statistics often do not incorporate information from the alternatives, resulting in s… ▽ More Multiple testing is an important research direction that has gained major attention in recent years. Currently, most multiple testing procedures are designed with p-values or Local false discovery rate (Lfdr) statistics. However, p-values obtained by applying probability integral transform to some well-known test statistics often do not incorporate information from the alternatives, resulting in suboptimal procedures. On the other hand, Lfdr based procedures can be asymptotically optimal but their guarantee on false discovery rate (FDR) control relies on consistent estimation of Lfdr, which is often difficult in practice especially when the incorporation of side information is desirable. In this article, we propose a novel and flexibly constructed class of statistics, called rho-values, which combines the merits of both p-values and Lfdr while enjoys superiorities over methods based on these two types of statistics. Specifically, it unifies these two frameworks and operates in two steps, ranking and thresholding. The ranking produced by rho-values mimics that produced by Lfdr statistics, and the strategy for choosing the threshold is similar to that of p-value based procedures. Therefore, the proposed framework guarantees FDR control under weak assumptions; it maintains the integrity of the structural information encoded by the summary statistics and the auxiliary covariates and hence can be asymptotically optimal. We demonstrate the efficacy of the new framework through extensive simulations and two data applications. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.15653 [pdf, other]

Deceptive Fairness Attacks on Graphs via Meta Learning

Authors: Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong

Abstract: We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as wel… ▽ More We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as well as arbitrary choices of manipulation operations. We further instantiate FATE to attack statistical parity and individual fairness on graph neural networks. We conduct extensive experimental evaluations on real-world datasets in the task of semi-supervised node classification. The experimental results demonstrate that FATE could amplify the bias of graph neural networks with or without fairness consideration while maintaining the utility on the downstream task. We hope this paper provides insights into the adversarial robustness of fair graph learning and can shed light on designing robust and fair graph learning in future studies. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 23 pages, 11 tables

arXiv:2310.08798 [pdf, other]

Alteration Detection of Tensor Dependence Structure via Sparsity-Exploited Reranking Algorithm

Authors: Li Ma, Shenghao Qin, Yin Xia

Abstract: Tensor-valued data arise frequently from a wide variety of scientific applications, and many among them can be translated into an alteration detection problem of tensor dependence structures. In this article, we formulate the problem under the popularly adopted tensor-normal distributions and aim at two-sample correlation/partial correlation comparisons of tensor-valued observations. Through decor… ▽ More Tensor-valued data arise frequently from a wide variety of scientific applications, and many among them can be translated into an alteration detection problem of tensor dependence structures. In this article, we formulate the problem under the popularly adopted tensor-normal distributions and aim at two-sample correlation/partial correlation comparisons of tensor-valued observations. Through decorrelation and centralization, a separable covariance structure is employed to pool sample information from different tensor modes to enhance the power of the test. Additionally, we propose a novel Sparsity-Exploited Reranking Algorithm (SERA) to further improve the multiple testing efficiency. The algorithm is approached through reranking of the p-values derived from the primary test statistics, by incorporating a carefully constructed auxiliary tensor sequence. Besides the tensor framework, SERA is also generally applicable to a wide range of two-sample large-scale inference problems with sparsity structures, and is of independent interest. The asymptotic properties of the proposed test are derived and the algorithm is shown to control the false discovery at the pre-specified level. We demonstrate the efficacy of the proposed method through intensive simulations and two scientific applications. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2308.00894 [pdf, other]

User-Controllable Recommendation via Counterfactual Retrospective and Prospective Explanations

Authors: Juntao Tan, Yingqiang Ge, Yan Zhu, Yinglong Xia, Jiebo Luo, Jianchao Ji, Yongfeng Zhang

Abstract: Modern recommender systems utilize users' historical behaviors to generate personalized recommendations. However, these systems often lack user controllability, leading to diminished user satisfaction and trust in the systems. Acknowledging the recent advancements in explainable recommender systems that enhance users' understanding of recommendation mechanisms, we propose leveraging these advancem… ▽ More Modern recommender systems utilize users' historical behaviors to generate personalized recommendations. However, these systems often lack user controllability, leading to diminished user satisfaction and trust in the systems. Acknowledging the recent advancements in explainable recommender systems that enhance users' understanding of recommendation mechanisms, we propose leveraging these advancements to improve user controllability. In this paper, we present a user-controllable recommender system that seamlessly integrates explainability and controllability within a unified framework. By providing both retrospective and prospective explanations through counterfactual reasoning, users can customize their control over the system by interacting with these explanations. Furthermore, we introduce and assess two attributes of controllability in recommendation systems: the complexity of controllability and the accuracy of controllability. Experimental evaluations on MovieLens and Yelp datasets substantiate the effectiveness of our proposed framework. Additionally, our experiments demonstrate that offering users control options can potentially enhance recommendation accuracy in the future. Source code and data are available at \url{https://github.com/chrisjtan/ucr}. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: Accepted for presentation at 26th European Conference on Artificial Intelligence (ECAI2023)

arXiv:2306.08489 [pdf, ps, other]

Analysis and Approximate Inference of Large Random Kronecker Graphs

Authors: Zhenyu Liao, Yuanqian Xia, Chengmei Niu, Yong Xiao

Abstract: Random graph models are playing an increasingly important role in various fields ranging from social networks, telecommunication systems, to physiologic and biological networks. Within this landscape, the random Kronecker graph model, emerges as a prominent framework for scrutinizing intricate real-world networks. In this paper, we investigate large random Kronecker graphs, i.e., the number of gra… ▽ More Random graph models are playing an increasingly important role in various fields ranging from social networks, telecommunication systems, to physiologic and biological networks. Within this landscape, the random Kronecker graph model, emerges as a prominent framework for scrutinizing intricate real-world networks. In this paper, we investigate large random Kronecker graphs, i.e., the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory (RMT) and high-dimensional statistics, we prove that the adjacency of a large random Kronecker graph can be decomposed, in a spectral norm sense, into two parts: a small-rank (of rank $O(\log N)$) signal matrix that is linear in the graph parameters and a zero-mean random noise matrix. Based on this result, we propose a ``denoise-and-solve'' approach to infer the key graph parameters, with significantly reduced computational complexity. Experiments on both graph inference and classification are presented to evaluate the our proposed method. In both tasks, the proposed approach yields comparable or advantageous performance, than widely-used graph inference (e.g., KronFit) and graph neural net baselines, at a time cost that scales linearly as the graph size $N$. △ Less

Submitted 5 February, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 27 pages, 5 figures, 2 tables

arXiv:2305.05367 [pdf]

Exploring assessment method of technological advancement based on literature cross-citation

Authors: Shengxuan Tang, Liming Zhang, Shuo Jiang, Ming Cai, Yao Xiao

Abstract: Assessing advancements of technology is essential for creating science and technology policies and making informed investments in the technology market. However, current methods primarily focus on the characteristics of the technologies themselves, making it difficult to accurately assess technologies across various fields and generations. To address this challenge, we propose a novel approach tha… ▽ More Assessing advancements of technology is essential for creating science and technology policies and making informed investments in the technology market. However, current methods primarily focus on the characteristics of the technologies themselves, making it difficult to accurately assess technologies across various fields and generations. To address this challenge, we propose a novel approach that uses bibliometrics, specifically literature citation networks, to measure changes in knowledge flow throughout the evolution of technology. This method can identify diverse trends in technology development and is an effective tool for evaluating technological advancements. We demonstrate its accuracy and applicability by applying it to mobile communication technology and comparing its quantitative results with other assessment methods. Our work provides critical support for assessing different technical routes and formulating technology policy. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 15 pages, 6 figures

arXiv:2304.12502 [pdf, ps, other]

Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach

Authors: Christo Kurisummoottil Thomas, Walid Saad, Yong Xiao

Abstract: A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing (e.g., edge computing), and artificial intelligence (AI) technologies to enable many connected intelligence services. In order to handle the large amounts of network data based on digital twins (DTs), wireless systems can exploit the paradigm of semantic communication (SC) f… ▽ More A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing (e.g., edge computing), and artificial intelligence (AI) technologies to enable many connected intelligence services. In order to handle the large amounts of network data based on digital twins (DTs), wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints by utilizing AI techniques such as causal reasoning. In this paper, a novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems. The CSC system is posed as an imitation learning (IL) problem, where the transmitter, with access to optimal network control policies using a DT, teaches the receiver using SC over a bandwidth limited wireless channel how to improve its knowledge to perform optimal control actions. The causal structure in the source data is extracted using novel approaches from the framework of deep end-to-end causal inference, thereby enabling the creation of a semantic representation that is causally invariant, which in turn helps generalize the learned knowledge of the system to unseen scenarios. The CSC decoder at the receiver is designed to extract and estimate semantic information while ensuring high semantic reliability. The receiver control policies, semantic decoder, and causal inference are formulated as a bi-level optimization problem within a variational inference framework. This problem is solved using a novel concept called network state models, inspired from world models in generative AI, that faithfully represents the environment dynamics leading to data generation. Simulation results demonstrate that the proposed CSC system outperforms state-of-the-art SC systems by achieving better semantic reliability and reduced semantic representation. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2302.14247 [pdf, ps, other]

Sequential edge detection using joint hierarchical Bayesian learning

Authors: Yao Xiao, Anne Gelb, Guohui Song

Abstract: This paper introduces a new sparse Bayesian learning (SBL) algorithm that jointly recovers a temporal sequence of edge maps from noisy and under-sampled Fourier data. The new method is cast in a Bayesian framework and uses a prior that simultaneously incorporates intra-image information to promote sparsity in each individual edge map with inter-image information to promote similarities in any unch… ▽ More This paper introduces a new sparse Bayesian learning (SBL) algorithm that jointly recovers a temporal sequence of edge maps from noisy and under-sampled Fourier data. The new method is cast in a Bayesian framework and uses a prior that simultaneously incorporates intra-image information to promote sparsity in each individual edge map with inter-image information to promote similarities in any unchanged regions. By treating both the edges as well as the similarity between adjacent images as random variables, there is no need to separately form regions of change. Thus we avoid both additional computational cost as well as any information loss resulting from pre-processing the image. Our numerical examples demonstrate that our new method compares favorably with more standard SBL approaches. △ Less

Submitted 27 February, 2023; originally announced February 2023.

MSC Class: 15A29; 62F15; 65F22; 65K10; 68U10

arXiv:2302.11173 [pdf, other]

VI-DGP: A variational inference method with deep generative prior for solving high-dimensional inverse problems

Authors: Yingzhi Xia, Qifeng Liao, **glai Li

Abstract: Solving high-dimensional Bayesian inverse problems (BIPs) with the variational inference (VI) method is promising but still challenging. The main difficulties arise from two aspects. First, VI methods approximate the posterior distribution using a simple and analytic variational distribution, which makes it difficult to estimate complex spatially-varying parameters in practice. Second, VI methods… ▽ More Solving high-dimensional Bayesian inverse problems (BIPs) with the variational inference (VI) method is promising but still challenging. The main difficulties arise from two aspects. First, VI methods approximate the posterior distribution using a simple and analytic variational distribution, which makes it difficult to estimate complex spatially-varying parameters in practice. Second, VI methods typically rely on gradient-based optimization, which can be computationally expensive or intractable when applied to BIPs involving partial differential equations (PDEs). To address these challenges, we propose a novel approximation method for estimating the high-dimensional posterior distribution. This approach leverages a deep generative model to learn a prior model capable of generating spatially-varying parameters. This enables posterior approximation over the latent variable instead of the complex parameters, thus improving estimation accuracy. Moreover, to accelerate gradient computation, we employ a differentiable physics-constrained surrogate model to replace the adjoint method. The proposed method can be fully implemented in an automatic differentiation manner. Numerical examples demonstrate two types of log-permeability estimation for flow in heterogeneous media. The results show the validity, accuracy, and high efficiency of the proposed method. △ Less

Submitted 22 February, 2023; originally announced February 2023.

MSC Class: 35R30; 62F15; 68T07

arXiv:2302.05790 [pdf, other]

Dimension Reduction and MARS

Authors: Yu Liu, Degui Li, Yingcun Xia

Abstract: The multivariate adaptive regression spline (MARS) is one of the popular estimation methods for nonparametric multivariate regressions. However, as MARS is based on marginal splines, to incorporate interactions of covariates, products of the marginal splines must be used, which leads to an unmanageable number of basis functions when the order of interaction is high and results in low estimation ef… ▽ More The multivariate adaptive regression spline (MARS) is one of the popular estimation methods for nonparametric multivariate regressions. However, as MARS is based on marginal splines, to incorporate interactions of covariates, products of the marginal splines must be used, which leads to an unmanageable number of basis functions when the order of interaction is high and results in low estimation efficiency. In this paper, we improve the performance of MARS by using linear combinations of the covariates which achieve sufficient dimension reduction. The special basis functions of MARS facilitate calculation of gradients of the regression function, and estimation of the linear combinations is obtained via eigen-analysis of the outer-product of the gradients. Under some technical conditions, the asymptotic theory is established for the proposed estimation method. Numerical studies including both simulation and empirical applications show its effectiveness in dimension reduction and improvement over MARS and other commonly-used nonparametric methods in regression estimation and prediction. △ Less

Submitted 4 July, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

arXiv:2301.10392 [pdf, other]

Statistical Inference and Large-scale Multiple Testing for High-dimensional Regression Models

Authors: T. Tony Cai, Zijian Guo, Yin Xia

Abstract: This paper presents a selective survey of recent developments in statistical inference and multiple testing for high-dimensional regression models, including linear and logistic regression. We examine the construction of confidence intervals and hypothesis tests for various low-dimensional objectives such as regression coefficients and linear and quadratic functionals. The key technique is to gene… ▽ More This paper presents a selective survey of recent developments in statistical inference and multiple testing for high-dimensional regression models, including linear and logistic regression. We examine the construction of confidence intervals and hypothesis tests for various low-dimensional objectives such as regression coefficients and linear and quadratic functionals. The key technique is to generate debiased and desparsified estimators for the targeted low-dimensional objectives and estimate their uncertainty. In addition to covering the motivations for and intuitions behind these statistical methods, we also discuss their optimality and adaptivity in the context of high-dimensional inference. In addition, we review the recent development of statistical inference based on multiple regression models and the advancement of large-scale multiple testing for high-dimensional regression. The R package SIHR has implemented some of the high-dimensional inference methods discussed in this paper. △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2301.05708 [pdf, other]

A domain-decomposed VAE method for Bayesian inverse problems

Authors: Zhihang Xu, Yingzhi Xia, Qifeng Liao

Abstract: Bayesian inverse problems are often computationally challenging when the forward model is governed by complex partial differential equations (PDEs). This is typically caused by expensive forward model evaluations and high-dimensional parameterization of priors. This paper proposes a domain-decomposed variational auto-encoder Markov chain Monte Carlo (DD-VAE-MCMC) method to tackle these challenges… ▽ More Bayesian inverse problems are often computationally challenging when the forward model is governed by complex partial differential equations (PDEs). This is typically caused by expensive forward model evaluations and high-dimensional parameterization of priors. This paper proposes a domain-decomposed variational auto-encoder Markov chain Monte Carlo (DD-VAE-MCMC) method to tackle these challenges simultaneously. Through partitioning the global physical domain into small subdomains, the proposed method first constructs local deterministic generative models based on local historical data, which provide efficient local prior representations. Gaussian process models with active learning address the domain decomposition interface conditions. Then inversions are conducted on each subdomain independently in parallel and in low-dimensional latent parameter spaces. The local inference solutions are post-processed through the Poisson image blending procedure to result in an efficient global inference result. Numerical examples are provided to demonstrate the performance of the proposed method. △ Less

Submitted 6 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: arXiv admin note: text overlap with arXiv:2211.04026 by other authors

arXiv:2212.12767 [pdf, other]

Streaming Traffic Flow Prediction Based on Continuous Reinforcement Learning

Authors: Yanan Xiao, Minyu Liu, Zichen Zhang, Lu Jiang, Minghao Yin, Jianan Wang

Abstract: Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end,… ▽ More Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time. △ Less

Submitted 24 December, 2022; originally announced December 2022.

arXiv:2210.12983 [pdf, other]

doi 10.1109/TSP.2023.3278944

Poisson multi-Bernoulli mixture filter with general target-generated measurements and arbitrary clutter

Authors: Ángel F. García-Fernández, Yuxuan Xia, Lennart Svensson

Abstract: This paper shows that the Poisson multi-Bernoulli mixture (PMBM) density is a multi-target conjugate prior for general target-generated measurement distributions and arbitrary clutter distributions. That is, for this multi-target measurement model and the standard multi-target dynamic model with Poisson birth model, the predicted and filtering densities are PMBMs. We derive the corresponding PMBM… ▽ More This paper shows that the Poisson multi-Bernoulli mixture (PMBM) density is a multi-target conjugate prior for general target-generated measurement distributions and arbitrary clutter distributions. That is, for this multi-target measurement model and the standard multi-target dynamic model with Poisson birth model, the predicted and filtering densities are PMBMs. We derive the corresponding PMBM filtering recursion. Based on this result, we implement a PMBM filter for point-target measurement models and negative binomial clutter density in which data association hypotheses with high weights are chosen via Gibbs sampling. We also implement an extended target PMBM filter with clutter that is the union of Poisson-distributed clutter and a finite number of independent clutter sources. Simulation results show the benefits of the proposed filters to deal with non-standard clutter. △ Less

Submitted 24 May, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: Matlab code available at https://github.com/Agarciafernandez/MTT and https://github.com/yuhsuansia/Extented-target-PMBM-filter-independent-clutter-sources

Journal ref: Á. F. García-Fernández, Y. Xia, L. Svensson, "Poisson multi-Bernoulli mixture filter with general target-generated measurements and arbitrary clutter", IEEE Transactions on Signal Processing, vol. 71, 2023

arXiv:2210.04714 [pdf, other]

Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

Authors: Yuxin Xiao, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan Salakhutdinov, Louis-Philippe Morency

Abstract: Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when w… ▽ More Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when we can trust its predictions. In particular, there are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more. Although prior work has looked into some of these considerations, they usually draw conclusions based on a limited scope of empirical studies. There still lacks a holistic analysis on how to compose a well-calibrated PLM-based prediction pipeline. To fill this void, we compare a wide range of popular options for each consideration based on three prevalent NLP classification tasks and the setting of domain shift. In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning. △ Less

Submitted 14 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: Accepted by EMNLP 2022 (Findings)

arXiv:2209.13281 [pdf, other]

Robust Fused Lasso Penalized Huber Regression with Nonasymptotic Property and Implementation Studies

Authors: Xin Xin, Boyi Xie, Yunhai Xiao

Abstract: For some special data in reality, such as the genetic data, adjacent genes may have the similar function. Thus ensuring the smoothness between adjacent genes is highly necessary. But, in this case, the standard lasso penalty just doesn't seem appropriate anymore. On the other hand, in high-dimensional statistics, some datasets are easily contaminated by outliers or contain variables with heavy-tai… ▽ More For some special data in reality, such as the genetic data, adjacent genes may have the similar function. Thus ensuring the smoothness between adjacent genes is highly necessary. But, in this case, the standard lasso penalty just doesn't seem appropriate anymore. On the other hand, in high-dimensional statistics, some datasets are easily contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address both issues, in this paper, we propose an adaptive Huber regression for robust estimation and inference, in which, the fused lasso penalty is used to encourage the sparsity of the coefficients as well as the sparsity of their differences, i.e., local constancy of the coefficient profile. Theoretically, we establish its nonasymptotic estimation error bounds under $\ell_2$-norm in high-dimensional setting. The proposed estimation method is formulated as a convex, nonsmooth and separable optimization problem, hence, the alternating direction method of multipliers can be employed. In the end, we perform on simulation studies and real cancer data studies, which illustrate that the proposed estimation method is more robust and predictive. △ Less

Submitted 27 September, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

arXiv:2208.13553 [pdf]

doi 10.1186/s41512-023-00147-z

Methodological concerns about 'concordance-statistic for benefit' as a measure of discrimination in treatment benefit prediction

Authors: Yuan Xia, Paul Gustafson, Mohsen Sadatsafavi

Abstract: Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predict… ▽ More Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predictor by directly extending the concept of the concordance statistic from a risk model with a binary outcome to a model for treatment benefit. In this work, we scrutinize $cfb$ on multiple fronts. Through numerical examples and theoretical developments, we show that cfb is not a proper scoring rule. We also show that it is sensitive to the unestimable correlation between counterfactual outcomes and to the definition of matched pairs. We argue that measures of statistical dispersion applied to predicted benefits do not suffer from these issues and can be an alternative metric for the discriminatory performance of treatment benefit predictors. △ Less

Submitted 15 May, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

Comments: 12 pages, 6 figures

arXiv:2208.08754 [pdf, other]

A Decorrelating and Debiasing Approach to Simultaneous Inference for High-Dimensional Confounded Models

Authors: Yinrui Sun, Li Ma, Yin Xia

Abstract: Motivated by the simultaneous association analysis with the presence of latent confounders, this paper studies the large-scale hypothesis testing problem for the high-dimensional confounded linear models with both non-asymptotic and asymptotic false discovery control. Such model covers a wide range of practical settings where both the response and the predictors may be confounded. In the presence… ▽ More Motivated by the simultaneous association analysis with the presence of latent confounders, this paper studies the large-scale hypothesis testing problem for the high-dimensional confounded linear models with both non-asymptotic and asymptotic false discovery control. Such model covers a wide range of practical settings where both the response and the predictors may be confounded. In the presence of the high-dimensional predictors and the unobservable confounders, the simultaneous inference with provable guarantees becomes highly challenging, and the unknown strong dependence among the confounded covariates makes the challenge even more pronounced. This paper first introduces a decorrelating procedure that shrinks the confounding effect and weakens the correlations among the predictors, then performs debiasing under the decorrelated design based on some biased initial estimator. Following that, an asymptotic normality result for the debiased estimator is established and standardized test statistics are then constructed. Furthermore, a simultaneous inference procedure is proposed to identify significant associations, and both the finite-sample and asymptotic false discovery bounds are provided. The non-asymptotic result is general and model-free, and is of independent interest. We also prove that, under minimal signal strength condition, all associations can be successfully detected with probability tending to one. Simulation and real data studies are carried out to evaluate the performance of the proposed approach and compare it with other competing methods. △ Less

Submitted 22 August, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

arXiv:2207.06156 [pdf, other]

A comparison between PMBM Bayesian track initiation and labelled RFS adaptive birth

Authors: Ángel F. García-Fernández, Yuxuan Xia, Lennart Svensson

Abstract: This paper provides a comparative analysis between the adaptive birth model used in the labelled random finite set literature and the track initiation in the Poisson multi-Bernoulli mixture (PMBM) filter, with point-target models. The PMBM track initiation is obtained via Bayes' rule applied on the predicted PMBM density, and creates one Bernoulli component for each received measurement, represent… ▽ More This paper provides a comparative analysis between the adaptive birth model used in the labelled random finite set literature and the track initiation in the Poisson multi-Bernoulli mixture (PMBM) filter, with point-target models. The PMBM track initiation is obtained via Bayes' rule applied on the predicted PMBM density, and creates one Bernoulli component for each received measurement, representing that this measurement may be clutter or a detection from a new target. Adaptive birth mimics this procedure by creating a Bernoulli component for each measurement using a different rule to determine the probability of existence and a user-defined single-target density. This paper first provides an analysis of the differences that arise in track initiation based on isolated measurements. Then, it shows that adaptive birth underestimates the number of objects present in the surveillance area under common modelling assumptions. Finally, we provide numerical simulations to further illustrate the differences. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: Matlab implementations of PMBM filters can be found at https://github.com/Agarciafernandez/MTT and https://github.com/yuhsuansia

Journal ref: Proceedings of the 25th International Conference on Information Fusion, 2022

arXiv:2203.11461 [pdf, other]

Locally Adaptive Algorithms for Multiple Testing with Network Structure, with Application to Genome-Wide Association Studies

Authors: Ziyi Liang, T. Tony Cai, Wenguang Sun, Yin Xia

Abstract: Linkage analysis has provided valuable insights to the GWAS studies, particularly in revealing that SNPs in linkage disequilibrium (LD) can jointly influence disease phenotypes. However, the potential of LD network data has often been overlooked or underutilized in the literature. In this paper, we propose a locally adaptive structure learning algorithm (LASLA) that provides a principled and gener… ▽ More Linkage analysis has provided valuable insights to the GWAS studies, particularly in revealing that SNPs in linkage disequilibrium (LD) can jointly influence disease phenotypes. However, the potential of LD network data has often been overlooked or underutilized in the literature. In this paper, we propose a locally adaptive structure learning algorithm (LASLA) that provides a principled and generic framework for incorporating network data or multiple samples of auxiliary data from related source domains; possibly in different dimensions/structures and from diverse populations. LASLA employs a $p$-value weighting approach, utilizing structural insights to assign data-driven weights to individual test points. Theoretical analysis shows that LASLA can asymptotically control FDR with independent or weakly dependent primary statistics, and achieve higher power when the network data is informative. Efficiency again of LASLA is illustrated through various synthetic experiments and an application to T2D-associated SNP identification. △ Less

Submitted 16 August, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 33 pages, 7 figures

arXiv:2201.10043 [pdf, other]

NAPA: Neighborhood-Assisted and Posterior-Adjusted Two-sample Inference

Authors: Li Ma, Yin Xia, Lexin Li

Abstract: Two-sample multiple testing problems of sparse spatial data are frequently arising in a variety of scientific applications. In this article, we develop a novel neighborhood-assisted and posterior-adjusted (NAPA) approach to incorporate both the spatial smoothness and sparsity type side information to improve the power of the test while controlling the false discovery of multiple testing. We transl… ▽ More Two-sample multiple testing problems of sparse spatial data are frequently arising in a variety of scientific applications. In this article, we develop a novel neighborhood-assisted and posterior-adjusted (NAPA) approach to incorporate both the spatial smoothness and sparsity type side information to improve the power of the test while controlling the false discovery of multiple testing. We translate the side information into a set of weights to adjust the $p$-values, where the spatial pattern is encoded by the ordering of the locations, and the sparsity structure is encoded by a set of auxiliary covariates. We establish the theoretical properties of the proposed test, including the guaranteed power improvement over some state-of-the-art alternative tests, and the asymptotic false discovery control. We demonstrate the efficacy of the test through intensive simulations and two neuroimaging applications. △ Less

Submitted 31 July, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2112.04243 [pdf]

Hybrid Data-driven Framework for Shale Gas Production Performance Analysis via Game Theory, Machine Learning and Optimization Approaches

Authors: ** Meng, Yujie Zhou, Tianrui Ye, Yitian Xiao

Abstract: A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential, designing field development plan, and making investment decisions. However, quantitative analysis can be challenging because production performance is dominated by a complex interaction among a series of geological and engineering factors. In this study, we propose a hybrid data-d… ▽ More A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential, designing field development plan, and making investment decisions. However, quantitative analysis can be challenging because production performance is dominated by a complex interaction among a series of geological and engineering factors. In this study, we propose a hybrid data-driven procedure for analyzing shale gas production performance, which consists of a complete workflow for dominant factor analysis, production forecast, and development optimization. More specifically, game theory and machine learning models are coupled to determine the dominating geological and engineering factors. The Shapley value with definite physical meanings is employed to quantitatively measure the effects of individual factors. A multi-model-fused stacked model is trained for production forecast, on the basis of which derivative-free optimization algorithms are introduced to optimize the development plan. The complete workflow is validated with actual production data collected from the Fuling shale gas field, Sichuan Basin, China. The validation results show that the proposed procedure can draw rigorous conclusions with quantified evidence and thereby provide specific and reliable suggestions for development plan optimization. Comparing with traditional and experience-based approaches, the hybrid data-driven procedure is advanced in terms of both efficiency and accuracy. △ Less

Submitted 7 June, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: 37 pages, 15 figures, 6 tables

arXiv:2111.15367 [pdf, other]

A Review on Graph Neural Network Methods in Financial Applications

Authors: Jianian Wang, Sheng Zhang, Yanghua Xiao, Rui Song

Abstract: With multiple components and relations, financial data are often presented as graph data, since it could represent both the individual features and the complicated relations. Due to the complexity and volatility of the financial market, the graph constructed on the financial data is often heterogeneous or time-varying, which imposes challenges on modeling technology. Among the graph modeling techn… ▽ More With multiple components and relations, financial data are often presented as graph data, since it could represent both the individual features and the complicated relations. Due to the complexity and volatility of the financial market, the graph constructed on the financial data is often heterogeneous or time-varying, which imposes challenges on modeling technology. Among the graph modeling technologies, graph neural network (GNN) models are able to handle the complex graph structure and achieve great performance and thus could be used to solve financial tasks. In this work, we provide a comprehensive review of GNN models in recent financial context. We first categorize the commonly-used financial graphs and summarize the feature processing step for each node. Then we summarize the GNN methodology for each graph type, application in each area, and propose some potential research areas. △ Less

Submitted 26 April, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

arXiv:2111.10766 [pdf, other]

Semismooth Newton Augmented Lagrangian Algorithm for Adaptive Lasso Penalized Least Squares in Semiparametric Regression

Authors: Meixia Yang, Yunhai Xiao, Peili Li, Hanbing Zhu

Abstract: This paper is concerned with a partially linear semiparametric regression model containing an unknown regression coefficient, an unknown nonparametric function, and an unobservable Gaussian distributed random error. We focus on the case of simultaneous variable selection and estimation with a divergent number of covariates under the assumption that the regression coefficient is sparse. We consider… ▽ More This paper is concerned with a partially linear semiparametric regression model containing an unknown regression coefficient, an unknown nonparametric function, and an unobservable Gaussian distributed random error. We focus on the case of simultaneous variable selection and estimation with a divergent number of covariates under the assumption that the regression coefficient is sparse. We consider the applications of the least squares to semiparametric regression and particularly present an adaptive lasso penalized least squares (PLS) method to select the regression coefficient. We note that there are many algorithms for PLS in various applications, but they seem to be rarely used in semiparametric regression. This paper focuses on using a semismooth Newton augmented Lagrangian (SSNAL) algorithm to solve the dual of PLS which is the sum of a smooth strongly convex function and an indicator function. At each iteration, there must be a strongly semismooth nonlinear system, which can be solved by semismooth Newton by making full use of the penalized term. We show that the algorithm offers a significant computational advantage, and the semismooth Newton method admits fast local convergence rate. Numerical experiments on some simulation data and real data to demonstrate that the PLS is effective and the SSNAL is progressive. △ Less

Submitted 8 February, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

arXiv:2111.03943 [pdf, ps, other]

A Probit Tensor Factorization Model For Relational Learning

Authors: Ye Liu, Rui Song, Wenbin Lu, Yanghua Xiao

Abstract: With the proliferation of knowledge graphs, modeling data with complex multirelational structure has gained increasing attention in the area of statistical relational learning. One of the most important goals of statistical relational learning is link prediction, i.e., predicting whether certain relations exist in the knowledge graph. A large number of models and algorithms have been proposed to p… ▽ More With the proliferation of knowledge graphs, modeling data with complex multirelational structure has gained increasing attention in the area of statistical relational learning. One of the most important goals of statistical relational learning is link prediction, i.e., predicting whether certain relations exist in the knowledge graph. A large number of models and algorithms have been proposed to perform link prediction, among which tensor factorization method has proven to achieve state-of-the-art performance in terms of computation efficiency and prediction accuracy. However, a common drawback of the existing tensor factorization models is that the missing relations and non-existing relations are treated in the same way, which results in a loss of information. To address this issue, we propose a binary tensor factorization model with probit link, which not only inherits the computation efficiency from the classic tensor factorization model but also accounts for the binary nature of relational data. Our proposed probit tensor factorization (PTF) model shows advantages in both the prediction accuracy and interpretability △ Less

Submitted 8 November, 2021; v1 submitted 6 November, 2021; originally announced November 2021.

Comments: 30 pages

arXiv:2106.10121 [pdf, other]

ScoreGrad: Multivariate Probabilistic Time Series Forecasting with Continuous Energy-based Generative Models

Authors: Ti** Yan, Hongwei Zhang, Tong Zhou, Yufeng Zhan, Yuanqing Xia

Abstract: Multivariate time series prediction has attracted a lot of attention because of its wide applications such as intelligence transportation, AIOps. Generative models have achieved impressive results in time series modeling because they can model data distribution and take noise into consideration. However, many existing works can not be widely used because of the constraints of functional form of ge… ▽ More Multivariate time series prediction has attracted a lot of attention because of its wide applications such as intelligence transportation, AIOps. Generative models have achieved impressive results in time series modeling because they can model data distribution and take noise into consideration. However, many existing works can not be widely used because of the constraints of functional form of generative models or the sensitivity to hyperparameters. In this paper, we propose ScoreGrad, a multivariate probabilistic time series forecasting framework based on continuous energy-based generative models. ScoreGrad is composed of time series feature extraction module and conditional stochastic differential equation based score matching module. The prediction can be achieved by iteratively solving reverse-time SDE. To the best of our knowledge, ScoreGrad is the first continuous energy based generative model used for time series forecasting. Furthermore, ScoreGrad achieves state-of-the-art results on six real-world datasets. The impact of hyperparameters and sampler types on the performance are also explored. Code is available at https://github.com/yanti**/ScoreGradPred. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: 12 pages, 10 figures

arXiv:2106.09179 [pdf, other]

Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation

Authors: Yuxin Xiao, Eric P. Xing, Willie Neiswanger

Abstract: With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not… ▽ More With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not focus on low-fidelity information. As we demonstrate empirically, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in additional observations and the need for performance forecasting. Therefore, we propose and conduct a thorough analysis of a multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method. △ Less

Submitted 7 April, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

arXiv:2105.00393 [pdf, other]

Directional FDR Control for Sub-Gaussian Sparse GLMs

Authors: Chang Cui, **zhu Jia, Yijun Xiao, Huiming Zhang

Abstract: High-dimensional sparse generalized linear models (GLMs) have emerged in the setting that the number of samples and the dimension of variables are large, and even the dimension of variables grows faster than the number of samples. False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results after getting the sparse penalized estimation of GLM… ▽ More High-dimensional sparse generalized linear models (GLMs) have emerged in the setting that the number of samples and the dimension of variables are large, and even the dimension of variables grows faster than the number of samples. False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results after getting the sparse penalized estimation of GLMs. Using the CLIME method for precision matrix estimations, we construct the debiased-Lasso estimator and prove the asymptotical normality by minimax-rate oracle inequalities for sparse GLMs. In practice, it is often needed to accurately judge each regression coefficient's positivity and negativity, which determines whether the predictor variable is positively or negatively related to the response variable conditionally on the rest variables. Using the debiased estimator, we establish multiple testing procedures. Under mild conditions, we show that the proposed debiased statistics can asymptotically control the directional (sign) FDR and directional false discovery variables at a pre-specified significance level. Moreover, it can be shown that our multiple testing procedure can approximately achieve a statistical power of 1. We also extend our methods to the two-sample problems and propose the two-sample test statistics. Under suitable conditions, we can asymptotically achieve directional FDR control and directional FDV control at the specified significance level for two-sample problems. Some numerical simulations have successfully verified the FDR control effects of our proposed testing procedures, which sometimes outperforms the classical knockoff method. △ Less

Submitted 2 May, 2021; originally announced May 2021.

Comments: 37 pages

arXiv:2102.03169 [pdf, other]

doi 10.1016/j.jcp.2022.111008

Bayesian multiscale deep generative model for the solution of high-dimensional inverse problems

Authors: Yingzhi Xia, Nicholas Zabaras

Abstract: Estimation of spatially-varying parameters for computationally expensive forward models governed by partial differential equations is addressed. A novel multiscale Bayesian inference approach is introduced based on deep probabilistic generative models. Such generative models provide a flexible representation by inferring on each scale a low-dimensional latent encoding while allowing hierarchical p… ▽ More Estimation of spatially-varying parameters for computationally expensive forward models governed by partial differential equations is addressed. A novel multiscale Bayesian inference approach is introduced based on deep probabilistic generative models. Such generative models provide a flexible representation by inferring on each scale a low-dimensional latent encoding while allowing hierarchical parameter generation from coarse- to fine-scales. Combining the multiscale generative model with Markov Chain Monte Carlo (MCMC), inference across scales is achieved enabling us to efficiently obtain posterior parameter samples at various scales. The estimation of coarse-scale parameters using a low-dimensional latent embedding captures global and notable parameter features using an inexpensive but inaccurate solver. MCMC sampling of the fine-scale parameters is enabled by utilizing the posterior information in the immediate coarser-scale. In this way, the global features are identified in the coarse-scale with inference of low-dimensional variables and inexpensive forward computation, and the local features are refined and corrected in the fine-scale. The developed method is demonstrated with two types of permeability estimation for flow in heterogeneous media. One is a Gaussian random field (GRF) with uncertain length scales, and the other is channelized permeability with the two regions defined by different GRFs. The obtained results indicate that the method allows high-dimensional parameter estimation while exhibiting stability, efficiency and accuracy. △ Less

Submitted 10 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

arXiv:2011.04464 [pdf, other]

doi 10.1109/TSP.2021.3072006

A Poisson multi-Bernoulli mixture filter for coexisting point and extended targets

Authors: Ángel F. García-Fernández, Jason L. Williams, Lennart Svensson, Yuxuan Xia

Abstract: This paper proposes a Poisson multi-Bernoulli mixture (PMBM) filter for coexisting point and extended targets, i.e., for scenarios where there may be simultaneous point and extended targets. The PMBM filter provides a recursion to compute the multi-target filtering posterior based on probabilistic information on data associations, and single-target predictions and updates. In this paper, we first… ▽ More This paper proposes a Poisson multi-Bernoulli mixture (PMBM) filter for coexisting point and extended targets, i.e., for scenarios where there may be simultaneous point and extended targets. The PMBM filter provides a recursion to compute the multi-target filtering posterior based on probabilistic information on data associations, and single-target predictions and updates. In this paper, we first derive the PMBM filter update for a generalised measurement model, which can include measurements originated from point and extended targets. Second, we propose a single-target space that accommodates both point and extended targets and derive the filtering recursion that propagates Gaussian densities for point targets and gamma Gaussian inverse Wishart densities for extended targets. As a computationally efficient approximation of the PMBM filter, we also develop a Poisson multi-Bernoulli (PMB) filter for coexisting point and extended targets. The resulting filters are analysed via numerical simulations. △ Less

Submitted 18 May, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: Matlab files can be found at https://github.com/Agarciafernandez/Coexisting-point-extended-target-PMBM-filter and https://github.com/yuhsuansia/Coexisting-point-extended-target-PMBM-filter. A relevant multi-object tracking course can be found at https://www.youtube.com/channel/UCa2-fpj6AV8T6JK1uTRuFpw

Journal ref: in IEEE Transactions on Signal Processing, vol. 69, pp. 2600-2610, 2021

arXiv:2010.10698 [pdf, other]

Batch Sequential Adaptive Designs for Global Optimization

Authors: Jianhui Ning, Yao Xiao, Zikang Xiong

Abstract: Compared with the fixed-run designs, the sequential adaptive designs (SAD) are thought to be more efficient and effective. Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization problems. A well-recognized weakness of the original EGO in complex computer experiments is that it is serial, and hence the modern parallel computing techniques can… ▽ More Compared with the fixed-run designs, the sequential adaptive designs (SAD) are thought to be more efficient and effective. Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization problems. A well-recognized weakness of the original EGO in complex computer experiments is that it is serial, and hence the modern parallel computing techniques cannot be utilized to speed up the running of simulator experiments. For those multiple points EGO methods, the heavy computation and points clustering are the obstacles. In this work, a novel batch SAD method, named "accelerated EGO", is forwarded by using a refined sampling/importance resampling (SIR) method to search the points with large expected improvement (EI) values. The computation burden of the new method is much lighter, and the points clustering is also avoided. The efficiency of the proposed SAD is validated by nine classic test functions with dimension from 2 to 12. The empirical results show that the proposed algorithm indeed can parallelize original EGO, and gain much improvement compared against the other parallel EGO algorithm especially under high-dimensional case. Additionally, we also apply the new method to the hyper-parameter tuning of Support Vector Machine (SVM). Accelerated EGO obtains comparable cross validation accuracy with other methods and the CPU time can be reduced a lot due to the parallel computation and sampling method. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: 20Pages, 4 Figures, 8 Tables

MSC Class: G.3

arXiv:2009.11469 [pdf, other]

Revisiting Graph Convolutional Network on Semi-Supervised Node Classification from an Optimization Perspective

Authors: Hongwei Zhang, Ti** Yan, Zenjun Xie, Yuanqing Xia, Yuan Zhang

Abstract: Graph convolutional networks (GCNs) have achieved promising performance on various graph-based tasks. However they suffer from over-smoothing when stacking more layers. In this paper, we present a quantitative study on this observation and develop novel insights towards the deeper GCN. First, we interpret the current graph convolutional operations from an optimization perspective and argue that ov… ▽ More Graph convolutional networks (GCNs) have achieved promising performance on various graph-based tasks. However they suffer from over-smoothing when stacking more layers. In this paper, we present a quantitative study on this observation and develop novel insights towards the deeper GCN. First, we interpret the current graph convolutional operations from an optimization perspective and argue that over-smoothing is mainly caused by the naive first-order approximation of the solution to the optimization problem. Subsequently, we introduce two metrics to measure the over-smoothing on node-level tasks. Specifically, we calculate the fraction of the pairwise distance between connected and disconnected nodes to the overall distance respectively. Based on our theoretical and empirical analysis, we establish a universal theoretical framework of GCN from an optimization perspective and derive a novel convolutional kernel named GCN+ which has lower parameter amount while relieving the over-smoothing inherently. Extensive experiments on real-world datasets demonstrate the superior performance of GCN+ over state-of-the-art baseline methods on the node classification tasks. △ Less

Submitted 24 September, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

arXiv:2009.00538 [pdf, other]

Stochastic Graph Recurrent Neural Network

Authors: Ti** Yan, Hongwei Zhang, Zirui Li, Yuanqing Xia

Abstract: Representation learning over graph structure data has been widely studied due to its wide application prospects. However, previous methods mainly focus on static graphs while many real-world graphs evolve over time. Modeling such evolution is important for predicting properties of unseen networks. To resolve this challenge, we propose SGRNN, a novel neural architecture that applies stochastic late… ▽ More Representation learning over graph structure data has been widely studied due to its wide application prospects. However, previous methods mainly focus on static graphs while many real-world graphs evolve over time. Modeling such evolution is important for predicting properties of unseen networks. To resolve this challenge, we propose SGRNN, a novel neural architecture that applies stochastic latent variables to simultaneously capture the evolution in node attributes and topology. Specifically, deterministic states are separated from stochastic states in the iterative process to suppress mutual interference. With semi-implicit variational inference integrated to SGRNN, a non-Gaussian variational distribution is proposed to help further improve the performance. In addition, to alleviate KL-vanishing problem in SGRNN, a simple and interpretable structure is proposed based on the lower bound of KL-divergence. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed model. Code is available at https://github.com/StochasticGRNN/SGRNN. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:2009.00399 [pdf, other]

Accounting for correlated horizontal pleiotropy in two-sample Mendelian randomization using correlated instrumental variants

Authors: Qing Cheng, Baoluo Sun, Yingcun Xia, ** Liu

Abstract: Mendelian randomization (MR) is a powerful approach to examine the causal relationships between health risk factors and outcomes from observational studies. Due to the proliferation of genome-wide association studies (GWASs) and abundant fully accessible GWASs summary statistics, a variety of two-sample MR methods for summary data have been developed to either detect or account for horizontal plei… ▽ More Mendelian randomization (MR) is a powerful approach to examine the causal relationships between health risk factors and outcomes from observational studies. Due to the proliferation of genome-wide association studies (GWASs) and abundant fully accessible GWASs summary statistics, a variety of two-sample MR methods for summary data have been developed to either detect or account for horizontal pleiotropy, primarily based on the assumption that the effects of variants on exposure (γ) and horizontal pleiotropy (α) are independent. This assumption is too strict and can be easily violated because of the correlated horizontal pleiotropy (CHP). To account for this CHP, we propose a Bayesian approach, MR-Corr2, that uses the orthogonal projection to reparameterize the bivariate normal distribution for γ and α, and a spike-slab prior to mitigate the impact of CHP. We develop an efficient algorithm with paralleled Gibbs sampling. To demonstrate the advantages of MR-Corr2 over existing methods, we conducted comprehensive simulation studies to compare for both type-I error control and point estimates in various scenarios. By applying MR-Corr2 to study the relationships between pairs in two sets of complex traits, we did not identify the contradictory causal relationship between HDL-c and CAD. Moreover, the results provide a new perspective of the causal network among complex traits. The developed R package and code to reproduce all the results are available at https://github.com/QingCheng0218/MR.Corr2. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:2009.00236 [pdf, other]

A Survey of Deep Active Learning

Authors: Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B. Gupta, Xiaojiang Chen, Xin Wang

Abstract: Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we… ▽ More Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we have massive amounts of data. In this way, DL has aroused strong interest of researchers and has been rapidly developed. Compared with DL, researchers have relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples. Therefore, early AL is difficult to reflect the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to the publicity of the large number of existing annotation datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, which is not allowed in some fields that require high expertise, especially in the fields of speech recognition, information extraction, medical images, etc. Therefore, AL has gradually received due attention. A natural idea is whether AL can be used to reduce the cost of sample annotations, while retaining the powerful learning capabilities of DL. Therefore, deep active learning (DAL) has emerged. Although the related research has been quite abundant, it lacks a comprehensive survey of DAL. This article is to fill this gap, we provide a formal classification method for the existing work, and a comprehensive and systematic overview. In addition, we also analyzed and summarized the development of DAL from the perspective of application. Finally, we discussed the confusion and problems in DAL, and gave some possible development directions for DAL. △ Less

Submitted 5 December, 2021; v1 submitted 30 August, 2020; originally announced September 2020.

arXiv:2008.07298 [pdf, other]

WAFFLE: Watermarking in Federated Learning

Authors: Buse Gul Atli, Yuxi Xia, Samuel Marchal, N. Asokan

Abstract: Federated learning is a distributed learning technique where machine learning models are trained on client devices in which the local training data resides. The training is coordinated via a central server which is, typically, controlled by the intended owner of the resulting model. By avoiding the need to transport the training data to the central server, federated learning improves privacy and e… ▽ More Federated learning is a distributed learning technique where machine learning models are trained on client devices in which the local training data resides. The training is coordinated via a central server which is, typically, controlled by the intended owner of the resulting model. By avoiding the need to transport the training data to the central server, federated learning improves privacy and efficiency. But it raises the risk of model theft by clients because the resulting model is available on every client device. Even if the application software used for local training may attempt to prevent direct access to the model, a malicious client may bypass any such restrictions by reverse engineering the application software. Watermarking is a well-known deterrence method against model theft by providing the means for model owners to demonstrate ownership of their models. Several recent deep neural network (DNN) watermarking techniques use backdooring: training the models with additional mislabeled data. Backdooring requires full access to the training data and control of the training process. This is feasible when a single party trains the model in a centralized manner, but not in a federated learning setting where the training process and training data are distributed among several client devices. In this paper, we present WAFFLE, the first approach to watermark DNN models trained using federated learning. It introduces a retraining step at the server after each aggregation of local models into the global model. We show that WAFFLE efficiently embeds a resilient watermark into models incurring only negligible degradation in test accuracy (-0.17%), and does not require access to training data. We also introduce a novel technique to generate the backdoor used as a watermark. It outperforms prior techniques, imposing no communication, and low computational (+3.2%) overhead. △ Less

Submitted 22 July, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: Will appear in the proceedings of SRDS 2021; 14 pages, 11 figures, 10 tables

arXiv:2007.12349 [pdf, other]

Adversarial Mixture Of Experts with Category Hierarchy Soft Constraint

Authors: Zhuojian Xiao, Yunjiang jiang, Guoyu Tang, Lin Liu, Sulong Xu, Yun Xiao, Weipeng Yan

Abstract: Product search is the most common way for people to satisfy their shop** needs on e-commerce websites. Products are typically annotated with one of several broad categorical tags, such as "Clothing" or "Electronics", as well as finer-grained categories like "Refrigerator" or "TV", both under "Electronics". These tags are used to construct a hierarchy of query categories. Distributions of feature… ▽ More Product search is the most common way for people to satisfy their shop** needs on e-commerce websites. Products are typically annotated with one of several broad categorical tags, such as "Clothing" or "Electronics", as well as finer-grained categories like "Refrigerator" or "TV", both under "Electronics". These tags are used to construct a hierarchy of query categories. Distributions of features such as price and brand popularity vary wildly across query categories. In addition, feature importance for the purpose of CTR/CVR predictions differs from one category to another. In this work, we leverage the Mixture of Expert (MoE) framework to learn a ranking model that specializes for each query category. In particular, our gate network relies solely on the category ids extracted from the user query. While classical MoE's pick expert towers spontaneously for each input example, we explore two techniques to establish more explicit and transparent connections between the experts and query categories. To help differentiate experts on their domain specialties, we introduce a form of adversarial regularization among the expert outputs, forcing them to disagree with one another. As a result, they tend to approach each prediction problem from different angles, rather than copying one another. This is validated by a much stronger clustering effect of the gate output vectors under different categories. In addition, soft gating constraints based on the categorical hierarchy are imposed to help similar products choose similar gate values. and make them more likely to share similar experts. This allows aggregation of training data among smaller sibling categories to overcome data scarcity. △ Less

Submitted 2 March, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

arXiv:2007.04649 [pdf, other]

Learning to Reweight with Deep Interactions

Authors: Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Abstract: Recently, the concept of teaching has been introduced into machine learning, in which a teacher model is used to guide the training of a student model (which will be used in real tasks) through data selection, loss function design, etc. Learning to reweight, which is a specific kind of teaching that reweights training data using a teacher model, receives much attention due to its simplicity and ef… ▽ More Recently, the concept of teaching has been introduced into machine learning, in which a teacher model is used to guide the training of a student model (which will be used in real tasks) through data selection, loss function design, etc. Learning to reweight, which is a specific kind of teaching that reweights training data using a teacher model, receives much attention due to its simplicity and effectiveness. In existing learning to reweight works, the teacher model only utilizes shallow/surface information such as training iteration number and loss/accuracy of the student model from training/validation sets, but ignores the internal states of the student model, which limits the potential of learning to reweight. In this work, we propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model, and the teacher model returns adaptive weights of training samples to enhance the training of the student model. The teacher model is jointly trained with the student model using meta gradients propagated from a validation set. Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods. △ Less

Submitted 12 January, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: Accepted to AAAI-2021

arXiv:2006.16501 [pdf, other]

Testing and Support Recovery of Correlation Structures for Matrix-Valued Observations with an Application to Stock Market Data

Authors: Xin Chen, Dan Yang, Yan Xu, Yin Xia, Dong Wang, Haipeng Shen

Abstract: Estimation of the covariance matrix of asset returns is crucial to portfolio construction. As suggested by economic theories, the correlation structure among assets differs between emerging markets and developed countries. It is therefore imperative to make rigorous statistical inference on correlation matrix equality between the two groups of countries. However, if the traditional vector-valued a… ▽ More Estimation of the covariance matrix of asset returns is crucial to portfolio construction. As suggested by economic theories, the correlation structure among assets differs between emerging markets and developed countries. It is therefore imperative to make rigorous statistical inference on correlation matrix equality between the two groups of countries. However, if the traditional vector-valued approach is undertaken, such inference is either infeasible due to limited number of countries comparing to the relatively abundant assets, or invalid due to the violations of temporal independence assumption. This highlights the necessity of treating the observations as matrix-valued rather than vector-valued. With matrix-valued observations, our problem of interest can be formulated as statistical inference on covariance structures under sub-Gaussian distributions, i.e., testing non-correlation and correlation equality, as well as the corresponding support estimations. We develop procedures that are asymptotically optimal under some regularity conditions. Simulation results demonstrate the computational and statistical advantages of our procedures over certain existing state-of-the-art methods for both normal and non-normal distributions. Application of our procedures to stock market data reveals interesting patterns and validates several economic propositions via rigorous statistical testing. △ Less

Submitted 27 September, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

arXiv:2006.10932 [pdf, other]

Convolutional Gaussian Embeddings for Personalized Recommendation with Uncertainty

Authors: Junyang Jiang, Deqing Yang, Yanghua Xiao, Chenlu Shen

Abstract: Most of existing embedding based recommendation models use embeddings (vectors) corresponding to a single fixed point in low-dimensional space, to represent users and items. Such embeddings fail to precisely represent the users/items with uncertainty often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, whi… ▽ More Most of existing embedding based recommendation models use embeddings (vectors) corresponding to a single fixed point in low-dimensional space, to represent users and items. Such embeddings fail to precisely represent the users/items with uncertainty often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, which are proven adaptive to uncertain preferences exhibited by some users, resulting in better user representations and recommendation performance. Furthermore, our framework adopts Monte-Carlo sampling and convolutional neural networks to compute the correlation between the objective user and the candidate item, based on which precise recommendations are achieved. Our extensive experiments on two benchmark datasets not only justify that our proposed Gaussian embeddings capture the uncertainty of users very well, but also demonstrate its superior performance over the state-of-the-art recommendation models. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Journal ref: IJCAI 2019

arXiv:2006.10783 [pdf, other]

doi 10.1103/PhysRevD.102.086013

Quiver Mutations, Seiberg Duality and Machine Learning

Authors: Jiakang Bao, Sebastián Franco, Yang-Hui He, Edward Hirst, Gregg Musiker, Yan Xiao

Abstract: We initiate the study of applications of machine learning to Seiberg duality, focusing on the case of quiver gauge theories, a problem also of interest in mathematics in the context of cluster algebras. Within the general theme of Seiberg duality, we define and explore a variety of interesting questions, broadly divided into the binary determination of whether a pair of theories picked from a seri… ▽ More We initiate the study of applications of machine learning to Seiberg duality, focusing on the case of quiver gauge theories, a problem also of interest in mathematics in the context of cluster algebras. Within the general theme of Seiberg duality, we define and explore a variety of interesting questions, broadly divided into the binary determination of whether a pair of theories picked from a series of duality classes are dual to each other, as well as the multi-class determination of the duality class to which a given theory belongs. We study how the performance of machine learning depends on several variables, including number of classes and mutation type (finite or infinite). In addition, we evaluate the relative advantages of Naive Bayes classifiers versus Convolutional Neural Networks. Finally, we also investigate how the results are affected by the inclusion of additional data, such as ranks of gauge/flavor groups and certain variables motivated by the existence of underlying Diophantine equations. In all questions considered, high accuracy and confidence can be achieved. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 57 pages

MSC Class: 13F60; 81T13; 81T30

Journal ref: Phys. Rev. D 102, 086013 (2020)

Showing 1–50 of 102 results for author: Xiao, Y