Search | arXiv e-print repository

Fast solution to the fair ranking problem using the Sinkhorn algorithm

Authors: Yuki Uehara, Shunnosuke Ikeda, Naoki Nishimura, Koya Ohashi, Yilin Li, Jie Yang, Deddy Jobson, Xingxia Zha, Takeshi Matsumoto, Noriyoshi Sukegawa, Yuichi Takano

Abstract: In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impac… ▽ More In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06213 [pdf, ps, other]

A Statistical Theory of Regularization-Based Continual Learning

Authors: Xuyang Zhao, Huiyuan Wang, Weiran Huang, Wei Lin

Abstract: We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. Next, we consider a family of generalized $\ell_2$-regularization algorithms index… ▽ More We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. Next, we consider a family of generalized $\ell_2$-regularization algorithms indexed by matrix-valued hyperparameters, which includes the minimum norm estimator and continual ridge regression as special cases. As more tasks are introduced, we derive an iterative update formula for the estimation error of generalized $\ell_2$-regularized estimators, from which we determine the hyperparameters resulting in the optimal algorithm. Interestingly, the choice of hyperparameters can effectively balance the trade-off between forward and backward knowledge transfer and adjust for data heterogeneity. Moreover, the estimation error of the optimal algorithm is derived explicitly, which is of the same order as that of the oracle estimator. In contrast, our lower bounds for the minimum norm estimator and continual ridge regression show their suboptimality. A byproduct of our theoretical analysis is the equivalence between early stop** and generalized $\ell_2$-regularization in continual learning, which may be of independent interest. Finally, we conduct experiments to complement our theory. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2406.04072 [pdf, other]

Variational Prior Replacement in Bayesian Inference and Inversion

Authors: Xuebin Zhao, Andrew Curtis

Abstract: Many scientific investigations require that the values of a set of model parameters are estimated using recorded data. In Bayesian inference, information from both observed data and prior knowledge is combined to update model parameters probabilistically. Prior information represents our belief about the range of values that the variables can take, and their relative probabilities when considered… ▽ More Many scientific investigations require that the values of a set of model parameters are estimated using recorded data. In Bayesian inference, information from both observed data and prior knowledge is combined to update model parameters probabilistically. Prior information represents our belief about the range of values that the variables can take, and their relative probabilities when considered independently of recorded data. Situations arise in which we wish to change prior information: (i) the subjective nature of prior information, (ii) cases in which we wish to test different states of prior information as hypothesis tests, and (iii) information from new studies may emerge so prior information may evolve over time. Estimating the solution to any single inference problem is usually computationally costly, as it typically requires thousands of model samples and their forward simulations. Therefore, recalculating the Bayesian solution every time prior information changes can be extremely expensive. We develop a mathematical formulation that allows prior information to be changed in a solution using variational methods, without performing Bayesian inference on each occasion. In this method, existing prior information is removed from a previously obtained posterior distribution and is replaced by new prior information. We therefore call the methodology variational prior replacement (VPR). We demonstrate VPR using a 2D seismic full waveform inversion example, where VPR provides almost identical posterior solutions compared to those obtained by solving independent inference problems using different priors. The former can be completed within minutes even on a laptop whereas the latter requires days of computations using high-performance computing resources. We demonstrate the value of the method by comparing the posterior solutions obtained using three different types of prior information. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2404.19495 [pdf]

Percentage Coefficient (bp) -- Effect Size Analysis (Theory Paper 1)

Authors: Xinshu Zhao, Dianshi Moses Li, Ze Zack Lai, Piper Li** Liu, Song Harris Ao, Fei You

Abstract: Percentage coefficient (bp) has emerged in recent publications as an additional and alternative estimator of effect size for regression analysis. This paper retraces the theory behind the estimator. It's posited that an estimator must first serve the fundamental function of enabling researchers and readers to comprehend an estimand, the target of estimation. It may then serve the instrumental func… ▽ More Percentage coefficient (bp) has emerged in recent publications as an additional and alternative estimator of effect size for regression analysis. This paper retraces the theory behind the estimator. It's posited that an estimator must first serve the fundamental function of enabling researchers and readers to comprehend an estimand, the target of estimation. It may then serve the instrumental function of enabling researchers and readers to compare two or more estimands. Defined as the regression coefficient when dependent variable (DV) and independent variable (IV) are both on conceptual 0-1 percentage scales, percentage coefficients (bp) feature 1) clearly comprehendible interpretation and 2) equitable scales for comparison. The coefficient (bp) serves the two functions effectively and efficiently. It thus serves needs unserved by other indicators, such as raw coefficient (bw) and standardized beta. Another premise of the functionalist theory is that "effect" is not a monolithic concept. Rather, it is a collection of concepts, each of which measures a component of the conglomerate called "effect", thereby serving a subfunction. Regression coefficient (b), for example, indicates the unit change in DV associated with a one-unit increase in IV, thereby measuring one aspect called unit effect, aka efficiency. Percentage coefficient (bp) indicates the percentage change in DV associated with a whole scale increase in IV. It is not meant to be an all-encompassing indicator of an all-encompassing concept, but rather a comprehendible and comparable indicator of efficiency, a key aspect of effect. △ Less

Submitted 6 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2402.05395 [pdf, other]

Efficient Estimation for Functional Accelerated Failure Time Model

Authors: Changyu Liu, Wen Su, Kin-Yat Liu, Guosheng Yin, Xingqiu Zhao

Abstract: We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baselin… ▽ More We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baseline hazard function in the likelihood function. Not only do the bundled parameters cause immense numerical difficulties, but they also result in new challenges in theoretical development. By develo** a general theoretical framework, we overcome the challenges arising from the bundled parameters and derive the convergence rate of the proposed estimator. Furthermore, we prove that the finite-dimensional estimator is $\sqrt{n}$-consistent, asymptotically normal and achieves the semiparametric information bound. The proposed inference procedures are evaluated by extensive simulation studies and illustrated with an application to the sequential organ failure assessment data from the Improving Care of Acute Lung Injury Patients study. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.00388 [pdf, other]

Cumulative Distribution Function based General Temporal Point Processes

Authors: Maolin Wang, Yu Pan, Zenglin Xu, Ruocheng Guo, Xiangyu Zhao, Wanyu Wang, Yiqi Wang, Zitao Liu, Langming Liu

Abstract: Temporal Point Processes (TPPs) hold a pivotal role in modeling event sequences across diverse domains, including social networking and e-commerce, and have significantly contributed to the advancement of recommendation systems and information retrieval strategies. Through the analysis of events such as user interactions and transactions, TPPs offer valuable insights into behavioral patterns, faci… ▽ More Temporal Point Processes (TPPs) hold a pivotal role in modeling event sequences across diverse domains, including social networking and e-commerce, and have significantly contributed to the advancement of recommendation systems and information retrieval strategies. Through the analysis of events such as user interactions and transactions, TPPs offer valuable insights into behavioral patterns, facilitating the prediction of future trends. However, accurately forecasting future events remains a formidable challenge due to the intricate nature of these patterns. The integration of Neural Networks with TPPs has ushered in the development of advanced deep TPP models. While these models excel at processing complex and nonlinear temporal data, they encounter limitations in modeling intensity functions, grapple with computational complexities in integral computations, and struggle to capture long-range temporal dependencies effectively. In this study, we introduce the CuFun model, representing a novel approach to TPPs that revolves around the Cumulative Distribution Function (CDF). CuFun stands out by uniquely employing a monotonic neural network for CDF representation, utilizing past events as a scaling factor. This innovation significantly bolsters the model's adaptability and precision across a wide range of data scenarios. Our approach addresses several critical issues inherent in traditional TPP modeling: it simplifies log-likelihood calculations, extends applicability beyond predefined density function forms, and adeptly captures long-range temporal patterns. Our contributions encompass the introduction of a pioneering CDF-based TPP model, the development of a methodology for incorporating past event information into future event prediction, and empirical validation of CuFun's effectiveness through extensive experimentation on synthetic and real-world datasets. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.16320 [pdf, ps, other]

A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning

Authors: X. L. Zhao, Y. M. Zhao, M. Li, T. T. Li, Q. Liu, S. Guo, X. X. Yi

Abstract: We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent sp… ▽ More We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent spin state in an environment characterized by dissipation and dephasing. Compared to the constant control scenario, this approach provides various control sequences maintaining collective spin squeezing and entanglement. It is observed that denser application of the control pulses enhances the performance of the outcomes. However, there is a minor enhancement in the performance by adding control actions. The proposed strategy demonstrates increased effectiveness for larger systems. Thermal excitations of the reservoir are detrimental to the control outcomes. Feasible experiments are suggested to implement this control proposal based on the comparison with the others. The extensions to continuous control problems and another quantum system are discussed. The replaceability of the reinforcement learning module is also emphasized. This research paves the way for its application in manipulating other quantum systems. △ Less

Submitted 14 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.11940 [pdf, other]

Low-Tubal-Rank Tensor Recovery via Factorized Gradient Descent

Authors: Zhiyu Liu, Zhi Han, Yandong Tang, Xi-Le Zhao, Yao Wang

Abstract: This paper considers the problem of recovering a tensor with an underlying low-tubal-rank structure from a small number of corrupted linear measurements. Traditional approaches tackling such a problem require the computation of tensor Singular Value Decomposition (t-SVD), that is a computationally intensive process, rendering them impractical for dealing with large-scale tensors. Aim to address th… ▽ More This paper considers the problem of recovering a tensor with an underlying low-tubal-rank structure from a small number of corrupted linear measurements. Traditional approaches tackling such a problem require the computation of tensor Singular Value Decomposition (t-SVD), that is a computationally intensive process, rendering them impractical for dealing with large-scale tensors. Aim to address this challenge, we propose an efficient and effective low-tubal-rank tensor recovery method based on a factorization procedure akin to the Burer-Monteiro (BM) method. Precisely, our fundamental approach involves decomposing a large tensor into two smaller factor tensors, followed by solving the problem through factorized gradient descent (FGD). This strategy eliminates the need for t-SVD computation, thereby reducing computational costs and storage requirements. We provide rigorous theoretical analysis to ensure the convergence of FGD under both noise-free and noisy situations. Additionally, it is worth noting that our method does not require the precise estimation of the tensor tubal-rank. Even in cases where the tubal-rank is slightly overestimated, our approach continues to demonstrate robust performance. A series of experiments have been carried out to demonstrate that, as compared to other popular ones, our approach exhibits superior performance in multiple scenarios, in terms of the faster computational speed and the smaller convergence error. △ Less

Submitted 2 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 13 pages, 4 figures

arXiv:2401.00104 [pdf, other]

Causal State Distillation for Explainable Reinforcement Learning

Authors: Wenhao Lu, Xufeng Zhao, Thilo Fryen, Jae Hee Lee, Mengdi Li, Sven Magg, Stefan Wermter

Abstract: Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promi… ▽ More Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections. △ Less

Submitted 1 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: https://lukaswill.github.io/; Accepted as oral by CLeaR 2024

arXiv:2312.13389 [pdf, other]

Enhancing Trade-offs in Privacy, Utility, and Computational Efficiency through MUltistage Sampling Technique (MUST)

Authors: Xingyuan Zhao, Fang Liu

Abstract: Applying a randomized algorithm to a subset of a dataset rather than the entire dataset is a common approach to amplify its privacy guarantees in the released information. We propose a class of subsampling methods named MUltistage Sampling Technique (MUST) for privacy amplification (PA) in the context of differential privacy (DP). We conduct comprehensive analyses of the PA effects and utility for… ▽ More Applying a randomized algorithm to a subset of a dataset rather than the entire dataset is a common approach to amplify its privacy guarantees in the released information. We propose a class of subsampling methods named MUltistage Sampling Technique (MUST) for privacy amplification (PA) in the context of differential privacy (DP). We conduct comprehensive analyses of the PA effects and utility for several 2-stage MUST procedures, namely, MUST.WO, MUST.OW, and MUST.WW that respectively represent sampling with (W), without (O), with (W) replacement from the original dataset in stage I and then sampling without (O), with (W), with (W) replacement in stage II from the subset drawn in stage I. We also provide the privacy composition analysis over repeated applications of MUST via the Fourier accountant algorithm. Our theoretical and empirical results suggest that MUST.OW and MUST.WW have stronger PA in $ε$ than the common one-stage sampling procedures including Poisson sampling, sampling without replacement, and sampling with replacement, while the results on $δ$ vary case by case. We also prove that MUST.WO is equivalent to sampling with replacement in PA. Furthermore, the final subset generated by a MUST procedure is a multiset that may contain multiple copies of the same data points due to sampling with replacement involved, which enhances the computational efficiency of algorithms that require complex function calculations on distinct data points (e.g., gradient descent). Our utility experiments show that MUST delivers similar or improved utility and stability in the privacy-preserving outputs compared to one-stage subsampling methods at similar privacy loss. MUST can be seamlessly integrated into stochastic optimization algorithms or procedures that involve parallel or simultaneous subsampling (e.g., bagging and subsampling bootstrap) when DP guarantees are necessary. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2310.19519 [pdf, other]

A General Neural Causal Model for Interactive Recommendation

Authors: Jialin Liu, Xinyan Su, Peng Zhou, Xiangyu Zhao, Jun Li

Abstract: Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and ines… ▽ More Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.04153 [pdf, other]

Fair coins tend to land on the same side they started: Evidence from 350,757 flips

Authors: František Bartoš, Alexandra Sarafoglou, Henrik R. Godmann, Amir Sahrani, David Klein Leunk, Pierre Y. Gui, David Voss, Kaleem Ullah, Malte J. Zoubek, Franziska Nippold, Frederik Aust, Felipe F. Vieira, Chris-Gabriel Islam, Anton J. Zoubek, Sara Shabani, Jonas Petter, Ingeborg B. Roos, Adam Finnemann, Aaron B. Lob, Madlen F. Hoffstadt, Jason Nak, Jill de Ron, Koen Derks, Karoline Huth, Sjoerd Terpstra , et al. (25 additional authors not shown)

Abstract: Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on… ▽ More Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on the same side it started -- DHM estimated the probability of a same-side outcome to be about 51%. Our data lend strong support to this precise prediction: the coins landed on the same side more often than not, $\text{Pr}(\text{same side}) = 0.508$, 95% credible interval (CI) [$0.506$, $0.509$], $\text{BF}_{\text{same-side bias}} = 2359$. Furthermore, the data revealed considerable between-people variation in the degree of this same-side bias. Our data also confirmed the generic prediction that when people flip an ordinary coin -- with the initial side-up randomly determined -- it is equally likely to land heads or tails: $\text{Pr}(\text{heads}) = 0.500$, 95% CI [$0.498$, $0.502$], $\text{BF}_{\text{heads-tails bias}} = 0.182$. Furthermore, this lack of heads-tails bias does not appear to vary across coins. Additional exploratory analyses revealed that the within-people same-side bias decreased as more coins were flipped, an effect that is consistent with the possibility that practice makes people flip coins in a less wobbly fashion. Our data therefore provide strong evidence that when some (but not all) people flip a fair coin, it tends to land on the same side it started. Our data provide compelling statistical support for the DHM physics model of coin tossing. △ Less

Submitted 2 June, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.08910 [pdf, other]

Total-effect Test May Erroneously Reject So-called "Full" or "Complete" Mediation

Authors: Tingxuan Han, Luxi Zhang, Xinshu Zhao, Ke Deng

Abstract: The procedure for establishing mediation, i.e., determining that an independent variable X affects a dependent variable Y through some mediator M, has been under debate. The classic causal steps require that a "total effect" be significant, now also known as statistically acknowledged. It has been shown that the total-effect test can erroneously reject competitive mediation and is superfluous for… ▽ More The procedure for establishing mediation, i.e., determining that an independent variable X affects a dependent variable Y through some mediator M, has been under debate. The classic causal steps require that a "total effect" be significant, now also known as statistically acknowledged. It has been shown that the total-effect test can erroneously reject competitive mediation and is superfluous for establishing complementary mediation. Little is known about the last type, indirect-only mediation, aka "full" or "complete" mediation, in which the indirect (ab) path passes the statistical partition test while the direct-and-remainder (d) path fails. This study 1) provides proof that the total-effect test can erroneously reject indirect-only mediation, including both sub-types, assuming least square estimation (LSE) F-test or Sobel test; 2) provides a simulation to duplicate the mathematical proofs and extend the conclusion to LAD-Z test; 3) provides two real-data examples, one for each sub-type, to illustrate the mathematical conclusion; 4) in view of the mathematical findings, proposes to revisit concepts, theories, and techniques of mediation analysis and other causal dissection analyses, and showcase a more comprehensive alternative, process-and-product analysis (PAPA). △ Less

Submitted 25 September, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

arXiv:2308.15549 [pdf, ps, other]

Kernel meets sieve: transformed hazards models with sparse longitudinal covariates

Authors: Dayu Sun, Zhuowei Sun, Xingqiu Zhao, Hongyuan Cao

Abstract: We study the transformed hazards model with time-dependent covariates observed intermittently for the censored outcome. Existing work assumes the availability of the whole trajectory of the time-dependent covariates, which is unrealistic. We propose to combine kernel-weighted log-likelihood and sieve maximum log-likelihood estimation to conduct statistical inference. The method is robust and easy… ▽ More We study the transformed hazards model with time-dependent covariates observed intermittently for the censored outcome. Existing work assumes the availability of the whole trajectory of the time-dependent covariates, which is unrealistic. We propose to combine kernel-weighted log-likelihood and sieve maximum log-likelihood estimation to conduct statistical inference. The method is robust and easy to implement. We establish the asymptotic properties of the proposed estimator and contribute to a rigorous theoretical framework for general kernel-weighted sieve M-estimators. Numerical studies corroborate our theoretical results and show that the proposed method performs favorably over existing methods. Applying to a COVID-19 study in Wuhan illustrates the practical utility of our method. △ Less

Submitted 17 September, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

MSC Class: 62N02 (primary); 62F12; 62E20 (secondary)

arXiv:2308.11978 [pdf, other]

Will More Expressive Graph Neural Networks do Better on Generative Tasks?

Authors: Xiandong Zou, Xiangyu Zhao, Pietro Liò, Yiren Zhao

Abstract: Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suff… ▽ More Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks -- autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM -- on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design. △ Less

Submitted 20 February, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: 2nd Learning on Graphs Conference (LoG 2023). 26 pages, 5 figures, 11 tables

arXiv:2305.14612 [pdf]

Assessment of Anterior Cruciate Ligament Injury Risk Based on Human Key Points Detection Algorithm

Authors: Ziyu Gong, Xiong Zhao, Chen Yang

Abstract: This paper aims to detect the potential injury risk of the anterior cruciate ligament (ACL) by proposing an ACL potential injury risk assessment algorithm based on key points of the human body detected using computer vision technology. To obtain the key points data of the human body in each frame, OpenPose, an open source computer vision algorithm, was employed. The obtained data underwent preproc… ▽ More This paper aims to detect the potential injury risk of the anterior cruciate ligament (ACL) by proposing an ACL potential injury risk assessment algorithm based on key points of the human body detected using computer vision technology. To obtain the key points data of the human body in each frame, OpenPose, an open source computer vision algorithm, was employed. The obtained data underwent preprocessing and were then fed into an ACL potential injury feature extraction model based on the Landing Error Evaluation System (LESS). This model extracted several important parameters, including the knee flexion angle, the trunk flexion on the sagittal plane, trunk flexion angle on the frontal plane, the ankle knee horizontal distance, and the ankle shoulder horizontal distance. Each of these features was assigned a threshold interval, and a segmented evaluation function was utilized to score them accordingly. To calculate the final score of the participant, the score values were input into a weighted scoring model designed based on the Analytic Hierarchy Process (AHP). The AHP based model takes into account the relative importance of each feature in the overall assessment. The results demonstrate that the proposed algorithm effectively detects the potential risk of ACL injury. The proposed algorithm demonstrates its effectiveness in detecting ACL injury risk, offering valuable insights for injury prevention and intervention strategies in sports and related fields. Code is available at: https://github.com/ZiyuGong-proj/Assessment-of-ACL-Injury-Risk-Based-on-Openpose △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 17 pages,and 6 figures

arXiv:2304.13646 [pdf, other]

Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information

Authors: Yiyang Zhang, Junyi Liu, Xiaobo Zhao

Abstract: Focusing on stochastic programming (SP) with covariate information, this paper proposes an empirical risk minimization (ERM) method embedded within a nonconvex piecewise affine decision rule (PADR), which aims to learn the direct map** from features to optimal decisions. We establish the nonasymptotic consistency result of our PADR-based ERM model for unconstrained problems and asymptotic consis… ▽ More Focusing on stochastic programming (SP) with covariate information, this paper proposes an empirical risk minimization (ERM) method embedded within a nonconvex piecewise affine decision rule (PADR), which aims to learn the direct map** from features to optimal decisions. We establish the nonasymptotic consistency result of our PADR-based ERM model for unconstrained problems and asymptotic consistency result for constrained ones. To solve the nonconvex and nondifferentiable ERM problem, we develop an enhanced stochastic majorization-minimization algorithm and establish the asymptotic convergence to (composite strong) directional stationarity along with complexity analysis. We show that the proposed PADR-based ERM method applies to a broad class of nonconvex SP problems with theoretical consistency guarantees and computational tractability. Our numerical study demonstrates the superior performance of PADR-based ERM methods compared to state-of-the-art approaches under various settings, with significantly lower costs, less computation time, and robustness to feature dimensions and nonlinearity of the underlying dependency. △ Less

Submitted 20 December, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

arXiv:2302.14243 [pdf, other]

doi 10.1002/jrsm.1686

metamedian: An R package for meta-analyzing studies reporting medians

Authors: Sean McGrath, XiaoFei Zhao, Omer Ozturk, Stephan Katzenschlager, Russell Steele, Andrea Benedetti

Abstract: When performing an aggregate data meta-analysis of a continuous outcome, researchers often come across primary studies that report the sample median of the outcome. However, standard meta-analytic methods typically cannot be directly applied in this setting. In recent years, there has been substantial development in statistical methods to incorporate primary studies reporting sample medians in met… ▽ More When performing an aggregate data meta-analysis of a continuous outcome, researchers often come across primary studies that report the sample median of the outcome. However, standard meta-analytic methods typically cannot be directly applied in this setting. In recent years, there has been substantial development in statistical methods to incorporate primary studies reporting sample medians in meta-analysis, yet there are currently no comprehensive software tools implementing these methods. In this paper, we present the metamedian R package, a freely available and open-source software tool for meta-analyzing primary studies that report sample medians. We summarize the main features of the software and illustrate its application through real data examples involving risk factors for a severe course of COVID-19. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Journal ref: Res. Synth. Methods 15 (2024) 332-346

arXiv:2302.03250 [pdf, other]

Network-based Statistics Distinguish Anomic and Broca Aphasia

Authors: Xingpei Zhao, Nicholas Riccardi, Rutvik H. Desai, Dirk-Bart den Ouden, Julius Fridriksson, Yuan Wang

Abstract: Aphasia is a speech-language impairment commonly caused by damage to the left hemisphere. Due to the complexity of speech-language processing, the neural mechanisms that underpin various symptoms between different types of aphasia are still not fully understood. We used the network-based statistic method to identify distinct subnetwork(s) of connections differentiating the resting-state functional… ▽ More Aphasia is a speech-language impairment commonly caused by damage to the left hemisphere. Due to the complexity of speech-language processing, the neural mechanisms that underpin various symptoms between different types of aphasia are still not fully understood. We used the network-based statistic method to identify distinct subnetwork(s) of connections differentiating the resting-state functional networks of the anomic and Broca groups. We identified one such subnetwork that mainly involved the brain regions in the premotor, primary motor, primary auditory, and primary sensory cortices in both hemispheres. The majority of connections in the subnetwork were weaker in the Broca group than the anomic group. The network properties of the subnetwork were examined through complex network measures, which indicated that the regions in the superior temporal gyrus and auditory cortex bilaterally exhibit intensive interaction, and primary motor, premotor and primary sensory cortices in the left hemisphere play an important role in information flow and overall communication efficiency. These findings underlied articulatory difficulties and reduced repetition performance in Broca aphasia, which are rarely observed in anomic aphasia. This research provides novel findings into the resting-state brain network differences between groups of individuals with anomic and Broca aphasia. We identified a subnetwork of, rather than isolated, connections that statistically differentiate the resting-state brain networks of the two groups, in comparison with standard lesion symptom map** results that yield isolated connections. △ Less

Submitted 17 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2211.04586 [pdf, other]

Learning to Price Supply Chain Contracts against a Learning Retailer

Authors: Xuejun Zhao, Ruihao Zhu, William B. Haskell

Abstract: The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier who needs to respond to the inventory decisions of the downstream retailer. Both the supplier and the retailer are uncertain about the market demand and need to learn about it sequentially.… ▽ More The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier who needs to respond to the inventory decisions of the downstream retailer. Both the supplier and the retailer are uncertain about the market demand and need to learn about it sequentially. The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory policies for a fixed time horizon. To capture the dynamics induced by the retailer's learning policy, we first make a connection to non-stationary online learning by following the notion of variation budget. The variation budget quantifies the impact of the retailer's learning strategy on the supplier's decision-making. We then propose dynamic pricing policies for the supplier for both discrete and continuous demand. We also note that our proposed pricing policy only requires access to the support of the demand distribution, but critically, does not require the supplier to have any prior knowledge about the retailer's learning policy or the demand realizations. We examine several well-known data-driven policies for the retailer, including sample average approximation, distributionally robust optimization, and parametric approaches, and show that our pricing policies lead to sublinear regret bounds in all these cases. At the managerial level, we answer affirmatively that there is a pricing policy with a sublinear regret bound under a wide range of retailer's learning policies, even though she faces a learning retailer and an unknown demand distribution. Our work also provides a novel perspective in data-driven operations management where the principal has to learn to react to the learning policies employed by other agents in the system. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2209.08737 [pdf, ps, other]

Heterogeneous Federated Learning on a Graph

Authors: Huiyuan Wang, Xuyang Zhao, Wei Lin

Abstract: Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure $G$ exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as lim… ▽ More Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure $G$ exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We encode the distribution heterogeneity by parametrizing distributions on local devices with a set of distinct $p$-dimensional vectors. We then propose to jointly estimate parameters of all devices under the $M$-estimation framework with the fused Lasso regularization, encouraging an equal estimate of parameters on connected devices in $G$. We provide a general result for our estimator depending on $G$, which can be further calibrated to obtain convergence rates for various specific problem setups. Surprisingly, our estimator attains the optimal rate under certain graph fidelity condition on $G$, as if we could aggregate all samples sharing the same distribution. If the graph fidelity condition is not met, we propose an edge selection procedure via multiple testing to ensure the optimality. To ease the burden of local computation, a decentralized stochastic version of ADMM is provided, with convergence rate $O(T^{-1}\log T)$ where $T$ denotes the number of iterations. We highlight that, our algorithm transmits only parameters along edges of $G$ at each iteration, without requiring a central machine, which preserves privacy. We further extend it to the case where devices are randomly inaccessible during the training process, with a similar algorithmic convergence guarantee. The computational and statistical efficiency of our method is evidenced by simulation experiments and the 2020 US presidential election data set. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: 61 pages, 4 figures

arXiv:2208.09107 [pdf]

Spatial Equity of Micromobility Systems: A Comparison of Shared E-scooters and Station-based Bikeshare in Washington DC

Authors: Lin Su, Xiang Yan, Xilei Zhao

Abstract: Many cities around the world have introduced dockless micromobility services in recent years and witnessed their rapid growth. Shared dockless e-scooters have the potential to benefit neighborhoods that lack access to station-based bikeshare services, but they may also exacerbate the existing spatial disparities. While some studies have examined the equity of station-based bikeshare systems, limit… ▽ More Many cities around the world have introduced dockless micromobility services in recent years and witnessed their rapid growth. Shared dockless e-scooters have the potential to benefit neighborhoods that lack access to station-based bikeshare services, but they may also exacerbate the existing spatial disparities. While some studies have examined the equity of station-based bikeshare systems, limited knowledge is available regarding dockless e-scooter services. This study uses Washington DC as a case study, a city with both dockless e-scooter and station-based bikeshare systems, to conduct equity analysis of the two types of micromobility options. We develop an analytical framework to examine how dockless e-scooter and station-based bikeshare differ on a set of equity-related outcomes (i.e., availability, accessibility, usage, and idle time) across neighborhoods of different socioeconomic categories. Results reveal that dockless e-scooter services increase accessibility to shared micromobility options for disadvantaged neighborhoods but also widen the access gap across neighborhoods. Compared to bikeshare, shared e-scooters have a higher level of spatial accessibility overall due to greater supply; however, the greater supply largely leads to longer average idle time of shared e-scooters rather than a greater number of trips. Finally, it appears that the bikeshare system's equity program effectively promotes low-income use but e-scooters' equity programs do not. Our findings suggest that increasing vehicle supply alone would probably not lead to higher micromobility use in disadvantaged neighborhoods. Instead, policymakers should combine a variety of strategies such as promoting the enrollment of equity programs and reducing access barriers (e.g., smartphone and banking requirements) to micromobility services. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: 18 pages, 4 figures

arXiv:2208.08855 [pdf, other]

Adaptive Partially-Observed Sequential Change Detection and Isolation

Authors: Xinyu Zhao, Jiuyun Hu, Yajun Mei, Hao Yan

Abstract: High-dimensional data has become popular due to the easy accessibility of sensors in modern industrial applications. However, one specific challenge is that it is often not easy to obtain complete measurements due to limited sensing powers and resource constraints. Furthermore, distinct failure patterns may exist in the systems, and it is necessary to identify the true failure pattern. This work f… ▽ More High-dimensional data has become popular due to the easy accessibility of sensors in modern industrial applications. However, one specific challenge is that it is often not easy to obtain complete measurements due to limited sensing powers and resource constraints. Furthermore, distinct failure patterns may exist in the systems, and it is necessary to identify the true failure pattern. This work focuses on the online adaptive monitoring of high-dimensional data in resource-constrained environments with multiple potential failure modes. To achieve this, we propose to apply the Shiryaev-Roberts procedure on the failure mode level and utilize the multi-arm bandit to balance the exploration and exploitation. We further discuss the theoretical property of the proposed algorithm to show that the proposed method can correctly isolate the failure mode. Finally, extensive simulations and two case studies demonstrate that the change point detection performance and the failure mode isolation accuracy can be greatly improved. △ Less

Submitted 25 August, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted in Technometrics

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2203.10651 [pdf, other]

Nonstationary Temporal Matrix Factorization for Multivariate Time Series Forecasting

Authors: Xinyu Chen, Chengyuan Zhang, Xi-Le Zhao, Nicolas Saunier, Lijun Sun

Abstract: Modern time series datasets are often high-dimensional, incomplete/sparse, and nonstationary. These properties hinder the development of scalable and efficient solutions for time series forecasting and analysis. To address these challenges, we propose a Nonstationary Temporal Matrix Factorization (NoTMF) model, in which matrix factorization is used to reconstruct the whole time series matrix and v… ▽ More Modern time series datasets are often high-dimensional, incomplete/sparse, and nonstationary. These properties hinder the development of scalable and efficient solutions for time series forecasting and analysis. To address these challenges, we propose a Nonstationary Temporal Matrix Factorization (NoTMF) model, in which matrix factorization is used to reconstruct the whole time series matrix and vector autoregressive (VAR) process is imposed on a properly differenced copy of the temporal factor matrix. This approach not only preserves the low-rank property of the data but also offers consistent temporal dynamics. The learning process of NoTMF involves the optimization of two factor matrices and a collection of VAR coefficient matrices. To efficiently solve the optimization problem, we derive an alternating minimization framework, in which subproblems are solved using conjugate gradient and least squares methods. In particular, the use of conjugate gradient method offers an efficient routine and allows us to apply NoTMF on large-scale problems. Through extensive experiments on Uber movement speed dataset, we demonstrate the superior accuracy and effectiveness of NoTMF over other baseline models. Our results also confirm the importance of addressing the nonstationarity of real-world time series data such as spatiotemporal traffic flow/speed. △ Less

Submitted 15 June, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: Data and Python codes: https://github.com/xinychen/tracebase

arXiv:2203.04483 [pdf, other]

Error-based Knockoffs Inference for Controlled Feature Selection

Authors: Xuebin Zhao, Hong Chen, Yingjie Wang, Weifu Li, Tieliang Gong, Yulong Wang, Feng Zheng

Abstract: Recently, the scheme of model-X knockoffs was proposed as a promising solution to address controlled feature selection under high-dimensional finite-sample settings. However, the procedure of model-X knockoffs depends heavily on the coefficient-based feature importance and only concerns the control of false discovery rate (FDR). To further improve its adaptivity and flexibility, in this paper, we… ▽ More Recently, the scheme of model-X knockoffs was proposed as a promising solution to address controlled feature selection under high-dimensional finite-sample settings. However, the procedure of model-X knockoffs depends heavily on the coefficient-based feature importance and only concerns the control of false discovery rate (FDR). To further improve its adaptivity and flexibility, in this paper, we propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together. The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees on controlling false discovery proportion (FDP), FDR, or k-familywise error rate (k-FWER). Empirical evaluations demonstrate the competitive performance of our approach on both simulated and real data. △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2112.14870 [pdf, other]

Registration-free localization of defects in 3-D parts from mesh metrology data using functional maps

Authors: Xueqi Zhao, Enrique del Castillo

Abstract: Spectral Laplacian methods, widely used in computer graphics and manifold learning, have been recently proposed for the Statistical Process Control (SPC) of a sequence of manufactured parts, whose 3-dimensional metrology is acquired with non-contact sensors. These techniques provide an {\em intrinsic} solution to the SPC problem, that is, a solution exclusively based on measurements on the scanned… ▽ More Spectral Laplacian methods, widely used in computer graphics and manifold learning, have been recently proposed for the Statistical Process Control (SPC) of a sequence of manufactured parts, whose 3-dimensional metrology is acquired with non-contact sensors. These techniques provide an {\em intrinsic} solution to the SPC problem, that is, a solution exclusively based on measurements on the scanned surfaces or 2-manifolds without making reference to their ambient space. These methods, therefore, avoid the computationally expensive, non-convex registration step needed to align the parts, as required by previous methods for SPC based on 3-dimensional measurements. Once a SPC mechanism triggers and out-of-control alarm, however, an additional problem remains: that of locating where on the surface of the part that triggered the SPC alarm there is a significant shape difference with respect to either an in-control part or its nominal (CAD) design. In the past, only registration-based solutions existed for this problem. In this paper, we present a new registration-free solution to the part localization problem. Our approach uses a functional map between the manifolds to be compared, that is, a map between functions defined on each manifold based on intrinsic differential operators, in particular, the Laplace-Beltrami operator, in order to construct a point to point map** between the two manifolds and be able to locate defects on the suspected part. A recursive partitioning algorithm is presented to define a region of interest on the surface of the part where defects are likely to occur, which results in considerable computational advantages. The functional map method involves a very large number of point-to-point comparisons based on noisy measurements, and a statistical thresholding method is presented to filter the false positives in the underlying massive multiple comparisons problem. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: 32 pages, 12 figures

MSC Class: 62P30

arXiv:2111.03179 [pdf, other]

Community detection in censored hypergraph

Authors: Mingao Yuan, Bin Zhao, Xiaofeng Zhao

Abstract: Community detection refers to the problem of clustering the nodes of a network (either graph or hypergrah) into groups. Various algorithms are available for community detection and all these methods apply to uncensored networks. In practice, a network may has censored (or missing) values and it is shown that censored values have non-negligible effect on the structural properties of a network. In t… ▽ More Community detection refers to the problem of clustering the nodes of a network (either graph or hypergrah) into groups. Various algorithms are available for community detection and all these methods apply to uncensored networks. In practice, a network may has censored (or missing) values and it is shown that censored values have non-negligible effect on the structural properties of a network. In this paper, we study community detection in censored $m$-uniform hypergraph from information-theoretic point of view. We derive the information-theoretic threshold for exact recovery of the community structure. Besides, we propose a polynomial-time algorithm to exactly recover the community structure up to the threshold. The proposed algorithm consists of a spectral algorithm plus a refinement step. It is also interesting to study whether a single spectral algorithm without refinement achieves the threshold. To this end, we also explore the semi-definite relaxation algorithm and analyze its performance. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2111.00743 [pdf, other]

Towards the Generalization of Contrastive Self-Supervised Learning

Authors: Weiran Huang, Mingyang Yi, Xuyang Zhao, Zihao Jiang

Abstract: Recently, self-supervised learning has attracted great attention, since it only requires unlabeled data for model training. Contrastive learning is one popular method for self-supervised learning and has achieved promising empirical performance. However, the theoretical understanding of its generalization ability is still limited. To this end, we define a kind of $(σ,δ)$-measure to mathematically… ▽ More Recently, self-supervised learning has attracted great attention, since it only requires unlabeled data for model training. Contrastive learning is one popular method for self-supervised learning and has achieved promising empirical performance. However, the theoretical understanding of its generalization ability is still limited. To this end, we define a kind of $(σ,δ)$-measure to mathematically quantify the data augmentation, and then provide an upper bound of the downstream classification error rate based on the measure. It reveals that the generalization ability of contrastive self-supervised learning is related to three key factors: alignment of positive samples, divergence of class centers, and concentration of augmented data. The first two factors are properties of learned representations, while the third one is determined by pre-defined data augmentation. We further investigate two canonical contrastive losses, InfoNCE and cross-correlation, to show how they provably achieve the first two factors. Moreover, we conduct experiments to study the third factor, and observe a strong correlation between downstream performance and the concentration of augmented data. △ Less

Submitted 2 March, 2023; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted by ICLR 2023

arXiv:2107.04412 [pdf]

Identifying latent shared mobility preference segments in low-income communities: ride-hailing, fixed-route bus, and mobility-on-demand transit

Authors: Xinyi Wang, Xiang Yan, Xilei Zhao, Zhuoxuan Cao

Abstract: Concepts of Mobility-on-Demand (MOD) and Mobility as a Service (MaaS), which feature the integration of various shared-use mobility options, have gained widespread popularity in recent years. While these concepts promise great benefits to travelers, their heavy reliance on technology raises equity concerns as socially disadvantaged population groups can be left out in an era of on-demand mobility.… ▽ More Concepts of Mobility-on-Demand (MOD) and Mobility as a Service (MaaS), which feature the integration of various shared-use mobility options, have gained widespread popularity in recent years. While these concepts promise great benefits to travelers, their heavy reliance on technology raises equity concerns as socially disadvantaged population groups can be left out in an era of on-demand mobility. This paper investigates the potential uptake of MOD transit services (integrated fixed-route and on-demand services) among travelers living in low-income communities. Specially, we analyze people's latent attitude towards three shared-use mobility services, including ride-hailing services, fixed-route transit, and MOD transit. We conduct a latent class cluster analysis of 825 survey respondents sampled from low-income neighborhoods in Detroit and Ypsilanti, Michigan. We identified three latent segments: shared-mode enthusiast, shared-mode opponent, and fixed-route transit loyalist. People from the shared-mode enthusiast segment often use ride-hailing services and live in areas with poor transit access, and they are likely to be the early adopters of MOD transit services. The shared-mode opponent segment mainly includes vehicle owners who lack interests in shared mobility options. The fixed-route transit loyalist segment includes a considerable share of low-income individuals who face technological barriers to use the MOD transit. We also find that males, college graduates, car owners, people with a mobile data plan, and people living in poor-transit-access areas have a higher level of preferences for MOD transit services. We conclude with policy recommendations for develo** more accessible and equitable MOD transit services. △ Less

Submitted 4 May, 2021; originally announced July 2021.

arXiv:2104.08928 [pdf, other]

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

Authors: Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

Abstract: Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word em… ▽ More Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word ``positive'' typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient tested positive for a disease. In practice, we expect that only a small number of domain-specific words may have new meanings. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our transfer learning estimator, proving that it can achieve high accuracy with substantially less domain-specific data when only a small number of embeddings are altered between domains. Furthermore, we prove that all local minima identified by our nonconvex objective function are statistically indistinguishable from the global minimum under standard regularization conditions, implying that our estimator can be computed efficiently. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate our approach compared to state-of-the-art fine-tuning heuristics from natural language processing. △ Less

Submitted 17 February, 2024; v1 submitted 18 April, 2021; originally announced April 2021.

arXiv:2104.05600 [pdf, other]

PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging

Authors: Anthony Sicilia, Xingchen Zhao, Anastasia Sosnovskikh, Seong Jae Hwang

Abstract: Application of deep neural networks to medical imaging tasks has in some sense become commonplace. Still, a "thorn in the side" of the deep learning movement is the argument that deep networks are prone to overfitting and are thus unable to generalize well when datasets are small (as is common in medical imaging tasks). One way to bolster confidence is to provide mathematical guarantees, or bounds… ▽ More Application of deep neural networks to medical imaging tasks has in some sense become commonplace. Still, a "thorn in the side" of the deep learning movement is the argument that deep networks are prone to overfitting and are thus unable to generalize well when datasets are small (as is common in medical imaging tasks). One way to bolster confidence is to provide mathematical guarantees, or bounds, on network performance after training which explicitly quantify the possibility of overfitting. In this work, we explore recent advances using the PAC-Bayesian framework to provide bounds on generalization error for large (stochastic) networks. While previous efforts focus on classification in larger natural image datasets (e.g., MNIST and CIFAR-10), we apply these techniques to both classification and segmentation in a smaller medical imagining dataset: the ISIC 2018 challenge set. We observe the resultant bounds are competitive compared to a simpler baseline, while also being more explainable and alleviating the need for holdout sets. △ Less

Submitted 8 July, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: MICCAI 2021

arXiv:2101.09438 [pdf, other]

An Optimal Reduction of TV-Denoising to Adaptive Online Learning

Authors: Dheeraj Baby, Xuandong Zhao, Yu-Xiang Wang

Abstract: We consider the problem of estimating a function from $n$ noisy samples whose discrete Total Variation (TV) is bounded by $C_n$. We reveal a deep connection to the seemingly disparate problem of Strongly Adaptive online learning (Daniely et al, 2015) and provide an $O(n \log n)$ time algorithm that attains the near minimax optimal rate of $\tilde O (n^{1/3}C_n^{2/3})$ under squared error loss. The… ▽ More We consider the problem of estimating a function from $n$ noisy samples whose discrete Total Variation (TV) is bounded by $C_n$. We reveal a deep connection to the seemingly disparate problem of Strongly Adaptive online learning (Daniely et al, 2015) and provide an $O(n \log n)$ time algorithm that attains the near minimax optimal rate of $\tilde O (n^{1/3}C_n^{2/3})$ under squared error loss. The resulting algorithm runs online and optimally adapts to the unknown smoothness parameter $C_n$. This leads to a new and more versatile alternative to wavelets-based methods for (1) adaptively estimating TV bounded functions; (2) online forecasting of TV bounded trends in time series. △ Less

Submitted 26 January, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: To appear at AISTATS 2021

arXiv:2101.02379 [pdf, other]

A Registration-free approach for Statistical Process Control of 3D scanned objects via FEM

Authors: Xueqi Zhao, Enrique del Castillo

Abstract: Recent work in on-line Statistical Process Control (SPC) of manufactured 3-dimensional (3-D) objects has been proposed based on the estimation of the spectrum of the Laplace-Beltrami (LB) operator, a differential operator that encodes the geometrical features of a manifold and is widely used in Machine Learning (i.e., Manifold Learning). The resulting spectra are an intrinsic geometrical feature o… ▽ More Recent work in on-line Statistical Process Control (SPC) of manufactured 3-dimensional (3-D) objects has been proposed based on the estimation of the spectrum of the Laplace-Beltrami (LB) operator, a differential operator that encodes the geometrical features of a manifold and is widely used in Machine Learning (i.e., Manifold Learning). The resulting spectra are an intrinsic geometrical feature of each part, and thus can be compared between parts avoiding the part to part registration (or "part localization") pre-processing or the need for equal size meshes, characteristics which are required in previous approaches for SPC of 3D parts. The recent spectral SPC methods, however, are limited to monitoring surface data from objects such that the scanned meshes have no boundaries, holes, or missing portions. In this paper, we extend spectral methods by first considering a more accurate and general estimator of the LB spectrum that is obtained by application of Finite Element Methods (FEM) to the solution of Helmholtz's equation with boundaries. It is shown how the new spectral FEM approach, while it retains the advantages of not requiring part localization/registration or equal size datasets scanned from each part, it provides more accurate spectrum estimates, which results in faster detection of out of control conditions than earlier methods, can be applied to both mesh or volumetric (solid) scans, and furthermore, it is shown how it can be applied to partial scans that result in open meshes (surface or volumetric) with boundaries, increasing the practical applicability of the methods. The present work brings SPC methods closer to contemporary research in Computer Graphics and Manifold Learning. MATLAB code that reproduces the examples of this paper is provided in the supplementary materials. △ Less

Submitted 7 January, 2021; originally announced January 2021.

arXiv:2012.12772 [pdf, other]

Matrix optimization based Euclidean embedding with outliers

Authors: Qian Zhang, Xinyuan Zhao, Chao Ding

Abstract: Euclidean embedding from noisy observations containing outlier errors is an important and challenging problem in statistics and machine learning. Many existing methods would struggle with outliers due to a lack of detection ability. In this paper, we propose a matrix optimization based embedding model that can produce reliable embeddings and identify the outliers jointly. We show that the estimato… ▽ More Euclidean embedding from noisy observations containing outlier errors is an important and challenging problem in statistics and machine learning. Many existing methods would struggle with outliers due to a lack of detection ability. In this paper, we propose a matrix optimization based embedding model that can produce reliable embeddings and identify the outliers jointly. We show that the estimators obtained by the proposed method satisfy a non-asymptotic risk bound, implying that the model provides a high accuracy estimator with high probability when the order of the sample size is roughly the degree of freedom up to a logarithmic factor. Moreover, we show that under some mild conditions, the proposed model also can identify the outliers without any prior information with high probability. Finally, numerical experiments demonstrate that the matrix optimization-based model can produce configurations of high quality and successfully identify outliers even for large networks. △ Less

Submitted 23 December, 2020; originally announced December 2020.

Comments: 29 pages

MSC Class: 49M45; 90C25; 90C33

arXiv:2010.00985 [pdf, other]

Kalman Filtering Attention for User Behavior Modeling in CTR Prediction

Authors: Hu Liu, **g Lu, Xiwei Zhao, Sulong Xu, Hao Peng, Yutong Liu, Zehua Zhang, Jian Li, Junsheng **, Yongjun Bao, Weipeng Yan

Abstract: Click-through rate (CTR) prediction is one of the fundamental tasks for e-commerce search engines. As search becomes more personalized, it is necessary to capture the user interest from rich behavior data. Existing user behavior modeling algorithms develop different attention mechanisms to emphasize query-relevant behaviors and suppress irrelevant ones. Despite being extensively studied, these att… ▽ More Click-through rate (CTR) prediction is one of the fundamental tasks for e-commerce search engines. As search becomes more personalized, it is necessary to capture the user interest from rich behavior data. Existing user behavior modeling algorithms develop different attention mechanisms to emphasize query-relevant behaviors and suppress irrelevant ones. Despite being extensively studied, these attentions still suffer from two limitations. First, conventional attentions mostly limit the attention field only to a single user's behaviors, which is not suitable in e-commerce where users often hunt for new demands that are irrelevant to any historical behaviors. Second, these attentions are usually biased towards frequent behaviors, which is unreasonable since high frequency does not necessarily indicate great importance. To tackle the two limitations, we propose a novel attention mechanism, termed Kalman Filtering Attention (KFAtt), that considers the weighted pooling in attention as a maximum a posteriori (MAP) estimation. By incorporating a priori, KFAtt resorts to global statistics when few user behaviors are relevant. Moreover, a frequency cap** mechanism is incorporated to correct the bias towards frequent behaviors. Offline experiments on both benchmark and a 10 billion scale real production dataset, together with an Online A/B test, show that KFAtt outperforms all compared state-of-the-arts. KFAtt has been deployed in the ranking system of a leading e commerce website, serving the main traffic of hundreds of millions of active users everyday. △ Less

Submitted 20 October, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:2009.09230 [pdf, other]

Simplifying Reinforced Feature Selection via Restructured Choice Strategy of Single Agent

Authors: Xiaosa Zhao, Kunpeng Liu, Wei Fan, Lu Jiang, Xiaowei Zhao, Minghao Yin, Yanjie Fu

Abstract: Feature selection aims to select a subset of features to optimize the performances of downstream predictive tasks. Recently, multi-agent reinforced feature selection (MARFS) has been introduced to automate feature selection, by creating agents for each feature to select or deselect corresponding features. Although MARFS enjoys the automation of the selection process, MARFS suffers from not just th… ▽ More Feature selection aims to select a subset of features to optimize the performances of downstream predictive tasks. Recently, multi-agent reinforced feature selection (MARFS) has been introduced to automate feature selection, by creating agents for each feature to select or deselect corresponding features. Although MARFS enjoys the automation of the selection process, MARFS suffers from not just the data complexity in terms of contents and dimensionality, but also the exponentially-increasing computational costs with regard to the number of agents. The raised concern leads to a new research question: Can we simplify the selection process of agents under reinforcement learning context so as to improve the efficiency and costs of feature selection? To address the question, we develop a single-agent reinforced feature selection approach integrated with restructured choice strategy. Specifically, the restructured choice strategy includes: 1) we exploit only one single agent to handle the selection task of multiple features, instead of using multiple agents. 2) we develop a scanning method to empower the single agent to make multiple selection/deselection decisions in each round of scanning. 3) we exploit the relevance to predictive labels of features to prioritize the scanning orders of the agent for multiple features. 4) we propose a convolutional auto-encoder algorithm, integrated with the encoded index information of features, to improve state representation. 5) we design a reward scheme that take into account both prediction accuracy and feature redundancy to facilitate the exploration process. Finally, we present extensive experimental results to demonstrate the efficiency and effectiveness of the proposed method. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2006.10337 [pdf, other]

doi 10.1145/3394486.3403319

Category-Specific CNN for Visual-aware CTR Prediction at JD.com

Authors: Hu Liu, **g Lu, Hao Yang, Xiwei Zhao, Sulong Xu, Hao Peng, Zehua Zhang, Wenjie Niu, Xiaokun Zhu, Yongjun Bao, Weipeng Yan

Abstract: As one of the largest B2C e-commerce platforms in China, JD com also powers a leading advertising system, serving millions of advertisers with fingertip connection to hundreds of millions of customers. In our system, as well as most e-commerce scenarios, ads are displayed with images.This makes visual-aware Click Through Rate (CTR) prediction of crucial importance to both business effectiveness an… ▽ More As one of the largest B2C e-commerce platforms in China, JD com also powers a leading advertising system, serving millions of advertisers with fingertip connection to hundreds of millions of customers. In our system, as well as most e-commerce scenarios, ads are displayed with images.This makes visual-aware Click Through Rate (CTR) prediction of crucial importance to both business effectiveness and user experience. Existing algorithms usually extract visual features using off-the-shelf Convolutional Neural Networks (CNNs) and late fuse the visual and non-visual features for the finally predicted CTR. Despite being extensively studied, this field still face two key challenges. First, although encouraging progress has been made in offline studies, applying CNNs in real systems remains non-trivial, due to the strict requirements for efficient end-to-end training and low-latency online serving. Second, the off-the-shelf CNNs and late fusion architectures are suboptimal. Specifically, off-the-shelf CNNs were designed for classification thus never take categories as input features. While in e-commerce, categories are precisely labeled and contain abundant visual priors that will help the visual modeling. Unaware of the ad category, these CNNs may extract some unnecessary category-unrelated features, wasting CNN's limited expression ability. To overcome the two challenges, we propose Category-specific CNN (CSCNN) specially for CTR prediction. CSCNN early incorporates the category knowledge with a light-weighted attention-module on each convolutional layer. This enables CSCNN to extract expressive category-specific visual patterns that benefit the CTR prediction. Offline experiments on benchmark and a 10 billion scale real production dataset from JD, together with an Online A/B test show that CSCNN outperforms all compared state-of-the-art algorithms. △ Less

Submitted 19 June, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

arXiv:2001.06923 [pdf, other]

Exploring Spatio-Temporal and Cross-Type Correlations for Crime Prediction

Authors: Xiangyu Zhao, Jiliang Tang

Abstract: Crime prediction plays an impactful role in enhancing public security and sustainable development of urban. With recent advances in data collection and integration technologies, a large amount of urban data with rich crime-related information and fine-grained spatio-temporal logs has been recorded. Such helpful information can boost our understandings about the temporal evolution and spatial facto… ▽ More Crime prediction plays an impactful role in enhancing public security and sustainable development of urban. With recent advances in data collection and integration technologies, a large amount of urban data with rich crime-related information and fine-grained spatio-temporal logs has been recorded. Such helpful information can boost our understandings about the temporal evolution and spatial factors of urban crimes and can enhance accurate crime prediction. In this paper, we perform crime prediction exploiting the cross-type and spatio-temporal correlations of urban crimes. In particular, we verify the existence of correlations among different types of crime from temporal and spatial perspectives, and propose a coherent framework to mathematically model these correlations for crime prediction. The extensive experimental results on real-world data validate the effectiveness of the proposed framework. Further experiments have been conducted to understand the importance of different correlations in crime prediction. △ Less

Submitted 21 January, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

arXiv:2001.01347 [pdf, other]

Elastic Bulk Synchronous Parallel Model for Distributed Deep Learning

Authors: Xing Zhao, Manos Papagelis, Aijun An, Bao Xin Chen, Junfeng Liu, Yonggang Hu

Abstract: The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is that it requires workers to wait for the straggler at every iteration. To ameliorate this shortcoming of classic BSP, we propose ELASTICBSP a model that aims to… ▽ More The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is that it requires workers to wait for the straggler at every iteration. To ameliorate this shortcoming of classic BSP, we propose ELASTICBSP a model that aims to relax its strict synchronization requirement. The proposed model offers more flexibility and adaptability during the training phase, without sacrificing on the accuracy of the trained model. We also propose an efficient method that materializes the model, named ZIPLINE. The algorithm is tunable and can effectively balance the trade-off between quality of convergence and iteration throughput, in order to accommodate different environments or applications. A thorough experimental evaluation demonstrates that our proposed ELASTICBSP model converges faster and to a higher accuracy than the classic BSP. It also achieves comparable (if not higher) accuracy than the other sensible synchronization models. △ Less

Submitted 5 January, 2020; originally announced January 2020.

Comments: The paper was accepted in the proceedings of the IEEE International Conference on Data Mining 2019 (ICDM'19), 1504-1509

Journal ref: ICDM 2019, 1504-1509

arXiv:1910.13930 [pdf, other]

Distilling Black-Box Travel Mode Choice Model for Behavioral Interpretation

Authors: Xilei Zhao, Zhengze Zhou, Xiang Yan, Pascal Van Hentenryck

Abstract: Machine learning has proved to be very successful for making predictions in travel behavior modeling. However, most machine-learning models have complex model structures and offer little or no explanation as to how they arrive at these predictions. Interpretations about travel behavior models are essential for decision makers to understand travelers' preferences and plan policy interventions accor… ▽ More Machine learning has proved to be very successful for making predictions in travel behavior modeling. However, most machine-learning models have complex model structures and offer little or no explanation as to how they arrive at these predictions. Interpretations about travel behavior models are essential for decision makers to understand travelers' preferences and plan policy interventions accordingly. Therefore, this paper proposes to apply and extend the model distillation approach, a model-agnostic machine-learning interpretation method, to explain how a black-box travel mode choice model makes predictions for the entire population and subpopulations of interest. Model distillation aims at compressing knowledge from a complex model (teacher) into an understandable and interpretable model (student). In particular, the paper integrates model distillation with market segmentation to generate more insights by accounting for heterogeneity. Furthermore, the paper provides a comprehensive comparison of student models with the benchmark model (decision tree) and the teacher model (gradient boosting trees) to quantify the fidelity and accuracy of the students' interpretations. △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: 17 pages, 3 figures

arXiv:1910.12800 [pdf, other]

Attenuating Random Noise in Seismic Data by a Deep Learning Approach

Authors: Xing Zhao, ** Lu, Yanyan Zhang, Jianxiong Chen, Xiaoyang Li

Abstract: In the geophysical field, seismic noise attenuation has been considered as a critical and long-standing problem, especially for the pre-stack data processing. Here, we propose a model to leverage the deep-learning model for this task. Rather than directly applying an existing de-noising model from ordinary images to the seismic data, we have designed a particular deep-learning model, based on resi… ▽ More In the geophysical field, seismic noise attenuation has been considered as a critical and long-standing problem, especially for the pre-stack data processing. Here, we propose a model to leverage the deep-learning model for this task. Rather than directly applying an existing de-noising model from ordinary images to the seismic data, we have designed a particular deep-learning model, based on residual neural networks. It is named as N2N-Seismic, which has a strong ability to recover the seismic signals back to intact condition with the preservation of primary signals. The proposed model, achieving with great success in attenuating noise, has been tested on two different seismic datasets. Several metrics show that our method outperforms conventional approaches in terms of Signal-to-Noise-Ratio, Mean-Squared-Error, Phase Spectrum, etc. Moreover, robust tests in terms of effectively removing random noise from any dataset with strong and weak noises have been extensively scrutinized in making sure that the proposed model is able to maintain a good level of adaptation while dealing with large variations of noise characteristics and intensities. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: 33 pages, 11 figures

arXiv:1910.05640 [pdf, other]

Deep Learning for Predicting Dynamic Uncertain Opinions in Network Data

Authors: Xujiang Zhao, Feng Chen, **-Hee Cho

Abstract: Subjective Logic (SL) is one of well-known belief models that can explicitly deal with uncertain opinions and infer unknown opinions based on a rich set of operators of fusing multiple opinions. Due to high simplicity and applicability, SL has been substantially applied in a variety of decision making in the area of cybersecurity, opinion models, trust models, and/or social network analysis. Howev… ▽ More Subjective Logic (SL) is one of well-known belief models that can explicitly deal with uncertain opinions and infer unknown opinions based on a rich set of operators of fusing multiple opinions. Due to high simplicity and applicability, SL has been substantially applied in a variety of decision making in the area of cybersecurity, opinion models, trust models, and/or social network analysis. However, SL and its variants have exposed limitations in predicting uncertain opinions in real-world dynamic network data mainly in three-fold: (1) a lack of scalability to deal with a large-scale network; (2) limited capability to handle heterogeneous topological and temporal dependencies among node-level opinions; and (3) a high sensitivity with conflicting evidence that may generate counterintuitive opinions derived from the evidence. In this work, we proposed a novel deep learning (DL)-based dynamic opinion inference model while node-level opinions are still formalized based on SL meaning that an opinion has a dimension of uncertainty in addition to belief and disbelief in a binomial opinion (i.e., agree or disagree). The proposed DL-based dynamic opinion inference model overcomes the above three limitations by integrating the following techniques: (1) state-of-the-art DL techniques, such as the Graph Convolutional Network (GCN) and the Gated Recurrent Units (GRU) for modeling the topological and temporal heterogeneous dependency information of a given dynamic network; (2) modeling conflicting opinions based on robust statistics; and (3) a highly scalable inference algorithm to predict dynamic, uncertain opinions in a linear computation time. We validated the outperformance of our proposed DL-based algorithm (i.e., GCN-GRU-opinion model) via extensive comparative performance analysis based on four real-world datasets. △ Less

Submitted 12 October, 2019; originally announced October 2019.

Comments: IEEE Bigdata 2018

Journal ref: 2018 IEEE International Conference on Big Data (Big Data)

arXiv:1910.00185 [pdf, other]

doi 10.1109/BigData47090.2019.9005971

Predicting Alzheimer's Disease by Hierarchical Graph Convolution from Positron Emission Tomography Imaging

Authors: Jiaming Guo, Wei Qiu, Xiang Li, Xuandong Zhao, Ning Guo, Quanzheng Li

Abstract: Imaging-based early diagnosis of Alzheimer Disease (AD) has become an effective approach, especially by using nuclear medicine imaging techniques such as Positron Emission Topography (PET). In various literature it has been found that PET images can be better modeled as signals (e.g. uptake of florbetapir) defined on a network (non-Euclidean) structure which is governed by its underlying graph pat… ▽ More Imaging-based early diagnosis of Alzheimer Disease (AD) has become an effective approach, especially by using nuclear medicine imaging techniques such as Positron Emission Topography (PET). In various literature it has been found that PET images can be better modeled as signals (e.g. uptake of florbetapir) defined on a network (non-Euclidean) structure which is governed by its underlying graph patterns of pathological progression and metabolic connectivity. In order to effectively apply deep learning framework for PET image analysis to overcome its limitation on Euclidean grid, we develop a solution for 3D PET image representation and analysis under a generalized, graph-based CNN architecture (PETNet), which analyzes PET signals defined on a group-wise inferred graph structure. Computations in PETNet are defined in non-Euclidean, graph (network) domain, as it performs feature extraction by convolution operations on spectral-filtered signals on the graph and pooling operations based on hierarchical graph clustering. Effectiveness of the PETNet is evaluated on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, which shows improved performance over both deep learning and other machine learning-based methods. △ Less

Submitted 30 September, 2019; originally announced October 2019.

Comments: Jiaming Guo, Wei Qiu and Xiang Li contribute equally to this work

arXiv:1908.11848 [pdf, other]

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Authors: Xing Zhao, Aijun An, Junfeng Liu, Bao Xin Chen

Abstract: Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to train a large model over large datasets. A popular solution is to distribute and parallelize the training process across multiple machines using the parameter s… ▽ More Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to train a large model over large datasets. A popular solution is to distribute and parallelize the training process across multiple machines using the parameter server framework. In this paper, we present a distributed paradigm on the parameter server framework called Dynamic Stale Synchronous Parallel (DSSP) which improves the state-of-the-art Stale Synchronous Parallel (SSP) paradigm by dynamically determining the staleness threshold at the run time. Conventionally to run distributed training in SSP, the user needs to specify a particular staleness threshold as a hyper-parameter. However, a user does not usually know how to set the threshold and thus often finds a threshold value through trial and error, which is time-consuming. Based on workers' recent processing time, our approach DSSP adaptively adjusts the threshold per iteration at running time to reduce the waiting time of faster workers for synchronization of the globally shared parameters, and consequently increases the frequency of parameters updates (increases iteration throughput), which speedups the convergence rate. We compare DSSP with other paradigms such as Bulk Synchronous Parallel (BSP), Asynchronous Parallel (ASP), and SSP by running deep neural networks (DNN) models over GPU clusters in both homogeneous and heterogeneous environments. The results show that in a heterogeneous environment where the cluster consists of mixed models of GPUs, DSSP converges to a higher accuracy much earlier than SSP and BSP and performs similarly to ASP. In a homogeneous distributed cluster, DSSP has more stable and slightly better performance than SSP and ASP, and converges much faster than BSP. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Journal ref: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

arXiv:1907.03382 [pdf, other]

doi 10.1145/3295500.3356180

Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale

Authors: Atılım Güneş Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, Saeid Naderiparizi, Bradley Gram-Hansen, Gilles Louppe, Mingfei Ma, Xiaohui Zhao, Philip Torr, Victor Lee, Kyle Cranmer, Prabhat, Frank Wood

Abstract: Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL frame… ▽ More Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL framework that couples directly to existing scientific simulators through a cross-platform probabilistic execution protocol and provides Markov chain Monte Carlo (MCMC) and deep-learning-based inference compilation (IC) engines for tractable inference. To guide IC inference, we perform distributed training of a dynamic 3DCNN--LSTM architecture with a PyTorch-MPI-based framework on 1,024 32-core CPU nodes of the Cori supercomputer with a global minibatch size of 128k: achieving a performance of 450 Tflop/s through enhancements to PyTorch. We demonstrate a Large Hadron Collider (LHC) use-case with the C++ Sherpa simulator and achieve the largest-scale posterior inference in a Turing-complete PPL. △ Less

Submitted 27 August, 2019; v1 submitted 7 July, 2019; originally announced July 2019.

Comments: 14 pages, 8 figures

MSC Class: 68T37; 68T05; 62P35 ACM Class: G.3; I.2.6; J.2

Journal ref: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 17--22, 2019

arXiv:1907.00111 [pdf, other]

doi 10.1080/00401706.2020.1772114

An Intrinsic Geometrical Approach for Statistical Process Control of Surface and Manifold Data

Authors: Xueqi Zhao, Enrique del Castillo

Abstract: We present a new method for statistical process control (SPC) of a discrete part manufacturing system based on intrinsic geometrical properties of the parts, estimated from three-dimensional sensor data. An intrinsic method has the computational advantage of avoiding the difficult part registration problem, necessary in previous SPC approaches of three-dimensional geometrical data, but inadequate… ▽ More We present a new method for statistical process control (SPC) of a discrete part manufacturing system based on intrinsic geometrical properties of the parts, estimated from three-dimensional sensor data. An intrinsic method has the computational advantage of avoiding the difficult part registration problem, necessary in previous SPC approaches of three-dimensional geometrical data, but inadequate if noncontact sensors are used. The approach estimates the spectrum of the Laplace-Beltrami (LB) operator of the scanned parts and uses a multivariate nonparametric control chart for online process control. Our proposal brings SPC closer to computer vision and computer graphics methods aimed to detect large differences in shape (but not in size). However, the SPC problem differs in that small changes in either shape or size of the parts need to be detected, kee** a controllable false alarm rate and without completely filtering noise. An online or "Phase II" method and a scheme for starting up in the absence of prior data ("Phase I") are presented. Comparison with earlier approaches that require registration shows the LB spectrum method to be more sensitive to rapidly detect small changes in shape and size, including the practical case when the sequence of part datasets is in the form of large, unequal size meshes. A post-alarm diagnostic method to investigate the location of defects on the surface of a part is also presented. While we focus in this article on surface (triangulation) data, the methods can also be applied to point cloud and voxel metrology data. △ Less

Submitted 10 July, 2020; v1 submitted 28 June, 2019; originally announced July 2019.

arXiv:1905.08152 [pdf, other]

Stochastic Variance Reduction for Deep Q-learning

Authors: Wei-Ye Zhao, Xi-Ya Guan, Yang Liu, Xiaoming Zhao, Jian Peng

Abstract: Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications. However, the current algorithms still suffer from poor gradient estimation with excessive variance, resulting in unstable training and poor sample efficiency. In our paper, we proposed an innovative optimization strategy by utilizing stochastic variance reduced gradient (SV… ▽ More Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications. However, the current algorithms still suffer from poor gradient estimation with excessive variance, resulting in unstable training and poor sample efficiency. In our paper, we proposed an innovative optimization strategy by utilizing stochastic variance reduced gradient (SVRG) techniques. With extensive experiments on Atari domain, our method outperforms the deep q-learning baselines on 18 out of 20 games. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: this is the full paper version, its extended abstract has been published

arXiv:1903.10498 [pdf]

doi 10.1177/0962280219889080

Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis

Authors: Sean McGrath, XiaoFei Zhao, Russell Steele, Brett D. Thombs, Andrea Benedetti, the DEPRESsion Screening Data, Collaboration

Abstract: Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median… ▽ More Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the sample mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data. △ Less

Submitted 25 March, 2019; originally announced March 2019.

Journal ref: Stat. Methods Med. Res. 29 (2020) 2520-2537

arXiv:1903.06258 [pdf, ps, other]

doi 10.1109/LGRS.2019.2939356

Hyperspectral Image Classification with Deep Metric Learning and Conditional Random Field

Authors: Yi Liang, Xin Zhao, Alan J. X. Guo, Fei Zhu

Abstract: To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this letter,… ▽ More To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this letter, we propose a novel framework that organically combines the spectrum-based deep metric learning model and the conditional random field algorithm. The deep metric learning model is supervised by the center loss to produce spectrum-based features that gather more tightly in Euclidean space within classes. The conditional random field with Gaussian edge potentials, which is firstly proposed for image segmentation tasks, is introduced to give the pixel-wise classification over the hyperspectral image by utilizing both the geographical distances between pixels and the Euclidean distances between the features produced by the deep metric learning model. The proposed framework is trained by spectral pixels at the deep metric learning stage and utilizes the half handcrafted spatial features at the conditional random field stage. This settlement alleviates the shortage of training data to some extent. Experiments on two real hyperspectral images demonstrate the advantages of the proposed method in terms of both classification accuracy and computation cost. △ Less

Submitted 15 July, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

Showing 1–50 of 79 results for author: Zhao, X