Search | arXiv e-print repository

Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

Authors: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga… ▽ More Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincaré inequality offers improved bounds. △ Less

Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.03683 [pdf, other]

Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Authors: Ding Huang, Ting Li, Jian Huang

Abstract: We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowle… ▽ More We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowledge from a pre-trained model's learned prior distribution. It efficiently leverages large diffusion models, differentially intervening different hidden features with a head-heavy and foot-light configuration. Experiments highlight the superiority of BPS over contemporary methods across a range of tasks even with limited amount of data. Notably, BPS attains an FID score of 10.49 under the sketch condition on the COCO17 dataset. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 25 pages, 26 figures, and 4 tables

MSC Class: 62G05; 68T07

arXiv:2406.03596 [pdf]

A Multivariate Equivalence Test Based on Mahalanobis Distance with a Data-Driven Margin

Authors: Chao Wang, Yu-Ting Weng, Shaobo Liu, Tengfei Li, Meiyu Shen, Yi Tsong

Abstract: Multivariate equivalence testing is needed in a variety of scenarios for drug development. For example, drug products obtained from natural sources may contain many components for which the individual effects and/or their interactions on clinical efficacy and safety cannot be completely characterized. Such lack of sufficient characterization poses a challenge for both generic drug developers to de… ▽ More Multivariate equivalence testing is needed in a variety of scenarios for drug development. For example, drug products obtained from natural sources may contain many components for which the individual effects and/or their interactions on clinical efficacy and safety cannot be completely characterized. Such lack of sufficient characterization poses a challenge for both generic drug developers to demonstrate and regulatory authorities to determine the sameness of a proposed generic product to its reference product. Another case is to ensure batch-to-batch consistency of naturally derived products containing a vast number of components, such as botanical products. The equivalence or sameness between products containing many components that cannot be individually evaluated needs to be studied in a holistic manner. Multivariate equivalence test based on Mahalanobis distance may be suitable to evaluate many variables holistically. Existing studies based on such method assumed either a predetermined constant margin, for which a consensus is difficult to achieve, or a margin derived from the data, where, however, the randomness is ignored during the testing. In this study, we propose a multivariate equivalence test based on Mahalanobis distance with a data-drive margin with the randomness in the margin considered. Several possible implementations are compared with existing approaches via extensive simulation studies. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.00317 [pdf, other]

Combining Experimental and Historical Data for Policy Evaluation

Authors: Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu

Abstract: This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to min… ▽ More This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20782 [pdf, other]

Universal Exact Compression of Differentially Private Mechanisms

Authors: Yanxiao Liu, Wei-Ning Chen, Ayfer Özgür, Cheuk Ting Li

Abstract: To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the or… ▽ More To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the original local randomizer. Hence, the PPR-compressed privacy mechanism retains all desirable statistical properties of the original privacy mechanism such as unbiasedness and Gaussianity. Moreover, PPR achieves a compression size within a logarithmic gap from the theoretical lower bound. Using the PPR, we give a new order-wise trade-off between communication, accuracy, central and local differential privacy for distributed mean estimation. Experiment results on distributed mean estimation show that PPR consistently gives a better trade-off between communication, accuracy and central differential privacy compared to the coordinate subsampled Gaussian mechanism, while also providing local differential privacy. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 30 pages, 3 figures

arXiv:2405.12838 [pdf, ps, other]

Quantum Non-Identical Mean Estimation: Efficient Algorithms and Fundamental Limits

Authors: Jiachen Hu, Tongyang Li, Xinzhao Wang, Yecheng Xue, Chenyi Zhang, Han Zhong

Abstract: We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve qua… ▽ More We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve quadratic speed-up over the number of classical samples needed to estimate the mean $μ$, where the samples come from different random variables with mean close to $μ$. Technically, our quantum algorithms reduce bounded and sub-Gaussian random variables to the Bernoulli case, and use an uncomputation trick to overcome the challenge that direct amplitude estimation does not work with non-identical query access. Our quantum query lower bounds are established by simulating non-identical oracles by parallel oracles, and also by an adversarial method with non-identical oracles. Both results pave the way for proving quantum query lower bounds with non-identical oracles in general, which may be of independent interest. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 31 pages, 0 figure. To appear in the 19th Theory of Quantum Computation, Communication and Cryptography (TQC 2024)

arXiv:2403.18951 [pdf, other]

Robust estimations from distribution structures: V. Non-asymptotic

Authors: Tuobang Li

Abstract: Due to the complexity of order statistics, the finite sample behaviour of robust statistics is generally not analytically solvable. While the Monte Carlo method can provide approximate solutions, its convergence rate is typically very slow, making the computational cost to achieve the desired accuracy unaffordable for ordinary users. In this paper, we propose an approach analogous to the Fourier t… ▽ More Due to the complexity of order statistics, the finite sample behaviour of robust statistics is generally not analytically solvable. While the Monte Carlo method can provide approximate solutions, its convergence rate is typically very slow, making the computational cost to achieve the desired accuracy unaffordable for ordinary users. In this paper, we propose an approach analogous to the Fourier transformation to decompose the finite sample structure of the uniform distribution. By obtaining sets of sequences that are consistent with parametric distributions for the first four sample moments, we can approximate the finite sample behavior of other estimators with significantly reduced computational costs. This article reveals the underlying structure of randomness and presents a novel approach to integrate multiple assumptions. △ Less

Submitted 13 June, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

arXiv:2403.12110 [pdf, other]

Robust estimations from distribution structures: I. Mean

Authors: Tuobang Li

Abstract: As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that their maximum biases concerning the mean can be quite different, but the underlying mechanisms largely remain unclear. This study exploited a semiparametric me… ▽ More As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that their maximum biases concerning the mean can be quite different, but the underlying mechanisms largely remain unclear. This study exploited a semiparametric method to classify distributions by the asymptotic orderliness of quantile combinations with varying breakdown points, showing their interrelations and connections to parametric distributions. Further deductions explain why the Winsorized mean typically has smaller biases compared to the trimmed mean; two sequences of semiparametric robust mean estimators emerge, particularly highlighting the superiority of the median Hodges Lehmann mean. This article sheds light on the understanding of the common nature of probability distributions. △ Less

Submitted 13 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.17287 [pdf, other]

An Interpretable Evaluation of Entropy-based Novelty of Generative Models

Authors: **gwei Zhang, Cheuk Ting Li, Farzan Farnia

Abstract: The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine… ▽ More The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine learning community. In this work, we focus on the novelty assessment for multi-modal distributions and attempt to address the following differential clustering task: Given samples of a generative model $P_\mathcal{G}$ and a reference model $P_\mathrm{ref}$, how can we discover the sample types expressed by $P_\mathcal{G}$ more frequently than in $P_\mathrm{ref}$? We introduce a spectral approach to the differential clustering task and propose the Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of $P_\mathcal{G}$ with respect to $P_\mathrm{ref}$. We analyze the KEN score for mixture distributions with well-separable components and develop a kernel-based method to compute the KEN score from empirical data. We support the KEN framework by presenting numerical results on synthetic and real image datasets, indicating the framework's effectiveness in detecting novel modes and comparing generative models. The paper's code is available at: www.github.com/buyeah1109/KEN △ Less

Submitted 13 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.05802 [pdf, other]

Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence

Authors: Thomas A. Lasko, John M. Still, Thomas Z. Li, Marco Barbero Mota, William W. Stead, Eric V. Strobl, Bennett A. Landman, Fabien Maldonado

Abstract: Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on… ▽ More Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on the medical record of causal latent sources of disease. We inferred a broad set of 2000 clinical signatures of latent sources from 9195 variables in 269,099 Electronic Health Records. The learned signatures produced better discrimination than the original variables in a lung cancer prediction task unknown to the inference algorithm, predicting 3-year malignancy in patients with no history of cancer before a solitary lung nodule was discovered. More importantly, the signatures' greater explanatory power identified pre-nodule signatures of apparently undiagnosed cancer in many of those patients. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 29 Pages, 8 figures

ACM Class: I.2.6; I.2.1; J.3

arXiv:2401.16320 [pdf, ps, other]

A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning

Authors: X. L. Zhao, Y. M. Zhao, M. Li, T. T. Li, Q. Liu, S. Guo, X. X. Yi

Abstract: We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent sp… ▽ More We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent spin state in an environment characterized by dissipation and dephasing. Compared to the constant control scenario, this approach provides various control sequences maintaining collective spin squeezing and entanglement. It is observed that denser application of the control pulses enhances the performance of the outcomes. However, there is a minor enhancement in the performance by adding control actions. The proposed strategy demonstrates increased effectiveness for larger systems. Thermal excitations of the reservoir are detrimental to the control outcomes. Feasible experiments are suggested to implement this control proposal based on the comparison with the others. The extensions to continuous control problems and another quantum system are discussed. The replaceability of the reinforcement learning module is also emphasized. This research paves the way for its application in manipulating other quantum systems. △ Less

Submitted 14 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.05579 [pdf, other]

Conditional Stochastic Interpolation for Generative Learning

Authors: Ding Huang, Jian Huang, Ting Li, Guohao Shen

Abstract: We propose a conditional stochastic interpolation (CSI) approach to learning conditional distributions. CSI learns probability flow equations or stochastic differential equations that transport a reference distribution to the target conditional distribution. This is achieved by first learning the drift function and the conditional score function based on conditional stochastic interpolation, which… ▽ More We propose a conditional stochastic interpolation (CSI) approach to learning conditional distributions. CSI learns probability flow equations or stochastic differential equations that transport a reference distribution to the target conditional distribution. This is achieved by first learning the drift function and the conditional score function based on conditional stochastic interpolation, which are then used to construct a deterministic process governed by an ordinary differential equation or a diffusion process for conditional sampling. In our proposed CSI model, we incorporate an adaptive diffusion term to address the instability issues arising during the training process. We provide explicit forms of the conditional score function and the drift function in terms of conditional expectations under mild conditions, which naturally lead to an nonparametric regression approach to estimating these functions. Furthermore, we establish non-asymptotic error bounds for learning the target conditional distribution via conditional stochastic interpolation in terms of KL divergence, taking into account the neural network approximation error. We illustrate the application of CSI on image generation using a benchmark image dataset. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 44 pages, 4 figures

arXiv:2311.15598 [pdf, other]

Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks

Authors: Zhongyuan Lyu, Ting Li, Dong Xia

Abstract: In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a… ▽ More In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization algorithm involving both node and sample splitting and a refinement procedure by likelihood-based Lloyd algorithm. Network clustering must be accompanied by node community detection. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. Numerical simulations and real data experiments both validate that our method outperforms existing methods. Oftentimes, the edges of networks carry count-type weights. We then extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks. The minimax optimal clustering error rates in these discrete mixtures all take the same exponential form characterized by the Renyi divergences. These optimal clustering error rates in discrete mixtures can also be achieved by our proposed two-stage clustering algorithm. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.05248 [pdf, other]

A General Space of Belief Updates for Model Misspecification in Bayesian Networks

Authors: Tian** Li

Abstract: In an ideal setting for Bayesian agents, a perfect description of the rules of the environment (i.e., the objective observation model) is available, allowing them to reason through the Bayesian posterior to update their beliefs in an optimal way. But such an ideal setting hardly ever exists in the natural world, so agents have to make do with reasoning about how they should update their beliefs si… ▽ More In an ideal setting for Bayesian agents, a perfect description of the rules of the environment (i.e., the objective observation model) is available, allowing them to reason through the Bayesian posterior to update their beliefs in an optimal way. But such an ideal setting hardly ever exists in the natural world, so agents have to make do with reasoning about how they should update their beliefs simultaneously. This introduces a number of related challenges for a number of research areas: (1) For Bayesian statistics, this deviation of the subjective model from the true data-generating mechanism is termed model misspecification in the literature. (2) For neuroscience, it introduces the necessity to model how the agents' belief updates (how they use evidence to update their belief) and how their belief changes over time. The current paper addresses these two challenges by (a) providing a general class of posteriors/belief updates called cut-posteriors of Bayesian networks that have a much greater expressivity, and (b) parameterizing the space of possible posteriors to make meta-learning (i.e., choosing the belief update from this space in a principled manner) possible. For (a), it is noteworthy that any cut-posterior has local computation only, making computation tractable for human or artificial agents. For (b), a Markov Chain Monte Carlo algorithm to perform such meta-learning will be sketched here, though it is only an illustration and but no means the only possible meta-learning procedure possible for the space of cut-posteriors. Operationally, this work gives a general algorithm to take in an arbitrary Bayesian network and output all possible cut-posteriors in the space. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 14 pages, 4 figures

arXiv:2311.02532 [pdf, other]

Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making

Authors: Ting Li, Chengchun Shi, Jianing Wang, Fan Zhou, Hongtu Zhu

Abstract: A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentia… ▽ More A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentially assigned over time. These strategies are designed to minimize the variance of the treatment effect estimator when data follow a non-Markov decision process or a (time-varying) Markov decision process. We further develop estimation procedures based on existing off-policy evaluation (OPE) methods and conduct extensive experiments in various environments to demonstrate the effectiveness of the proposed methodologies. In theory, we prove the optimality of the proposed treatment allocation design and establish upper bounds for the mean squared errors of the resulting treatment effect estimators. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2309.08809 [pdf]

Associations Between Sleep Efficiency Variability and Cognition Among Older Adults: Cross-Sectional Accelerometer Study

Authors: Collin Sakal, Tingyou Li, Juan Li, Xinyue Li

Abstract: Objective: We aimed to determine the relationship between day-to-day sleep efficiency variability and cognitive function among older adults using accelerometer data and three cognitive tests. Methods: Older adults aged 65+ with 5 days of accelerometer data from the National Health and Nutrition Examination Survey (NHANES) who completed the Digit Symbol Substitution Test (DSST), the Consortium to… ▽ More Objective: We aimed to determine the relationship between day-to-day sleep efficiency variability and cognitive function among older adults using accelerometer data and three cognitive tests. Methods: Older adults aged 65+ with 5 days of accelerometer data from the National Health and Nutrition Examination Survey (NHANES) who completed the Digit Symbol Substitution Test (DSST), the Consortium to Establish a Registry for Alzheimers Disease Word-Learning subtest (CERAD WL), and Animal Fluency Test (AFT) were included in this study. Associations between sleep efficiency variability and each cognitive test were examined adjusted for age, sex, education, household income, marital status, depressive symptoms, diabetes, smoking habits, alcohol consumption, arthritis, heart disease, prior heart attack, prior stroke, activities of daily living, and instrumental activities of daily living. Results: A total of 1074 older adults were included in this study. Greater sleep efficiency variability was univariably associated with worse cognitive function based on the DSST (per 10% increase, Beta -3.34, 95% CI -5.33 to -1.34), CERAD-WL (per 10% increase, Beta -1.00, 95% CI -1.79 to -0.21), and AFT (per 10% increase, Beta -1.02, 95% CI -1.68 to -0.36). In adjusted models, greater sleep efficiency variability remained associated with lower DSST (per 10% increase, Beta -2.01, 95% CI -3.62 to -0.40) and AFT (per 10% increase, Beta -0.84, 95% CI -1.47 to -0.21) scores but not CERAD WL scores. Conclusions: Targeting consistency regarding sleep quality may be useful for interventions seeking to preserve cognitive function among older adults. △ Less

Submitted 5 November, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: Revised study design and figures

arXiv:2307.08382 [pdf, other]

doi 10.1016/j.xcrp.2024.101891

Predicting Battery Lifetime Under Varying Usage Conditions from Early Aging Data

Authors: Tingkai Li, Zihao Zhou, Adam Thelen, David Howey, Chao Hu

Abstract: Accurate battery lifetime prediction is important for preventative maintenance, warranties, and improved cell design and manufacturing. However, manufacturing variability and usage-dependent degradation make life prediction challenging. Here, we investigate new features derived from capacity-voltage data in early life to predict the lifetime of cells cycled under widely varying charge rates, disch… ▽ More Accurate battery lifetime prediction is important for preventative maintenance, warranties, and improved cell design and manufacturing. However, manufacturing variability and usage-dependent degradation make life prediction challenging. Here, we investigate new features derived from capacity-voltage data in early life to predict the lifetime of cells cycled under widely varying charge rates, discharge rates, and depths of discharge. Features were extracted from regularly scheduled reference performance tests (i.e., low rate full cycles) during cycling. The early-life features capture a cell's state of health and the rate of change of component-level degradation modes, some of which correlate strongly with cell lifetime. Using a newly generated dataset from 225 nickel-manganese-cobalt/graphite Li-ion cells aged under a wide range of conditions, we demonstrate a lifetime prediction of in-distribution cells with 15.1% mean absolute percentage error using no more than the first 15% of data, for most cells. Further testing using a hierarchical Bayesian regression model shows improved performance on extrapolation, achieving 21.8% mean absolute percentage error for out-of-distribution cells. Our approach highlights the importance of using domain knowledge of lithium-ion battery degradation modes to inform feature engineering. Further, we provide the community with a new publicly available battery aging dataset with cells cycled beyond 80% of their rated capacity. △ Less

Submitted 20 October, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Journal ref: Cell Reports Physical Science. 5(4), 101891. 2024

arXiv:2307.01497 [pdf, other]

Accelerated stochastic approximation with state-dependent noise

Authors: Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li

Abstract: We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered… ▽ More We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings. △ Less

Submitted 13 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.10405 [pdf, other]

A semi-parametric estimation method for quantile coherence with an application to bivariate financial time series clustering

Authors: Cristian F. Jiménez-Varón, Ying Sun, Ta-Hsin Li

Abstract: In multivariate time series analysis, spectral coherence measures the linear dependency between two time series at different frequencies. However, real data applications often exhibit nonlinear dependency in the frequency domain. Conventional coherence analysis fails to capture such dependency. The quantile coherence, on the other hand, characterizes nonlinear dependency by defining the coherence… ▽ More In multivariate time series analysis, spectral coherence measures the linear dependency between two time series at different frequencies. However, real data applications often exhibit nonlinear dependency in the frequency domain. Conventional coherence analysis fails to capture such dependency. The quantile coherence, on the other hand, characterizes nonlinear dependency by defining the coherence at a set of quantile levels based on trigonometric quantile regression. This paper introduces a new estimation technique for quantile coherence. The proposed method is semi-parametric, which uses the parametric form of the spectrum of a vector autoregressive (VAR) model to approximate the quantile coherence, combined with nonparametric smoothing across quantiles. At a given quantile level, we compute the quantile autocovariance function (QACF) by performing the Fourier inverse transform of the quantile periodograms. Subsequently, we utilize the multivariate Durbin-Levinson algorithm to estimate the VAR parameters and derive the estimate of the quantile coherence. Finally, we smooth the preliminary estimate of quantile coherence across quantiles using a nonparametric smoother. Numerical results show that the proposed estimation method outperforms nonparametric methods. We show that quantile coherence-based bivariate time series clustering has advantages over the ordinary VAR coherence. For applications, the identified clusters of financial stocks by quantile coherence with a market benchmark are shown to have an intriguing and more informative structure of diversified investment portfolios that may be used by investors to make better decisions. △ Less

Submitted 29 February, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

Comments: 39 pages, 11 figures

arXiv:2306.06581 [pdf, other]

Importance Sparsification for Sinkhorn Algorithm

Authors: Mengyu Li, Jun Yu, Tao Li, Cheng Meng

Abstract: Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and… ▽ More Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and UOT solutions. Specifically, our method employs natural upper bounds for unknown optimal transport plans to establish effective sampling probabilities, and constructs a sparse kernel matrix to accelerate Sinkhorn iterations, reducing the computational cost of each iteration from $O(n^2)$ to $\widetilde{O}(n)$ for a sample of size $n$. Theoretically, we show the proposed estimators for the regularized OT and UOT problems are consistent under mild regularity conditions. Experiments on various synthetic data demonstrate Spar-Sink outperforms mainstream competitors in terms of both estimation error and speed. A real-world echocardiogram data analysis shows Spar-Sink can effectively estimate and visualize cardiac cycles, from which one can identify heart failure and arrhythmia. To evaluate the numerical accuracy of cardiac cycle prediction, we consider the task of predicting the end-systole time point using the end-diastole one. Results show Spar-Sink performs as well as the classical Sinkhorn algorithm, requiring significantly less computational time. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Accepted by Journal of Machine Learning Research

arXiv:2306.02826 [pdf, ps, other]

Near-Optimal Quantum Coreset Construction Algorithms for Clustering

Authors: Yecheng Xue, Xiaoyu Chen, Tongyang Li, Shaofeng H. -C. Jiang

Abstract: $k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains open to find sublinear-time quantum algorithms. We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3… ▽ More $k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains open to find sublinear-time quantum algorithms. We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3/2})$ query complexity. Our coreset reduces the input size from $n$ to $\mathrm{poly}(kε^{-1}d)$, so that existing $α$-approximation algorithms for clustering can run on top of it and yield $(1 + ε)α$-approximation. This eventually yields a quadratic speedup for various $k$-clustering approximation algorithms. We complement our algorithm with a nearly matching lower bound, that any quantum algorithm must make $Ω(\sqrt{nk})$ queries in order to achieve even $O(1)$-approximation for $k$-clustering. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Comments: 32 pages, 0 figures, 1 table. To appear in the Fortieth International Conference on Machine Learning (ICML 2023)

arXiv:2305.10187 [pdf, other]

Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing

Authors: Ting Li, Chengchun Shi, Zhaohua Lu, Yi Li, Hongtu Zhu

Abstract: Many modern tech companies, such as Google, Uber, and Didi, utilize online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with skewed and heavy-tailed outcome distributions may benefit from alternative criteria, such as quantiles. However, assessing dynamic quantile treatment effects (Q… ▽ More Many modern tech companies, such as Google, Uber, and Didi, utilize online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with skewed and heavy-tailed outcome distributions may benefit from alternative criteria, such as quantiles. However, assessing dynamic quantile treatment effects (QTE) remains a challenge, particularly when dealing with data from ride-sourcing platforms that involve sequential decision-making across time and space. In this paper, we establish a formal framework to calculate QTE conditional on characteristics independent of the treatment. Under specific model assumptions, we demonstrate that the dynamic conditional QTE (CQTE) equals the sum of individual CQTEs across time, even though the conditional quantile of cumulative rewards may not necessarily equate to the sum of conditional quantiles of individual rewards. This crucial insight significantly streamlines the estimation and inference processes for our target causal estimand. We then introduce two varying coefficient decision process (VCDP) models and devise an innovative method to test the dynamic CQTE. Moreover, we expand our approach to accommodate data from spatiotemporal dependent experiments and examine both conditional quantile direct and indirect effects. To showcase the practical utility of our method, we apply it to three real-world datasets from a ride-sourcing platform. Theoretical findings and comprehensive simulation studies further substantiate our proposal. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.06172 [pdf, other]

Principal Feature Detection via $Φ$-Sobolev Inequalities

Authors: Matthew T. C. Li, Youssef Marzouk, Olivier Zahm

Abstract: We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the refe… ▽ More We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the reference measure satisfies a subspace $φ$-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari $α$-divergences. Our construction proceeds in two stages. First, for any feature map and any $α$-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the $φ$-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of $α$-divergences $α\in (0,1]$. We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps. △ Less

Submitted 16 January, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: To appear in Bernoulli, but this version contains both the main file and the supplementary material

arXiv:2305.02500 [pdf]

Identifying the most predictive risk factors for future cognitive impairment among elderly Chinese

Authors: Collin Sakal, Tingyou Li, Juan Li, Xinyue Li

Abstract: Introduction. The societal burden of cognitive impairments in China has prompted researchers to develop clinical prediction models aimed at making risk assessments that enable preventative interventions. However, it is unclear which risk factors best predict future cognitive impairment and if predictive ability is consistent across different socioeconomic groups. Methods. We quantified the ability… ▽ More Introduction. The societal burden of cognitive impairments in China has prompted researchers to develop clinical prediction models aimed at making risk assessments that enable preventative interventions. However, it is unclear which risk factors best predict future cognitive impairment and if predictive ability is consistent across different socioeconomic groups. Methods. We quantified the ability of demographics, instrumental activities of daily living, activities of daily living, cognitive tests, social factors, psychological factors, diet, exercise and sleep, chronic diseases, and three recently published prediction models predict future cognitive impairments in the general Chinese population and among male, female, rural, urban, educated, and uneducated elderly. Data were taken from the 2011 and 2014 waves of the Chinese Longitudinal Healthy Longevity Survey (CLHLS). Results. The risk factor groups with the most predictive ability in the general population were demographics (AUC, 0.78, 95% CI, 0.77-0.78), cognitive tests (AUC, 0.72, 95% CI, 0.72-0.73), and instrumental activities of daily living (AUC, 0.71, 95% CI, 0.70-0.71). Demographics, cognitive tests, instrumental activities of daily living, and all three re-created prediction models had significantly higher AUCs when making predictions among women compared to men and among the uneducated compared to the educated. Discussion. This study suggests that demographics, cognitive tests, and instrumental activities of daily living are the most useful risk factors for predicting future cognitive impairment among elderly Chinese. However, the most useful risk factors and existing models have lower predictive power among male, urban, and educated elderly. More efforts are needed to ensure that equally accurate risk assessments can be conducted across different socioeconomic groups in China. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 3 figures, 2 tables

arXiv:2303.11230 [pdf, other]

Fitting Low-rank Models on Egocentrically Sampled Partial Networks

Authors: Angus Chan, Tianxi Li

Abstract: The statistical modeling of random networks has been widely used to uncover interaction mechanisms in complex systems and to predict unobserved links in real-world networks. In many applications, network connections are collected via egocentric sampling: a subset of nodes is sampled first, after which all links involving this subset are recorded; all other information is missing. Compared with the… ▽ More The statistical modeling of random networks has been widely used to uncover interaction mechanisms in complex systems and to predict unobserved links in real-world networks. In many applications, network connections are collected via egocentric sampling: a subset of nodes is sampled first, after which all links involving this subset are recorded; all other information is missing. Compared with the assumption of ``uniformly missing at random", egocentrically sampled partial networks require specially designed modeling strategies. Current statistical methods are either computationally infeasible or based on intuitive designs without theoretical justification. Here, we propose an approach to fit general low-rank models for egocentrically sampled networks, which include several popular network models. This method is based on graph spectral properties and is computationally efficient for large-scale networks. It results in consistent recovery of missing subnetworks due to egocentric sampling for sparse networks. To our knowledge, this method offers the first theoretical guarantee for egocentric partial network estimation in the scope of low-rank models. We evaluate the technique on several synthetic and real-world networks and show that it delivers competitive performance in link prediction tasks. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.10599 [pdf, ps, other]

Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators

Authors: Tianyou Li, Fan Chen, Huajie Chen, Zaiwen Wen

Abstract: Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational i… ▽ More Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational inference. In this paper, we consider the SGD algorithm that employ the Markov Chain Monte Carlo (MCMC) estimator to compute the gradient, called MCMC-SGD. Since MCMC reduces the sampling complexity significantly, it is an asymptotically convergent biased estimator in practice. Moreover, by incorporating a general class of unbounded functions, it is much more difficult to analyze the MCMC sampling error. Therefore, we assume that the function is sub-exponential and use the Bernstein inequality for non-stationary Markov chains to derive error bounds of the MCMC estimator. Consequently, MCMC-SGD is proven to have a first order convergence rate $O(\log K/\sqrt{n K})$ with $K$ iterations and a sample size $n$. It partially explains how MCMC influences the behavior of SGD. Furthermore, we verify the correlated negative curvature condition under reasonable assumptions. It is shown that MCMC-SGD escapes from saddle points and reaches $(ε,ε^{1/4})$ approximate second order stationary points or $ε^{1/2}$-variance points at least $O(ε^{-11/2}\log^{2}(1/ε) )$ steps with high probability. Our analysis unveils the convergence pattern of MCMC-SGD across a broad class of stochastic optimization problems, and interprets the convergence phenomena observed in practical applications. △ Less

Submitted 23 March, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

arXiv:2302.10796 [pdf, ps, other]

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Authors: Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, Liwei Wang

Abstract: While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision pr… ▽ More While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with $S$ states, $A$ actions, and horizon $H$, and establish an $\mathcal{O}(\mathrm{poly}(S, A, H, \log T))$ worst-case regret for it, where $T$ is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation and prove that it enjoys $\mathcal{O}(\mathrm{poly}(d, H, \log T))$ regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the $Ω(\sqrt{T})$-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret. △ Less

Submitted 13 June, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: ICML 2024

arXiv:2302.04437 [pdf, other]

rMultiNet: An R Package For Multilayer Networks Analysis

Authors: Ting Li, Zhongyuan Lyu, Chenyu Ren, Dong Xia

Abstract: This paper develops an R package rMultiNet to analyze multilayer network data. We provide two general frameworks from recent literature, e.g. mixture multilayer stochastic block model(MMSBM) and mixture multilayer latent space model(MMLSM) to generate the multilayer network. We also provide several methods to reveal the embedding of both nodes and layers followed by further data analysis methods,… ▽ More This paper develops an R package rMultiNet to analyze multilayer network data. We provide two general frameworks from recent literature, e.g. mixture multilayer stochastic block model(MMSBM) and mixture multilayer latent space model(MMLSM) to generate the multilayer network. We also provide several methods to reveal the embedding of both nodes and layers followed by further data analysis methods, such as clustering. Three real data examples are processed in the package. The source code of rMultiNet is available at https://github.com/ChenyuzZZ73/rMultiNet. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2211.09221 [pdf, other]

The non-overlap** statistical approximation to overlap** group lasso

Authors: Mingyu Qi, Tianxi Li

Abstract: Group lasso is a commonly used regularization method in statistical learning in which parameters are eliminated from the model according to predefined groups. However, when the groups overlap, optimizing the group lasso penalized objective can be time-consuming on large-scale problems because of the non-separability induced by the overlap** groups. This bottleneck has seriously limited the appli… ▽ More Group lasso is a commonly used regularization method in statistical learning in which parameters are eliminated from the model according to predefined groups. However, when the groups overlap, optimizing the group lasso penalized objective can be time-consuming on large-scale problems because of the non-separability induced by the overlap** groups. This bottleneck has seriously limited the application of overlap** group lasso regularization in many modern problems, such as gene pathway selection and graphical model estimation. In this paper, we propose a separable penalty as an approximation of the overlap** group lasso penalty. Thanks to the separability, the computation of regularization based on our penalty is substantially faster than that of the overlap** group lasso, especially for large-scale and high-dimensional problems. We show that the penalty is the tightest separable relaxation of the overlap** group lasso norm within the family of $\ell_{q_1}/\ell_{q_2}$ norms. Moreover, we show that the estimator based on the proposed separable penalty is statistically equivalent to the one based on the overlap** group lasso penalty with respect to their error bounds and the rate-optimal performance under the squared loss. We demonstrate the faster computational time and statistical equivalence of our method compared with the overlap** group lasso in simulation examples and a classification problem of cancer tumors based on gene expression and multiple gene pathways. △ Less

Submitted 20 February, 2024; v1 submitted 16 November, 2022; originally announced November 2022.

arXiv:2211.05844 [pdf, ps, other]

Quantile Fourier Transform, Quantile Series, and Nonparametric Estimation of Quantile Spectra

Authors: Ta-Hsin Li

Abstract: A nonparametric method is proposed for estimating the quantile spectra and cross-spectra introduced in Li (2012; 2014) as bivariate functions of frequency and quantile level. The method is based on the quantile discrete Fourier transform (QDFT) defined by trigonometric quantile regression and the quantile series (QSER) defined by the inverse Fourier transform of the QDFT. A nonparametric spectral… ▽ More A nonparametric method is proposed for estimating the quantile spectra and cross-spectra introduced in Li (2012; 2014) as bivariate functions of frequency and quantile level. The method is based on the quantile discrete Fourier transform (QDFT) defined by trigonometric quantile regression and the quantile series (QSER) defined by the inverse Fourier transform of the QDFT. A nonparametric spectral estimator is constructed from the autocovariance and cross-covariance functions of the QSER using the lag-window (LW) approach. Various quantile smoothing techniques are employed further to reduce the statistical variability of the estimator across quantiles, among which is a new technique called spline quantile regression (SQR). The performance of the proposed estimation method is evaluated through a simulation study. △ Less

Submitted 10 November, 2022; originally announced November 2022.

arXiv:2210.14086 [pdf, ps, other]

A Global Wavelet Based Bootstrapped Test of Covariance Stationarity

Authors: Jonathan B. Hill, Tianqi Li

Abstract: We propose a covariance stationarity test for an otherwise dependent and possibly globally non-stationary time series. We work in a generalized version of the new setting in **, Wang and Wang (2015), who exploit Walsh (1923) functions in order to compare sub-sample covariances with the full sample counterpart. They impose strict stationarity under the null, only consider linear processes under ei… ▽ More We propose a covariance stationarity test for an otherwise dependent and possibly globally non-stationary time series. We work in a generalized version of the new setting in **, Wang and Wang (2015), who exploit Walsh (1923) functions in order to compare sub-sample covariances with the full sample counterpart. They impose strict stationarity under the null, only consider linear processes under either hypothesis in order to achieve a parametric estimator for an inverted high dimensional asymptotic covariance matrix, and do not consider any other orthonormal basis. Conversely, we work with a general orthonormal basis under mild conditions that include Haar wavelet and Walsh functions; and we allow for linear or nonlinear processes with possibly non-iid innovations. This is important in macroeconomics and finance where nonlinear feedback and random volatility occur in many settings. We completely sidestep asymptotic covariance matrix estimation and inversion by bootstrap** a max-correlation difference statistic, where the maximum is taken over the correlation lag $h$ and basis generated sub-sample counter $k$ (the number of systematic samples). We achieve a higher feasible rate of increase for the maximum lag and counter $\mathcal{H}_{T}$ and $\mathcal{K}_{T}$. Of particular note, our test is capable of detecting breaks in variance, and distant, or very mild, deviations from stationarity. △ Less

Submitted 21 May, 2024; v1 submitted 25 October, 2022; originally announced October 2022.

MSC Class: 62G10; 62M10; 62F40

arXiv:2210.09217 [pdf, other]

Statistical learning methods for neuroimaging data analysis with applications

Authors: Hongtu Zhu, Tengfei Li, Bingxin Zhao

Abstract: The aim of this paper is to provide a comprehensive review of statistical challenges in neuroimaging data analysis from neuroimaging techniques to large-scale neuroimaging studies to statistical learning methods. We briefly review eight popular neuroimaging techniques and their potential applications in neuroscience research and clinical translation. We delineate the four common themes of neuroima… ▽ More The aim of this paper is to provide a comprehensive review of statistical challenges in neuroimaging data analysis from neuroimaging techniques to large-scale neuroimaging studies to statistical learning methods. We briefly review eight popular neuroimaging techniques and their potential applications in neuroscience research and clinical translation. We delineate the four common themes of neuroimaging data and review major image processing analysis methods for processing neuroimaging data at the individual level. We briefly review four large-scale neuroimaging-related studies and a consortium on imaging genomics and discuss four common themes of neuroimaging data analysis at the population level. We review nine major population-based statistical analysis methods and their associated statistical challenges and present recent progress in statistical methodology to address these challenges. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 73 pages, 4 Figures

arXiv:2210.01084 [pdf, other]

A Partially Functional Linear Modeling Framework for Integrating Genetic, Imaging, and Clinical Data

Authors: Ting Li, Yang Yu, J. S. Marron, Hongtu Zhu

Abstract: This paper is motivated by the joint analysis of genetic, imaging, and clinical (GIC) data collected in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. We propose a regression framework based on partially functional linear regression models to map high-dimensional GIC-related pathways for Alzheimer's Disease (AD). We develop a joint model selection and estimation procedure by embeddi… ▽ More This paper is motivated by the joint analysis of genetic, imaging, and clinical (GIC) data collected in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. We propose a regression framework based on partially functional linear regression models to map high-dimensional GIC-related pathways for Alzheimer's Disease (AD). We develop a joint model selection and estimation procedure by embedding imaging data in the reproducing kernel Hilbert space and imposing the L0 penalty for the coefficients of genetic variables. We apply the proposed method to the ADNI dataset to identify important features from tens of thousands of genetic polymorphisms (reduced from millions using a preprocessing step) and study the effects of a certain set of informative genetic variants and the baseline hippocampus surface on thirteen future cognitive scores measuring different aspects of cognitive function. We explore the shared and different heritability patterns of these cognitive scores. Analysis results suggest that both the hippocampal and genetic data have heterogeneous effects on different scores, with the trend that the value of both hippocampi is negatively associated with the severity of cognition deficits. Polygenic effects are observed for all thirteen cognitive scores. The well-known APOE4 genotype only explains a small part of cognitive function. Shared genetic etiology exists, however, greater genetic heterogeneity exists within disease classifications after accounting for the baseline diagnosis status. These analyses are useful in further investigation of functional mechanisms for AD evolution. △ Less

Submitted 22 February, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.10433 [pdf, ps, other]

Arithmetic Average Density Fusion -- Part II: Unified Derivation for Unlabeled and Labeled RFS Fusion

Authors: Tiancheng Li

Abstract: As a fundamental information fusion approach, the arithmetic average (AA) fusion has recently been investigated for various random finite set (RFS) filter fusion in the context of multi-sensor multi-target tracking. It is not a straightforward extension of the ordinary density-AA fusion to the RFS distribution but has to preserve the form of the fusing multi-target density. In this work, we first… ▽ More As a fundamental information fusion approach, the arithmetic average (AA) fusion has recently been investigated for various random finite set (RFS) filter fusion in the context of multi-sensor multi-target tracking. It is not a straightforward extension of the ordinary density-AA fusion to the RFS distribution but has to preserve the form of the fusing multi-target density. In this work, we first propose a statistical concept, probability hypothesis density (PHD) consistency, and explain how it can be achieved by the PHD-AA fusion and lead to more accurate and robust detection and localization of the present targets. This forms a both theoretically sound and technically meaningful reason for performing inter-filter PHD AA-fusion/consensus, while preserving the form of the fusing RFS filter. Then, we derive and analyze the proper AA fusion formulations for most existing unlabeled/labeled RFS filters basing on the (labeled) PHD-AA/consistency. These derivations are theoretically unified, exact, need no approximation and greatly enable heterogenous unlabeled and labeled RFS density fusion which is separately demonstrated in two consequent companion papers. △ Less

Submitted 22 November, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 13 pages, 4 figures, 1 table

Journal ref: IEEE Transactions on Aerospace and Electronics Systems, 2024

arXiv:2208.02161 [pdf, other]

A Screening Strategy for Structured Optimization Involving Nonconvex $\ell_{q,p}$ Regularization

Authors: Tiange Li, Xiangyu Yang, Hao Wang

Abstract: In this paper, we develop a simple yet effective screening rule strategy to improve the computational efficiency in solving structured optimization involving nonconvex $\ell_{q,p}$ regularization. Based on an iteratively reweighted $\ell_1$ (IRL1) framework, the proposed screening rule works like a preprocessing module that potentially removes the inactive groups before starting the subproblem sol… ▽ More In this paper, we develop a simple yet effective screening rule strategy to improve the computational efficiency in solving structured optimization involving nonconvex $\ell_{q,p}$ regularization. Based on an iteratively reweighted $\ell_1$ (IRL1) framework, the proposed screening rule works like a preprocessing module that potentially removes the inactive groups before starting the subproblem solver, thereby reducing the computational time in total. This is mainly achieved by heuristically exploiting the dual subproblem information during each iteration.Moreover, we prove that our screening rule can remove all inactive variables in a finite number of iterations of the IRL1 method. Numerical experiments illustrate the efficiency of our screening rule strategy compared with several state-of-the-art algorithms. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2207.05301 [pdf, other]

Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation

Authors: Tianyi Li

Abstract: The graph-theoretical task of determining most likely inter-community edges based on disconnected subgraphs' intra-community connectivity is proposed. An algorithm is developed for this edge augmentation task, based on elevating the zero eigenvalues of graph's spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are authenticate… ▽ More The graph-theoretical task of determining most likely inter-community edges based on disconnected subgraphs' intra-community connectivity is proposed. An algorithm is developed for this edge augmentation task, based on elevating the zero eigenvalues of graph's spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are authenticated with simulation on random graphs. The algorithm works consistently across synthetic and real networks, yielding desirable performance at connecting graph components. Edge augmentation reverse-engineers graph partition under different community detection methods (Girvan-Newman method, greedy modularity maximization, label propagation, Louvain method, and fluid community), in most cases producing inter-community edges at >50% frequency. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: 6 pages, 3 figures

arXiv:2206.10240 [pdf, other]

Core-Elements for Classical Linear Regression

Authors: Mengyu Li, Jun Yu, Tao Li, Cheng Meng

Abstract: The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample. Such an approach has been used pervasively in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numeric… ▽ More The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample. Such an approach has been used pervasively in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numerically sparse. To overcome the limitation, we develop a novel element-wise subset selection approach, called core-elements, for large-scale least squares estimation in classical linear regression. We provide a deterministic algorithm to construct the core-elements estimator, only requiring an $O(\mbox{nnz}(\mathbf{X})+rp^2)$ computational cost, where $\mathbf{X}$ is an $n\times p$ predictor matrix, $r$ is the number of elements selected from each column of $\mathbf{X}$, and $\mbox{nnz}(\cdot)$ denotes the number of non-zero elements. Theoretically, we show that the proposed estimator is unbiased and approximately minimizes an upper bound of the estimation variance. We also provide an approximation guarantee by deriving a coresets-like finite sample bound for the proposed estimator. To handle potential outliers in the data, we further combine core-elements with the median-of-means procedure, resulting in an efficient and robust estimator with theoretical consistency guarantees. Numerical studies on various synthetic and open-source datasets demonstrate the proposed method's superior performance compared to mainstream competitors. △ Less

Submitted 17 March, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.08649 [pdf]

On the probability of invalidating a causal inference due to limited external validity

Authors: Tenglong Li

Abstract: External validity is often questionable in empirical research, especially in randomized experiments due to the trade-off between internal validity and external validity. To quantify the robustness of external validity, one must first conceptualize the gap between a sample that is fully representative of the target population (i.e., the ideal sample) and the observed sample. Drawing on Frank & Min… ▽ More External validity is often questionable in empirical research, especially in randomized experiments due to the trade-off between internal validity and external validity. To quantify the robustness of external validity, one must first conceptualize the gap between a sample that is fully representative of the target population (i.e., the ideal sample) and the observed sample. Drawing on Frank & Min (2007) and Frank et al. (2013), I define such gap as the unobserved sample and intend to quantify its relationship with the null hypothesis statistical testing (NHST) in this study. The probability of invalidating a causal inference due to limited external validity, i.e., the PEV, is the probability of failing to reject the null hypothesis based on the ideal sample provided the null hypothesis has been rejected based on the observed sample. This study illustrates the guideline and the procedure of evaluating external validity with the PEV through an empirical example (i.e., Borman et al. (2008)). Specifically, one would be able to locate the threshold of the unobserved sample statistic that would make the PEV higher than a desired value and use this information to characterize the unobserved sample that would render external validity of the research in question less robust. The PEV is shown to be linked to statistical power when the NHST is thought to be based on the ideal sample. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 57 pages, 3 figures

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.03861 [pdf, ps, other]

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Authors: Xiwei Zhang, Tao Li, Xiaozheng Fu

Abstract: We study the decentralized online regularized linear regression algorithm over random time-varying graphs. At each time step, every node runs an online estimation algorithm consisting of an innovation term processing its own new measurement, a consensus term taking a weighted sum of estimations of its own and its neighbors with additive and multiplicative communication noises and a regularization… ▽ More We study the decentralized online regularized linear regression algorithm over random time-varying graphs. At each time step, every node runs an online estimation algorithm consisting of an innovation term processing its own new measurement, a consensus term taking a weighted sum of estimations of its own and its neighbors with additive and multiplicative communication noises and a regularization term preventing over-fitting. It is not required that the regression matrices and graphs satisfy special statistical assumptions such as mutual independence, spatio-temporal independence or stationarity. We develop the nonnegative supermartingale inequality of the estimation error, and prove that the estimations of all nodes converge to the unknown true parameter vector almost surely if the algorithm gains, graphs and regression matrices jointly satisfy the sample path spatio-temporal persistence of excitation condition. Especially, this condition holds by choosing appropriate algorithm gains if the graphs are uniformly conditionally jointly connected and conditionally balanced, and the regression models of all nodes are uniformly conditionally spatio-temporally jointly observable, under which the algorithm converges in mean square and almost surely. In addition, we prove that the regret upper bound is $O(T^{1-τ}\ln T)$, where $τ\in (0.5,1)$ is a constant depending on the algorithm gains. △ Less

Submitted 21 April, 2024; v1 submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.01341 [pdf, other]

Equip** Black-Box Policies with Model-Based Advice for Stable Nonlinear Control

Authors: Tongxin Li, Ruixiao Yang, Guannan Qu, Yiheng Lin, Steven Low, Adam Wierman

Abstract: Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of equip** a black-box control policy with model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive conv… ▽ More Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of equip** a black-box control policy with model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an adaptive $λ$-confident policy, with a coefficient $λ$ indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive $λ$-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive $λ$-confident policy and verify its efficacy in case studies about the CartPole problem and a real-world electric vehicle (EV) charging problem with data bias due to COVID-19. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 33 pages, 7 figures

arXiv:2205.15059 [pdf, other]

Hilbert Curve Projection Distance for Distribution Comparison

Authors: Tao Li, Cheng Meng, Hongteng Xu, Jun Yu

Abstract: Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to… ▽ More Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the $L_p$ cost in the $d$-dimensional space converges to its population counterpart at a rate of no more than $O(n^{-1/2\max\{d,p\}})$. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance. △ Less

Submitted 6 February, 2024; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: 33 pages, 11 figures

arXiv:2205.05800 [pdf, other]

Stochastic first-order methods for average-reward Markov decision processes

Authors: Tianjiao Li, Feiyang Wu, Guanghui Lan

Abstract: We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy t… ▽ More We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy these issues, we develop a novel variance-reduced temporal difference (VRTD) method with linear function approximation for randomized policies along with sharp convergence guarantees, and an exploratory variance-reduced temporal difference (EVRTD) method for insufficiently random policies with comparable convergence guarantees. We further establish linear convergence rate on the bias of policy evaluation, which is essential for improving the overall sample complexity of policy optimization. On the other hand, compared with intensive research interest in finite sample analysis of policy gradient methods for discounted MDPs, existing studies on policy gradient methods for AMDPs mostly focus on regret bounds under restrictive assumptions on the underlying Markov processes (see, e.g., Abbasi-Yadkori et al., 2019), and they often lack guarantees on the overall sample complexities. Towards this end, we develop an average-reward variant of the stochastic policy mirror descent (SPMD) (Lan, 2022). We establish the first $\widetilde{\mathcal{O}}(ε^{-2})$ sample complexity for solving AMDPs with policy gradient method under both the generative model (with unichain assumption) and Markovian noise model (with ergodic assumption). This bound can be further improved to $\widetilde{\mathcal{O}}(ε^{-1})$ for solving regularized AMDPs. Our theoretical advantages are corroborated by numerical experiments. △ Less

Submitted 14 September, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

arXiv:2202.05963 [pdf, other]

Private Adaptive Optimization with Side Information

Authors: Tian Li, Manzil Zaheer, Sashank J. Reddi, Virginia Smith

Abstract: Adaptive optimization methods have become the default solvers for many machine learning tasks. Unfortunately, the benefits of adaptivity may degrade when training with differential privacy, as the noise added to ensure privacy reduces the effectiveness of the adaptive preconditioner. To this end, we propose AdaDPS, a general framework that uses non-sensitive side information to precondition the gr… ▽ More Adaptive optimization methods have become the default solvers for many machine learning tasks. Unfortunately, the benefits of adaptivity may degrade when training with differential privacy, as the noise added to ensure privacy reduces the effectiveness of the adaptive preconditioner. To this end, we propose AdaDPS, a general framework that uses non-sensitive side information to precondition the gradients, allowing the effective use of adaptive methods in private settings. We formally show AdaDPS reduces the amount of noise needed to achieve similar privacy guarantees, thereby improving optimization performance. Empirically, we leverage simple and readily available side information to explore the performance of AdaDPS in practice, comparing to strong baselines in both centralized and federated settings. Our results show that AdaDPS improves accuracy by 7.7% (absolute) on average -- yielding state-of-the-art privacy-utility trade-offs on large-scale text and image benchmarks. △ Less

Submitted 24 June, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

Comments: ICML 2022

arXiv:2112.13109 [pdf, other]

Accelerated and instance-optimal policy evaluation with linear function approximation

Authors: Tianjiao Li, Guanghui Lan, Ashwin Pananjady

Abstract: We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-depe… ▽ More We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments. △ Less

Submitted 13 August, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

arXiv:2112.08507 [pdf, other]

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

Authors: Tong Li, Jacob Nogas, Haochen Song, Harsh Kumar, Audrey Durand, Anna Rafferty, Nina Deliu, Sofia S. Villar, Joseph J. Williams

Abstract: Multi-armed bandit algorithms like Thompson Sampling (TS) can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference i… ▽ More Multi-armed bandit algorithms like Thompson Sampling (TS) can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference in arms when there truly is one. We tackle this by introducing a novel heuristic algorithm, called TS-PostDiff (Posterior Probability of Difference). TS-PostDiff takes a Bayesian approach to mixing TS and Uniform Random (UR): the probability a participant is assigned using UR allocation is the posterior probability that the difference between two arms is 'small' (below a certain threshold), allowing for more UR exploration when there is little or no reward to be gained. We evaluate TS-PostDiff against state-of-the-art strategies. The empirical and simulation results help characterize the trade-offs of these approaches between reward, False Positive Rate (FPR), and statistical power, as well as under which circumstances each is effective. We quantify the advantage of TS-PostDiff in performing well across multiple differences in arm means (effect sizes), showing the benefits of adaptively changing randomization/exploration in TS in a "Statistically Considerate" manner: reducing FPR and increasing statistical power when differences are small or zero and there is less reward to be gained, while exploiting more when differences may be large. This highlights important considerations for future algorithm development and analysis to better balance reward and statistical analysis. △ Less

Submitted 23 November, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

arXiv:2111.14069 [pdf, other]

Escape saddle points by a simple gradient-descent based algorithm

Authors: Chenyi Zhang, Tongyang Li

Abstract: Esca** saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an $ε$-approximate second-order stationary point in $\tilde{O}(\log n/ε^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by ** et al. with… ▽ More Esca** saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an $ε$-approximate second-order stationary point in $\tilde{O}(\log n/ε^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by ** et al. with $\tilde{O}((\log n)^{4}/ε^{2})$ or $\tilde{O}((\log n)^{6}/ε^{1.75})$ iterations, our algorithm is polynomially better in terms of $\log n$ and matches their complexities in terms of $1/ε$. For the stochastic setting, our algorithm outputs an $ε$-approximate second-order stationary point in $\tilde{O}((\log n)^{2}/ε^{4})$ iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in $\log n$ compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results. △ Less

Submitted 28 November, 2021; originally announced November 2021.

Comments: 34 pages, 8 figures, to appear in the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2111.03721 [pdf, other]

Compressed spectral screening for large-scale differential correlation analysis with application in selecting Glioblastoma gene modules

Authors: Tianxi Li, Xiwei Tang, Ajay Chatrath

Abstract: Differential co-expression analysis has been widely applied by scientists in understanding the biological mechanisms of diseases. However, the unknown differential patterns are often complicated; thus, models based on simplified parametric assumptions can be ineffective in identifying the differences. Meanwhile, the gene expression data involved in such analysis are in extremely high dimensions by… ▽ More Differential co-expression analysis has been widely applied by scientists in understanding the biological mechanisms of diseases. However, the unknown differential patterns are often complicated; thus, models based on simplified parametric assumptions can be ineffective in identifying the differences. Meanwhile, the gene expression data involved in such analysis are in extremely high dimensions by nature, whose correlation matrices may not even be computable. Such a large scale seriously limits the application of most well-studied statistical methods. This paper introduces a simple yet powerful approach to the differential correlation analysis problem called compressed spectral screening. By leveraging spectral structures and random sampling techniques, our approach could achieve a highly accurate screening of features with complicated differential patterns while maintaining the scalability to analyze correlation matrices of $10^4$--$10^5$ variables within a few minutes on a standard personal computer. We have applied this screening approach in comparing a TCGA data set about Glioblastoma with normal subjects. Our analysis successfully identifies multiple functional modules of genes that exhibit different co-expression patterns. The findings reveal new insights about Glioblastoma's evolving mechanism. The validity of our approach is also justified by a theoretical analysis, showing that the compressed spectral analysis can achieve variable screening consistency. △ Less

Submitted 12 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

arXiv:2109.06141 [pdf, other]

On Tilted Losses in Machine Learning: Theory and Applications

Authors: Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith

Abstract: Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -… ▽ More Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches. △ Less

Submitted 1 June, 2023; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.01162

arXiv:2109.00171 [pdf]

A generalized bootstrap procedure of the standard error and confidence interval estimation for inverse probability of treatment weighting

Authors: Tenglong Li, Jordan Lawson

Abstract: The inverse probability of treatment weighting (IPTW) approach is commonly used in propensity score analysis to infer causal effects in regression models. Due to oversized IPTW weights and errors associated with propensity score estimation, the IPTW approach can underestimate the standard error of causal effect. To remediate this, bootstrap standard errors have been recommended to replace the IPTW… ▽ More The inverse probability of treatment weighting (IPTW) approach is commonly used in propensity score analysis to infer causal effects in regression models. Due to oversized IPTW weights and errors associated with propensity score estimation, the IPTW approach can underestimate the standard error of causal effect. To remediate this, bootstrap standard errors have been recommended to replace the IPTW standard error, but the ordinary bootstrap (OB) procedure might still result in underestimation of the standard error because of its inefficient sampling algorithm and un-stabilized weights. In this paper, we develop a generalized bootstrap (GB) procedure for estimating the standard error of the IPTW approach. Compared with the OB procedure, the GB procedure has much lower risk of underestimating the standard error and is more efficient for both point and standard error estimates. The GB procedure also has smaller risk of standard error underestimation than the ordinary bootstrap procedure with trimmed weights, with comparable efficiencies. We demonstrate the effectiveness of the GB procedure via a simulation study and a dataset from the National Educational Longitudinal Study-1988 (NELS-88). △ Less

Submitted 31 August, 2021; originally announced September 2021.

Showing 1–50 of 154 results for author: Li, T