Search | arXiv e-print repository

On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

Authors: Jerry Yao-Chieh Hu, Weimin Wu, Zhuoru Li, Zhao Song, Han Liu

Abstract: We investigate the statistical and computational limits of latent \textbf{Di}ffusion \textbf{T}ransformers (\textbf{DiT}s) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score function, as well as the distribution recovery property of the initial data. Specifically, under mild data assumptions, we deri… ▽ More We investigate the statistical and computational limits of latent \textbf{Di}ffusion \textbf{T}ransformers (\textbf{DiT}s) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score function, as well as the distribution recovery property of the initial data. Specifically, under mild data assumptions, we derive an approximation error bound for the score network of latent DiTs, which is sub-linear in the latent space dimension. Additionally, we derive the corresponding sample complexity bound and show that the data distribution generated from the estimated score function converges toward a proximate area of the original one. Computationally, we characterize the hardness of both forward inference and backward computation of latent DiTs, assuming the Strong Exponential Time Hypothesis (SETH). For forward inference, we identify efficient criteria for all possible latent DiTs inference algorithms and showcase our theory by pushing the efficiency toward almost-linear time inference. For backward computation, we leverage the low-rank structure within the gradient computation of DiTs training for possible algorithmic speedup. Specifically, we show that such speedup achieves almost-linear time latent DiTs training by casting the DiTs gradient as a series of chained low-rank approximations with bounded error. Under the low-dimensional assumption, we show that the convergence rate and the computational efficiency are both dominated by the dimension of the subspace, suggesting that latent DiTs have the potential to bypass the challenges associated with the high dimensionality of initial data. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.19049 [pdf, other]

Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation

Authors: Amartya Sanyal, Yaxi Hu, Yaodong Yu, Yian Ma, Yixin Wang, Bernhard Schölkopf

Abstract: "Accuracy-on-the-line" is a widely observed phenomenon in machine learning, where a model's accuracy on in-distribution (ID) and out-of-distribution (OOD) data is positively correlated across different hyperparameters and data configurations. But when does this useful relationship break down? In this work, we explore its robustness. The key observation is that noisy data and the presence of nuisan… ▽ More "Accuracy-on-the-line" is a widely observed phenomenon in machine learning, where a model's accuracy on in-distribution (ID) and out-of-distribution (OOD) data is positively correlated across different hyperparameters and data configurations. But when does this useful relationship break down? In this work, we explore its robustness. The key observation is that noisy data and the presence of nuisance features can be sufficient to shatter the Accuracy-on-the-line phenomenon. In these cases, ID and OOD accuracy can become negatively correlated, leading to "Accuracy-on-the-wrong-line". This phenomenon can also occur in the presence of spurious (shortcut) features, which tend to overshadow the more complex signal (core, non-spurious) features, resulting in a large nuisance feature space. Moreover, scaling to larger datasets does not mitigate this undesirable behavior and may even exacerbate it. We formally prove a lower bound on Out-of-distribution (OOD) error in a linear classification model, characterizing the conditions on the noise and nuisance features for a large OOD error. We finally demonstrate this phenomenon across both synthetic and real datasets with noisy data and nuisance features. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.14380 [pdf, other]

Estimating Treatment Effects under Recommender Interference: A Structured Neural Networks Approach

Authors: Ruohan Zhan, Shichao Han, Yuchen Hu, Zhenling Jiang

Abstract: Recommender systems are essential for content-sharing platforms by curating personalized content. To evaluate updates to recommender systems targeting content creators, platforms frequently rely on creator-side randomized experiments. The treatment effect measures the change in outcomes when a new algorithm is implemented compared to the status quo. We show that the standard difference-in-means es… ▽ More Recommender systems are essential for content-sharing platforms by curating personalized content. To evaluate updates to recommender systems targeting content creators, platforms frequently rely on creator-side randomized experiments. The treatment effect measures the change in outcomes when a new algorithm is implemented compared to the status quo. We show that the standard difference-in-means estimator can lead to biased estimates due to recommender interference that arises when treated and control creators compete for exposure. We propose a "recommender choice model" that describes which item gets exposed from a pool containing both treated and control items. By combining a structural choice model with neural networks, this framework directly models the interference pathway while accounting for rich viewer-content heterogeneity. We construct a debiased estimator of the treatment effect and prove it is $\sqrt n$-consistent and asymptotically normal with potentially correlated samples. We validate our estimator's empirical performance with a field experiment on Weixin short-video platform. In addition to the standard creator-side experiment, we conduct a costly double-sided randomization design to obtain a benchmark estimate free from interference bias. We show that the proposed estimator yields results comparable to the benchmark, whereas the standard difference-in-means estimator can exhibit significant bias and even produce reversed signs. △ Less

Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.03136 [pdf, ps, other]

Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu

Abstract: We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of n… ▽ More We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of nearly linear algorithms by controlling the LoRA update computation term by term, assuming the Strong Exponential Time Hypothesis (SETH). For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $\mathbf{X}$, pretrained weights $\mathbf{W^\star}$, and adapter matrices $α\mathbf{B} \mathbf{A} / r$. Specifically, we derive a shared upper bound threshold for such norms and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of nearly linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $\mathbf{W}_V$ and $\mathbf{W}_Q$) and full adaptations (e.g., $\mathbf{W}_Q$, $\mathbf{W}_V$, and $\mathbf{W}_K$) of weights in attention heads. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01575 [pdf, other]

Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes

Authors: Vinzenz Thoma, Barna Pasztor, Andreas Krause, Giorgia Ramponi, Yifan Hu

Abstract: In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP c… ▽ More In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of (many) MDPs that (potentially multiple) followers best respond to. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as model design for MDPs, tax design, reward sha** and dynamic mechanism design. We propose a stochastic Hyper Policy Gradient Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence. Notably, HPGD only utilizes observations of the followers' trajectories. Therefore, it allows followers to use any training procedure and the leader to be agnostic of the specific algorithm used, which aligns with various real-world scenarios. We further consider the setting when the leader can influence the training of followers and propose an accelerated algorithm. We empirically demonstrate the performance of our algorithm. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 54 pages, 18 Figures

arXiv:2405.19463 [pdf, other]

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Authors: Xuxing Chen, Abhishek Roy, Yifan Hu, Krishnakumar Balasubramanian

Abstract: We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true mode… ▽ More We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-ι})$ for any $ι>0$, respectively under the availability of two-sample and one-sample oracles, respectively, where $T$ is the number of iterations. Importantly, under the availability of the two-sample oracle, our procedure avoids explicitly modeling and estimating the relationship between confounder and the instrumental variables, demonstrating the benefit of the proposed approach over recent works based on reformulating the problem as minimax optimization problems. Numerical experiments are provided to corroborate the theoretical results. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16564 [pdf, ps, other]

Contextual Linear Optimization with Bandit Feedback

Authors: Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu

Abstract: Contextual linear optimization (CLO) uses predictive observations to reduce uncertainty in random cost coefficients and thereby improve average-cost performance. An example is a stochastic shortest path with random edge costs (e.g., traffic) and predictive features (e.g., lagged traffic, weather). Existing work on CLO assumes the data has fully observed cost coefficient vectors, but in many applic… ▽ More Contextual linear optimization (CLO) uses predictive observations to reduce uncertainty in random cost coefficients and thereby improve average-cost performance. An example is a stochastic shortest path with random edge costs (e.g., traffic) and predictive features (e.g., lagged traffic, weather). Existing work on CLO assumes the data has fully observed cost coefficient vectors, but in many applications, we can only see the realized cost of a historical decision, that is, just one projection of the random cost coefficient vector, to which we refer as bandit feedback. We study a class of algorithms for CLO with bandit feedback, which we term induced empirical risk minimization (IERM), where we fit a predictive model to directly optimize the downstream performance of the policy it induces. We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate, and we develop computationally tractable surrogate losses. A byproduct of our theory of independent interest is fast-rate regret bound for IERM with full feedback and misspecified policy class. We compare the performance of different modeling choices numerically using a stochastic shortest path example and provide practical insights from the empirical results. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2404.03900 [pdf, other]

Nonparametric Modern Hopfield Models

Authors: Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

Abstract: We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known resul… ▽ More We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known results from the original dense modern Hopfield model but also fills the void in the literature regarding efficient modern Hopfield models, by introducing \textit{sparse-structured} modern Hopfield models with sub-quadratic complexity. We establish that this sparse model inherits the appealing theoretical properties of its dense analogue -- connection with transformer attention, fixed point convergence and exponential memory capacity -- even without knowing details of the Hopfield energy function. Additionally, we showcase the versatility of our framework by constructing a family of modern Hopfield models as extensions, including linear, random masked, top-$K$ and positive random feature modern Hopfield models. Empirically, we validate the efficacy of our framework in both synthetic and realistic settings. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 59 pages; Code available at https://github.com/MAGICS-LAB/NonparametricHopfield

arXiv:2404.03830 [pdf, other]

BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model

Authors: Chenwei Xu, Yu-Chao Huang, Jerry Yao-Chieh Hu, Weijian Li, Ammar Gilani, Hsi-Sheng Goan, Han Liu

Abstract: We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and a… ▽ More We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 40 page; Code available at https://github.com/MAGICS-LAB/BiSHop

arXiv:2404.03828 [pdf, other]

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu

Abstract: We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an out… ▽ More We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{https://arxiv.longhoe.net/abs/2404.03828}{arXiv}. △ Less

Submitted 26 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

arXiv:2404.03827 [pdf, other]

Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

Authors: Dennis Wu, Jerry Yao-Chieh Hu, Teng-Yun Hsiao, Han Liu

Abstract: We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $Φ$ which transforms the Hopfield energy function into kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space.… ▽ More We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $Φ$ which transforms the Hopfield energy function into kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space. Consequently, the kernel norm induced by $Φ$ serves as a novel similarity measure. It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models. Specifically, we accomplish this by constructing a separation loss $\mathcal{L}_Φ$ that separates the local minima of kernelized energy by separating stored memory patterns in kernel space. Methodologically, $\mathtt{U\text{-}Hop}$ memory retrieval process consists of: (Stage I) minimizing separation loss for a more uniform memory (local minimum) distribution, followed by (Stage II) standard Hopfield energy minimization for memory retrieval. This results in a significant reduction of possible metastable states in the Hopfield energy function, thus enhancing memory capacity by preventing memory confusion. Empirically, with real-world datasets, we demonstrate that $\mathtt{U\text{-}Hop}$ outperforms all existing modern Hopfield models and state-of-the-art similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks. Code is available at https://github.com/MAGICS-LAB/UHop ; future updates are on arXiv:2404.03827 △ Less

Submitted 12 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/UHop

arXiv:2403.13041 [pdf, other]

Provable Privacy with Non-Private Pre-Processing

Authors: Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf

Abstract: When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarante… ▽ More When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines. △ Less

Submitted 21 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.01386 [pdf, other]

Minimax-Regret Sample Selection in Randomized Experiments

Authors: Yuchen Hu, Henry Zhu, Emma Brunskill, Stefan Wager

Abstract: Randomized controlled trials are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in a randomized trial, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection s… ▽ More Randomized controlled trials are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in a randomized trial, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection schemes under a variety of conditions. Using data from a COVID-19 vaccine trial, we also highlight how different objectives and decision rules can lead to meaningfully different guidance regarding optimal sample allocation. △ Less

Submitted 25 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.04520 [pdf, ps, other]

On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

Authors: Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song, Han Liu

Abstract: We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and m… ▽ More We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and memory patterns. Only below this criterion, sub-quadratic (efficient) variants of the modern Hopfield model exist, assuming the Strong Exponential Time Hypothesis (SETH). To showcase our theory, we provide a formal example of efficient constructions of modern Hopfield models using low-rank approximation when the efficient criterion holds. This includes a derivation of a lower bound on the computational time, scaling linearly with $\max\{$# of stored memory patterns, length of input query sequence$\}$. In addition, we prove its memory retrieval error bound and exponential memory capacity. △ Less

Submitted 31 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: Accepted at ICML 2024; v2 corrected typos; v3 added clarifications and references; v4,5 updated to camera-ready version

arXiv:2401.00521 [pdf, other]

Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data

Authors: Yuxiao Hu, Qian Li, Xiaodan Shi, **yue Yan, Yuntian Chen

Abstract: Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is of… ▽ More Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is often overlooked or inadequately considered. To address these limitations, we present a novel Multi-spatial Multi-temporal air quality forecasting method based on Graph Convolutional Networks and Gated Recurrent Units (M2G2), bridging the gap in air quality forecasting across spatial and temporal scales. The proposed framework consists of two modules: Multi-scale Spatial GCN (MS-GCN) for spatial information fusion and Multi-scale Temporal GRU(MT-GRU) for temporal information integration. In the spatial dimension, the MS-GCN module employs a bidirectional learnable structure and a residual structure, enabling comprehensive information exchange between individual monitoring stations and the city-scale graph. Regarding the temporal dimension, the MT-GRU module adaptively combines information from different temporal scales through parallel hidden states. Leveraging meteorological indicators and four air quality indicators, we present comprehensive comparative analyses and ablation experiments, showcasing the higher accuracy of M2G2 in comparison to nine currently available advanced approaches across all aspects. The improvements of M2G2 over the second-best method on RMSE of the 24h/48h/72h are as follows: PM2.5: (7.72%, 6.67%, 10.45%); PM10: (6.43%, 5.68%, 7.73%); NO2: (5.07%, 7.76%, 16.60%); O3: (6.46%, 6.86%, 9.79%). Furthermore, we demonstrate the effectiveness of each module of M2G2 by ablation study. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.17346 [pdf, other]

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Authors: Dennis Wu, Jerry Yao-Chieh Hu, Weijian Li, Bo-Yu Chen, Han Liu

Abstract: We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-s… ▽ More We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.13482 [pdf, other]

Spatially Adaptive Variable Screening in Presurgical fMRI Data Analysis

Authors: Yifei Hu, Xinge Jessie Jeng

Abstract: Accurate delineation of tumor-adjacent functional brain regions is essential for planning function-preserving neurosurgery. Functional magnetic resonance imaging (fMRI) is increasingly used for presurgical counseling and planning. When analyzing presurgical fMRI data, false negatives are more dangerous to the patients than false positives because patients are more likely to experience significant… ▽ More Accurate delineation of tumor-adjacent functional brain regions is essential for planning function-preserving neurosurgery. Functional magnetic resonance imaging (fMRI) is increasingly used for presurgical counseling and planning. When analyzing presurgical fMRI data, false negatives are more dangerous to the patients than false positives because patients are more likely to experience significant harm from failing to identify functional regions and subsequently resecting critical tissues. In this paper, we propose a novel spatially adaptive variable screening procedure to enable effective control of false negatives while leveraging the spatial structure of fMRI data. Compared to existing statistical methods in fMRI data analysis, the new procedure directly controls false negatives at a desirable level and is completely data-driven. The new method is also substantially different from existing false-negative control procedures which do not take spatial information into account. Numerical examples show that the new method outperforms several state-of-the-art methods in retaining signal voxels, especially the subtle ones at the boundaries of functional regions, while providing cleaner separation of functional regions from background noise. Such results could be valuable to preserve critical tissues in neurosurgery. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.10796 [pdf, other]

Two sample test for covariance matrices in ultra-high dimension

Authors: Xiucai Ding, Yichen Hu, Zhenggang Wang

Abstract: In this paper, we propose a new test for testing the equality of two population covariance matrices in the ultra-high dimensional setting that the dimension is much larger than the sizes of both of the two samples. Our proposed methodology relies on a data splitting procedure and a comparison of a set of well selected eigenvalues of the sample covariance matrices on the split data sets. Compared t… ▽ More In this paper, we propose a new test for testing the equality of two population covariance matrices in the ultra-high dimensional setting that the dimension is much larger than the sizes of both of the two samples. Our proposed methodology relies on a data splitting procedure and a comparison of a set of well selected eigenvalues of the sample covariance matrices on the split data sets. Compared to the existing methods, our methodology is adaptive in the sense that (i). it does not require specific assumption (e.g., comparable or balancing, etc.) on the sizes of two samples; (ii). it does not need quantitative or structural assumptions of the population covariance matrices; (iii). it does not need the parametric distributions or the detailed knowledge of the moments of the two populations. Theoretically, we establish the asymptotic distributions of the statistics used in our method and conduct the power analysis. We justify that our method is powerful under very weak alternatives. We conduct extensive numerical simulations and show that our method significantly outperforms the existing ones both in terms of size and power. Analysis of two real data sets is also carried out to demonstrate the usefulness and superior performance of our proposed methodology. An $\texttt{R}$ package $\texttt{UHDtst}$ is developed for easy implementation of our proposed methodology. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 43 pages, 1 figure

arXiv:2312.07067 [pdf, other]

Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

Authors: Qian Li, Yuxiao Hu, Yinpeng Dong, Dongxiao Zhang, Yuntian Chen

Abstract: Adversarial training is often formulated as a min-max problem, however, concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model, i.e., previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders", which reveal the hidden h… ▽ More Adversarial training is often formulated as a min-max problem, however, concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model, i.e., previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders", which reveal the hidden high-risk regions within the secure area obtained through adversarial training and prevent the model from finding the real worst cases. We demand the model to prevent hiders when defending against adversarial examples for improving accuracy and robustness simultaneously. By rethinking and redefining the min-max optimization problem for adversarial training, we propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT). HFAT introduces the iterative evolution optimization strategy to simplify the optimization problem and employs an auxiliary model to reveal hiders, effectively combining the optimization directions of standard adversarial training and prevention hiders. Furthermore, we introduce an adaptive weighting mechanism that facilitates the model in adaptively adjusting its focus between adversarial examples and hiders during different training periods. We demonstrate the effectiveness of our method based on extensive experiments, and ensure that HFAT can provide higher robustness and accuracy. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.05758 [pdf, other]

CLeaRForecast: Contrastive Learning of High-Purity Representations for Time Series Forecasting

Authors: Jiaxin Gao, Yuxiao Hu, Qinglong Cao, Siqi Dai, Yuntian Chen

Abstract: Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccur… ▽ More Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccuracies and seriously demoting the forecasting performance. To address this issue, we propose CLeaRForecast, a novel contrastive learning framework to learn high-purity time series representations with proposed sample, feature, and architecture purifying methods. More specifically, to avoid more noise adding caused by the transformations of original samples (series), transformations are respectively applied for trendy and periodic parts to provide better positive samples with obviously less noise. Moreover, we introduce a channel independent training manner to mitigate noise originating from unrelated variables in the multivariate series. By employing a streamlined deep-learning backbone and a comprehensive global contrastive loss function, we prevent noise introduction due to redundant or uneven learning of periodicity and trend. Experimental results show the superior performance of CLeaRForecast in various downstream TSF tasks. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2309.12673 [pdf, other]

On Sparse Modern Hopfield Model

Authors: Jerry Yao-Chieh Hu, Donglin Yang, Dennis Wu, Chenwei Xu, Bo-Yu Chen, Han Liu

Abstract: We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate… ▽ More We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate of the sparse entropic regularizer. Building upon this, we derive the sparse memory retrieval dynamics from the sparse energy function and show its one-step approximation is equivalent to the sparse-structured attention. Importantly, we provide a sparsity-dependent memory retrieval error bound which is provably tighter than its dense analog. The conditions for the benefits of sparsity to arise are therefore identified and discussed. In addition, we show that the sparse modern Hopfield model maintains the robust theoretical properties of its dense counterpart, including rapid fixed point convergence and exponential memory capacity. Empirically, we use both synthetic and real-world datasets to demonstrate that the sparse Hopfield model outperforms its dense counterpart in many situations. △ Less

Submitted 29 November, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 37 pages, accepted at NeurIPS 2023. [v2] updated to match with camera-ready version. Code is available at https://github.com/MAGICS-LAB/SparseModernHopfield

arXiv:2309.08429 [pdf, other]

IHT-Inspired Neural Network for Single-Snapshot DOA Estimation with Sparse Linear Arrays

Authors: Yunqiao Hu, Shunqiao Sun

Abstract: Single-snapshot direction-of-arrival (DOA) estimation using sparse linear arrays (SLAs) has gained significant attention in the field of automotive MIMO radars. This is due to the dynamic nature of automotive settings, where multiple snapshots aren't accessible, and the importance of minimizing hardware costs. Low-rank Hankel matrix completion has been proposed to interpolate the missing elements… ▽ More Single-snapshot direction-of-arrival (DOA) estimation using sparse linear arrays (SLAs) has gained significant attention in the field of automotive MIMO radars. This is due to the dynamic nature of automotive settings, where multiple snapshots aren't accessible, and the importance of minimizing hardware costs. Low-rank Hankel matrix completion has been proposed to interpolate the missing elements in SLAs. However, the solvers of matrix completion, such as iterative hard thresholding (IHT), heavily rely on expert knowledge of hyperparameter tuning and lack task-specificity. Besides, IHT involves truncated-singular value decomposition (t-SVD), which has high computational cost in each iteration. In this paper, we propose an IHT-inspired neural network for single-snapshot DOA estimation with SLAs, termed IHT-Net. We utilize a recurrent neural network structure to parameterize the IHT algorithm. Additionally, we integrate shallow-layer autoencoders to replace t-SVD, reducing computational overhead while generating a novel optimizer through supervised learning. IHT-Net maintains strong interpretability as its network layer operations align with the iterations of the IHT algorithm. The learned optimizer exhibits fast convergence and higher accuracy in the full array signal reconstruction followed by single-snapshot DOA estimation. Numerical results validate the effectiveness of the proposed method. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 5 pages, 5 figures

arXiv:2309.02236 [pdf, other]

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic

Abstract: Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and… ▽ More Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics, leveraging access to a generative model (i.e., simulator). We further demonstrate the statistical sample complexity of the proposed method for different uncertainty sets. These complexity bounds are independent of the number of states and extend beyond linear dynamics, ensuring the effectiveness of our approach in identifying near-optimal distributionally-robust policies. The proposed method can be further combined with other model-free distributionally robust reinforcement learning methods to obtain a near-optimal robust policy. Experimental results demonstrate the robustness of our algorithm to distributional shifts and its superior performance in terms of the number of samples needed. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Journal ref: AISTATS 2024

arXiv:2306.15616 [pdf, other]

Network-Adjusted Covariates for Community Detection

Authors: Yaofang Hu, Wanjie Wang

Abstract: Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the ne… ▽ More Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated. △ Less

Submitted 11 February, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: 48 pages

MSC Class: 91D30; 62F12; 91C20

arXiv:2306.06252 [pdf, other]

Feature Programming for Multivariate Time Series Prediction

Authors: Alex Reneau, Jerry Yao-Chieh Hu, Chenwei Xu, Weijian Li, Ammar Gilani, Han Liu

Abstract: We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework. This framework generates large amounts of predictive features for noisy multivariate time series while allowing users to incorporate their inductive bias with minimal effort. The key motivation of our framework is to view any multivariate time series as a cumulative su… ▽ More We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework. This framework generates large amounts of predictive features for noisy multivariate time series while allowing users to incorporate their inductive bias with minimal effort. The key motivation of our framework is to view any multivariate time series as a cumulative sum of fine-grained trajectory increments, with each increment governed by a novel spin-gas dynamical Ising model. This fine-grained perspective motivates the development of a parsimonious set of operators that summarize multivariate time series in an abstract fashion, serving as the foundation for large-scale automated feature engineering. Numerically, we validate the efficacy of our method on several synthetic and real-world noisy time series datasets. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 21 pages, accepted to ICML2023. Code is available at https://github.com/SirAlex900/FeatureProgramming

arXiv:2306.03962 [pdf, other]

PILLAR: How to make semi-private learning more effective

Authors: Francesco Pinto, Yaxi Hu, Fanny Yang, Amartya Sanyal

Abstract: In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networ… ▽ More In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networks pre-trained on public (labelled or unlabelled) data, whose distribution can significantly differ from the one on which SP learning is performed. To validate its empirical effectiveness, we propose a wide variety of experiments under tight privacy constraints ($ε= 0.1$) and with a focus on low-data regimes. In all of these settings, our algorithm exhibits significantly improved performance over available baselines that use similar amounts of public data. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2303.01887 [pdf, other]

Fast Forecasting of Unstable Data Streams for On-Demand Service Platforms

Authors: Yu Jeffrey Hu, Jeroen Rombouts, Ines Wilms

Abstract: On-demand service platforms face a challenging problem of forecasting a large collection of high-frequency regional demand data streams that exhibit instabilities. This paper develops a novel forecast framework that is fast and scalable, and automatically assesses changing environments without human intervention. We empirically test our framework on a large-scale demand data set from a leading on-… ▽ More On-demand service platforms face a challenging problem of forecasting a large collection of high-frequency regional demand data streams that exhibit instabilities. This paper develops a novel forecast framework that is fast and scalable, and automatically assesses changing environments without human intervention. We empirically test our framework on a large-scale demand data set from a leading on-demand delivery platform in Europe, and find strong performance gains from using our framework against several industry benchmarks, across all geographical regions, loss functions, and both pre- and post-Covid periods. We translate forecast gains to economic impacts for this on-demand service platform by computing financial gains and reductions in computing costs. △ Less

Submitted 31 May, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.05516 [pdf, other]

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Authors: Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

Abstract: Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random s… ▽ More Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called "tail-index") in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules. △ Less

Submitted 29 August, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: To Appear

Journal ref: Transactions of Machine Learning Research, 2023

arXiv:2212.13574 [pdf, other]

Weak Signal Inclusion Under Dependence and Applications in Genome-wide Association Study

Authors: X. Jessie Jeng, Yifei Hu, Quan Sun, Yun Li

Abstract: Motivated by the inquiries of weak signals in underpowered genome-wide association studies (GWASs), we consider the problem of retaining true signals that are not strong enough to be individually separable from a large amount of noise. We address the challenge from the perspective of false negative control and present false negative control (FNC) screening, a data-driven method to efficiently regu… ▽ More Motivated by the inquiries of weak signals in underpowered genome-wide association studies (GWASs), we consider the problem of retaining true signals that are not strong enough to be individually separable from a large amount of noise. We address the challenge from the perspective of false negative control and present false negative control (FNC) screening, a data-driven method to efficiently regulate false negative proportion at a user-specified level. FNC screening is developed in a realistic setting with arbitrary covariance dependence between variables. We calibrate the overall dependence through a parameter whose scale is compatible with the existing phase diagram in high-dimensional sparse inference. Utilizing the new calibration, we asymptotically explicate the joint effect of covariance dependence, signal sparsity, and signal intensity on the proposed method. We interpret the results using a new phase diagram, which shows that FNC screening can efficiently select a set of candidate variables to retain a high proportion of signals even when the signals are not individually separable from noise. Finite sample performance of FNC screening is compared to those of several existing methods in simulation studies. The proposed method outperforms the others in adapting to a user-specified false negative control level. We implement FNC screening to empower a two-stage GWAS procedure, which demonstrates substantial power gain when working with limited sample sizes in real applications. △ Less

Submitted 2 February, 2024; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: arXiv admin note: text overlap with arXiv:2006.15667

arXiv:2212.09996 [pdf, other]

A marginalized three-part interrupted time series regression model for proportional data

Authors: Shangyuan Ye, Maricela Cruz, Yuchen Hu, Yun Yu

Abstract: Interrupted time series (ITS) is often used to evaluate the effectiveness of a health policy intervention that accounts for the temporal dependence of outcomes. When the outcome of interest is a percentage or percentile, the data can be highly skewed, bounded in $[0, 1]$, and have many zeros or ones. A three-part Beta regression model is commonly used to separate zeros, ones, and positive values e… ▽ More Interrupted time series (ITS) is often used to evaluate the effectiveness of a health policy intervention that accounts for the temporal dependence of outcomes. When the outcome of interest is a percentage or percentile, the data can be highly skewed, bounded in $[0, 1]$, and have many zeros or ones. A three-part Beta regression model is commonly used to separate zeros, ones, and positive values explicitly by three submodels. However, incorporating temporal dependence into the three-part Beta regression model is challenging. In this article, we propose a marginalized zero-one-inflated Beta time series model that captures the temporal dependence of outcomes through copula and allows investigators to examine covariate effects on the marginal mean. We investigate its practical performance using simulation studies and apply the model to a real ITS study. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.02585 [pdf, ps, other]

Identification of Unobservables in Observations

Authors: Yingyao Hu

Abstract: In empirical studies, the data usually don't include all the variables of interest in an economic model. This paper shows the identification of unobserved variables in observations at the population level. When the observables are distinct in each observation, there exists a function map** from the observables to the unobservables. Such a function guarantees the uniqueness of the latent value in… ▽ More In empirical studies, the data usually don't include all the variables of interest in an economic model. This paper shows the identification of unobserved variables in observations at the population level. When the observables are distinct in each observation, there exists a function map** from the observables to the unobservables. Such a function guarantees the uniqueness of the latent value in each observation. The key lies in the identification of the joint distribution of observables and unobservables from the distribution of observables. The joint distribution of observables and unobservables then reveal the latent value in each observation. Three examples of this result are discussed. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2212.00570 [pdf, other]

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

Authors: Mert Gürbüzbalaban, Yuanhan Hu, Lingjiong Zhu

Abstract: We consider the constrained sampling problem where the goal is to sample from a target distribution $π(x)\propto e^{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem… ▽ More We consider the constrained sampling problem where the goal is to sample from a target distribution $π(x)\propto e^{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem into an unconstrained sampling problem by introducing a penalty function for constraint violations. When $f$ is smooth and gradients are available, we get $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in the TV distance and $\tilde{\mathcal{O}}(\cdot)$ hides logarithmic factors. For PULMC, we improve the result to $\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7})$ when the Hessian of $f$ is Lipschitz and the boundary of $\mathcal{C}$ is sufficiently smooth. To our knowledge, these are the first convergence results for underdamped Langevin Monte Carlo methods in the constrained sampling that handle non-convex $f$ and provide guarantees with the best dimension dependency among existing methods with deterministic gradient. If unbiased stochastic estimates of the gradient of $f$ are available, we propose PSGLD and PSGULMC methods that can handle stochastic gradients and are scaleable to large datasets without requiring Metropolis-Hasting correction steps. For PSGLD and PSGULMC, when $f$ is strongly convex and smooth, we obtain $\tilde{\mathcal{O}}(d/\varepsilon^{18})$ and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39})$ iteration complexity in W2 distance. When $f$ is smooth and can be non-convex, we provide finite-time performance bounds and iteration complexity results. Finally, we illustrate the performance on Bayesian LASSO regression and Bayesian constrained deep learning problems. △ Less

Submitted 14 April, 2024; v1 submitted 29 November, 2022; originally announced December 2022.

arXiv:2211.15762 [pdf, other]

Understanding the Impact of Adversarial Robustness on Accuracy Disparity

Authors: Yuzheng Hu, Fan Wu, Hongyang Zhang, Han Zhao

Abstract: While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear clas… ▽ More While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model. We decompose the impact of adversarial robustness into two parts: an inherent effect that will degrade the standard accuracy on all classes due to the robustness constraint, and the other caused by the class imbalance ratio, which will increase the accuracy disparity compared to standard training. Furthermore, we also show that such effects extend beyond the Gaussian mixture model, by generalizing our data model to the general family of stable distributions. More specifically, we demonstrate that while the constraint of adversarial robustness consistently degrades the standard accuracy in the balanced class setting, the class imbalance ratio plays a fundamentally different role in accuracy disparity compared to the Gaussian case, due to the heavy tail of the stable distribution. We additionally perform experiments on both synthetic and real-world datasets to corroborate our theoretical findings. Our empirical results also suggest that the implications may extend to nonlinear models over real-world datasets. Our code is publicly available on GitHub at https://github.com/Accuracy-Disparity/AT-on-AD. △ Less

Submitted 28 May, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: Accepted at ICML 2023

arXiv:2210.13965 [pdf]

Exploring the impact of weather on Metro demand forecasting using machine learning method

Authors: Yiming Hu, Yangchuan Huang, Shuying Liu, Yuanyang Qi, Danhui Bai

Abstract: Urban rail transit provides significant comprehensive benefits such as large traffic volume and high speed, serving as one of the most important components of urban traffic construction management and congestion solution. Using real passenger flow data of an Asian subway system from April to June of 2018, this work analyzes the space-time distribution of the passenger flow using short-term traffic… ▽ More Urban rail transit provides significant comprehensive benefits such as large traffic volume and high speed, serving as one of the most important components of urban traffic construction management and congestion solution. Using real passenger flow data of an Asian subway system from April to June of 2018, this work analyzes the space-time distribution of the passenger flow using short-term traffic flow prediction. Stations are divided into four types for passenger flow forecasting, and meteorological records are collected for the same period. Then, machine learning methods with different inputs are applied and multivariate regression is performed to evaluate the improvement effect of each weather element on passenger flow forecasting of representative metro stations on hourly basis. Our results show that by inputting weather variables the precision of prediction on weekends enhanced while the performance on weekdays only improved marginally, while the contribution of different elements of weather differ. Also, different categories of stations are affected differently by weather. This study provides a possible method to further improve other prediction models, and attests to the promise of data-driven analytics for optimization of short-term scheduling in transit management. △ Less

Submitted 4 May, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: 16 pages, 4 figures

arXiv:2210.01300 [pdf, other]

Revealing Unobservables by Deep Learning: Generative Element Extraction Networks (GEEN)

Authors: Yingyao Hu, Yang Liu, Jiaxiong Yao

Abstract: Latent variable models are crucial in scientific research, where a key variable, such as effort, ability, and belief, is unobserved in the sample but needs to be identified. This paper proposes a novel method for estimating realizations of a latent variable $X^*$ in a random sample that contains its multiple measurements. With the key assumption that the measurements are independent conditional on… ▽ More Latent variable models are crucial in scientific research, where a key variable, such as effort, ability, and belief, is unobserved in the sample but needs to be identified. This paper proposes a novel method for estimating realizations of a latent variable $X^*$ in a random sample that contains its multiple measurements. With the key assumption that the measurements are independent conditional on $X^*$, we provide sufficient conditions under which realizations of $X^*$ in the sample are locally unique in a class of deviations, which allows us to identify realizations of $X^*$. To the best of our knowledge, this paper is the first to provide such identification in observation. We then use the Kullback-Leibler distance between the two probability densities with and without the conditional independence as the loss function to train a Generative Element Extraction Networks (GEEN) that maps from the observed measurements to realizations of $X^*$ in the sample. The simulation results imply that this proposed estimator works quite well and the estimated values are highly correlated with realizations of $X^*$. Our estimator can be applied to a large class of latent variable models and we expect it will change how people deal with latent variables. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 19 pages, 6 figures

arXiv:2209.00197 [pdf, other]

Switchback Experiments under Geometric Mixing

Authors: Yuchen Hu, Stefan Wager

Abstract: The switchback is an experimental design that measures treatment effects by repeatedly turning an intervention on and off for a whole system. Switchback experiments are a robust way to overcome cross-unit spillover effects; however, they are vulnerable to bias from temporal carryovers. In this paper, we consider properties of switchback experiments in Markovian systems that mix at a geometric rate… ▽ More The switchback is an experimental design that measures treatment effects by repeatedly turning an intervention on and off for a whole system. Switchback experiments are a robust way to overcome cross-unit spillover effects; however, they are vulnerable to bias from temporal carryovers. In this paper, we consider properties of switchback experiments in Markovian systems that mix at a geometric rate. We find that, in this setting, standard switchback designs suffer considerably from carryover bias: Their estimation error decays as $T^{-1/3}$ in terms of the experiment horizon $T$, whereas in the absence of carryovers a faster rate of $T^{-1/2}$ would have been possible. We also show, however, that judicious use of burn-in periods can considerably improve the situation, and enables errors that decay as $\log(T)^{1/2}T^{-1/2}$. Our formal results are mirrored in an empirical evaluation. △ Less

Submitted 2 April, 2024; v1 submitted 31 August, 2022; originally announced September 2022.

arXiv:2208.00257

Covariate-Assisted Community Detection on Sparse Networks

Authors: Yaofang Hu, Wanjie Wang

Abstract: Community detection is an important problem when processing network data. Traditionally, this is done by exploiting the connections between nodes, but connections can be too sparse to detect communities in many real datasets. Node covariates can be used to assist community detection; see Binkiewicz et al. (2017); Weng and Feng (2022); Yan and Sarkar (2021); Yang et al. (2013). However, how to comb… ▽ More Community detection is an important problem when processing network data. Traditionally, this is done by exploiting the connections between nodes, but connections can be too sparse to detect communities in many real datasets. Node covariates can be used to assist community detection; see Binkiewicz et al. (2017); Weng and Feng (2022); Yan and Sarkar (2021); Yang et al. (2013). However, how to combine covariates with network connections is challenging, because covariates may be high-dimensional and inconsistent with community labels. To study the relationship between covariates and communities, we propose the degree corrected stochastic block model with node covariates (DCSBM-NC). It allows degree heterogeneity among communities and inconsistent labels between communities and covariates. Based on DCSBM-NC, we design the adjusted neighbor-covariate (ANC) data matrix, which leverages covariate information to assist community detection. We then propose the covariate-assisted spectral clustering on ratios of singular vectors (CA-SCORE) method on the ANC matrix. We prove that CA-SCORE successfully recovers community labels when 1) the network is relatively dense; 2) the covariate class labels match the community labels; 3) the data is a mixture of 1) and 2). CA-SCORE has good performance on synthetic and real datasets. The algorithm is implemented in the R(R Core Team (2021)) package CASCORE. △ Less

Submitted 27 June, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

Comments: The theory and algorithm are developed very differently, and so a new submission is in 2306.15616

MSC Class: 91D30; 62F12; 91C20

arXiv:2206.03985 [pdf, other]

How unfair is private learning ?

Authors: Amartya Sanyal, Yaxi Hu, Fanny Yang

Abstract: As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further sho… ▽ More As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further show that relaxing overall accuracy can lead to good fairness even with strict privacy requirements. To corroborate our theoretical results in practice, we provide an extensive set of experimental results using a variety of synthetic, vision (CIFAR10 and CelebA), and tabular (Law School) datasets and learning algorithms. △ Less

Submitted 24 December, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: Accepted as an Oral paper in UAI '2022, Major update on 23 Dec, 2022

arXiv:2205.14278 [pdf, ps, other]

Generalization Bounds of Nonconvex-(Strongly)-Concave Stochastic Minimax Optimization

Authors: Siqi Zhang, Yifan Hu, Liang Zhang, Niao He

Abstract: This paper takes an initial step to systematically investigate the generalization bounds of algorithms for solving nonconvex-(strongly)-concave (NC-SC/NC-C) stochastic minimax optimization measured by the stationarity of primal functions. We first establish algorithm-agnostic generalization bounds via uniform convergence between the empirical minimax problem and the population minimax problem. The… ▽ More This paper takes an initial step to systematically investigate the generalization bounds of algorithms for solving nonconvex-(strongly)-concave (NC-SC/NC-C) stochastic minimax optimization measured by the stationarity of primal functions. We first establish algorithm-agnostic generalization bounds via uniform convergence between the empirical minimax problem and the population minimax problem. The sample complexities for achieving $ε$-generalization are $\tilde{\mathcal{O}}(dκ^2ε^{-2})$ and $\tilde{\mathcal{O}}(dε^{-4})$ for NC-SC and NC-C settings, respectively, where $d$ is the dimension and $κ$ is the condition number. We further study the algorithm-dependent generalization bounds via stability arguments of algorithms. In particular, we introduce a novel stability notion for minimax problems and build a connection between generalization bounds and the stability notion. As a result, we establish algorithm-dependent generalization bounds for stochastic gradient descent ascent (SGDA) algorithm and the more general sampling-determined algorithms. △ Less

Submitted 6 February, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

arXiv:2205.08676 [pdf, ps, other]

Testing the parametric form of the conditional variance in regressions based on distance covariance

Authors: Yue Hu, Haiqi Li, Falong Tan

Abstract: In this paper, we propose a new test for checking the parametric form of the conditional variance based on distance covariance in nonlinear and nonparametric regression models. Inherit from the nice properties of distance covariance, our test is very easy to implement in practice and less effected by the dimensionality of covariates. The asymptotic properties of the test statistic are investigated… ▽ More In this paper, we propose a new test for checking the parametric form of the conditional variance based on distance covariance in nonlinear and nonparametric regression models. Inherit from the nice properties of distance covariance, our test is very easy to implement in practice and less effected by the dimensionality of covariates. The asymptotic properties of the test statistic are investigated under the null and alternative hypotheses. We show that the proposed test is consistent against any alternative and can detect local alternatives converging to the null hypothesis at the parametric rate 1/root(n) in both the nonlinear and nonparametric settings. As the limiting null distribution of the test statistic is intractable, we propose a residual bootstrap to approximate the limiting null distribution. Simulation studies are presented to assess the finite sample performance of the proposed test. We also apply the proposed test to a real data set for illustration. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2205.06689 [pdf, other]

Heavy-Tail Phenomenon in Decentralized SGD

Authors: Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu

Abstract: Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in m… ▽ More Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in modern machine learning applications. In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of decentralization on the tail behavior. We first show that, when the loss function at each computational node is twice continuously differentiable and strongly convex outside a compact region, the law of the DE-SGD iterates converges to a distribution with polynomially decaying (heavy) tails. To have a more explicit control on the tail exponent, we then consider the case where the loss at each node is a quadratic, and show that the tail-index can be estimated as a function of the step-size, batch-size, and the topological properties of the network of the computational nodes. Then, we provide theoretical and empirical results showing that DE-SGD has heavier tails than centralized SGD. We also compare DE-SGD to disconnected SGD where nodes distribute the data but do not communicate. Our theory uncovers an interesting interplay between the tails and the network structure: we identify two regimes of parameters (stepsize and network size), where DE-SGD can have lighter or heavier tails than disconnected SGD depending on the regime. Finally, to support our theoretical results, we provide numerical experiments conducted on both synthetic data and neural networks. △ Less

Submitted 16 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

arXiv:2203.15297 [pdf, other]

Kernel Modulation: A Parameter-Efficient Method for Training Convolutional Neural Networks

Authors: Yuhuang Hu, Shih-Chii Liu

Abstract: Deep Neural Networks, particularly Convolutional Neural Networks (ConvNets), have achieved incredible success in many vision tasks, but they usually require millions of parameters for good accuracy performance. With increasing applications that use ConvNets, updating hundreds of networks for multiple tasks on an embedded device can be costly in terms of memory, bandwidth, and energy. Approaches to… ▽ More Deep Neural Networks, particularly Convolutional Neural Networks (ConvNets), have achieved incredible success in many vision tasks, but they usually require millions of parameters for good accuracy performance. With increasing applications that use ConvNets, updating hundreds of networks for multiple tasks on an embedded device can be costly in terms of memory, bandwidth, and energy. Approaches to reduce this cost include model compression and parameter-efficient models that adapt a subset of network layers for each new task. This work proposes a novel parameter-efficient kernel modulation (KM) method that adapts all parameters of a base network instead of a subset of layers. KM uses lightweight task-specialized kernel modulators that require only an additional 1.4% of the base network parameters. With multiple tasks, only the task-specialized KM weights are communicated and stored on the end-user device. We applied this method in training ConvNets for Transfer Learning and Meta-Learning scenarios. Our results show that KM delivers up to 9% higher accuracy than other parameter-efficient methods on the Transfer Learning benchmark. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted at 2022 26th International Conference on Pattern Recognition (ICPR)

arXiv:2112.02093 [pdf, other]

Causal-based Time Series Domain Generalization for Vehicle Intention Prediction

Authors: Ye** Hu, Xiaogang Jia, Masayoshi Tomizuka, Wei Zhan

Abstract: Accurately predicting possible behaviors of traffic participants is an essential capability for autonomous vehicles. Since autonomous vehicles need to navigate in dynamically changing environments, they are expected to make accurate predictions regardless of where they are and what driving circumstances they encountered. Therefore, generalization capability to unseen domains is crucial for predict… ▽ More Accurately predicting possible behaviors of traffic participants is an essential capability for autonomous vehicles. Since autonomous vehicles need to navigate in dynamically changing environments, they are expected to make accurate predictions regardless of where they are and what driving circumstances they encountered. Therefore, generalization capability to unseen domains is crucial for prediction models when autonomous vehicles are deployed in the real world. In this paper, we aim to address the domain generalization problem for vehicle intention prediction tasks and a causal-based time series domain generalization (CTSDG) model is proposed. We construct a structural causal model for vehicle intention prediction tasks to learn an invariant representation of input driving data for domain generalization. We further integrate a recurrent latent variable model into our structural causal model to better capture temporal latent dependencies from time-series input data. The effectiveness of our approach is evaluated via real-world driving data. We demonstrate that our proposed method has consistent improvement on prediction accuracy compared to other state-of-the-art domain generalization and behavior prediction methods. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: Accepted by NeurIPS 2021 Workshop on Distribution Shifts

arXiv:2110.12343 [pdf, other]

Off-Policy Evaluation in Partially Observed Markov Decision Processes under Sequential Ignorability

Authors: Yuchen Hu, Stefan Wager

Abstract: We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP). We propose an estimator, partial history importance weighting, and show that it can consistently estimate the stationary mean rewards of a target policy given long enough draws from the beh… ▽ More We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP). We propose an estimator, partial history importance weighting, and show that it can consistently estimate the stationary mean rewards of a target policy given long enough draws from the behavior policy. We provide an upper bound on its error that decays polynomially in the number of observations (i.e., the number of trajectories times their length), with an exponent that depends on the overlap of the target and behavior policies, and on the mixing time of the underlying system. Furthermore, we show that this rate of convergence is minimax given only our assumptions on mixing and overlap. Our results establish that off-policy evaluation in POMDPs is strictly harder than off-policy evaluation in (fully observed) Markov decision processes, but strictly easier than model-free off-policy evaluation. △ Less

Submitted 9 May, 2023; v1 submitted 23 October, 2021; originally announced October 2021.

arXiv:2108.11875 [pdf, other]

A spatio-temporal LSTM model to forecast across multiple temporal and spatial scales

Authors: Yihao Hu, Fearghal O'Donncha, Paulito Palmes, Meredith Burke, Ramon Filgueira, Jon Grant

Abstract: This paper presents a novel spatio-temporal LSTM (SPATIAL) architecture for time series forecasting applied to environmental datasets. The framework was evaluated across multiple sensors and for three different oceanic variables: current speed, temperature, and dissolved oxygen. Network implementation proceeded in two directions that are nominally separated but connected as part of a natural envir… ▽ More This paper presents a novel spatio-temporal LSTM (SPATIAL) architecture for time series forecasting applied to environmental datasets. The framework was evaluated across multiple sensors and for three different oceanic variables: current speed, temperature, and dissolved oxygen. Network implementation proceeded in two directions that are nominally separated but connected as part of a natural environmental system -- across the spatial (between individual sensors) and temporal components of the sensor data. Data from four sensors sampling current speed, and eight measuring both temperature and dissolved oxygen evaluated the framework. Results were compared against RF and XGB baseline models that learned on the temporal signal of each sensor independently by extracting the date-time features together with the past history of data using sliding window matrix. Results demonstrated ability to accurately replicate complex signals and provide comparable performance to state-of-the-art benchmarks. Notably, the novel framework provided a simpler pre-processing and training pipeline that handles missing values via a simple masking layer. Enabling learning across the spatial and temporal directions, this paper addresses two fundamental challenges of ML applications to environmental science: 1) data sparsity and the challenges and costs of collecting measurements of environmental conditions such as ocean dynamics, and 2) environmental datasets are inherently connected in the spatial and temporal directions while classical ML approaches only consider one of these directions. Furthermore, sharing of parameters across all input steps makes SPATIAL a fast, scalable, and easily-parameterized forecasting framework. △ Less

Submitted 26 August, 2021; originally announced August 2021.

arXiv:2108.04851 [pdf, other]

Bayesian Inference using the Proximal Map**: Uncertainty Quantification under Varying Dimensionality

Authors: Maoran Xu, Hua Zhou, Yujie Hu, Leo L. Duan

Abstract: In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. Examples include the fused lasso regression, the matrix recovery under an unknown low rank, etc. Despite the ease of obtaining a point estimate via the optimization, it is much more challenging to quantify their uncertainty -- in the Bayesian framework, a major difficulty is that… ▽ More In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. Examples include the fused lasso regression, the matrix recovery under an unknown low rank, etc. Despite the ease of obtaining a point estimate via the optimization, it is much more challenging to quantify their uncertainty -- in the Bayesian framework, a major difficulty is that if assigning the prior associated with a $p$-dimensional measure, then there is zero posterior probability on any lower-dimensional subset with dimension $d<p$; to avoid this caveat, one needs to choose another dimension-selection prior on $d$, which often involves a highly combinatorial problem. To significantly reduce the modeling burden, we propose a new generative process for the prior: starting from a continuous random variable such as multivariate Gaussian, we transform it into a varying-dimensional space using the proximal map**. This leads to a large class of new Bayesian models that can directly exploit the popular frequentist regularizations and their algorithms, such as the nuclear norm penalty and the alternating direction method of multipliers, while providing a principled and probabilistic uncertainty estimation. We show that this framework is well justified in the geometric measure theory, and enjoys a convenient posterior computation via the standard Hamiltonian Monte Carlo. We demonstrate its use in the analysis of the dynamic flow network data. △ Less

Submitted 2 October, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: 26 pages, 4 figures

arXiv:2108.00968 [pdf, other]

Robust Semantic Segmentation with Superpixel-Mix

Authors: Gianni Franchi, Nacim Belkhir, Mai Lan Ha, Yufei Hu, Andrei Bursuc, Volker Blanz, Angela Yao

Abstract: Along with predictive performance and runtime speed, reliability is a key requirement for real-world semantic segmentation. Reliability encompasses robustness, predictive uncertainty and reduced bias. To improve reliability, we introduce Superpixel-mix, a new superpixel-based data augmentation method with teacher-student consistency training. Unlike other mixing-based augmentation techniques, mixi… ▽ More Along with predictive performance and runtime speed, reliability is a key requirement for real-world semantic segmentation. Reliability encompasses robustness, predictive uncertainty and reduced bias. To improve reliability, we introduce Superpixel-mix, a new superpixel-based data augmentation method with teacher-student consistency training. Unlike other mixing-based augmentation techniques, mixing superpixels between images is aware of object boundaries, while yielding consistent gains in segmentation accuracy. Our proposed technique achieves state-of-the-art results in semi-supervised semantic segmentation on the Cityscapes dataset. Moreover, Superpixel-mix improves the reliability of semantic segmentation by reducing network uncertainty and bias, as confirmed by competitive results under strong distributions shift (adverse weather, image corruptions) and when facing out-of-distribution data. △ Less

Submitted 21 October, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: Accepted to BMVC2021

arXiv:2107.10955 [pdf, other]

Learning Linear Polytree Structural Equation Models

Authors: Xingmei Lou, Yu Hu, Xiaodong Li

Abstract: We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of… ▽ More We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider the problem of inverse correlation matrix estimation under the linear polytree models, and establish the estimation error bound in terms of the dimension and the total number of v-structures. We also consider an extension of group linear polytree models, in which each node represents a group of variables. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees. △ Less

Submitted 14 May, 2024; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: 35 pages, 5 figures, 4 tables

arXiv:2106.15323 [pdf, other]

doi 10.3758/s13428-023-02092-7

Face Identification Proficiency Test Designed Using Item Response Theory

Authors: Géraldine Jeckeln, Ying Hu, Jacqueline G. Cavazos, Amy N. Yates, Carina A. Hahn, Larry Tang, P. Jonathon Phillips, Alice J. O'Toole

Abstract: Measures of face-identification proficiency are essential to ensure accurate and consistent performance by professional forensic face examiners and others who perform face-identification tasks in applied scenarios. Current proficiency tests rely on static sets of stimulus items, and so, cannot be administered validly to the same individual multiple times. To create a proficiency test, a large numb… ▽ More Measures of face-identification proficiency are essential to ensure accurate and consistent performance by professional forensic face examiners and others who perform face-identification tasks in applied scenarios. Current proficiency tests rely on static sets of stimulus items, and so, cannot be administered validly to the same individual multiple times. To create a proficiency test, a large number of items of "known" difficulty must be assembled. Multiple tests of equal difficulty can be constructed then using subsets of items. We introduce the Triad Identity Matching (TIM) test and evaluate it using Item Response Theory (IRT). Participants view face-image "triads" (N=225) (two images of one identity, one image of a different identity) and select the different identity. In Experiment 1, university students (N=197) showed wide-ranging accuracy on the TIM test, and IRT modeling demonstrated that the TIM items span various difficulty levels. In Experiment 2, we used IRT-based item metrics to partition the test into subsets of specific difficulties. Simulations showed that subsets of the TIM items yielded reliable estimates of subject ability. In Experiments 3a and 3b, we found that the student-derived IRT model reliably evaluated the ability of non-student participants and that ability generalized across different test sessions. In Experiment 3c, we show that TIM test performance correlates with other common face-recognition tests. In summary, the TIM test provides a starting point for develo** a framework that is flexible and calibrated to measure proficiency across various ability levels (e.g., professionals or populations with face-processing deficits). △ Less

Submitted 9 August, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: 20 pages (including references), 10 figures

arXiv:2104.03802 [pdf, other]

Average Direct and Indirect Causal Effects under Interference

Authors: Yuchen Hu, Shuangning Li, Stefan Wager

Abstract: We propose a definition for the average indirect effect of a binary treatment in the potential outcomes model for causal inference under cross-unit interference. Our definition is analogous to the standard definition of the average direct effect, and can be expressed without needing to compare outcomes across multiple randomized experiments. We show that the proposed indirect effect satisfies a de… ▽ More We propose a definition for the average indirect effect of a binary treatment in the potential outcomes model for causal inference under cross-unit interference. Our definition is analogous to the standard definition of the average direct effect, and can be expressed without needing to compare outcomes across multiple randomized experiments. We show that the proposed indirect effect satisfies a decomposition theorem whereby, in a Bernoulli trial, the sum of the average direct and indirect effects always corresponds to the effect of a policy intervention that infinitesimally increases treatment probabilities. We also consider a number of parametric models for interference, and find that our non-parametric indirect effect remains a natural estimand when re-expressed in the context of these models. △ Less

Submitted 11 January, 2022; v1 submitted 8 April, 2021; originally announced April 2021.

Showing 1–50 of 139 results for author: Hu, Y