Skip to main content

Showing 1–50 of 108 results for author: Wu, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.18137  [pdf, ps, other

    stat.ML cs.LG

    Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

    Authors: Dongya Wu, Xin Li

    Abstract: Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the conver… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.11828  [pdf, other

    cs.LG stat.ML

    Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

    Authors: Kazusato Oko, Yu** Song, Taiji Suzuki, Denny Wu

    Abstract: We study the computational and sample complexity of learning a target function $f_*:\mathbb{R}^d\to\mathbb{R}$ with additive structure, that is, $f_*(x) = \frac{1}{\sqrt{M}}\sum_{m=1}^M f_m(\langle x, v_m\rangle)$, where $f_1,f_2,...,f_M:\mathbb{R}\to\mathbb{R}$ are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features $\{v_m\}_{m=1}^M$,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: COLT 2024

  3. arXiv:2406.01581  [pdf, other

    cs.LG stat.ML

    Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

    Authors: Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu

    Abstract: We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyleσ_*\left(\langle\boldsymbol{x},\boldsymbolθ\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $σ_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion)… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 34 pages

  4. arXiv:2405.01275  [pdf, other

    stat.ME

    Variable Selection in Ultra-high Dimensional Feature Space for the Cox Model with Interval-Censored Data

    Authors: Daewoo Pak, Jianrui Zhang, Di Wu, Haolei Weng, Chenxi Li

    Abstract: We develop a set of variable selection methods for the Cox model under interval censoring, in the ultra-high dimensional setting where the dimensionality can grow exponentially with the sample size. The methods select covariates via a penalized nonparametric maximum likelihood estimation with some popular penalty functions, including lasso, adaptive lasso, SCAD, and MCP. We prove that our penalize… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2404.03900  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Nonparametric Modern Hopfield Models

    Authors: Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

    Abstract: We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known resul… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 59 pages; Code available at https://github.com/MAGICS-LAB/NonparametricHopfield

  6. arXiv:2404.03827  [pdf, other

    cs.LG cs.AI stat.ML

    Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

    Authors: Dennis Wu, Jerry Yao-Chieh Hu, Teng-Yun Hsiao, Han Liu

    Abstract: We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $Φ$ which transforms the Hopfield energy function into kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space.… ▽ More

    Submitted 12 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/UHop

  7. arXiv:2403.02696  [pdf, ps, other

    math.ST stat.ME

    Low-rank matrix estimation via nonconvex spectral regularized methods in errors-in-variables matrix regression

    Authors: Xin Li, Dongya Wu

    Abstract: High-dimensional matrix regression has been studied in various aspects, such as statistical properties, computational efficiency and application to specific instances including multivariate regression, system identification and matrix compressed sensing. Current studies mainly consider the idealized case that the covariate matrix is obtained without noise, while the more realistic scenario that th… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  8. arXiv:2402.10127  [pdf, other

    stat.ML cs.LG math.PR math.ST

    Nonlinear spiked covariance matrices and signal propagation in deep neural networks

    Authors: Zhichao Wang, Denny Wu, Zhou Fan

    Abstract: Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the ''spike'' eigenvalues and eigenvectors that often capture the low-dimens… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 55 pages

  9. arXiv:2402.03726  [pdf, other

    cs.LG stat.ML

    Learning Granger Causality from Instance-wise Self-attentive Hawkes Processes

    Authors: Dongxia Wu, Tsuyoshi Idé, Aurélie Lozano, Georgios Kollias, Jiří Navrátil, Naoki Abe, Yi-An Ma, Rose Yu

    Abstract: We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature e… ▽ More

    Submitted 29 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  10. arXiv:2401.16832  [pdf, other

    cs.CY cs.LG stat.ML

    Analysis of Knowledge Tracing performance on synthesised student data

    Authors: Panagiotis Pagonis, Kai Hartung, Di Wu, Munir Georges, Sören Gröttrup

    Abstract: Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises i… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted at AI4AI Education workshop 2023 ( https://sme.uni-bamberg.de/ai4ai/ )

  11. arXiv:2401.09719  [pdf, ps, other

    stat.ME

    Kernel-based multi-marker tests of association based on the accelerated failure time model

    Authors: Chenxi Li, Di Wu, Qing Lu

    Abstract: Kernel-based multi-marker tests for survival outcomes use primarily the Cox model to adjust for covariates. The proportional hazards assumption made by the Cox model could be unrealistic, especially in the long-term follow-up. We develop a suite of novel multi-marker survival tests for genetic association based on the accelerated failure time model, which is a popular alternative to the Cox model… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  12. arXiv:2401.03287  [pdf, ps, other

    stat.ME

    Advancing Stepped Wedge Cluster Randomized Trials Analysis: Bayesian Hierarchical Penalized Spline Models for Immediate and Time-Varying Intervention Effects

    Authors: Danni Wu, Hyung G. Park, Corita R. Grudzen, Keith S. Goldfeld

    Abstract: Stepped wedge cluster randomized trials (SWCRTs) often face challenges with potential confounding by time trends. Traditional frequentist methods can fail to provide adequate coverage of the intervention's true effect using confidence intervals, whereas Bayesian approaches show potential for better coverage of intervention effects. However, Bayesian methods have seen limited development in SWCRTs.… ▽ More

    Submitted 1 February, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

  13. arXiv:2312.17346  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

    Authors: Dennis Wu, Jerry Yao-Chieh Hu, Weijian Li, Bo-Yu Chen, Han Liu

    Abstract: We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-s… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  14. arXiv:2312.11863  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Network Approximation for Pessimistic Offline Reinforcement Learning

    Authors: Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, Xiliang Lu

    Abstract: Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Full version of the paper accepted to the 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  15. arXiv:2311.15031  [pdf, other

    stat.ME

    Robust and Efficient Semi-supervised Learning for Ising Model

    Authors: Daiqing Wu, Molei Liu

    Abstract: In biomedical studies, it is often desirable to characterize the interactive mode of multiple disease outcomes beyond their marginal risk. Ising model is one of the most popular choices serving for this purpose. Nevertheless, learning efficiency of Ising models can be impeded by the scarcity of accurate disease labels, which is a prominent problem in contemporary studies driven by electronic healt… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  16. arXiv:2309.12673  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    On Sparse Modern Hopfield Model

    Authors: Jerry Yao-Chieh Hu, Donglin Yang, Dennis Wu, Chenwei Xu, Bo-Yu Chen, Han Liu

    Abstract: We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate… ▽ More

    Submitted 29 November, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: 37 pages, accepted at NeurIPS 2023. [v2] updated to match with camera-ready version. Code is available at https://github.com/MAGICS-LAB/SparseModernHopfield

  17. arXiv:2309.06459  [pdf, other

    stat.ME stat.AP

    Sensitivity Analysis for Quantiles of Hidden Biases in Matched Observational Studies

    Authors: Dongxiao Wu, Xinran Li

    Abstract: In matched observational studies, the inferred causal conclusions pretending that matching has taken into account all confounding can be sensitive to unmeasured confounding. In such cases, a sensitivity analysis is often conducted, which investigates whether the observed association between treatment and outcome is due to effects caused by the treatment or it is due to hidden confounding. In gener… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  18. arXiv:2309.03843  [pdf, other

    stat.ML cs.LG

    Gradient-Based Feature Learning under Structured Data

    Authors: Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu

    Abstract: Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  19. arXiv:2306.13255  [pdf, other

    cs.LG stat.ML

    Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models

    Authors: David X. Wu, Anant Sahai

    Abstract: We study the asymptotic generalization of an overparameterized linear model for multiclass classification under the Gaussian covariates bi-level model introduced in Subramanian et al.~'22, where the number of data points, features, and classes all grow together. We fully resolve the conjecture posed in Subramanian et al.~'22, matching the predicted regimes for generalization. Furthermore, our new… ▽ More

    Submitted 5 December, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023, 56 pages

  20. arXiv:2306.07221  [pdf, ps, other

    cs.LG stat.ML

    Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

    Authors: Taiji Suzuki, Denny Wu, Atsushi Nitanda

    Abstract: The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift, and it naturally arises from the optimization of two-layer neural networks via (noisy) gradient descent. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. However, all prior analyses as… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 37 pages

  21. arXiv:2303.02957  [pdf, other

    stat.ML cs.LG math.OC

    Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

    Authors: Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki

    Abstract: The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime. In this work, we provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structur… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  22. arXiv:2302.09376  [pdf, other

    stat.ML cs.LG

    Why is parameter averaging beneficial in SGD? An objective smoothing perspective

    Authors: Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda, Denny Wu

    Abstract: It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We f… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: 27pages, AISTATS2024

  23. arXiv:2210.06819  [pdf, other

    cs.LG stat.ML

    Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence

    Authors: Diyuan Wu, Vyacheslav Kungurtsev, Marco Mondelli

    Abstract: The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties… ▽ More

    Submitted 5 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 14 pages in main text; 51 pages including bibliography and appendix. Published in Transcation on Machine Learning Research(TMLR), 2023. https://openreview.net/forum?id=gZna3IiGfl

  24. arXiv:2207.01702  [pdf, other

    stat.ME math.ST

    Statistical inference of random graphs with a surrogate likelihood function

    Authors: Dingbo Wu, Fangzheng Xie

    Abstract: Spectral estimators have been broadly applied to statistical network analysis but they do not incorporate the likelihood information of the network sampling model. This paper proposes a novel surrogate likelihood function for statistical inference of a class of popular network models referred to as random dot product graphs. In contrast to the structurally complicated exact likelihood function, th… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

  25. arXiv:2206.12363  [pdf, other

    quant-ph cond-mat.str-el cs.LG physics.comp-ph stat.ML

    From Tensor Network Quantum States to Tensorial Recurrent Neural Networks

    Authors: Dian Wu, Riccardo Rossi, Filippo Vicentini, Giuseppe Carleo

    Abstract: We show that any matrix product state (MPS) can be exactly represented by a recurrent neural network (RNN) with a linear memory update. We generalize this RNN architecture to 2D lattices using a multilinear memory update. It supports perfect sampling and wave function evaluation in polynomial time, and can represent an area law of entanglement entropy. Numerical evidence shows that it can encode t… ▽ More

    Submitted 8 March, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

    Comments: 14 pages, 10 figures

    Journal ref: Phys. Rev. Research 5, L032001 (2023)

  26. arXiv:2205.01795  [pdf, other

    stat.ME

    Bayesian index models for heterogeneous treatment effects

    Authors: Hyung Park, Danni Wu, Eva Petkova, Thaddeus Tarpey, R. Todd Ogden

    Abstract: The general idea of this article is to develop a Bayesian model with a flexible link function connecting an exponential family treatment response to a linear combination of covariates and a treatment indicator and the interaction between the two. Generalized linear models allowing data-driven link functions are often called "single-index models," and among popular semi-parametric modeling methods.… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 9 pages, 1 figure and 1 table

  27. arXiv:2205.01445  [pdf, other

    stat.ML cs.LG math.ST

    High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

    Authors: Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

    Abstract: We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\topσ(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss:… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 71 pages

  28. arXiv:2204.07818  [pdf

    cs.LG cs.AI stat.ML

    Graph-incorporated Latent Factor Analysis for High-dimensional and Sparse Matrices

    Authors: Di Wu, Yi He, Xin Luo

    Abstract: A High-dimensional and sparse (HiDS) matrix is frequently encountered in a big data-related application like an e-commerce system or a social network services system. To perform highly accurate representation learning on it is of great significance owing to the great desire of extracting latent knowledge and patterns from it. Latent factor analysis (LFA), which represents an HiDS matrix by learnin… ▽ More

    Submitted 16 April, 2022; originally announced April 2022.

  29. arXiv:2203.16688  [pdf, other

    math.ST stat.ME

    Eigenvector-Assisted Statistical Inference for Signal-Plus-Noise Matrix Models

    Authors: Fangzheng Xie, Dingbo Wu

    Abstract: In this paper, we develop a generalized Bayesian inference framework for a collection of signal-plus-noise matrix models arising in high-dimensional statistics and many applications. The framework is built upon an asymptotically unbiased estimating equation with the assistance of the leading eigenvectors of the data matrix. The solution to the estimating equation coincides with the maximizer of an… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  30. arXiv:2202.11963  [pdf, ps, other

    cs.LG stat.ML

    A general framework for adaptive two-index fusion attribute weighted naive Bayes

    Authors: Xiaoliang Zhou, Dongyang Wu, Zitong You, Li Zhang, Ning Ye

    Abstract: Naive Bayes(NB) is one of the essential algorithms in data mining. However, it is rarely used in reality because of the attribute independent assumption. Researchers have proposed many improved NB methods to alleviate this assumption. Among these methods, due to high efficiency and easy implementation, the filter attribute weighted NB methods receive great attentions. However, there still exists s… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  31. arXiv:2201.10469  [pdf, other

    stat.ML cs.LG math.PR

    Convex Analysis of the Mean Field Langevin Dynamics

    Authors: Atsushi Nitanda, Denny Wu, Taiji Suzuki

    Abstract: As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this work, we give a concise and self-contained convergence rate analysis of the mean… ▽ More

    Submitted 24 February, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: AISTATS2022

  32. arXiv:2110.03273  [pdf, other

    cs.LG stat.ML

    AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow

    Authors: Haiyan Jiang, Haoyi Xiong, Dongrui Wu, Ji Liu, De**g Dou

    Abstract: Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction. In the High Dimension Low Sample Size (HDLSS) setting, one may prefer modified principal components, with penalized loadings, and automated penalty selection by implementing model selection among these different models with varying penalties. The earlier work [1, 2] has… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: accepted by Machine Learning

  33. arXiv:2109.03775  [pdf, other

    cs.LG stat.ML

    FedZKT: Zero-Shot Knowledge Transfer towards Resource-Constrained Federated Learning with Heterogeneous On-Device Models

    Authors: Lan Zhang, Dapeng Wu, Xiaoyong Yuan

    Abstract: Federated learning enables multiple distributed devices to collaboratively learn a shared prediction model without centralizing their on-device data. Most of the current algorithms require comparable individual efforts for local training with the same structure and size of on-device models, which, however, impedes participation from resource-constrained devices. Given the widespread yet heterogene… ▽ More

    Submitted 5 April, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: This paper has been accepted to ICDCS 2022

  34. arXiv:2106.02770  [pdf, other

    cs.LG stat.ML

    Deep Bayesian Active Learning for Accelerating Stochastic Simulation

    Authors: Dongxia Wu, Ruijia Niu, Matteo Chinazzi, Alessandro Vespignani, Yi-An Ma, Rose Yu

    Abstract: Stochastic simulations such as large-scale, spatiotemporal, age-structured epidemic models are computationally expensive at fine-grained resolution. While deep surrogate models can speed up the simulations, doing so for stochastic simulations and with active learning approaches is an underexplored area. We propose Interactive Neural Process (INP), a deep Bayesian active learning framework for lear… ▽ More

    Submitted 4 June, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

  35. arXiv:2105.11982  [pdf, other

    cs.AI cs.LG stat.AP stat.ML

    Quantifying Uncertainty in Deep Spatiotemporal Forecasting

    Authors: Dongxia Wu, Liyao Gao, Xinyue Xiong, Matteo Chinazzi, Alessandro Vespignani, Yi-An Ma, Rose Yu

    Abstract: Deep learning is gaining increasing popularity for spatiotemporal forecasting. However, prior works have mostly focused on point estimates without quantifying the uncertainty of the predictions. In high stakes domains, being able to generate probabilistic forecasts with confidence intervals is critical to risk assessment and decision making. Hence, a systematic study of uncertainty quantification… ▽ More

    Submitted 12 June, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

    Comments: arXiv admin note: text overlap with arXiv:2102.06684

  36. arXiv:2105.05650  [pdf, other

    cond-mat.stat-mech cond-mat.dis-nn cs.LG stat.ML

    Unbiased Monte Carlo Cluster Updates with Autoregressive Neural Networks

    Authors: Dian Wu, Riccardo Rossi, Giuseppe Carleo

    Abstract: Efficient sampling of complex high-dimensional probability distributions is a central task in computational science. Machine learning methods like autoregressive neural networks, used with Markov chain Monte Carlo sampling, provide good approximations to such distributions, but suffer from either intrinsic bias or high variance. In this Letter, we propose a way to make this approximation unbiased… ▽ More

    Submitted 10 November, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: 12 pages, 9 figures

    Journal ref: Phys. Rev. Research 3, L042024 (2021)

  37. arXiv:2103.11269  [pdf

    cs.LG stat.ML

    Development and Validation of a Deep Learning Model for Prediction of Severe Outcomes in Suspected COVID-19 Infection

    Authors: Varun Buch, Aoxiao Zhong, Xiang Li, Marcio Aloisio Bezerra Cavalcanti Rockenbach, Dufan Wu, Hui Ren, Jiahui Guan, Andrew Liteplo, Sayon Dutta, Ittai Dayan, Quanzheng Li

    Abstract: COVID-19 patient triaging with predictive outcome of the patients upon first present to emergency department (ED) is crucial for improving patient prognosis, as well as better hospital resources management and cross-infection control. We trained a deep feature fusion model to predict patient outcomes, where the model inputs were EHR data including demographic information, co-morbidities, vital sig… ▽ More

    Submitted 28 March, 2021; v1 submitted 20 March, 2021; originally announced March 2021.

    Comments: Varun Buch, Aoxiao Zhong and Xiang Li contribute equally to this work

  38. arXiv:2101.02180  [pdf, other

    cs.LG math.OC stat.ML

    Maximum a Posteriori Inference of Random Dot Product Graphs via Conic Programming

    Authors: David Wu, David R. Palmer, Daryl R. Deford

    Abstract: We present a convex cone program to infer the latent probability matrix of a random dot product graph (RDPG). The optimization problem maximizes the Bernoulli maximum likelihood function with an added nuclear norm regularization term. The dual problem has a particularly nice form, related to the well-known semidefinite program relaxation of the MaxCut problem. Using the primal-dual optimality cond… ▽ More

    Submitted 16 December, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: submitted for publication in SIAM Journal on Optimization (SIOPT)

    MSC Class: 62F15; 90C35; 65F55; 68R10; 15B48

  39. arXiv:2012.15477  [pdf, other

    stat.ML cs.LG

    Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

    Authors: Atsushi Nitanda, Denny Wu, Taiji Suzuki

    Abstract: We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the oute… ▽ More

    Submitted 22 January, 2022; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: NeurIPS 2021

  40. arXiv:2012.05432  [pdf, other

    math.ST stat.ME

    Low-rank matrix estimation in multi-response regression with measurement errors: Statistical and computational guarantees

    Authors: Xin Li, Dongya Wu

    Abstract: In this paper, we investigate the matrix estimation problem in the multi-response regression model with measurement errors. A nonconvex error-corrected estimator based on a combination of the amended loss function and the nuclear norm regularizer is proposed to estimate the matrix parameter. Then under the (near) low-rank assumption, we analyse statistical and computational theoretical properties… ▽ More

    Submitted 15 September, 2022; v1 submitted 9 December, 2020; originally announced December 2020.

  41. arXiv:2011.10254  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

    Authors: Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

    Abstract: Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that all views have the same incompleteness, i.e., balanced incompleteness. However, different views often have distinct incompleteness, i.e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness view… ▽ More

    Submitted 30 April, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

    Journal ref: IEEE Transactions on Emerging Topics in Computational Intelligence 2021

  42. arXiv:2010.00029  [pdf, other

    cs.LG cond-mat.dis-nn cs.AI cs.CV stat.ML

    RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior

    Authors: Hong-Ye Hu, Dian Wu, Yi-Zhuang You, Bruno Olshausen, Yubei Chen

    Abstract: Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key ideas of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, RG-Flow, which can separate information at different scales of images and extract disentangled representations at each scale. We demonstrate our m… ▽ More

    Submitted 15 August, 2022; v1 submitted 30 September, 2020; originally announced October 2020.

    Comments: 31 pages, 20 figures, 3 tables

    Journal ref: Mach. Learn.: Sci. Technol. 3 035009 (2022)

  43. arXiv:2009.07999  [pdf, other

    cs.LG cs.AI stat.ML

    Distilled One-Shot Federated Learning

    Authors: Yanlin Zhou, George Pu, Xiyao Ma, Xiaolin Li, Dapeng Wu

    Abstract: Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable perfor… ▽ More

    Submitted 6 June, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

  44. arXiv:2009.00909  [pdf, other

    cs.LG cs.CV stat.ML

    A Survey on Negative Transfer

    Authors: Wen Zhang, Lingfei Deng, Lei Zhang, Dongrui Wu

    Abstract: Transfer learning (TL) utilizes data or knowledge from one or more source domains to facilitate the learning in a target domain. It is particularly useful when the target domain has very few or no labeled data, due to annotation expense, privacy concerns, etc. Unfortunately, the effectiveness of TL is not always guaranteed. Negative transfer (NT), i.e., leveraging source domain data/knowledge unde… ▽ More

    Submitted 9 August, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

    Report number: 23299266

    Journal ref: IEEE/CAA Journal of Automatica Sinica, 2022, 1-25

  45. arXiv:2007.06503  [pdf, other

    cs.LG stat.ML

    PRI-VAE: Principle-of-Relevant-Information Variational Autoencoders

    Authors: Yanjun Li, Shujian Yu, Jose C. Principe, Xiaolin Li, Dapeng Wu

    Abstract: Although substantial efforts have been made to learn disentangled representations under the variational autoencoder (VAE) framework, the fundamental properties to the dynamics of learning of most VAE models still remain unknown and under-investigated. In this work, we first propose a novel learning objective, termed the principle-of-relevant-information variational autoencoder (PRI-VAE), to learn… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  46. arXiv:2007.05860  [pdf, other

    math.OC stat.CO

    Solving Bayesian Risk Optimization via Nested Stochastic Gradient Estimation

    Authors: Sait Cakmak, Di Wu, Enlu Zhou

    Abstract: In this paper, we aim to solve Bayesian Risk Optimization (BRO), which is a recently proposed framework that formulates simulation optimization under input uncertainty. In order to efficiently solve the BRO problem, we derive nested stochastic gradient estimators and propose corresponding stochastic approximation algorithms. We show that our gradient estimators are asymptotically unbiased and cons… ▽ More

    Submitted 15 July, 2020; v1 submitted 11 July, 2020; originally announced July 2020.

    Comments: The paper is 20 pages with 3 figures. The supplement is an additional 15 pages. The paper is currently under review at IISE Transactions. Updated formatting in v2

  47. arXiv:2007.03746  [pdf, ps, other

    eess.SP cs.HC cs.LG stat.ML

    Transfer Learning for Motor Imagery Based Brain-Computer Interfaces: A Complete Pipeline

    Authors: Dongrui Wu, Xue Jiang, Ruimin Peng, Wanzeng Kong, Jian Huang, Zhigang Zeng

    Abstract: Transfer learning (TL) has been widely used in motor imagery (MI) based brain-computer interfaces (BCIs) to reduce the calibration effort for a new subject, and demonstrated promising performance. While a closed-loop MI-based BCI system, after electroencephalogram (EEG) signal acquisition and temporal filtering, includes spatial filtering, feature engineering, and classification blocks before send… ▽ More

    Submitted 22 January, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Journal ref: Neural Networks, 153:235-253, 2022

  48. arXiv:2007.00240  [pdf, other

    cs.LG stat.ML

    Temporal Calibrated Regularization for Robust Noisy Label Learning

    Authors: Dongxian Wu, Yisen Wang, Zhuobin Zheng, Shu-tao Xia

    Abstract: Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets. However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality (i.e., having noisy labels). Training on these noisy labeled datasets may adversely deteriorate their generalization performance. Existing methods eithe… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: Published as a conference paper at IJCNN 2020

  49. arXiv:2006.10732  [pdf, other

    stat.ML cs.LG

    When Does Preconditioning Help or Hurt Generalization?

    Authors: Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

    Abstract: While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question. This work presents a more nuanced view on how the \textit{implicit bias} of first- and second-order methods affects the comparison of generalization properties. We provide an exact asymptotic bias-variance decomposition of the generalizatio… ▽ More

    Submitted 8 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: 42 pages

  50. arXiv:2006.05800  [pdf, other

    stat.ML cs.LG math.ST

    On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

    Authors: Denny Wu, Ji Xu

    Abstract: We consider the linear model $\mathbf{y} = \mathbf{X} \mathbfβ_\star + \mathbfε$ with $\mathbf{X}\in \mathbb{R}^{n\times p}$ in the overparameterized regime $p>n$. We estimate $\mathbfβ_\star$ via generalized (weighted) ridge regression: $\hat{\mathbfβ}_λ= \left(\mathbf{X}^T\mathbf{X} + λ\mathbfΣ_w\right)^\dagger \mathbf{X}^T\mathbf{y}$, where $\mathbfΣ_w$ is the weighting matrix. Under a random d… ▽ More

    Submitted 2 November, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020