Skip to main content

Showing 1–50 of 92 results for author: Song, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.01079  [pdf, ps, other

    stat.ML cs.AI cs.LG

    On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

    Authors: Jerry Yao-Chieh Hu, Weimin Wu, Zhuoru Li, Zhao Song, Han Liu

    Abstract: We investigate the statistical and computational limits of latent \textbf{Di}ffusion \textbf{T}ransformers (\textbf{DiT}s) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score function, as well as the distribution recovery property of the initial data. Specifically, under mild data assumptions, we deri… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.15762  [pdf, other

    cs.LG stat.ML

    Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow

    Authors: Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang

    Abstract: Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.03136  [pdf, ps, other

    cs.LG cs.AI cs.CC stat.ML

    Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

    Authors: Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu

    Abstract: We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of n… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  4. arXiv:2405.06003  [pdf, ps, other

    stat.ML cs.LG

    Binary Hypothesis Testing for Softmax Models and Leverage Score Models

    Authors: Yeqi Gao, Yuzhou Gu, Zhao Song

    Abstract: Softmax distributions are widely used in machine learning, including Large Language Models (LLMs) where the attention unit uses softmax distributions. We abstract the attention unit as the softmax model, where given a vector input, the model produces an output drawn from the softmax distribution (which depends on the vector input). We consider the fundamental problem of binary hypothesis testing i… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2404.04775  [pdf, other

    stat.ME

    Bipartite causal inference with interference, time series data, and a random network

    Authors: Zhaoyan Song, Georgia Papadogeorgou

    Abstract: In bipartite causal inference with interference there are two distinct sets of units: those that receive the treatment, termed interventional units, and those on which the outcome is measured, termed outcome units. Which interventional units' treatment can drive which outcome units' outcomes is often depicted in a bipartite network. We study bipartite causal inference with interference from observ… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  6. arXiv:2402.09469  [pdf, other

    cs.LG stat.ML

    Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

    Authors: Jiuxiang Gu, Chenyang Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou

    Abstract: In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the internal representations harnessed by neural networks and Transformers. Building on recent progress toward comprehending how networks execute distinct target functions, our study embarks on an exploration of the underlying reasons behind networks adopting specific computational strategies. We direct our focu… ▽ More

    Submitted 24 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Update Section 5.3; clean up problem setup

  7. arXiv:2402.04520  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

    Authors: Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song, Han Liu

    Abstract: We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and m… ▽ More

    Submitted 31 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML 2024; v2 corrected typos; v3 added clarifications and references; v4,5 updated to camera-ready version

  8. arXiv:2312.05771  [pdf, other

    cs.LG stat.ML

    Hacking Task Confounder in Meta-Learning

    Authors: **gyao Wang, Yi Ren, Zeen Song, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

    Abstract: Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain thi… ▽ More

    Submitted 29 May, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted by IJCAI 2024, 9 pages, 5 figures, 4 tables

  9. arXiv:2311.14652  [pdf, other

    cs.LG cs.CL stat.ML

    One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

    Authors: Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang

    Abstract: Attention computation takes both the time complexity of $O(n^2)$ and the space complexity of $O(n^2)$ simultaneously, which makes deploying Large Language Models (LLMs) in streaming applications that involve long contexts requiring substantial computational resources. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we f… ▽ More

    Submitted 5 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  10. arXiv:2310.12462  [pdf, other

    cs.LG cs.CL stat.ML

    Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

    Authors: Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

    Abstract: In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  11. arXiv:2310.04064  [pdf, ps, other

    cs.DS cs.CC cs.CL cs.LG stat.ML

    How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

    Authors: Josh Alman, Zhao Song

    Abstract: In the classical transformer attention scheme, we are given three $n \times d$ size matrices $Q, K, V$ (the query, key, and value tokens), and the goal is to compute a new $n \times d$ size matrix $D^{-1} \exp(QK^\top) V$ where $D = \mathrm{diag}( \exp(QK^\top) {\bf 1}_n )$. In this work, we study a generalization of attention which captures triple-wise correlations. This generalization is able to… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  12. arXiv:2309.13482  [pdf, other

    cs.LG stat.ML

    A Unified Scheme of ResNet and Softmax

    Authors: Zhao Song, Weixin Wang, Junze Yin

    Abstract: Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  13. arXiv:2309.07418  [pdf, other

    cs.DS cs.LG stat.ML

    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

    Authors: Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin

    Abstract: Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  14. arXiv:2308.10722  [pdf, ps, other

    cs.LG stat.ML

    Clustered Linear Contextual Bandits with Knapsacks

    Authors: Yichuan Deng, Michalis Mamakos, Zhao Song

    Abstract: In this work, we study clustered contextual bandits where rewards and resource consumption are the outcomes of cluster-specific linear models. The arms are divided in clusters, with the cluster memberships being unknown to an algorithm. Pulling an arm in a time period results in a reward and in consumption for each one of multiple resources, and with the total consumption of any resource exceeding… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  15. arXiv:2308.10502  [pdf, other

    cs.LG cs.CL stat.ML

    GradientCoin: A Peer-to-Peer Decentralized Large Language Models

    Authors: Yeqi Gao, Zhao Song, Junze Yin

    Abstract: Since 2008, after the proposal of a Bitcoin electronic cash system, Bitcoin has fundamentally changed the economic system over the last decade. Since 2022, large language models (LLMs) such as GPT have outperformed humans in many real-life tasks. However, these large language models have several practical issues. For example, the model is centralized and controlled by a specific unit. One weakness… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  16. arXiv:2308.08358  [pdf, ps, other

    cs.LG stat.ML

    Convergence of Two-Layer Regression with Nonlinear Units

    Authors: Yichuan Deng, Zhao Song, Shenghao Xie

    Abstract: Large language models (LLMs), such as ChatGPT and GPT4, have shown outstanding performance in many human life task. Attention computation plays an important role in training LLMs. Softmax unit and ReLU unit are the key structure in attention computation. Inspired by them, we put forward a softmax ReLU regression problem. Generally speaking, our goal is to find an optimal solution to the regression… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  17. arXiv:2307.08352  [pdf, ps, other

    cs.LG stat.ML

    Zero-th Order Algorithm for Softmax Attention Optimization

    Authors: Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song

    Abstract: Large language models (LLMs) have brought about significant transformations in human society. Among the crucial computations in LLMs, the softmax unit holds great importance. Its helps the model generating a probability distribution on potential subsequent words or phrases, considering a series of input words. By utilizing this distribution, the model selects the most probable next word or phrase,… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  18. arXiv:2307.07735  [pdf, ps, other

    math.OC cs.LG stat.ML

    Faster Algorithms for Structured Linear and Kernel Support Vector Machines

    Authors: Yuzhou Gu, Zhao Song, Lichen Zhang

    Abstract: Quadratic programming is a ubiquitous prototype in convex programming. Many combinatorial optimizations on graphs and machine learning problems can be formulated as quadratic programming; for example, Support Vector Machines (SVMs). Linear and kernel SVMs have been among the most popular models in machine learning over the past three decades, prior to the deep learning era. Generally, a quadrati… ▽ More

    Submitted 13 November, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: New results: almost-linear time algorithm for Gaussian kernel SVM and complementary lower bounds. Abstract shortened to meet arxiv requirement

  19. arXiv:2306.06895  [pdf, other

    cs.LG stat.ML

    MPPN: Multi-Resolution Periodic Pattern Network For Long-Term Time Series Forecasting

    Authors: Xing Wang, Zhendong Wang, Kexin Yang, Junlan Feng, Zhiyan Song, Chao Deng, Lin zhu

    Abstract: Long-term time series forecasting plays an important role in various real-world scenarios. Recent deep learning methods for long-term series forecasting tend to capture the intricate patterns of time series by decomposition-based or sampling-based methods. However, most of the extracted patterns may include unpredictable noise and lack good interpretability. Moreover, the multivariate series forec… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 21 pages

  20. arXiv:2306.04933  [pdf, other

    cs.CL cs.LG stat.ML

    InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

    Authors: Junda Wu, Tong Yu, Rui Wang, Zhao Song, Ruiyi Zhang, Handong Zhao, Chaochao Lu, Shuai Li, Ricardo Henao

    Abstract: Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of the prompts. We also empirically observe that conventional prompt tuning methods cannot encode and learn sufficient task-relevant information from prompt tokens. In this work, we develop an information-theoretic framewo… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  21. arXiv:2305.17608  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Reward Collapse in Aligning Large Language Models

    Authors: Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

    Abstract: The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences, which are often represented as rankings of responses to prompts. In this paper, we document the phenomenon of \textit{reward collapse}, an empirical observation where the prevailing ranking-based approach results i… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  22. arXiv:2305.04143  [pdf, other

    stat.AP

    Risk Set Matched Difference-in-Differences for the Analysis of Effect Modification in an Observational Study on the Impact of Gun Violence on Health Outcomes

    Authors: Eric R. Cohn, Zirui Song, Jose R. Zubizarreta

    Abstract: Gun violence is a major source of injury and death in the United States. However, relatively little is known about the effects of firearm injuries on survivors and their family members and how these effects vary across subpopulations. To study these questions and, more generally, to address a gap in the causal inference literature, we present a framework for the study of effect modification or het… ▽ More

    Submitted 31 May, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  23. arXiv:2305.00660  [pdf, ps, other

    cs.LG stat.ML

    An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

    Authors: Yeqi Gao, Zhao Song, Junze Yin

    Abstract: Large language models (LLMs) have numerous real-life applications across various domains, such as natural language translation, sentiment analysis, language modeling, chatbots and conversational agents, creative writing, text classification, summarization, and generation. LLMs have shown great promise in improving the accuracy and efficiency of these tasks, and have the potential to revolutionize… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  24. arXiv:2303.16504  [pdf, ps, other

    cs.LG stat.ML

    An Over-parameterized Exponential Regression

    Authors: Yeqi Gao, Sridhar Mahadevan, Zhao Song

    Abstract: Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathema… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  25. arXiv:2303.06025  [pdf, ps, other

    stat.ME

    Quantile sheet estimator with shape constraints

    Authors: Zhuolin Song

    Abstract: A quantile sheet is a global estimator for multiple quantile curves. A quantile sheet estimator is proposed to maintain the non-crossing properties for different quantiles. The proposed estimator utilizes SCOP: shape-constrained P-spline to enforce the non-crossing properties directly in construction. A local GCV parameter tunning algorithm is used for fast estimation results. Data simulation show… ▽ More

    Submitted 16 February, 2023; originally announced March 2023.

  26. arXiv:2302.13214  [pdf, ps, other

    cs.LG cs.CC cs.DS stat.ML

    Fast Attention Requires Bounded Entries

    Authors: Josh Alman, Zhao Song

    Abstract: In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Formally, in this problem, one is given as input three matrices $Q, K, V \in [-B,B]^{n \times d}$, and the goal is to construct the matrix… ▽ More

    Submitted 9 May, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

  27. arXiv:2302.11068  [pdf, ps, other

    cs.LG cs.DS math.OC stat.ML

    Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time

    Authors: Yuzhou Gu, Zhao Song, Junze Yin, Lichen Zhang

    Abstract: Given a matrix $M\in \mathbb{R}^{m\times n}$, the low rank matrix completion problem asks us to find a rank-$k$ approximation of $M$ as $UV^\top$ for $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ by only observing a few entries specified by a set of entries $Ω\subseteq [m]\times [n]$. In particular, we examine an approach that is widely used in practice -- the alternating minimiz… ▽ More

    Submitted 1 April, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: ICLR 2024

  28. arXiv:2302.00248  [pdf, ps, other

    cs.DS cs.LG stat.ML

    A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee

    Authors: Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang

    Abstract: Given a matrix $A\in \mathbb{R}^{n\times d}$ and a vector $b\in \mathbb{R}^n$, we consider the regression problem with $\ell_\infty$ guarantees: finding a vector $x'\in \mathbb{R}^d$ such that $ \|x'-x^*\|_\infty \leq \fracε{\sqrt{d}}\cdot \|Ax^*-b\|_2\cdot \|A^\dagger\|$ where $x^*=\arg\min_{x\in \mathbb{R}^d}\|Ax-b\|_2$. One popular approach for solving such $\ell_2$ regression problem is via sk… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: Abstract shortened to meet arxiv requirement

  29. arXiv:2211.14227  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing

    Authors: Josh Alman, Jiehao Liang, Zhao Song, Ruizhe Zhang, Danyang Zhuo

    Abstract: Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy, and as a result, model training consumes substantial computing resources and will only consume more in the future. Using current training… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  30. arXiv:2211.01588  [pdf, ps, other

    cs.LG stat.ML

    A Convergence Theory for Federated Average: Beyond Smoothness

    Authors: Xiaoxiao Li, Zhao Song, Runzhou Tao, Guangyi Zhang

    Abstract: Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly. As a leading algorithm in this setting, Federated Average FedAvg, which runs Stochastic Gradient Descent (SGD) in parallel on local devices and averages the sequences only once in a while, have been widely used due to their simplicity and low communication cost. However, despite recen… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: BigData 2022

  31. arXiv:2208.04508  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Training Overparametrized Neural Networks in Sublinear Time

    Authors: Yichuan Deng, Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo

    Abstract: The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI). Despite the popularity and low cost-per-iteration of traditional backpropagation via gradient decent, stochastic gradient descent (SGD) has prohibitive convergence rat… ▽ More

    Submitted 7 February, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

  32. arXiv:2208.03915  [pdf, ps, other

    cs.LG stat.ML

    Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

    Authors: Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

    Abstract: Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we would like to compute $\frac{1}{n}\sum_{i=1}^{n} f(x_i,y)$ for any query point $y \in \mathbb{R}^d$. Recently, there has been a growing trend of using data struct… ▽ More

    Submitted 13 February, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

  33. arXiv:2208.03635  [pdf, other

    cs.LG stat.ML

    Federated Adversarial Learning: A Framework with Convergence Analysis

    Authors: Xiaoxiao Li, Zhao Song, Jiaming Yang

    Abstract: Federated learning (FL) is a trending training paradigm to utilize decentralized training data. FL allows clients to update model parameters locally for several epochs, then share them to a global model for aggregation. This training paradigm with multi-local step updating before aggregation exposes unique vulnerabilities to adversarial attacks. Adversarial training is a popular and effective meth… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

  34. arXiv:2206.12802  [pdf, other

    cs.LG cs.DS stat.ML

    Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis

    Authors: Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff

    Abstract: A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors. We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the convergence analysis. While a similar technique has been studied for random inputs [Daniely, NeurIPS 2020], it has not… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  35. arXiv:2205.15294  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

    Authors: Yu Bai, Chi **, Song Mei, Ziang Song, Tiancheng Yu

    Abstract: A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address thi… ▽ More

    Submitted 27 October, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

  36. arXiv:2205.07223  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

    Authors: Ziang Song, Song Mei, Yu Bai

    Abstract: Imperfect-Information Extensive-Form Games (IIEFGs) is a prevalent model for real-world games involving imperfect information and sequential plays. The Extensive-Form Correlated Equilibrium (EFCE) has been proposed as a natural solution concept for multi-player general-sum IIEFGs. However, existing algorithms for finding an EFCE require full feedback from the game, and it remains open how to effic… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.

  37. arXiv:2204.07596  [pdf, other

    stat.ML cs.LG

    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

    Authors: Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré

    Abstract: An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them,… ▽ More

    Submitted 13 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: ICML 2022 Camera Ready

  38. arXiv:2201.10096  [pdf, ps, other

    stat.ME stat.CO

    Imputation Maximization Stochastic Approximation with Application to Generalized Linear Mixed Models

    Authors: Zexi Song, Zhiqiang Tan

    Abstract: Generalized linear mixed models are useful in studying hierarchical data with possibly non-Gaussian responses. However, the intractability of likelihood functions poses challenges for estimation. We develop a new method suitable for this problem, called imputation maximization stochastic approximation (IMSA). For each iteration, IMSA first imputes latent variables/random effects, then maximizes ov… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  39. arXiv:2201.05672  [pdf, ps, other

    econ.EM stat.AP

    Measuring Changes in Disparity Gaps: An Application to Health Insurance

    Authors: Paul Goldsmith-Pinkham, Karen Jiang, Zirui Song, Jacob Wallace

    Abstract: We propose a method for reporting how program evaluations reduce gaps between groups, such as the gender or Black-white gap. We first show that the reduction in disparities between groups can be written as the difference in conditional average treatment effects (CATE) for each group. Then, using a Kitagawa-Oaxaca-Blinder-style decomposition, we highlight how these CATE can be decomposed into unexp… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

    Comments: AEA P&P accepted draft

  40. arXiv:2112.07785  [pdf, other

    stat.ML cs.LG

    Variable Selection and Regularization via Arbitrary Rectangle-range Generalized Elastic Net

    Authors: Yujia Ding, Qidi Peng, Zhengming Song, Hansen Chen

    Abstract: We introduce the arbitrary rectangle-range generalized elastic net penalty method, abbreviated to ARGEN, for performing constrained variable selection and regularization in high-dimensional sparse linear models. As a natural extension of the nonnegative elastic net penalty method, ARGEN is proved to have variable selection consistency and estimation consistency under some conditions. The asymptoti… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: 25 pages, 2 figures

    MSC Class: 62J07; 62F12 (Primary) 62P05 (Secondary)

  41. arXiv:2112.05120  [pdf, other

    stat.ML cs.LG

    On Convergence of Federated Averaging Langevin Dynamics

    Authors: Wei Deng, Qian Zhang, Yi-An Ma, Zhao Song, Guang Lin

    Abstract: We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-… ▽ More

    Submitted 5 October, 2023; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: A polished proof without the federated formulation of Langevin diffusion to avoid confusion

  42. arXiv:2111.14674  [pdf, ps, other

    cs.LG cs.AI cs.DS stat.ML

    Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes

    Authors: Aravind Reddy, Ryan A. Rossi, Zhao Song, Anup Rao, Tung Mai, Nedim Lipka, Gang Wu, Eunyee Koh, Nesreen Ahmed

    Abstract: In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory. The online setting has an additional requirement of maintaining a valid solution at any point in time. For s… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  43. arXiv:2110.04622  [pdf, other

    cs.LG cs.DS stat.ML

    Does Preprocessing Help Training Over-parameterized Neural Networks?

    Authors: Zhao Song, Shuo Yang, Ruizhe Zhang

    Abstract: Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying $Ω(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensi… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  44. arXiv:2110.04184  [pdf, other

    cs.LG cs.GT stat.ML

    When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

    Authors: Ziang Song, Song Mei, Yu Bai

    Abstract: Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates wha… ▽ More

    Submitted 31 March, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

  45. arXiv:2108.09420  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Fast Sketching of Polynomial Kernels of Polynomial Degree

    Authors: Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang

    Abstract: Kernel methods are fundamental in machine learning, and faster algorithms for kernel approximation provide direct speedups for many core tasks in machine learning. The polynomial kernel is especially important as other kernels can often be approximated by the polynomial kernel via a Taylor series expansion. Recent techniques in oblivious sketching reduce the dependence in the running time on the d… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: ICML 2021

  46. arXiv:2106.03012  [pdf, other

    stat.CO math.NA physics.comp-ph

    On Irreversible Metropolis Sampling Related to Langevin Dynamics

    Authors: Zexi Song, Zhiqiang Tan

    Abstract: There has been considerable interest in designing Markov chain Monte Carlo algorithms by exploiting numerical methods for Langevin dynamics, which includes Hamiltonian dynamics as a deterministic case. A prominent approach is Hamiltonian Monte Carlo (HMC), where a leapfrog discretization of Hamiltonian dynamics is employed. We investigate a recently proposed class of irreversible sampling algorith… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    MSC Class: 65C05; 60J22

  47. arXiv:2105.08285  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing

    Authors: Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu

    Abstract: We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions. We formulate the value function estimation procedure in value iteration as an approximate maximum inner product search problem and propose a locality sensitive hashing (LSH) [Indyk and Motwani STOC'98, Andoni and Razenshteyn STOC'15, Andoni, Laarhoven, Raze… ▽ More

    Submitted 8 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

  48. arXiv:2105.05001  [pdf, other

    cs.LG cs.DC stat.ML

    FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Convergence Analysis

    Authors: Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang

    Abstract: Federated Learning (FL) is an emerging learning scheme that allows different distributed clients to train deep neural networks together without data sharing. Neural networks have become popular due to their unprecedented success. To the best of our knowledge, the theoretical guarantees of FL concerning neural networks with explicit forms and multi-step updates are unexplored. Nevertheless, trainin… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: ICML 2021

  49. arXiv:2102.01570  [pdf, ps, other

    cs.LG cs.CR cs.DS stat.ML

    Symmetric Sparse Boolean Matrix Factorization and Applications

    Authors: Sitan Chen, Zhao Song, Runzhou Tao, Ruizhe Zhang

    Abstract: In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given $\mathbf{M}\in\mathbb{Z}^{m\times m}$, we want to find $\mathbf{W}\in\{0,1\}^{m\times r}$ such that $\| \mathbf{M} - \mathbf{W}\mathbf{W}^\top \|_0$ is minimized among all $\mathbf{W}$ for which each row… ▽ More

    Submitted 13 January, 2022; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: 33 pages, to appear in Innovations in Theoretical Computer Science (ITCS 2022), v2: updated refs

  50. arXiv:2011.11877  [pdf, other

    cs.LG cs.CC cs.CR cs.DS stat.ML

    InstaHide's Sample Complexity When Mixing Two Private Images

    Authors: Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo

    Abstract: Training neural networks usually require large numbers of sensitive training data, and how to protect the privacy of training data has thus become a critical topic in deep learning research. InstaHide is a state-of-the-art scheme to protect training data privacy with only minor effects on test accuracy, and its security has become a salient question. In this paper, we systematically study recent a… ▽ More

    Submitted 5 February, 2024; v1 submitted 23 November, 2020; originally announced November 2020.