Skip to main content

Showing 1–10 of 10 results for author: Junze

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.19619  [pdf, other

    stat.ML cs.LG math.ST

    ScoreFusion: fusing score-based generative models via Kullback-Leibler barycenters

    Authors: Hao Liu, Junze, Ye, Jose Blanchet, Nian Si

    Abstract: We study the problem of fusing pre-trained (auxiliary) generative models to enhance the training of a target generative model. We propose using KL-divergence weighted barycenters as an optimal fusion mechanism, in which the barycenter weights are optimally trained to minimize a suitable loss for the target population. While computing the optimal KL-barycenter weights can be challenging, we demonst… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 40 pages, 6 figures

  2. arXiv:2402.02700  [pdf, ps, other

    cs.LG stat.ML

    Sample Complexity Characterization for Linear Contextual MDPs

    Authors: Junze Deng, Yuan Cheng, Shaofeng Zou, Yingbin Liang

    Abstract: Contextual Markov decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. While CMDPs serve as an important framework to model many real-world applications with time-varying environments, they are largely unexplored from theoretical perspective. In thi… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: accepted to AIstats2024

  3. arXiv:2309.13482  [pdf, other

    cs.LG stat.ML

    A Unified Scheme of ResNet and Softmax

    Authors: Zhao Song, Weixin Wang, Junze Yin

    Abstract: Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  4. arXiv:2309.07418  [pdf, other

    cs.DS cs.LG stat.ML

    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

    Authors: Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin

    Abstract: Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  5. arXiv:2308.10502  [pdf, other

    cs.LG cs.CL stat.ML

    GradientCoin: A Peer-to-Peer Decentralized Large Language Models

    Authors: Yeqi Gao, Zhao Song, Junze Yin

    Abstract: Since 2008, after the proposal of a Bitcoin electronic cash system, Bitcoin has fundamentally changed the economic system over the last decade. Since 2022, large language models (LLMs) such as GPT have outperformed humans in many real-life tasks. However, these large language models have several practical issues. For example, the model is centralized and controlled by a specific unit. One weakness… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  6. arXiv:2305.00660  [pdf, ps, other

    cs.LG stat.ML

    An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

    Authors: Yeqi Gao, Zhao Song, Junze Yin

    Abstract: Large language models (LLMs) have numerous real-life applications across various domains, such as natural language translation, sentiment analysis, language modeling, chatbots and conversational agents, creative writing, text classification, summarization, and generation. LLMs have shown great promise in improving the accuracy and efficiency of these tasks, and have the potential to revolutionize… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  7. arXiv:2302.11068  [pdf, ps, other

    cs.LG cs.DS math.OC stat.ML

    Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time

    Authors: Yuzhou Gu, Zhao Song, Junze Yin, Lichen Zhang

    Abstract: Given a matrix $M\in \mathbb{R}^{m\times n}$, the low rank matrix completion problem asks us to find a rank-$k$ approximation of $M$ as $UV^\top$ for $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ by only observing a few entries specified by a set of entries $Ω\subseteq [m]\times [n]$. In particular, we examine an approach that is widely used in practice -- the alternating minimiz… ▽ More

    Submitted 1 April, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: ICLR 2024

  8. arXiv:2302.00248  [pdf, ps, other

    cs.DS cs.LG stat.ML

    A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee

    Authors: Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang

    Abstract: Given a matrix $A\in \mathbb{R}^{n\times d}$ and a vector $b\in \mathbb{R}^n$, we consider the regression problem with $\ell_\infty$ guarantees: finding a vector $x'\in \mathbb{R}^d$ such that $ \|x'-x^*\|_\infty \leq \fracε{\sqrt{d}}\cdot \|Ax^*-b\|_2\cdot \|A^\dagger\|$ where $x^*=\arg\min_{x\in \mathbb{R}^d}\|Ax-b\|_2$. One popular approach for solving such $\ell_2$ regression problem is via sk… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: Abstract shortened to meet arxiv requirement

  9. arXiv:2208.03915  [pdf, ps, other

    cs.LG stat.ML

    Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

    Authors: Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

    Abstract: Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we would like to compute $\frac{1}{n}\sum_{i=1}^{n} f(x_i,y)$ for any query point $y \in \mathbb{R}^d$. Recently, there has been a growing trend of using data struct… ▽ More

    Submitted 13 February, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

  10. arXiv:2011.11877  [pdf, other

    cs.LG cs.CC cs.CR cs.DS stat.ML

    InstaHide's Sample Complexity When Mixing Two Private Images

    Authors: Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo

    Abstract: Training neural networks usually require large numbers of sensitive training data, and how to protect the privacy of training data has thus become a critical topic in deep learning research. InstaHide is a state-of-the-art scheme to protect training data privacy with only minor effects on test accuracy, and its security has become a salient question. In this paper, we systematically study recent a… ▽ More

    Submitted 5 February, 2024; v1 submitted 23 November, 2020; originally announced November 2020.