Skip to main content

Showing 1–8 of 8 results for author: Nakada, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03628  [pdf, other

    stat.ML cs.LG

    Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance

    Authors: Ryumei Nakada, Yichen Xu, Lexin Li, Linjun Zhang

    Abstract: Imbalanced data and spurious correlations are common challenges in machine learning and data science. Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges. In this article, we introduce OPAL (\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM-generated data), a systematic oversamplin… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 59 pages, 7 figures

  2. arXiv:2404.16287  [pdf, other

    stat.ML cs.CR cs.LG math.ST stat.ME

    Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference

    Authors: Zhe Zhang, Ryumei Nakada, Linjun Zhang

    Abstract: Differentially private federated learning is crucial for maintaining privacy in distributed environments. This paper investigates the challenges of high-dimensional estimation and inference under the constraints of differential privacy. First, we study scenarios involving an untrusted central server, demonstrating the inherent difficulties of accurate estimation in high-dimensional problems. Our f… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 56 pages, 3 figures

  3. arXiv:2403.14926  [pdf, other

    stat.ML cs.LG

    Contrastive Learning on Multimodal Analysis of Electronic Health Records

    Authors: Tianxi Cai, Feiqing Huang, Ryumei Nakada, Linjun Zhang, Doudou Zhou

    Abstract: Electronic health record (EHR) systems contain a wealth of multimodal clinical data including structured data like clinical codes and unstructured data such as clinical notes. However, many existing EHR-focused studies has traditionally either concentrated on an individual modality or merged different modalities in a rather rudimentary fashion. This approach often results in the perception of stru… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 34 pages

  4. arXiv:2306.08173  [pdf, other

    cs.LG cs.CR cs.IT stat.ML

    Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training

    Authors: Alyssa Huang, Peihan Liu, Ryumei Nakada, Linjun Zhang, Wanrong Zhang

    Abstract: The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Langua… ▽ More

    Submitted 29 February, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

  5. arXiv:2302.06232  [pdf, other

    cs.LG stat.ML

    Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data

    Authors: Ryumei Nakada, Halil Ibrahim Gulluk, Zhun Deng, Wenlong Ji, James Zou, Linjun Zhang

    Abstract: Language-supervised vision models have recently attracted great attention in computer vision. A common approach to build such models is to use contrastive learning on paired data across the two modalities, as exemplified by Contrastive Language-Image Pre-Training (CLIP). In this paper, under linear representation settings, (i) we initiate the investigation of a general class of nonlinear loss func… ▽ More

    Submitted 14 March, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: 42 pages, 3 figures, accepted by AISTATS 2023; a link to GitHub repository added, style corrected, acknowledgements section added

  6. arXiv:2110.02473  [pdf, other

    cs.LG stat.ML

    The Power of Contrast for Feature Learning: A Theoretical Analysis

    Authors: Wenlong Ji, Zhun Deng, Ryumei Nakada, James Zou, Linjun Zhang

    Abstract: Contrastive learning has achieved state-of-the-art performance in various self-supervised learning tasks and even outperforms its supervised counterpart. Despite its empirical success, theoretical understanding of the superiority of contrastive learning is still limited. In this paper, under linear representation settings, (i) we provably show that contrastive learning outperforms the standard aut… ▽ More

    Submitted 19 December, 2023; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: 78 pages, accepted by JMLR

  7. arXiv:2103.00500  [pdf, other

    stat.ML cs.LG math.ST

    Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks

    Authors: Ryumei Nakada, Masaaki Imaizumi

    Abstract: We investigate the asymptotic risk of a general class of overparameterized likelihood models, including deep models. The recent empirical success of large-scale models has motivated several theoretical studies to investigate a scenario wherein both the number of samples, $n$, and parameters, $p$, diverge to infinity and derive an asymptotic risk at the limit. However, these theorems are only valid… ▽ More

    Submitted 15 March, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: 36 pages

  8. arXiv:1907.02177  [pdf, other

    stat.ML cs.LG

    Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality

    Authors: Ryumei Nakada, Masaaki Imaizumi

    Abstract: In this study, we prove that an intrinsic low dimensionality of covariates is the main factor that determines the performance of deep neural networks (DNNs). DNNs generally provide outstanding empirical performance. Hence, numerous studies have actively investigated the theoretical properties of DNNs to understand their underlying mechanisms. In particular, the behavior of DNNs in terms of high-di… ▽ More

    Submitted 17 September, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

    Comments: 38 pages

    Journal ref: Journal of Machine Learning Research, 21(174), 2020