Skip to main content

Showing 1–16 of 16 results for author: Zhang, H R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2311.00164  [pdf, other

    cs.SI cs.LG

    Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis

    Authors: Abhinav Nippani, Dongyue Li, Haotian Ju, Haris N. Koutsopoulos, Hongyang R. Zhang

    Abstract: We consider the problem of traffic accident analysis on a road network based on road network connections and traffic volume. Previous works have designed various deep-learning methods using historical records to predict traffic accident occurrences. However, there is a lack of consensus on how accurate existing methods are, and a fundamental issue is the lack of public accident datasets for compre… ▽ More

    Submitted 12 February, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: 24 pages. Appeared in NeurIPS 2023 Datasets Track

  3. Boosting Multitask Learning on Graphs through Higher-Order Task Affinities

    Authors: Dongyue Li, Haotian Ju, Aneesh Sharma, Hongyang R. Zhang

    Abstract: Predicting node labels on a given graph is a widely studied problem with many applications, including community detection and molecular graph prediction. This paper considers predicting multiple node labeling functions on graphs simultaneously and revisits this problem from a multitask learning perspective. For a concrete example, consider overlap** community detection: each community membership… ▽ More

    Submitted 14 March, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

    Comments: 16 pages. Appeared in KDD 2023

  4. arXiv:2306.08553  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Noise Stability Optimization for Flat Minima with Tight Rates

    Authors: Haotian Ju, Dongyue Li, Hongyang R. Zhang

    Abstract: We consider minimizing a perturbed function $F(W) = \mathbb{E}_{U}[f(W + U)]$, given a function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ and a random sample $U$ from a distribution $\mathcal{P}$ with mean zero. When $\mathcal{P}$ is the isotropic Gaussian, $F(W)$ is roughly equal to $f(W)$ plus a penalty on the trace of $\nabla^2 f(W)$, scaled by the variance of $\mathcal{P}$. This penalty on the… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 36 pages, 3 tables

  5. arXiv:2303.14582  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Identification of Negative Transfers in Multitask Learning Using Surrogate Models

    Authors: Dongyue Li, Huy L. Nguyen, Hongyang R. Zhang

    Abstract: Multitask learning is widely used in practice to train a low-resource target task by augmenting it with multiple related source tasks. Yet, naively combining all the source tasks with a target task does not always improve the prediction performance for the target task due to negative transfers. Thus, a critical problem in multitask learning is identifying subsets of source tasks that would benefit… ▽ More

    Submitted 27 December, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

    Comments: 30 pages. Appeared in TMLR'23

  6. arXiv:2303.09086  [pdf, other

    cs.SI cs.DS math.OC

    Optimal Intervention on Weighted Networks via Edge Centrality

    Authors: Dongyue Li, Tina Eliassi-Rad, Hongyang R. Zhang

    Abstract: Suppose there is a spreading process such as an infectious disease propagating on a graph. How would we reduce the number of affected nodes in the spreading process? This question appears in recent studies about implementing mobility interventions on mobility networks (Chang et al. (2021)). A practical algorithm to reduce infections on unweighted graphs is to remove edges with the highest edge cen… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 15 pages, 4 figures, 3 tables. SIAM International Conference on Data Mining, 2023 (Extended version)

  7. arXiv:2302.04451  [pdf, other

    cs.LG cs.SI math.ST stat.ML

    Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion

    Authors: Haotian Ju, Dongyue Li, Aneesh Sharma, Hongyang R. Zhang

    Abstract: Graph neural networks are widely used tools for graph prediction tasks. Motivated by their empirical performance, prior works have developed generalization bounds for graph neural networks, which scale with graph structures in terms of the maximum degree. In this paper, we present generalization bounds that instead scale with the largest singular value of the graph neural network's feature diffusi… ▽ More

    Submitted 23 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: 36 pages. Appeared in AISTATS 2023

  8. arXiv:2206.02659  [pdf, other

    cs.LG cs.CV math.ST stat.ML

    Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

    Authors: Haotian Ju, Dongyue Li, Hongyang R. Zhang

    Abstract: We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., th… ▽ More

    Submitted 22 December, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: 38 pages. Appeared in ICML 2022

  9. arXiv:2204.09583  [pdf, other

    cs.LG

    Improved Group Robustness via Classifier Retraining on Independent Splits

    Authors: Thien Hang Nguyen, Hongyang R. Zhang, Huy Le Nguyen

    Abstract: Deep neural networks trained by minimizing the average risk can achieve strong average performance. Still, their performance for a subgroup may degrade if the subgroup is underrepresented in the overall data population. Group distributionally robust optimization (Sagawa et al., 2020a), or group DRO in short, is a widely used baseline for learning models with strong worst-group performance. We note… ▽ More

    Submitted 28 July, 2023; v1 submitted 20 April, 2022; originally announced April 2022.

  10. arXiv:2203.01517  [pdf, other

    cs.LG

    Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations

    Authors: Michael Zhang, Nimit S. Sohoni, Hongyang R. Zhang, Chelsea Finn, Christopher Ré

    Abstract: Spurious correlations pose a major challenge for robust machine learning. Models trained with empirical risk minimization (ERM) may learn to rely on correlations between class labels and spurious attributes, leading to poor performance on data groups without these correlations. This is particularly challenging to address when spurious attribute labels are unavailable. To improve worst-group perfor… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: 38 pages, 14 figures. Preprint

  11. arXiv:2111.04578  [pdf, other

    cs.LG cs.CV stat.ML

    Improved Regularization and Robustness for Fine-tuning in Neural Networks

    Authors: Dongyue Li, Hongyang R. Zhang

    Abstract: A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data. When the capacity of the pre-trained model is much larger than the size of the target data set, fine-tuning is prone to overfitting and "memorizing" the training labels. Hence, an important question is to regularize fine-tuning and ensure its r… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: 22 pages, 6 figures, 11 tables

  12. arXiv:2010.11750  [pdf, other

    stat.ML cs.LG

    Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

    Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

    Abstract: The problem of learning one task with samples from another task has received much interest recently. In this paper, we ask a fundamental question: when is combining data from two tasks better than learning one task alone? Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices. However, quantifying such a transfer effect… ▽ More

    Submitted 10 August, 2023; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 64 pages, 6 figures; We thoroughly revised the paper by adding new results and reorganizing the presentation

  13. arXiv:2007.04596  [pdf, ps, other

    cs.LG math.OC stat.ML

    Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

    Authors: Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang

    Abstract: We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x\in\mathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{\star}(x) = a^{\top}|W^{\star}x|$, where $a\in\mathbb{R}^d$ is a nonnegative vector and $W^{\star} \in\mathbb{R}^{d\times d}$ is an orthonormal matrix. We show that an over-parametrized two-layer neural… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: Conference on Learning Theory (COLT) 2020

  14. arXiv:2005.00944  [pdf, other

    cs.LG cs.CL

    Understanding and Improving Information Transfer in Multi-Task Learning

    Authors: Sen Wu, Hongyang R. Zhang, Christopher Ré

    Abstract: We investigate multi-task learning approaches that use a shared feature representation for all tasks. To better understand the transfer of task information, we study an architecture with a shared module for all tasks and a separate output module for each task. We study the theory of this setting on linear and ReLU-activated models. Our key observation is that whether or not tasks' data are well-al… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

    Comments: Appeared in ICLR 2020

  15. arXiv:2005.00695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    On the Generalization Effects of Linear Transformations in Data Augmentation

    Authors: Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré

    Abstract: Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transfor… ▽ More

    Submitted 26 July, 2023; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: 22 pages. Appeared in ICML 2020

  16. arXiv:1811.00148  [pdf, other

    cs.LG cs.DS stat.ML

    Recovery Guarantees for Quadratic Tensors with Sparse Observations

    Authors: Hongyang R. Zhang, Vatsal Sharan, Moses Charikar, Yingyu Liang

    Abstract: We consider the tensor completion problem of predicting the missing entries of a tensor. The commonly used CP model has a triple product form, but an alternate family of quadratic models, which are the sum of pairwise products instead of a triple product, have emerged from applications such as recommendation systems. Non-convex methods are the method of choice for learning quadratic models, and th… ▽ More

    Submitted 29 July, 2023; v1 submitted 31 October, 2018; originally announced November 2018.

    Comments: 16 pages. Appeared in AISTATS 2019