Skip to main content

Showing 1–9 of 9 results for author: Synovic, N

.
  1. arXiv:2406.08205  [pdf, other

    cs.SE cs.LG

    What do we know about Hugging Face? A systematic literature review and quantitative validation of qualitative claims

    Authors: Jason Jones, Wenxin Jiang, Nicholas Synovic, George K. Thiruvathukal, James C. Davis

    Abstract: Background: Collaborative Software Package Registries (SPRs) are an integral part of the software supply chain. Much engineering work synthesizes SPR package into applications. Prior research has examined SPRs for traditional software, such as NPM (JavaScript) and PyPI (Python). Pre-Trained Model (PTM) Registries are an emerging class of SPR of increasing importance, because they support the deep… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2404.16688  [pdf, other

    cs.SE

    Reusing Deep Learning Models: Challenges and Directions in Software Engineering

    Authors: James C. Davis, Purvish Jajal, Wenxin Jiang, Taylor R. Schorlemmer, Nicholas Synovic, George K. Thiruvathukal

    Abstract: Deep neural networks (DNNs) achieve state-of-the-art performance in many areas, including computer vision, system configuration, and question-answering. However, DNNs are expensive to develop, both in intellectual effort (e.g., devising new architectures) and computational costs (e.g., training). Reusing DNNs is a promising direction to amortize costs within a company and across the computing indu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Proceedings of the IEEE John Vincent Atanasoff Symposium on Modern Computing (JVA'23) 2023

  3. arXiv:2402.00699  [pdf, other

    cs.SE cs.AI cs.DB cs.LG

    PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software

    Authors: Wenxin Jiang, Jerin Yasmin, Jason Jones, Nicholas Synovic, Jiashen Kuo, Nathaniel Bielanski, Yuan Tian, George K. Thiruvathukal, James C. Davis

    Abstract: The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these mo… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted at MSR'24

  4. arXiv:2310.07782  [pdf, other

    cs.CV

    An automated approach for improving the inference latency and energy efficiency of pretrained CNNs by removing irrelevant pixels with focused convolutions

    Authors: Caleb Tung, Nicholas Eliopoulos, Purvish Jajal, Gowri Ramshankar, Chen-Yun Yang, Nicholas Synovic, Xuecen Zhang, Vipin Chaudhary, George K. Thiruvathukal, Yung-Hsiang Lu

    Abstract: Computer vision often uses highly accurate Convolutional Neural Networks (CNNs), but these deep learning models are associated with ever-increasing energy and computation requirements. Producing more energy-efficient CNNs often requires model training which can be cost-prohibitive. We propose a novel, automated method to make a pretrained CNN more energy-efficient without re-training. Given a pret… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  5. arXiv:2310.03620  [pdf, other

    cs.SE cs.AI

    PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

    Authors: Wenxin Jiang, Jason Jones, Jerin Yasmin, Nicholas Synovic, Rajeev Sashti, Sophie Chen, George K. Thiruvathukal, Yuan Tian, James C. Davis

    Abstract: Develo** and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-T… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  6. arXiv:2303.08934  [pdf, other

    cs.SE

    PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages

    Authors: Wenxin Jiang, Nicholas Synovic, Purvish Jajal, Taylor R. Schorlemmer, Arav Tewari, Bhavesh Pareek, George K. Thiruvathukal, James C. Davis

    Abstract: Due to the cost of develo** and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM registries known as "model hubs" support engineers in distributing and reusing deep learning models. PTM packages include pre-trained weights, documentation, model architectures, datasets, and metadata. M… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 5 pages, 2 figures, Accepted to MSR'23

  7. arXiv:2303.07476  [pdf, other

    cs.SE cs.AI

    Challenges and Practices of Deep Learning Model Reengineering: A Case Study on Computer Vision

    Authors: Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis

    Abstract: Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering - reusing, reproducing, adapting, and enhancing state-of-the-art deep learning approaches - is challenging for reasons including under-documented reference models, changing requirements, an… ▽ More

    Submitted 25 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Under submission to EMSE

  8. arXiv:2303.02552  [pdf, other

    cs.SE cs.AI cs.LG

    An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

    Authors: Wenxin Jiang, Nicholas Synovic, Matt Hyatt, Taylor R. Schorlemmer, Rohan Sethi, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis

    Abstract: Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks.… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

    Comments: Proceedings of the ACM/IEEE 45th International Conference on Software Engineering (ICSE) 2023

  9. arXiv:2207.11767  [pdf, other

    cs.SE

    Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics

    Authors: Nicholas Synovic, Matt Hyatt, Rohan Sethi, Sohini Thota, Shilpika, Allan J. Miller, Wenxin Jiang, Emmanuel S. Amobi, Austin Pinderski, Konstantin Läufer, Nicholas J. Hayward, Neil Klingensmith, James C. Davis, George K. Thiruvathukal

    Abstract: Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about pr… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: Accepted at ASE 2022 Tool Demonstrations