Skip to main content

Showing 1–12 of 12 results for author: Wolfe, C R

Searching in archive cs. Search in all archives.
.
  1. Better Schedules for Low Precision Training of Deep Neural Networks

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: Low precision training can significantly reduce the computational overhead of training deep neural networks (DNNs). Though many such techniques exist, cyclic precision training (CPT), which dynamically adjusts precision throughout training according to a cyclic schedule, achieves particularly impressive improvements in training efficiency, while actually improving DNN performance. Existing CPT imp… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 20 pages, 8 figures, 1 table, ACML 2023

    ACM Class: I.2.6; I.2.10; I.4.0

    Journal ref: Machine Learning (2024): 1-19

  2. arXiv:2211.04624  [pdf, other

    cs.LG cs.CV math.OC

    Cold Start Streaming Learning for Deep Networks

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: The ability to dynamically adapt neural networks to newly-available data without performance deterioration would revolutionize deep learning applications. Streaming learning (i.e., learning from one data example at a time) has the potential to enable such real-time adaptation, but current approaches i) freeze a majority of network parameters during streaming and ii) are dependent upon offline, bas… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 52 pages, 7 figures, pre-print

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  3. arXiv:2205.12484  [pdf, other

    cs.CL cs.AI

    GisPy: A Tool for Measuring Gist Inference Score in Text

    Authors: Pedram Hosseini, Christopher R. Wolfe, Mona Diab, David A. Broniatowski

    Abstract: Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions. In this work, we delineate the process of develo** GisPy, an open-source tool in Python for measuring the Gist Inference Score (GIS) in text. Evaluation of GisPy on documents in three benchmarks from the news and scientific text domai… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted to the 4th Workshop on Narrative Understanding @ NAACL 2022

  4. arXiv:2203.10428  [pdf, other

    cs.LG cs.AI

    PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication

    Authors: Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin

    Abstract: Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data, and training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions for every… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: ICLR 2022

  5. arXiv:2112.04905  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP. Inspired by ideas from sparse signal recovery, i-SpaSP operates by iteratively identifying a larger set of important parameter groups (e.g., filters or neurons) within a network that contribute most to the residual between pruned and dense network output, t… ▽ More

    Submitted 29 March, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: 29 pages, 4 figures, 4th Annual Conference on Learning for Dynamics and Control

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  6. arXiv:2108.00259  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    How much pre-training is enough to discover a good subnetwork?

    Authors: Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

    Abstract: Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. More often than not, it involves a three-step process -- pre-training, pruning, and re-training -- that is computationally expensive, as the dense model must be fully pre-trained. While previous work has revealed through experiments the relationship between the a… ▽ More

    Submitted 22 August, 2023; v1 submitted 31 July, 2021; originally announced August 2021.

    Comments: 29 pages

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  7. arXiv:2107.13054  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Exceeding the Limits of Visual-Linguistic Multi-Task Learning

    Authors: Cameron R. Wolfe, Keld T. Lundgaard

    Abstract: By leveraging large amounts of product data collected across hundreds of live e-commerce websites, we construct 1000 unique classification tasks that share similarly-structured input data, comprised of both text and images. These classification tasks focus on learning the product hierarchy of different e-commerce websites, causing many of them to be correlated. Adopting a multi-modal transformer m… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: 10 pages, 7 figures

    MSC Class: 68T07 ACM Class: I.2.6; I.2.7; I.2.10

  8. arXiv:2107.00961  [pdf, other

    cs.LG cs.CV cs.DC math.OC

    ResIST: Layer-Wise Decomposition of ResNets for Distributed Training

    Authors: Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

    Abstract: We propose ResIST, a novel distributed training protocol for Residual Networks (ResNets). ResIST randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the proc… ▽ More

    Submitted 14 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: 26 pages, 8 figures, pre-print under review

  9. arXiv:2102.10424  [pdf, other

    cs.LG cs.AI cs.DC math.OC

    GIST: Distributed Training for Large-Scale Graph Convolutional Networks

    Authors: Cameron R. Wolfe, **gkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis

    Abstract: The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on large-scale graphs (e.g., GraphSAGE, ClusterGCN, etc.), we pioneer efficient training of large-scale GCN models (i.e., ultra-wide, overparameterized mo… ▽ More

    Submitted 14 March, 2022; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: 28 pages, 5 figures, pre-print under review

    ACM Class: I.2.4

  10. arXiv:1912.00772  [pdf, other

    cs.LG stat.ML

    E-Stitchup: Data Augmentation for Pre-Trained Embeddings

    Authors: Cameron R. Wolfe, Keld T. Lundgaard

    Abstract: In this work, we propose data augmentation methods for embeddings from pre-trained deep learning models that take a weighted combination of a pair of input embeddings, as inspired by Mixup, and combine such augmentation with extra label softening. These methods are shown to significantly increase classification accuracy, reduce training time, and improve confidence calibration of a downstream mode… ▽ More

    Submitted 6 October, 2020; v1 submitted 27 November, 2019; originally announced December 2019.

    Comments: 11 pages, 7 figures

  11. arXiv:1910.02120  [pdf, other

    cs.LG stat.ML

    Distributed Learning of Deep Neural Networks using Independent Subnet Training

    Authors: Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine

    Abstract: Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, thus enabling reductions in training time. Distributed learning partitions models and data over many machines, allowing model and dataset sizes beyond the available compute power and memory of a single machine. In practice though, distributed ML is challenging when distribution is mandato… ▽ More

    Submitted 18 April, 2022; v1 submitted 4 October, 2019; originally announced October 2019.

  12. arXiv:1903.10103  [pdf, other

    cs.NE

    Functional Generative Design of Mechanisms with Recurrent Neural Networks and Novelty Search

    Authors: Cameron R. Wolfe, Cem C. Tutum, Risto Miikkulainen

    Abstract: Consumer-grade 3D printers have made it easier to fabricate aesthetic objects and static assemblies, opening the door to automated design of such objects. However, while static designs are easily produced with 3D printing, functional designs with moving parts are more difficult to generate: The search space is too high-dimensional, the resolution of the 3D-printed parts is not adequate, and it is… ▽ More

    Submitted 24 March, 2019; originally announced March 2019.

    Comments: 7 pages, GECCO 2019