Skip to main content

Showing 1–9 of 9 results for author: DeCoste, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.10464  [pdf, other

    cs.LG cs.CL

    SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

    Authors: Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis DeCoste, Sean Lie, Shreyas Saxena

    Abstract: The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Sca… ▽ More

    Submitted 29 July, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted to Uncertainty in Artificial Intelligence (UAI) 2023 Conference; 13 pages, 4 figures (Main Paper) + 5 pages (Supplementary Material)

  2. arXiv:2206.14098  [pdf, other

    cs.LG cs.AI cs.CV

    RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

    Authors: Vitaliy Chiley, Vithursan Thangarasa, Abhay Gupta, Anshul Samar, Joel Hestness, Dennis DeCoste

    Abstract: This work introduces RevSilo, the first reversible bidirectional multi-scale feature fusion module. Like other reversible methods, RevSilo eliminates the need to store hidden activations by recomputing them. However, existing reversible methods do not apply to multi-scale feature fusion and are, therefore, not applicable to a large class of networks. Bidirectional multi-scale feature fusion promot… ▽ More

    Submitted 28 April, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Presented at MLSys 2023. Code available from Cerebras Systems: https://github.com/CerebrasResearch/RevBiFPN

  3. arXiv:2105.13464  [pdf, other

    cs.LG cs.AI cs.CV

    Training With Data Dependent Dynamic Learning Rates

    Authors: Shreyas Saxena, Nidhi Vyas, Dennis DeCoste

    Abstract: Recently many first and second order variants of SGD have been proposed to facilitate training of Deep Neural Networks (DNNs). A common limitation of these works stem from the fact that they use the same learning rate across all instances present in the dataset. This setting is widely adopted under the assumption that loss functions for each instance are similar in nature, and hence, a common lear… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

  4. Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation

    Authors: Mihir Pendse, Vithursan Thangarasa, Vitaliy Chiley, Ryan Holmdahl, Joel Hestness, Dennis DeCoste

    Abstract: We propose combining memory saving techniques with traditional U-Net architectures to increase the complexity of the models on the Brain Tumor Segmentation (BraTS) challenge. The BraTS challenge consists of a 3D segmentation of a 240x240x155x4 input image into a set of tumor classes. Because of the large volume and need for 3D convolutional layers, this task is very memory intensive. To address th… ▽ More

    Submitted 20 April, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: 11 pages, 5 figures, Published at MICCAI Brainles 2020

    Journal ref: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (2021) 388-397

  5. arXiv:2001.02312  [pdf, other

    cs.LG stat.ML

    Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

    Authors: Vipul Gupta, Santiago Akle Serrano, Dennis DeCoste

    Abstract: We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple models computed independently and in parallel. The resulting models generalize equally well as those trained with small mini-batches but are produced in a substan… ▽ More

    Submitted 7 January, 2020; originally announced January 2020.

  6. arXiv:1412.6599  [pdf, other

    cs.LG

    Hot Swap** for Online Adaptation of Optimization Hyperparameters

    Authors: Kevin Bache, Dennis DeCoste, Padhraic Smyth

    Abstract: We describe a general framework for online adaptation of optimization hyperparameters by `hot swap**' their values during learning. We investigate this approach in the context of adaptive learning rate selection using an explore-exploit strategy from the multi-armed bandit literature. Experiments on a benchmark neural network show that the hot swap** approach leads to consistently better solut… ▽ More

    Submitted 13 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

    Comments: Submission to ICLR 2015

    MSC Class: 62L20 ACM Class: G.1.6; I.2.6

  7. arXiv:1410.0736  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition

    Authors: Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, Yizhou Yu

    Abstract: In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of ca… ▽ More

    Submitted 15 May, 2015; v1 submitted 2 October, 2014; originally announced October 2014.

    Comments: Add new results on ImageNet using VGG-16-layer building block net

  8. arXiv:1404.5351  [pdf, other

    cs.CV

    Fast Approximate Matching of Cell-Phone Videos for Robust Background Subtraction

    Authors: Raffay Hamid, Atish Das Sarma, Dennis DeCoste, Neel Sundaresan

    Abstract: We identify a novel instance of the background subtraction problem that focuses on extracting near-field foreground objects captured using handheld cameras. Given two user-generated videos of a scene, one with and the other without the foreground object(s), our goal is to efficiently generate an output video with only the foreground object(s) present in it. We cast this challenge as a spatio-tempo… ▽ More

    Submitted 21 April, 2014; originally announced April 2014.

  9. arXiv:1312.4626  [pdf, other

    stat.ML cs.LG

    Compact Random Feature Maps

    Authors: Raffay Hamid, Ying Xiao, Alex Gittens, Dennis DeCoste

    Abstract: Kernel approximation using randomized feature maps has recently gained a lot of interest. In this work, we identify that previous approaches for polynomial kernel approximation create maps that are rank deficient, and therefore do not utilize the capacity of the projected feature space effectively. To address this challenge, we propose compact random feature maps (CRAFTMaps) to approximate polynom… ▽ More

    Submitted 16 December, 2013; originally announced December 2013.

    Comments: 9 pages