Skip to main content

Showing 1–4 of 4 results for author: Sievert, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10522  [pdf, other

    cs.LG cs.AI cs.CL

    Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

    Authors: Jifan Zhang, Lalit Jain, Yang Guo, Jiayi Chen, Kuan Lok Zhou, Siddharth Suresh, Andrew Wagenmaker, Scott Sievert, Timothy Rogers, Kevin Jamieson, Robert Mankoff, Robert Nowak

    Abstract: We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  2. arXiv:2310.04986  [pdf, other

    econ.TH cs.AI physics.class-ph

    A new economic and financial theory of money

    Authors: Michael E. Glinsky, Sharon Sievert

    Abstract: This paper fundamentally reformulates economic and financial theory to include electronic currencies. The valuation of the electronic currencies will be based on macroeconomic theory and the fundamental equation of monetary policy, not the microeconomic theory of discounted cash flows. The view of electronic currency as a transactional equity associated with tangible assets of a sub-economy will b… ▽ More

    Submitted 10 January, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: 51 pages, 35 figures, 158 equations, to be submitted to Journal of Economic Affairs

  3. arXiv:1910.08222  [pdf, other

    cs.LG math.OC stat.ML

    Improving the convergence of SGD through adaptive batch sizes

    Authors: Scott Sievert, Shrey Shah

    Abstract: Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update but can yield high-variance gradient estimates, which poses some challenges for optimization. Conversely, large batches require more computation but can yield… ▽ More

    Submitted 27 September, 2023; v1 submitted 17 October, 2019; originally announced October 2019.

  4. arXiv:1806.04090  [pdf, other

    stat.ML cs.DC cs.LG

    ATOMO: Communication-efficient Learning via Atomic Sparsification

    Authors: Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos

    Abstract: Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular va… ▽ More

    Submitted 8 November, 2018; v1 submitted 11 June, 2018; originally announced June 2018.