Skip to main content

Showing 1–5 of 5 results for author: Marion, M

.
  1. arXiv:2405.20541  [pdf, other

    cs.LG cs.CL

    Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

    Authors: Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

    Abstract: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2309.04564  [pdf, other

    cs.CL cs.LG

    When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

    Authors: Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

    Abstract: Large volumes of text data have contributed significantly to the development of large language models (LLMs) in recent years. This data is typically acquired by scra** the internet, leading to pretraining datasets comprised of noisy web text. To date, efforts to prune these datasets down to a higher quality subset have relied on hand-crafted heuristics encoded as rule-based filters. In this work… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 14 pages, 8 figures

  3. arXiv:1904.01839  [pdf, ps, other

    math.AP

    Existence of pulses for a reaction-diffusion system of blood coagulation

    Authors: Nicolas Ratto, Martine Marion, Vitaly Volpert

    Abstract: The paper is devoted to the investigation of a reaction-diffusion system of equations describing the process of blood coagulation. Existence of pulses solutions, that is, positive stationary solutions with zero limit at infinity is studied. It is shown that such solutions exist if and only if the speed of the travelling wave described by the same system is positive. The proof is based on the Leray… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

  4. Hydrothermal formation of Clay-Carbonate alteration assemblages in the Nili Fossae region of Mars

    Authors: Adrian J. Brown, Simon J. Hook, Alice M. Baldridge, James K. Crowley, Nathan T. Bridges, Bradley J. Thomson, Giles M. Marion, Carlos R. de Souza Filho, Janice L. Bishop

    Abstract: The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) has returned observations of the Nili Fossae region indicating the presence of Mg- carbonate in small (<10km sq2), relatively bright rock units that are commonly fractured (Ehlmann et al., 2008b). We have analyzed spectra from CRISM images and used co-located HiRISE images in order to further characterize these carbonate-bearing unit… ▽ More

    Submitted 5 February, 2014; originally announced February 2014.

    Comments: 21 pages, 7 figures, 2 tables

    Journal ref: Earth and Planetary Science Letters (2010) 297, 174-182

  5. arXiv:1310.2624  [pdf, ps, other

    math.AP

    Global existence for fully nonlinear reaction-diffusion systems describing multicomponent reactive flows

    Authors: Martine Marion, Roger Temam

    Abstract: We consider combustion problems in the presence of complex chemistry and nonlinear diffusion laws leading to fully nonlinear multispecies reaction-diffusion equations. We establish results of existence of solution and maximum principle, i.e. positivity of the mass fractions, which rely on specific properties of the models. The nonlinear diffusion coefficients are obtained by resolution of the so-c… ▽ More

    Submitted 9 October, 2013; originally announced October 2013.

    MSC Class: 35