Skip to main content

Showing 1–9 of 9 results for author: Krajewski, J

.
  1. arXiv:2402.07871  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Fine-Grained Mixture of Experts

    Authors: Jakub Krajewski, Jan Ludziejewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Piotr Sankowski, Marek Cygan, Sebastian Jaszczur

    Abstract: Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introduce a new hyperparameter, granularity, whose adjustment enables precise control over the size of the experts. Building on this, we establish scaling la… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  2. arXiv:2401.04081  [pdf, other

    cs.LG cs.AI cs.CL

    MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

    Authors: Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Michał Krutul, Jakub Krajewski, Szymon Antoniak, Piotr Miłoś, Marek Cygan, Sebastian Jaszczur

    Abstract: State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, including recent state-of-the-art open models. We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcas… ▽ More

    Submitted 26 February, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  3. arXiv:2310.15961  [pdf, other

    cs.CL cs.LG

    Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

    Authors: Szymon Antoniak, Sebastian Jaszczur, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Tomasz Odrzygóźdź, Marek Cygan

    Abstract: Despite the promise of Mixture of Experts (MoE) models in increasing parameter counts of Transformer models while maintaining training and inference costs, their application carries notable drawbacks. The key strategy of these models is to, for each processed token, activate at most a few experts - subsets of an extensive feed-forward layer. But this approach is not without its challenges. The ope… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  4. Scaling slowly rotating asteroids by stellar occultations

    Authors: A. Marciniak, J. Ďurech, A. Choukroun, J. Hanuš, W. Ogłoza, R. Szakáts, L. Molnár, A. Pál, F. Monteiro, E. Frappa, W. Beisker, H. Pavlov, J. Moore, R. Adomavičienė, R. Aikawa, S. Andersson, P. Antonini, Y. Argentin, A. Asai, P. Assoignon, J. Barton, P. Baruffetti, K. L. Bath, R. Behrend, L. Benedyktowicz , et al. (154 additional authors not shown)

    Abstract: As evidenced by recent survey results, majority of asteroids are slow rotators (P>12 h), but lack spin and shape models due to selection bias. This bias is skewing our overall understanding of the spins, shapes, and sizes of asteroids, as well as of their other properties. Also, diameter determinations for large (>60km) and medium-sized asteroids (between 30 and 60 km) often vary by over 30% for m… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to Astronomy & Astrophysics. 12 pages + appendices

    Journal ref: A&A 679, A60 (2023)

  5. arXiv:2208.01369  [pdf, other

    cs.CV eess.IV q-bio.NC

    The Face of Affective Disorders

    Authors: Christian S. Pilz, Benjamin Clemens, Inka C. Hiss, Christoph Weiss, Ulrich Canzler, Jarek Krajewski, Ute Habel, Steffen Leonhardt

    Abstract: We study the statistical properties of facial behaviour altered by the regulation of brain arousal in the clinical domain of psychiatry. The underlying mechanism is linked to the empirical interpretation of the vigilance continuum as behavioral surrogate measurement for certain states of mind. Referring to the classical scalp-based obtrusive measurements, we name the presented method Opto-Electron… ▽ More

    Submitted 5 September, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: 15 pages. Submitted for Peer Review to the IEEE Transaction on Affective Computing

    Report number: rev-2.11-2022

  6. Properties of slowly rotating asteroids from the Convex Inversion Thermophysical Model

    Authors: A. Marciniak, J. Ďurech, V. Alí-Lagoa, W. Ogłoza, R. Szakáts, T. G. Müller, L. Molnár, A. Pál, F. Monteiro, P. Arcoverde, R. Behrend, Z. Benkhaldoun, L. Bernasconi, J. Bosch, S. Brincat, L. Brunetto, M. Butkiewicz - Bąk, F. Del Freo, R. Duffard, M. Evangelista-Santana, G. Farroni, S. Fauvaud, M. Fauvaud, M. Ferrais, S. Geier , et al. (51 additional authors not shown)

    Abstract: Results from the TESS mission showed that previous studies strngly underestimated the number of slow rotators, revealing the importance of studying those asteroids. For most slowly rotating asteroids (P > 12), no spin and shape model is available because of observation selection effects. This hampers determination of their thermal parameters and accurate sizes. We continue our campaign in minimi… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: Accepted to Astronomy & Astrophysics. 10 pages + appendices

    Journal ref: A&A 654, A87 (2021)

  7. arXiv:2005.14155  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.SR

    Demonstrating high-precision photometry with a CubeSat: ASTERIA observations of 55 Cancri e

    Authors: Mary Knapp, Sara Seager, Brice-Olivier Demory, Akshata Krishnamurthy, Matthew W. Smith, Christopher M. Pong, Vanessa P. Bailey, Amanda Donner, Peter Di Pasquale, Brian Campuzano, Colin Smith, Jason Luu, Alessandra Babuscia, Robert L. Bocchino, Jr., Jessica Loveland, Cody Colley, Tobias Gedenk, Tejas Kulkarni, Kyle Hughes, Mary White, Joel Krajewski, Lorraine Fesq

    Abstract: ASTERIA (Arcsecond Space Telescope Enabling Research In Astrophysics) is a 6U CubeSat space telescope (10 cm x 20 cm x 30 cm, 10 kg). ASTERIA's primary mission objective was demonstrating two key technologies for reducing systematic noise in photometric observations: high-precision pointing control and high-stabilty thermal control. ASTERIA demonstrated 0.5 arcsecond RMS pointing stability and… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: 23 pages, 9 figures. Accepted in AJ

  8. Multiple Field-Induced Phase Transitions in a Geometrically-Frustrated Dipolar Magnet - Gd2Ti2O7

    Authors: A. P. Ramirez, B. S. Shastry, A. Hayashi, J. J. Krajewski, D. A. Huse, R. J. Cava

    Abstract: Field-driven phase transitions generally arise from competition between Zeeman energy and exchange or crystal-field anisotropy. Here we present the phase diagram of a frustrated pyrochlore magnet Gd2Ti2O7, where crystal field splitting is small compared to the dipolar energy. We find good agreement between zero-temperature critical fields and those obtained from a mean-field model. Here, dipola… ▽ More

    Submitted 3 December, 2001; originally announced December 2001.

    Comments: 10pages,5figures: pdf file attached PACS 75.30.Kz, 75.50.Ee, 75.10.-b

    Journal ref: Phys. Rev. Lett. 89, 067202 (2002).

  9. Neutron Scattering Study of Crystal Field Energy Levels and Field Dependence of the Magnetic Order in Superconducting HoNi2B2C

    Authors: T. E. Grigereit, J. W. Lynn, R. J. Cava, J. J. Krajewski, W. F. Peck, Jr.

    Abstract: Elastic and inelastic neutron scattering measurements have been carried out to investigate the magnetic properties of superconducting (Tc~8K) HoNi2B2C. The inelastic measurements reveal that the lowest two crystal field transitions out of the ground state occurat 11.28(3) and 16.00(2) meV, while the transition of 4.70(9) meV between these two levels is observed at elevated temperatures. The temp… ▽ More

    Submitted 23 May, 1995; originally announced May 1995.

    Comments: RevTex, 7 pages, 11 figures (available upon request); Physica C