Skip to main content

Showing 1–10 of 10 results for author: Odrzygóźdź, T

.
  1. arXiv:2402.07871  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Fine-Grained Mixture of Experts

    Authors: Jakub Krajewski, Jan Ludziejewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Piotr Sankowski, Marek Cygan, Sebastian Jaszczur

    Abstract: Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introduce a new hyperparameter, granularity, whose adjustment enables precise control over the size of the experts. Building on this, we establish scaling la… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  2. arXiv:2310.15961  [pdf, other

    cs.CL cs.LG

    Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

    Authors: Szymon Antoniak, Sebastian Jaszczur, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Tomasz Odrzygóźdź, Marek Cygan

    Abstract: Despite the promise of Mixture of Experts (MoE) models in increasing parameter counts of Transformer models while maintaining training and inference costs, their application carries notable drawbacks. The key strategy of these models is to, for each processed token, activate at most a few experts - subsets of an extensive feed-forward layer. But this approach is not without its challenges. The ope… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  3. arXiv:2206.00702  [pdf, other

    cs.AI cs.LG

    Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search

    Authors: Michał Zawalski, Michał Tyrolski, Konrad Czechowski, Tomasz Odrzygóźdź, Damian Stachura, Piotr Piękos, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

    Abstract: Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. To this end, AdaSubS generates diverse sets of subgoals at different distances. A verification mechanism is employed to filter out unreac… ▽ More

    Submitted 25 May, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: ICLR 2023 (notable-top-5%) website: https://sites.google.com/view/adaptivesubgoalsearch/

    ACM Class: I.2.8; I.2.6

  4. arXiv:2205.10893  [pdf, other

    cs.AI

    Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

    Authors: Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik

    Abstract: In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  5. arXiv:2108.11204  [pdf, other

    cs.AI cs.LG

    Subgoal Search For Complex Reasoning Tasks

    Authors: Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

    Abstract: Humans excel in solving complex reasoning tasks through a mental process of moving from one idea to a related one. Inspired by this, we propose Subgoal Search (kSubS) method. Its key component is a learned subgoal generator that produces a diversity of subgoals that are both achievable and closer to the solution. Using subgoals reduces the search space and induces a high-level search graph suitabl… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: NeurIPS 2021

  6. arXiv:2104.13903  [pdf, ps, other

    math.GR math.GT

    Nonplanar isoperimetric inequality for random groups

    Authors: Tomasz Odrzygóźdź

    Abstract: The goal of this note is to generalize Isoperimetric Inequality for random groups to the class of non-planar diagrams of bounded number of faces.

    Submitted 25 April, 2021; originally announced April 2021.

  7. arXiv:1906.05417  [pdf, other

    math.GR math.GT

    Bent walls for random groups in the square and hexagonal model

    Authors: Tomasz Odrzygóźdź

    Abstract: We consider two random group models: the hexagonal model and the square model, defined as the quotient of a free group by a random set of reduced words of length four and six respectively. Our first main result is that in this model there exists a sharp density threshold for Kazhdan's Property (T) and it equals 1/3. Our second main result is that for densities < 3/8 a random group in the square mo… ▽ More

    Submitted 22 June, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: 41 pages, 24 figures

  8. arXiv:1610.03376  [pdf, ps, other

    math.GR math.GT

    Cubulating random groups in the square model

    Authors: Tomasz Odrzygóźdź

    Abstract: Our main result is that for densities $<\frac{3}{10}$ a random group in the square model has the Haagerup property and is residually finite. Moreover, we generalize the Isoperimetric Inequality, to some class of non-planar diagrams and, using this, we introduce a system of modified hypergraphs providing the structure of a space with walls on the Cayley complex of a random group. Then we show that… ▽ More

    Submitted 9 October, 2016; originally announced October 2016.

    Comments: 30 pages, 18 figures

  9. arXiv:1405.2773  [pdf, ps, other

    math.GR math.GT

    The square model for random groups

    Authors: Tomasz Odrzygóźdź

    Abstract: We introduce a new random group model called the square model: we quotient a free group on $n$ generators by a random set of relations, each of which is a reduced word of length four. We prove, as in the Gromov density model, that for densities $> \frac{1}{2}$ a random group in the square model is trivial with overwhelming probability and for densities $<\frac{1}{2}$ a random group is with overwhe… ▽ More

    Submitted 13 May, 2014; v1 submitted 9 May, 2014; originally announced May 2014.

  10. arXiv:1211.1854  [pdf, ps, other

    physics.class-ph gr-qc quant-ph

    Half-page derivation of the Thomas precession

    Authors: Andrzej Dragan, Tomasz Odrzygóźdź

    Abstract: Instantaneous derivation of the Thomas precession with only basic vector calculus.

    Submitted 8 November, 2012; originally announced November 2012.