Skip to main content

Showing 1–10 of 10 results for author: Belcak, P

.
  1. arXiv:2311.10770  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Exponentially Faster Language Modelling

    Authors: Peter Belcak, Roger Wattenhofer

    Abstract: Language models only really need to use an exponential fraction of their neurons for individual inferences. As proof, we present UltraFastBERT, a BERT variant that uses 0.3% of its neurons during inference while performing on par with similar BERT models. UltraFastBERT selectively engages just 12 out of 4095 neurons for each layer inference. This is achieved by replacing feedforward networks with… ▽ More

    Submitted 21 November, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

  2. arXiv:2308.14711  [pdf, other

    cs.LG cs.AI cs.PF

    Fast Feedforward Networks

    Authors: Peter Belcak, Roger Wattenhofer

    Abstract: We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a log-time alternative to feedforward networks. We demonstrate that FFFs are up to 220x faster than feedforward networks, up to 6x faster than mixture-of-experts networks, and exhibit better training properties than mixtures of experts thanks to noiseless conditional execu… ▽ More

    Submitted 18 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: 12 pages, 6 figures, 4 tables

  3. arXiv:2306.01009  [pdf, other

    cs.CL cs.AI cs.LG

    Examining the Emergence of Deductive Reasoning in Generative Language Models

    Authors: Peter Belcak, Luca A. Lanzendörfer, Roger Wattenhofer

    Abstract: We conduct a preliminary inquiry into the ability of generative transformer models to deductively reason from premises provided. We observe notable differences in the performance of models coming from different training setups and find that the deductive reasoning ability increases with scale. Further, we discover that the performance generally does not decrease with the length of the deductive ch… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: Accepted to the 1st Natural Language Reasoning and Structured Explanations Workshop (NLRSE@ACL'23). 8 pages, 4 figures, 3 tables

  4. arXiv:2210.16606  [pdf, other

    cs.LG cs.AI cs.SC

    Neural Combinatorial Logic Circuit Synthesis from Input-Output Examples

    Authors: Peter Belcak, Roger Wattenhofer

    Abstract: We propose a novel, fully explainable neural approach to synthesis of combinatorial logic circuits from input-output examples. The carrying advantage of our method is that it readily extends to inductive scenarios, where the set of examples is incomplete but still indicative of the desired behaviour. Our method can be employed for a virtually arbitrary choice of atoms - from logic gates to FPGA bl… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted to the 2nd Workshop on Math-AI (MATH-AI@NeurIPS'22). 10 pages, 1 figure

  5. arXiv:2209.11628  [pdf, other

    cs.LG cs.CL

    A Neural Model for Regular Grammar Induction

    Authors: Peter Belcák, David Hofer, Roger Wattenhofer

    Abstract: Grammatical inference is a classical problem in computational learning theory and a topic of wider influence in natural language processing. We treat grammars as a model of computation and propose a novel neural approach to induction of regular grammars from positive and negative examples. Our model is fully explainable, its intermediate results are directly interpretable as partial parses, and it… ▽ More

    Submitted 1 October, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted to the 21st IEEE International Conference on Machine Learning and Applications (ICMLA) 2022, 6 pages, 4 figures

  6. arXiv:2209.10280  [pdf, other

    cs.LG cs.AI

    Periodic Extrapolative Generalisation in Neural Networks

    Authors: Peter Belcák, Roger Wattenhofer

    Abstract: The learning of the simplest possible computational pattern -- periodicity -- is an open problem in the research of strong generalisation in neural networks. We formalise the problem of extrapolative generalisation for periodic signals and systematically investigate the generalisation abilities of classical, population-based, and recently proposed periodic architectures on a set of benchmarking ta… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted to IEEE Symposium on Deep Learning (IEEE DL) 2022, 8 pages, 7 figures

  7. arXiv:2209.09543  [pdf, other

    cs.LG cs.AI cs.SC

    FACT: Learning Governing Abstractions Behind Integer Sequences

    Authors: Peter Belcák, Ard Kastrati, Flavio Schenker, Roger Wattenhofer

    Abstract: Integer sequences are of central importance to the modeling of concepts admitting complete finitary descriptions. We introduce a novel view on the learning of such concepts and lay down a set of benchmarking tasks aimed at conceptual understanding by machine learning models. These tasks indirectly assess model ability to abstract, and challenge them to reason both interpolatively and extrapolative… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: Accepted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. 37 pages

  8. arXiv:2208.10290  [pdf, other

    cs.LG cs.DS

    Deterministic Graph-Walking Program Mining

    Authors: Peter Belcak, Roger Wattenhofer

    Abstract: Owing to their versatility, graph structures admit representations of intricate relationships between the separate entities comprising the data. We formalise the notion of connection between two vertex sets in terms of edge and vertex features by introducing graph-walking programs. We give two algorithms for mining of deterministic graph-walking programs that yield programs in the order of increas… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: Paper accepted for an oral presentation at Advanced Data Mining and Applications (ADMA) 2022. 15 pages, 3 figures

    MSC Class: 68T10; 68T09 ACM Class: I.3; I.5

  9. arXiv:2010.07874   

    cs.PL cs.CL cs.FL

    The LL(finite) strategy for optimal LL(k) parsing

    Authors: Peter Belcak

    Abstract: The LL(finite) parsing strategy for parsing of LL(k) grammars where k needs not to be known is presented. The strategy parses input in linear time, uses arbitrary but always minimal lookahead necessary to disambiguate between alternatives of nonterminals, and it is optimal in the number of lookahead terminal scans performed. Modifications to the algorithm are shown that allow for resolution of gra… ▽ More

    Submitted 20 January, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: An error was found in one of the algorithms for weak LL(k) grammars

  10. arXiv:2008.07871  [pdf, other

    q-fin.CP cs.MA q-fin.TR

    Fast Agent-Based Simulation Framework with Applications to Reinforcement Learning and the Study of Trading Latency Effects

    Authors: Peter Belcak, Jan-Peter Calliess, Stefan Zohren

    Abstract: We introduce a new software toolbox for agent-based simulation. Facilitating rapid prototy** by offering a user-friendly Python API, its core rests on an efficient C++ implementation to support simulation of large-scale multi-agent systems. Our software environment benefits from a versatile message-driven architecture. Originally developed to support research on financial markets, it offers the… ▽ More

    Submitted 21 September, 2022; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: Presented at the International Workshop on Multi-Agent Systems and Agent-Based Simulation (MABS@AAMAS) 2021, 12 pages, 8 figures