-
Harmonizing Program Induction with Rate-Distortion Theory
Authors:
Hanqi Zhou,
David G. Nagy,
Charley M. Wu
Abstract:
Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take…
▽ More
Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take the form of programs. In this work, we adapt RDT by proposing a three way trade-off among rate (description length), distortion (error), and computational costs (search budget). We use simulations on a melody task to study the implications of this trade-off, and show that constructing a shared program library across tasks provides global benefits. However, this comes at the cost of sensitivity to curricula, which is also characteristic of human learners. Finally, we use methods from partial information decomposition to generate training curricula that induce more effective libraries and better generalization.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Query complexity of Boolean functions on the middle slice of the cube
Authors:
Dániel Gerbner,
Balázs Keszegh,
Dániel T. Nagy,
Kartal Nagy,
Dömötör Pálvölgyi,
Balázs Patkós,
Gábor Wiener
Abstract:
We study the query complexity on slices of Boolean functions. Among other results we show that there exists a Boolean function for which we need to query all but 7 input bits to compute its value, even if we know beforehand that the number of 0's and 1's in the input are the same, i.e., when our input is from the middle slice. This answers a question of Byramji. Our proof is non-constructive, but…
▽ More
We study the query complexity on slices of Boolean functions. Among other results we show that there exists a Boolean function for which we need to query all but 7 input bits to compute its value, even if we know beforehand that the number of 0's and 1's in the input are the same, i.e., when our input is from the middle slice. This answers a question of Byramji. Our proof is non-constructive, but we also propose a concrete candidate function that might have the above property. Our results are related to certain natural discrepancy type questions that, somewhat surprisingly, have not been studied before.
△ Less
Submitted 6 June, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Modelling continual learning in humans with Hebbian context gating and exponentially decaying task signals
Authors:
Timo Flesch,
David G. Nagy,
Andrew Saxe,
Christopher Summerfield
Abstract:
Humans can learn several tasks in succession with minimal mutual interference but perform more poorly when trained on multiple tasks at once. The opposite is true for standard deep neural networks. Here, we propose novel computational constraints for artificial neural networks, inspired by earlier work on gating in the primate prefrontal cortex, that capture the cost of interleaved training and al…
▽ More
Humans can learn several tasks in succession with minimal mutual interference but perform more poorly when trained on multiple tasks at once. The opposite is true for standard deep neural networks. Here, we propose novel computational constraints for artificial neural networks, inspired by earlier work on gating in the primate prefrontal cortex, that capture the cost of interleaved training and allow the network to learn two tasks in sequence without forgetting. We augment standard stochastic gradient descent with two algorithmic motifs, so-called "sluggish" task units and a Hebbian training step that strengthens connections between task units and hidden units that encode task-relevant information. We found that the "sluggish" units introduce a switch-cost during training, which biases representations under interleaved training towards a joint representation that ignores the contextual cue, while the Hebbian step promotes the formation of a gating scheme from task units to the hidden layer that produces orthogonal representations which are perfectly guarded against interference. Validating the model on previously published human behavioural data revealed that it matches performance of participants who had been trained on blocked or interleaved curricula, and that these performance differences were driven by misestimation of the true category boundary.
△ Less
Submitted 5 September, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Photonic Quantum Policy Learning in OpenAI Gym
Authors:
Dániel Nagy,
Zsolt Tabi,
Péter Hága,
Zsófia Kallus,
Zoltán Zimborás
Abstract:
In recent years, near-term noisy intermediate scale quantum (NISQ) computing devices have become available. One of the most promising application areas to leverage such NISQ quantum computer prototypes is quantum machine learning. While quantum neural networks are widely studied for supervised learning, quantum reinforcement learning is still just an emerging field of this area. To solve a classic…
▽ More
In recent years, near-term noisy intermediate scale quantum (NISQ) computing devices have become available. One of the most promising application areas to leverage such NISQ quantum computer prototypes is quantum machine learning. While quantum neural networks are widely studied for supervised learning, quantum reinforcement learning is still just an emerging field of this area. To solve a classical continuous control problem, we use a continuous-variable quantum machine learning approach. We introduce proximal policy optimization for photonic variational quantum agents and also study the effect of the data re-uploading. We present performance assessment via empirical study using Strawberry Fields, a photonic simulator Fock backend and a hybrid training framework connected to an OpenAI Gym environment and TensorFlow. For the restricted CartPole problem, the two variations of the photonic policy learning achieve comparable performance levels and a faster convergence than the baseline classical neural network of same number of trainable parameters.
△ Less
Submitted 29 August, 2021;
originally announced August 2021.
-
Solving large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs: performance comparisons of MPGOS, ODEINT and DifferentialEquations.jl
Authors:
Dániel Nagy,
Lambert Plavecz,
Ferenc Hegedűs
Abstract:
In this paper, the performance characteristics of different solution techniques and program packages to solve a large number of independent ordinary differential equation systems is examined. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision pe…
▽ More
In this paper, the performance characteristics of different solution techniques and program packages to solve a large number of independent ordinary differential equation systems is examined. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. The tested systems (Lorenz equation, Keller--Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge--Kutta type solvers are efficient and suitable choices. The tested program packages are MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. Using GPUs, the program package MPGOS is superior. For CPU computations, the ODEINT program package has the best performance.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
On Covering Numbers, Young Diagrams, and the Local Dimension of Posets
Authors:
Gábor Damásdi,
Stefan Felsner,
António Girão,
Balázs Keszegh,
David Lewis,
Dániel T. Nagy,
Torsten Ueckerdt
Abstract:
We study covering numbers and local covering numbers with respect to difference graphs and complete bipartite graphs. In particular we show that in every cover of a Young diagram with $\binom{2k}{k}$ steps with generalized rectangles there is a row or a column in the diagram that is used by at least $k+1$ rectangles, and prove that this is best-possible. This answers two questions by Kim, Martin,…
▽ More
We study covering numbers and local covering numbers with respect to difference graphs and complete bipartite graphs. In particular we show that in every cover of a Young diagram with $\binom{2k}{k}$ steps with generalized rectangles there is a row or a column in the diagram that is used by at least $k+1$ rectangles, and prove that this is best-possible. This answers two questions by Kim, Martin, Masa{ř}{\'ı}k, Shull, Smith, Uzzell, and Wang (Europ. J. Comb. 2020), namely:
- What is the local complete bipartite cover number of a difference graph? - Is there a sequence of graphs with constant local difference graph cover number and unbounded local complete bipartite cover number?
We add to the study of these local covering numbers with a lower bound construction and some examples. Following Kim \emph{et al.}, we use the results on local covering numbers to provide lower and upper bounds for the local dimension of partially ordered sets of height~2. We discuss the local dimension of some posets related to Boolean lattices and show that the poset induced by the first two layers of the Boolean lattice has local dimension $(1 + o(1))\log_2\log_2 n$. We conclude with some remarks on covering numbers for digraphs and Ferrers dimension.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Adaptive Majority Problems for Restricted Query Graphs and for Weighted Sets
Authors:
Gábor Damásdi,
Dániel Gerbner,
Gyula O. H. Katona,
Balázs Keszegh,
Dániel Lenger,
Abhishek Methuku,
Dániel T. Nagy,
Dömötör Pálvölgyi,
Balázs Patkós,
Máté Vizer,
Gábor Wiener
Abstract:
Suppose that the vertices of a graph $G$ are colored with two colors in an unknown way. The color that occurs on more than half of the vertices is called the majority color (if it exists), and any vertex of this color is called a majority vertex. We study the problem of finding a majority vertex (or show that none exists) if we can query edges to learn whether their endpoints have the same or diff…
▽ More
Suppose that the vertices of a graph $G$ are colored with two colors in an unknown way. The color that occurs on more than half of the vertices is called the majority color (if it exists), and any vertex of this color is called a majority vertex. We study the problem of finding a majority vertex (or show that none exists) if we can query edges to learn whether their endpoints have the same or different colors. Denote the least number of queries needed in the worst case by $m(G)$. It was shown by Saks and Werman that $m(K_n)=n-b(n)$, where $b(n)$ is the number of 1's in the binary representation of $n$.
In this paper, we initiate the study of the problem for general graphs. The obvious bounds for a connected graph $G$ on $n$ vertices are $n-b(n)\le m(G)\le n-1$. We show that for any tree $T$ on an even number of vertices we have $m(T)=n-1$ and that for any tree $T$ on an odd number of vertices, we have $n-65\le m(T)\le n-2$. Our proof uses results about the weighted version of the problem for $K_n$, which may be of independent interest. We also exhibit a sequence $G_n$ of graphs with $m(G_n)=n-b(n)$ such that $G_n$ has $O(nb(n))$ edges and $n$ vertices.
△ Less
Submitted 8 May, 2020; v1 submitted 20 March, 2019;
originally announced March 2019.
-
Topological Analysis of Bitcoin's Lightning Network
Authors:
István András Seres,
László Gulyás,
Dániel A. Nagy,
Péter Burcsi
Abstract:
Bitcoin's Lightning Network (LN) is a scalability solution for Bitcoin allowing transactions to be issued with negligible fees and settled instantly at scale. In order to use LN, funds need to be locked in payment channels on the Bitcoin blockchain (Layer-1) for subsequent use in LN (Layer-2). LN is comprised of many payment channels forming a payment channel network. LN's promise is that relative…
▽ More
Bitcoin's Lightning Network (LN) is a scalability solution for Bitcoin allowing transactions to be issued with negligible fees and settled instantly at scale. In order to use LN, funds need to be locked in payment channels on the Bitcoin blockchain (Layer-1) for subsequent use in LN (Layer-2). LN is comprised of many payment channels forming a payment channel network. LN's promise is that relatively few payment channels already enable anyone to efficiently, securely and privately route payments across the whole network. In this paper, we quantify the structural properties of LN and argue that LN's current topological properties can be ameliorated in order to improve the security of LN, enabling it to reach its true potential.
△ Less
Submitted 14 April, 2019; v1 submitted 15 January, 2019;
originally announced January 2019.
-
Episodic memory for continual model learning
Authors:
David G. Nagy,
Gergő Orbán
Abstract:
Both the human brain and artificial learning agents operating in real-world or comparably complex environments are faced with the challenge of online model selection. In principle this challenge can be overcome: hierarchical Bayesian inference provides a principled method for model selection and it converges on the same posterior for both off-line (i.e. batch) and online learning. However, maintai…
▽ More
Both the human brain and artificial learning agents operating in real-world or comparably complex environments are faced with the challenge of online model selection. In principle this challenge can be overcome: hierarchical Bayesian inference provides a principled method for model selection and it converges on the same posterior for both off-line (i.e. batch) and online learning. However, maintaining a parameter posterior for each model in parallel has in general an even higher memory cost than storing the entire data set and is consequently clearly unfeasible. Alternatively, maintaining only a limited set of models in memory could limit memory requirements. However, sufficient statistics for one model will usually be insufficient for fitting a different kind of model, meaning that the agent loses information with each model change. We propose that episodic memory can circumvent the challenge of limited memory-capacity online model selection by retaining a selected subset of data points. We design a method to compute the quantities necessary for model selection even when the data is discarded and only statistics of one (or few) learnt models are available. We demonstrate on a simple model that a limited-sized episodic memory buffer, when the content is optimised to retain data with statistics not matching the current representation, can resolve the fundamental challenge of online model selection.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.