Search | arXiv e-print repository

arXiv:2405.20278 [pdf, ps, other]

Length independent generalization bounds for deep SSM architectures

Authors: Dániel Rácz, Mihály Petreczky, Bálint Daróczy

Abstract: Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with stable SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in… ▽ More Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with stable SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases. △ Less

Submitted 11 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 20 pages, no figures, accepted at ICML 2024 Next Generation of Sequence Modeling Architectures Workshop

MSC Class: 68 ACM Class: I.2.6

arXiv:2405.10054 [pdf, other]

A finite-sample generalization bound for stable LPV systems

Authors: Daniel Racz, Martin Gonzalez, Mihaly Petreczky, Andras Benczur, Balint Daroczy

Abstract: One of the main theoretical challenges in learning dynamical systems from data is providing upper bounds on the generalization error, that is, the difference between the expected prediction error and the empirical prediction error measured on some finite sample. In machine learning, a popular class of such bounds are the so-called Probably Approximately Correct (PAC) bounds. In this paper, we deri… ▽ More One of the main theoretical challenges in learning dynamical systems from data is providing upper bounds on the generalization error, that is, the difference between the expected prediction error and the empirical prediction error measured on some finite sample. In machine learning, a popular class of such bounds are the so-called Probably Approximately Correct (PAC) bounds. In this paper, we derive a PAC bound for stable continuous-time linear parameter-varying (LPV) systems. Our bound depends on the H2 norm of the chosen class of the LPV systems, but does not depend on the time interval for which the signals are considered. △ Less

Submitted 21 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: 8 pages, 1 figure, under review

MSC Class: 68 ACM Class: I.2.0

arXiv:2310.17378 [pdf, other]

Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle

Authors: Dániel Rácz, Mihály Petreczky, András Csertán, Bálint Daróczy

Abstract: Recent advances in deep learning have given us some very promising results on the generalization ability of deep neural networks, however literature still lacks a comprehensive theory explaining why heavily over-parametrized models are able to generalize well while fitting the training data. In this paper we propose a PAC type bound on the generalization error of feedforward ReLU networks via esti… ▽ More Recent advances in deep learning have given us some very promising results on the generalization ability of deep neural networks, however literature still lacks a comprehensive theory explaining why heavily over-parametrized models are able to generalize well while fitting the training data. In this paper we propose a PAC type bound on the generalization error of feedforward ReLU networks via estimating the Rademacher complexity of the set of networks available from an initial parameter vector via gradient descent. The key idea is to bound the sensitivity of the network's gradient to perturbation of the input data along the optimization trajectory. The obtained bound does not explicitly depend on the depth of the network. Our results are experimentally verified on the MNIST and CIFAR-10 datasets. △ Less

Submitted 4 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 17 pages, 5 figures, OPT2023: 15th Annual Workshop on Optimization for Machine Learning at the 37th NeurIPS 2023, New Orleans, LA, USA

MSC Class: 68 ACM Class: I.2.6

arXiv:2307.03630 [pdf, ps, other]

PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs

Authors: Dániel Rácz, Mihály Petreczky, Bálint Daróczy

Abstract: We consider the problem of learning Neural Ordinary Differential Equations (neural ODEs) within the context of Linear Parameter-Varying (LPV) systems in continuous-time. LPV systems contain bilinear systems which are known to be universal approximators for non-linear systems. Moreover, a large class of neural ODEs can be embedded into LPV systems. As our main contribution we provide Probably Appro… ▽ More We consider the problem of learning Neural Ordinary Differential Equations (neural ODEs) within the context of Linear Parameter-Varying (LPV) systems in continuous-time. LPV systems contain bilinear systems which are known to be universal approximators for non-linear systems. Moreover, a large class of neural ODEs can be embedded into LPV systems. As our main contribution we provide Probably Approximately Correct (PAC) bounds under stability for LPV systems related to neural ODEs. The resulting bounds have the advantage that they do not depend on the integration interval. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 12 pages

MSC Class: 68 ACM Class: I.2.0

arXiv:2110.13581 [pdf, other]

Gradient representations in ReLU networks as similarity functions

Authors: Dániel Rácz, Bálint Daróczy

Abstract: Feed-forward networks can be interpreted as map**s with linear decision surfaces at the level of the last layer. We investigate how the tangent space of the network can be exploited to refine the decision in case of ReLU (Rectified Linear Unit) activations. We show that a simple Riemannian metric parametrized on the parameters of the network forms a similarity function at least as good as the or… ▽ More Feed-forward networks can be interpreted as map**s with linear decision surfaces at the level of the last layer. We investigate how the tangent space of the network can be exploited to refine the decision in case of ReLU (Rectified Linear Unit) activations. We show that a simple Riemannian metric parametrized on the parameters of the network forms a similarity function at least as good as the original network and we suggest a sparse metric to increase the similarity gap. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted at 29th ESANN 2021, 6-8 October 2021, Belgium, 7 pages, 1 figure

arXiv:2102.00949 [pdf, other]

Quantum Inspired Adaptive Boosting

Authors: Bálint Daróczy, Katalin Friedl, László Kabódi, Attila Pereszlényi, Dániel Szabó

Abstract: Building on the quantum ensemble based classifier algorithm of Schuld and Petruccione [arXiv:1704.02146v1], we devise equivalent classical algorithms which show that this quantum ensemble method does not have advantage over classical algorithms. Essentially, we simplify their algorithm until it is intuitive to come up with an equivalent classical version. One of the classical algorithms is extreme… ▽ More Building on the quantum ensemble based classifier algorithm of Schuld and Petruccione [arXiv:1704.02146v1], we devise equivalent classical algorithms which show that this quantum ensemble method does not have advantage over classical algorithms. Essentially, we simplify their algorithm until it is intuitive to come up with an equivalent classical version. One of the classical algorithms is extremely simple and runs in constant time for each input to be classified. We further develop the idea and, as the main contribution of the paper, we propose methods inspired by combining the quantum ensemble method with adaptive boosting. The algorithms were tested and found to be comparable to the AdaBoost algorithm on publicly available data sets. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: 11 pages, 1 figure

arXiv:2006.06780 [pdf, other]

Tangent Space Sensitivity and Distribution of Linear Regions in ReLU Networks

Authors: Bálint Daróczy

Abstract: Recent articles indicate that deep neural networks are efficient models for various learning problems. However they are often highly sensitive to various changes that cannot be detected by an independent observer. As our understanding of deep neural networks with traditional generalization bounds still remains incomplete, there are several measures which capture the behaviour of the model in case… ▽ More Recent articles indicate that deep neural networks are efficient models for various learning problems. However they are often highly sensitive to various changes that cannot be detected by an independent observer. As our understanding of deep neural networks with traditional generalization bounds still remains incomplete, there are several measures which capture the behaviour of the model in case of small changes at a specific state. In this paper we consider adversarial stability in the tangent space and suggest tangent sensitivity in order to characterize stability. We focus on a particular kind of stability with respect to changes in parameters that are induced by individual examples without known labels. We derive several easily computable bounds and empirical measures for feed-forward fully connected ReLU (Rectified Linear Unit) networks and connect tangent sensitivity to the distribution of the activation regions in the input space realized by the network. Our experiments suggest that even simple bounds and measures are associated with the empirical generalization gap. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Comments: 14 pages, 4 figures, 2 tables

MSC Class: 68T07 ACM Class: I.2.6

arXiv:1912.09306 [pdf, other]

Tangent Space Separability in Feedforward Neural Networks

Authors: Bálint Daróczy, Rita Aleksziev, András Benczúr

Abstract: Hierarchical neural networks are exponentially more efficient than their corresponding "shallow" counterpart with the same expressive power, but involve huge number of parameters and require tedious amounts of training. By approximating the tangent subspace, we suggest a sparse representation that enables switching to shallow networks, GradNet after a very early training stage. Our experiments sho… ▽ More Hierarchical neural networks are exponentially more efficient than their corresponding "shallow" counterpart with the same expressive power, but involve huge number of parameters and require tedious amounts of training. By approximating the tangent subspace, we suggest a sparse representation that enables switching to shallow networks, GradNet after a very early training stage. Our experiments show that the proposed approximation of the metric improves and sometimes even surpasses the achievable performance of the original network significantly even after a few epochs of training the original feedforward network. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: 10 pages; accepted at Workshop "Beyond First-Order Optimization Methods in Machine Learning", 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). arXiv admin note: substantial text overlap with arXiv:1807.06630

MSC Class: I.2.6; I.5.1 ACM Class: I.2.6; I.5.1

arXiv:1807.06630 [pdf, other]

Expressive power of outer product manifolds on feed-forward neural networks

Authors: Bálint Daróczy, Rita Aleksziev, András Benczúr

Abstract: Hierarchical neural networks are exponentially more efficient than their corresponding "shallow" counterpart with the same expressive power, but involve huge number of parameters and require tedious amounts of training. Our main idea is to mathematically understand and describe the hierarchical structure of feedforward neural networks by reparametrization invariant Riemannian metrics. By computing… ▽ More Hierarchical neural networks are exponentially more efficient than their corresponding "shallow" counterpart with the same expressive power, but involve huge number of parameters and require tedious amounts of training. Our main idea is to mathematically understand and describe the hierarchical structure of feedforward neural networks by reparametrization invariant Riemannian metrics. By computing or approximating the tangent subspace, we better utilize the original network via sparse representations that enables switching to shallow networks after a very early training stage. Our experiments show that the proposed approximation of the metric improves and sometimes even surpasses the achievable performance of the original network significantly even after a few epochs of training the original feedforward network. △ Less

Submitted 17 July, 2018; originally announced July 2018.

Comments: 11 pages, 8 figures, under submission

arXiv:1804.05705 [pdf, other]

doi 10.1145/3201064.3201088

And Now for Something Completely Different: Visual Novelty in an Online Network of Designers

Authors: Johannes Wachs, Bálint Daróczy, Anikó Hannák, Katinka Páll, Christoph Riedl

Abstract: Novelty is a key ingredient of innovation but quantifying it is difficult. This is especially true for visual work like graphic design. Using designs shared on an online social network of professional digital designers, we measure visual novelty using statistical learning methods to compare an images features with those of images that have been created before. We then relate social network positio… ▽ More Novelty is a key ingredient of innovation but quantifying it is difficult. This is especially true for visual work like graphic design. Using designs shared on an online social network of professional digital designers, we measure visual novelty using statistical learning methods to compare an images features with those of images that have been created before. We then relate social network position to the novelty of the designers images. We find that on this professional platform, users with dense local networks tend to produce more novel but generally less successful images, with important exceptions. Namely, users making novel images while embedded in cohesive local networks are more successful. △ Less

Submitted 23 April, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

Comments: accepted to 10th International ACM Web Science Conference, 2018, May 27-30, Amsterdam, The Netherlands, 11 pages, 6 figures, 60 references

arXiv:1705.04964 [pdf, other]

doi 10.15476/ELTE.2016.086

Machine learning methods for multimedia information retrieval

Authors: Bálint Zoltán Daróczy

Abstract: In this thesis we examined several multimodal feature extraction and learning methods for retrieval and classification purposes. We reread briefly some theoretical results of learning in Section 2 and reviewed several generative and discriminative models in Section 3 while we described the similarity kernel in Section 4. We examined different aspects of the multimodal image retrieval and classific… ▽ More In this thesis we examined several multimodal feature extraction and learning methods for retrieval and classification purposes. We reread briefly some theoretical results of learning in Section 2 and reviewed several generative and discriminative models in Section 3 while we described the similarity kernel in Section 4. We examined different aspects of the multimodal image retrieval and classification in Section 5 and suggested methods for identifying quality assessments of Web documents in Section 6. In our last problem we proposed similarity kernel for time-series based classification. The experiments were carried over publicly available datasets and source codes for the most essential parts are either open source or released. Since the used similarity graphs (Section 4.2) are greatly constrained for computational purposes, we would like to continue work with more complex, evolving and capable graphs and apply for different problems such as capturing the rapid change in the distribution (e.g. session based recommendation) or complex graphs of the literature work. The similarity kernel with the proper metrics reaches and in many cases improves over the state-of-the-art. Hence we may conclude generative models based on instance similarities with multiple modes is a generally applicable model for classification and regression tasks ranging over various domains, including but not limited to the ones presented in this thesis. More generally, the Fisher kernel is not only unique in many ways but one of the most powerful kernel functions. Therefore we may exploit the Fisher kernel in the future over widely used generative models, such as Boltzmann Machines [Hinton et al., 1984], a particular subset, the Restricted Boltzmann Machines and Deep Belief Networks [Hinton et al., 2006]), Latent Dirichlet Allocation [Blei et al., 2003] or Hidden Markov Models [Baum and Petrie, 1966] to name a few. △ Less

Submitted 14 May, 2017; originally announced May 2017.

Comments: doctoral thesis, 2016

arXiv:1705.02972 [pdf, ps, other]

Why Do Men Get More Attention? Exploring Factors Behind Success in an Online Design Community

Authors: Johannes Wachs, Anikó Hannák, András Vörös, Bálint Daróczy

Abstract: Online platforms are an increasingly popular tool for people to produce, promote or sell their work. However recent studies indicate that social disparities and biases present in the real world might transfer to online platforms and could be exacerbated by seemingly harmless design choices on the site (e.g., recommendation systems or publicly visible success measures). In this paper we analyze an… ▽ More Online platforms are an increasingly popular tool for people to produce, promote or sell their work. However recent studies indicate that social disparities and biases present in the real world might transfer to online platforms and could be exacerbated by seemingly harmless design choices on the site (e.g., recommendation systems or publicly visible success measures). In this paper we analyze an exclusive online community of teams of design professionals called Dribbble and investigate apparent differences in outcomes by gender. Overall, we find that men produce more work, and are able to show it to a larger audience thus receiving more likes. Some of this effect can be explained by the fact that women have different skills and design different images. Most importantly however, women and men position themselves differently in the Dribbble community. Our investigation of users' position in the social network shows that women have more clustered and gender homophilous following relations, which leads them to have smaller and more closely knit social networks. Overall, our study demonstrates that looking behind the apparent patterns of gender inequalities in online markets with the help of social networks and product differentiation helps us to better understand gender differences in success and failure. △ Less

Submitted 8 May, 2017; originally announced May 2017.

Comments: in The International AAAI Conference on Web and Social Media (ICWSM2017), Montreal, May 2017

Journal ref: ICWSM 2017

arXiv:1611.01974 [pdf, other]

Item-to-item recommendation based on Contextual Fisher Information

Authors: Bálint Daróczy, Frederick Ayala-Gómez, András Benczúr

Abstract: Web recommendation services bear great importance in e-commerce, as they aid the user in navigating through the items that are most relevant to her needs. In a typical Web site, long history of previous activities or purchases by the user is rarely available. Hence in most cases, recommenders propose items that are similar to the most recent ones viewed in the current user session. The correspondi… ▽ More Web recommendation services bear great importance in e-commerce, as they aid the user in navigating through the items that are most relevant to her needs. In a typical Web site, long history of previous activities or purchases by the user is rarely available. Hence in most cases, recommenders propose items that are similar to the most recent ones viewed in the current user session. The corresponding task is called session based item-to-item recommendation. For frequent items, it is easy to present item-to-item recommendations by "people who viewed this, also viewed" lists. However, most of the items belong to the long tail, where previous actions are sparsely available. Another difficulty is the so-called cold start problem, when the item has recently appeared and had no time yet to accumulate sufficient number of transactions. In order to recommend a next item in a session in sparse or cold start situations, we also have to incorporate item similarity models. In this paper we describe a probabilistic similarity model based on Random Fields to approximate item-to-item transition probabilities. We give a generative model for the item interactions based on arbitrary distance measures over the items including explicit, implicit ratings and external metadata. The model may change in time to fit better recent events and recommend the next item based on the updated Fisher Information. Our new model outperforms both simple similarity baseline methods and recent item-to-item recommenders, under several different performance metrics and publicly available data sets. We reach significant gains in particular for recommending a new item following a rare item. △ Less

Submitted 8 November, 2016; v1 submitted 7 November, 2016; originally announced November 2016.

Comments: 9 pages, 8 figures, 4 tables

arXiv:1505.03002 [pdf, other]

doi 10.1140/epjb/e2015-60357-1

Statistical analysis of NOMAO customer votes for spots of France

Authors: Robert Palovics, Balint Daroczy, Andras Benczur, Julia Pap, Leonardo Ermann, Samuel Phan, Alexei D. Chepelianskii, Dima L. Shepelyansky

Abstract: We investigate the statistical properties of votes of customers for spots of France collected by the startup company NOMAO. The frequencies of votes per spot and per customer are characterized by a power law distributions which remain stable on a time scale of a decade when the number of votes is varied by almost two orders of magnitude. Using the computer science methods we explore the spectrum a… ▽ More We investigate the statistical properties of votes of customers for spots of France collected by the startup company NOMAO. The frequencies of votes per spot and per customer are characterized by a power law distributions which remain stable on a time scale of a decade when the number of votes is varied by almost two orders of magnitude. Using the computer science methods we explore the spectrum and the eigenvalues of a matrix containing user ratings to geolocalized items. Eigenvalues nicely map to large towns and regions but show certain level of instability as we modify the interpretation of the underlying matrix. We evaluate imputation strategies that provide improved prediction performance by reaching geographically smooth eigenvectors. We point on possible links between distribution of votes and the phenomenon of self-organized criticality. △ Less

Submitted 12 May, 2015; originally announced May 2015.

Comments: 10 pages, 12 figs

Journal ref: Eur. Phys. J. B. v.88, p.194 (2015)

Showing 1–14 of 14 results for author: Daroczy, B