-
FOCIL: Finetune-and-Freeze for Online Class Incremental Learning by Training Randomly Pruned Sparse Experts
Authors:
Murat Onur Yildirim,
Elif Ceren Gok Yildirim,
Decebal Constantin Mocanu,
Joaquin Vanschoren
Abstract:
Class incremental learning (CIL) in an online continual learning setting strives to acquire knowledge on a series of novel classes from a data stream, using each data point only once for training. This is more realistic compared to offline modes, where it is assumed that all data from novel class(es) is readily available. Current online CIL approaches store a subset of the previous data which crea…
▽ More
Class incremental learning (CIL) in an online continual learning setting strives to acquire knowledge on a series of novel classes from a data stream, using each data point only once for training. This is more realistic compared to offline modes, where it is assumed that all data from novel class(es) is readily available. Current online CIL approaches store a subset of the previous data which creates heavy overhead costs in terms of both memory and computation, as well as privacy issues. In this paper, we propose a new online CIL approach called FOCIL. It fine-tunes the main architecture continually by training a randomly pruned sparse subnetwork for each task. Then, it freezes the trained connections to prevent forgetting. FOCIL also determines the sparsity level and learning rate per task adaptively and ensures (almost) zero forgetting across all tasks without storing any replay data. Experimental results on 10-Task CIFAR100, 20-Task CIFAR100, and 100-Task TinyImagenet, demonstrate that our method outperforms the SOTA by a large margin. The code is publicly available at https://github.com/muratonuryildirim/FOCIL.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Continual Learning with Dynamic Sparse Training: Exploring Algorithms for Effective Model Updates
Authors:
Murat Onur Yildirim,
Elif Ceren Gok Yildirim,
Ghada Sokar,
Decebal Constantin Mocanu,
Joaquin Vanschoren
Abstract:
Continual learning (CL) refers to the ability of an intelligent system to sequentially acquire and retain knowledge from a stream of data with as little computational overhead as possible. To this end; regularization, replay, architecture, and parameter isolation approaches were introduced to the literature. Parameter isolation using a sparse network which enables to allocate distinct parts of the…
▽ More
Continual learning (CL) refers to the ability of an intelligent system to sequentially acquire and retain knowledge from a stream of data with as little computational overhead as possible. To this end; regularization, replay, architecture, and parameter isolation approaches were introduced to the literature. Parameter isolation using a sparse network which enables to allocate distinct parts of the neural network to different tasks and also allows to share of parameters between tasks if they are similar. Dynamic Sparse Training (DST) is a prominent way to find these sparse networks and isolate them for each task. This paper is the first empirical study investigating the effect of different DST components under the CL paradigm to fill a critical research gap and shed light on the optimal configuration of DST for CL if it exists. Therefore, we perform a comprehensive study in which we investigate various DST components to find the best topology per task on well-known CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup since our primary focus is to evaluate the performance of various DST criteria, rather than the process of mask selection. We found that, at a low sparsity level, Erdos-Rényi Kernel (ERK) initialization utilizes the backbone more efficiently and allows to effectively learn increments of tasks. At a high sparsity level, unless it is extreme, uniform initialization demonstrates a more reliable and robust performance. In terms of growth strategy; performance is dependent on the defined initialization strategy and the extent of sparsity. Finally, adaptivity within DST components is a promising way for better continual learners.
△ Less
Submitted 4 December, 2023; v1 submitted 28 August, 2023;
originally announced August 2023.
-
AdaCL:Adaptive Continual Learning
Authors:
Elif Ceren Gok Yildirim,
Murat Onur Yildirim,
Mert Kilickaya,
Joaquin Vanschoren
Abstract:
Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include regularizing the neural network updates and storing exemplars in memory, which come with hyperparameters such as the learning rate, regularization strength, or the numb…
▽ More
Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include regularizing the neural network updates and storing exemplars in memory, which come with hyperparameters such as the learning rate, regularization strength, or the number of exemplars. However, these hyperparameters are usually only tuned at the start and then kept fixed throughout the learning sessions, ignoring the fact that newly encountered tasks may have varying levels of novelty or difficulty. This study investigates the necessity of hyperparameter `adaptivity' in Class-Incremental Learning: the ability to dynamically adjust hyperparameters such as the learning rate, regularization strength, and memory size according to the properties of the new task at hand. We propose AdaCL, a Bayesian Optimization-based approach to automatically and efficiently determine the optimal values for those parameters with each learning task. We show that adapting hyperpararmeters on each new task leads to improvement in accuracy, forgetting and memory. Code is available at https://github.com/ElifCerenGokYildirim/AdaCL.
△ Less
Submitted 1 July, 2024; v1 submitted 23 March, 2023;
originally announced March 2023.
-
An algorithmic approach based on generating trees for enumerating pattern-avoiding inversion sequences
Authors:
Toufik Mansour,
Gökhan Yıldırım
Abstract:
We introduce an algorithmic approach based on generating tree method for enumerating the inversion sequences with various pattern-avoidance restrictions. For a given set of patterns, we propose an algorithm that outputs either an accurate description of the succession rules of the corresponding generating tree or an ansatz. By using this approach, we determine the generating trees for the pattern-…
▽ More
We introduce an algorithmic approach based on generating tree method for enumerating the inversion sequences with various pattern-avoidance restrictions. For a given set of patterns, we propose an algorithm that outputs either an accurate description of the succession rules of the corresponding generating tree or an ansatz. By using this approach, we determine the generating trees for the pattern-classes $I_n(000, 021), I_n(100, 021)$, $I_n(110, 021), I_n(102, 021)$, $I_n(100,012)$, $I_n(011,201)$, $I_n(011,210)$ and $I_n(120,210)$. Then we use the kernel method, obtain generating functions of each class, and find enumerating formulas. Lin and Yan studied the classification of the Wilf-equivalences for inversion sequences avoiding pairs of length-three patterns and showed that there are 48 Wilf classes among 78 pairs. In this paper, we solve six open cases for such pattern classes.
△ Less
Submitted 27 September, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation
Authors:
Nikolay Jetchev,
Gökhan Yildirim,
Christian Bracher,
Roland Vollgraf
Abstract:
Attention is a general reasoning mechanism than can flexibly deal with image information, but its memory requirements had made it so far impractical for high resolution image generation. We present Grid Partitioned Attention (GPA), a new approximate attention algorithm that leverages a sparse inductive bias for higher computational and memory efficiency in image domains: queries attend only to few…
▽ More
Attention is a general reasoning mechanism than can flexibly deal with image information, but its memory requirements had made it so far impractical for high resolution image generation. We present Grid Partitioned Attention (GPA), a new approximate attention algorithm that leverages a sparse inductive bias for higher computational and memory efficiency in image domains: queries attend only to few keys, spatially close queries attend to close keys due to correlations. Our paper introduces the new attention layer, analyzes its complexity and how the trade-off between memory usage and model power can be tuned by the hyper-parameters.We will show how such attention enables novel deep learning architectures with copying modules that are especially useful for conditional image generation tasks like pose morphing. Our contributions are (i) algorithm and code1of the novel GPA layer, (ii) a novel deep attention-copying architecture, and (iii) new state-of-the art experimental results in human pose morphing generation benchmarks.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Variations on Hammersley's interacting particle process
Authors:
Arda Atalik,
H. S. Melihcan Erol,
Gökhan Yıldırım,
Mustafa Yilmaz
Abstract:
The longest increasing subsequence problem for permutations has been studied extensively in the last fifty years. The interpretation of the longest increasing subsequence as the longest 21-avoiding subsequence in the context of permutation patterns leads to many interesting research directions. We introduce and study the statistical properties of Hammersleytype interacting particle processes relat…
▽ More
The longest increasing subsequence problem for permutations has been studied extensively in the last fifty years. The interpretation of the longest increasing subsequence as the longest 21-avoiding subsequence in the context of permutation patterns leads to many interesting research directions. We introduce and study the statistical properties of Hammersleytype interacting particle processes related to these generalizations and explore the finer structures of their distributions. We also propose three different interacting particle systems in the plane analogous to the Hammersley process in one dimension and obtain estimates for the asymptotic orders of the mean and variance of the number of particles in the systems.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Evaluating Salient Object Detection in Natural Images with Multiple Objects having Multi-level Saliency
Authors:
Gökhan Yildirim,
Debashis Sen,
Mohan Kankanhalli,
Sabine Süsstrunk
Abstract:
Salient object detection is evaluated using binary ground truth with the labels being salient object class and background. In this paper, we corroborate based on three subjective experiments on a novel image dataset that objects in natural images are inherently perceived to have varying levels of importance. Our dataset, named SalMoN (saliency in multi-object natural images), has 588 images contai…
▽ More
Salient object detection is evaluated using binary ground truth with the labels being salient object class and background. In this paper, we corroborate based on three subjective experiments on a novel image dataset that objects in natural images are inherently perceived to have varying levels of importance. Our dataset, named SalMoN (saliency in multi-object natural images), has 588 images containing multiple objects. The subjective experiments performed record spontaneous attention and perception through eye fixation duration, point clicking and rectangle drawing. As object saliency in a multi-object image is inherently multi-level, we propose that salient object detection must be evaluated for the capability to detect all multi-level salient objects apart from the salient object class detection capability. For this purpose, we generate multi-level maps as ground truth corresponding to all the dataset images using the results of the subjective experiments, with the labels being multi-level salient objects and background. We then propose the use of mean absolute error, Kendall's rank correlation and average area under precision-recall curve to evaluate existing salient object detection methods on our multi-level saliency ground truth dataset. Approaches that represent saliency detection on images as local-global hierarchical processing of a graph perform well in our dataset.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
The longest increasing subsequence in involutions avoiding 3412 and another pattern
Authors:
Toufik Mansour,
Reza Rastegar,
Alexander Roitershtein,
Gökhan Yıldırım
Abstract:
In this note, we study the mean length of the longest increasing subsequence of a uniformly sampled involution that avoids the pattern $3412$ and another pattern.
In this note, we study the mean length of the longest increasing subsequence of a uniformly sampled involution that avoids the pattern $3412$ and another pattern.
△ Less
Submitted 13 June, 2020; v1 submitted 27 January, 2020;
originally announced January 2020.
-
Transform the Set: Memory Attentive Generation of Guided and Unguided Image Collages
Authors:
Nikolay Jetchev,
Urs Bergmann,
Gökhan Yildirim
Abstract:
Cutting and pasting image segments feels intuitive: the choice of source templates gives artists flexibility in recombining existing source material. Formally, this process takes an image set as input and outputs a collage of the set elements. Such selection from sets of source templates does not fit easily in classical convolutional neural models requiring inputs of fixed size. Inspired by advanc…
▽ More
Cutting and pasting image segments feels intuitive: the choice of source templates gives artists flexibility in recombining existing source material. Formally, this process takes an image set as input and outputs a collage of the set elements. Such selection from sets of source templates does not fit easily in classical convolutional neural models requiring inputs of fixed size. Inspired by advances in attention and set-input machine learning, we present a novel architecture that can generate in one forward pass image collages of source templates using set-structured representations. This paper has the following contributions: (i) a novel framework for image generation called Memory Attentive Generation of Image Collages (MAGIC) which gives artists new ways to create digital collages; (ii) from the machine-learning perspective, we show a novel Generative Adversarial Networks (GAN) architecture that uses Set-Transformer layers and set-pooling to blend sets of random image samples - a hybrid non-parametric approach.
△ Less
Submitted 28 November, 2019; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Generating High-Resolution Fashion Model Images Wearing Custom Outfits
Authors:
Gökhan Yildirim,
Nikolay Jetchev,
Roland Vollgraf,
Urs Bergmann
Abstract:
Visualizing an outfit is an essential part of shop** for clothes. Due to the combinatorial aspect of combining fashion articles, the available images are limited to a pre-determined set of outfits. In this paper, we broaden these visualizations by generating high-resolution images of fashion models wearing a custom outfit under an input body pose. We show that our approach can not only transfer…
▽ More
Visualizing an outfit is an essential part of shop** for clothes. Due to the combinatorial aspect of combining fashion articles, the available images are limited to a pre-determined set of outfits. In this paper, we broaden these visualizations by generating high-resolution images of fashion models wearing a custom outfit under an input body pose. We show that our approach can not only transfer the style and the pose of one generated outfit to another, but also create realistic images of human bodies and garments.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Image Stylization
Authors:
Nikolay Jetchev,
Urs Bergmann,
Gokhan Yildirim
Abstract:
Parametric generative deep models are state-of-the-art for photo and non-photo realistic image stylization. However, learning complicated image representations requires compute-intense models parametrized by a huge number of weights, which in turn requires large datasets to make learning successful. Non-parametric exemplar-based generation is a technique that works well to reproduce style from sma…
▽ More
Parametric generative deep models are state-of-the-art for photo and non-photo realistic image stylization. However, learning complicated image representations requires compute-intense models parametrized by a huge number of weights, which in turn requires large datasets to make learning successful. Non-parametric exemplar-based generation is a technique that works well to reproduce style from small datasets, but is also compute-intensive. These aspects are a drawback for the practice of digital AI artists: typically one wants to use a small set of stylization images, and needs a fast flexible model in order to experiment with it. With this motivation, our work has these contributions: (i) a novel stylization method called Fully Adversarial Mosaics (FAMOS) that combines the strengths of both parametric and non-parametric approaches; (ii) multiple ablations and image examples that analyze the method and show its capabilities; (iii) source code that will empower artists and machine learning researchers to use and modify FAMOS.
△ Less
Submitted 22 November, 2018;
originally announced November 2018.
-
Permutations avoiding 312 and another pattern, Chebyshev polynomials and longest increasing subsequences
Authors:
Toufik Mansour,
Gökhan Yıldırım
Abstract:
We study the longest increasing subsequence problem for random permutations avoiding the pattern $312$ and another pattern $τ$ under the uniform probability distribution. We determine the exact and asymptotic formulas for the average length of the longest increasing subsequences for such permutation classes specifically when the pattern $τ$ is monotone increasing or decreasing, or any pattern of l…
▽ More
We study the longest increasing subsequence problem for random permutations avoiding the pattern $312$ and another pattern $τ$ under the uniform probability distribution. We determine the exact and asymptotic formulas for the average length of the longest increasing subsequences for such permutation classes specifically when the pattern $τ$ is monotone increasing or decreasing, or any pattern of length four.
△ Less
Submitted 24 January, 2020; v1 submitted 16 August, 2018;
originally announced August 2018.
-
Enumerations of bargraphs with respect to corner statistics
Authors:
Toufik Mansour,
Gökhan Yıldırım
Abstract:
We study the enumeration of bargraphs with respect to some corner statistics. We find generating functions for the number of bargraphs that tracks the corner statistics of interest, the number of cells, and the number of columns. The bargraph representation of set partitions is also considered and some explicit formulas are obtained for the number of some specific types of corners in such represen…
▽ More
We study the enumeration of bargraphs with respect to some corner statistics. We find generating functions for the number of bargraphs that tracks the corner statistics of interest, the number of cells, and the number of columns. The bargraph representation of set partitions is also considered and some explicit formulas are obtained for the number of some specific types of corners in such representations.
△ Less
Submitted 30 January, 2021; v1 submitted 5 August, 2018;
originally announced August 2018.
-
Disentangling Multiple Conditional Inputs in GANs
Authors:
Gökhan Yildirim,
Calvin Seward,
Urs Bergmann
Abstract:
In this paper, we propose a method that disentangles the effects of multiple input conditions in Generative Adversarial Networks (GANs). In particular, we demonstrate our method in controlling color, texture, and shape of a generated garment image for computer-aided fashion design. To disentangle the effect of input attributes, we customize conditional GANs with consistency loss functions. In our…
▽ More
In this paper, we propose a method that disentangles the effects of multiple input conditions in Generative Adversarial Networks (GANs). In particular, we demonstrate our method in controlling color, texture, and shape of a generated garment image for computer-aided fashion design. To disentangle the effect of input attributes, we customize conditional GANs with consistency loss functions. In our experiments, we tune one input at a time and show that we can guide our network to generate novel and realistic images of clothing articles. In addition, we present a fashion design process that estimates the input attributes of an existing garment and modifies them using our generator.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
Running genetic algorithms on Hadoop for solving high dimensional optimization problems
Authors:
Güngör Yildirim,
İbrahim R Hallac,
Galip Aydin,
Yetkin Tatar
Abstract:
Hadoop is a popular MapReduce framework for develo** parallel applications in distributed environments. Several advantages of MapReduce such as programming ease and ability to use commodity hardware make the applicability of soft computing methods for parallel and distributed systems easier than before. In this paper, we present the results of an experimental study on running soft computing algo…
▽ More
Hadoop is a popular MapReduce framework for develo** parallel applications in distributed environments. Several advantages of MapReduce such as programming ease and ability to use commodity hardware make the applicability of soft computing methods for parallel and distributed systems easier than before. In this paper, we present the results of an experimental study on running soft computing algorithms using Hadoop. This study shows how a simple genetic algorithm running on Hadoop can be used to produce solutions for high dimensional optimization problems. In addition, a simple but effective technique, which did not need MapReduce chains, has been proposed.
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Directed polymers on a disordered tree with a defect subtree
Authors:
Neal Madras,
Gökhan Yıldırım
Abstract:
We study the question of how the competition between $\textit{bulk disorder}$ and a $\textit{localized microscopic defect}$ affects the macroscopic behavior of a system in the directed polymer context at the free energy level. We consider the directed polymer model on a disordered $d$-ary tree and represent the localized microscopic defect by modifying the disorder distribution at each vertex in a…
▽ More
We study the question of how the competition between $\textit{bulk disorder}$ and a $\textit{localized microscopic defect}$ affects the macroscopic behavior of a system in the directed polymer context at the free energy level. We consider the directed polymer model on a disordered $d$-ary tree and represent the localized microscopic defect by modifying the disorder distribution at each vertex in a single path (branch), or in a subtree, of the tree. The polymer must choose between following the microscopic defect and finding the best branches through the bulk disorder. We describe three possible phases, called the $\textit{fully pinned, partially pinned}$ and $\textit{depinned}$ phases. When the microscopic defect is associated only with a single branch, we compute the free energy and the critical curve of the model, and show that the partially pinned phase does not occur. When the localized microscopic defect is associated with a non-disordered regular subtree of the disordered tree, the picture is more complicated. We prove that all three phases are non-empty below a critical temperature, and that the partially pinned phase disappears above the critical temperature.
△ Less
Submitted 1 March, 2018; v1 submitted 24 December, 2017;
originally announced December 2017.
-
Longest monotone subsequences and rare regions of pattern-avoiding permutations
Authors:
Neal Madras,
Gökhan Yıldırım
Abstract:
We consider the distributions of the lengths of the longest monotone and alternating subsequences in classes of permutations of size $n$ that avoid a specific pattern or set of patterns, with respect to the uniform distribution on each such class. We obtain exact results for any class that avoids two patterns of length 3, as well as results for some classes that avoid one pattern of length 4 or mo…
▽ More
We consider the distributions of the lengths of the longest monotone and alternating subsequences in classes of permutations of size $n$ that avoid a specific pattern or set of patterns, with respect to the uniform distribution on each such class. We obtain exact results for any class that avoids two patterns of length 3, as well as results for some classes that avoid one pattern of length 4 or more. In our results, the longest monotone subsequences have expected length proportional to $n$ for pattern-avoiding classes, in contrast with the $\sqrt n$ behaviour that holds for unrestricted permutations.
In addition, for a pattern $τ$ of length $k$, we scale the plot of a random $τ$-avoiding permutation down to the unit square and study the "rare region," which is the part of the square that is exponentially unlikely to contain any points. We prove that when $τ_1>τ_k$, the complement of the rare region is a closed set that contains the main diagonal of the unit square. For the case $τ_1=k,$ we also show that the lower boundary of the part of the rare region above the main diagonal is a curve that is Lipschitz continuous and strictly increasing on $[0,1]$.
△ Less
Submitted 20 June, 2017; v1 submitted 22 August, 2016;
originally announced August 2016.
-
Directed polymers in a random environment with a defect line
Authors:
Kenneth S. Alexander,
Gökhan Yıldırım
Abstract:
We study the depinning transition of the $1+1$ dimensional directed polymer in a random environment with a defect line. The random environment consists of i.i.d. potential values assigned to each site of $\mathbb{Z}^2$; sites on the positive axis have the potential enhanced by a deterministic value $u$. We show that for small inverse temperature $β$ the quenched and annealed free energies differ s…
▽ More
We study the depinning transition of the $1+1$ dimensional directed polymer in a random environment with a defect line. The random environment consists of i.i.d. potential values assigned to each site of $\mathbb{Z}^2$; sites on the positive axis have the potential enhanced by a deterministic value $u$. We show that for small inverse temperature $β$ the quenched and annealed free energies differ significantly at most in a small neighborhood (of size of order $β$) of the annealed critical point $u_c^a=0$. For the case $u=0$, we show that the difference between quenched and annealed free energies is of order $β^4$ as $β\to 0$, assuming only finiteness of exponential moments of the potential values, improving existing results which required stronger assumptions.
△ Less
Submitted 20 January, 2015; v1 submitted 26 February, 2014;
originally announced February 2014.
-
Solution of spin-boson systems in one and two-dimensional geometry via the asymptotic iteration method
Authors:
R. Koc,
O. Ozer,
H. Tutunculer,
R. G. Yildirim
Abstract:
We consider solutions of the $2\times 2$ matrix Hamiltonian of physical systems within the context of the asymptotic iteration method. Our technique is based on transformation of the associated Hamiltonian in the form of the first order coupled differential equations. We construct a general matrix Hamiltonian which includes a wide class of physical models. The systematic study presented here rep…
▽ More
We consider solutions of the $2\times 2$ matrix Hamiltonian of physical systems within the context of the asymptotic iteration method. Our technique is based on transformation of the associated Hamiltonian in the form of the first order coupled differential equations. We construct a general matrix Hamiltonian which includes a wide class of physical models. The systematic study presented here reproduces a number of earlier results in a natural way as well as leading to new findings. Possible generalizations of the method are also suggested.
△ Less
Submitted 28 November, 2007; v1 submitted 27 November, 2007;
originally announced November 2007.