Search | arXiv e-print repository

FOCIL: Finetune-and-Freeze for Online Class Incremental Learning by Training Randomly Pruned Sparse Experts

Authors: Murat Onur Yildirim, Elif Ceren Gok Yildirim, Decebal Constantin Mocanu, Joaquin Vanschoren

Abstract: Class incremental learning (CIL) in an online continual learning setting strives to acquire knowledge on a series of novel classes from a data stream, using each data point only once for training. This is more realistic compared to offline modes, where it is assumed that all data from novel class(es) is readily available. Current online CIL approaches store a subset of the previous data which crea… ▽ More Class incremental learning (CIL) in an online continual learning setting strives to acquire knowledge on a series of novel classes from a data stream, using each data point only once for training. This is more realistic compared to offline modes, where it is assumed that all data from novel class(es) is readily available. Current online CIL approaches store a subset of the previous data which creates heavy overhead costs in terms of both memory and computation, as well as privacy issues. In this paper, we propose a new online CIL approach called FOCIL. It fine-tunes the main architecture continually by training a randomly pruned sparse subnetwork for each task. Then, it freezes the trained connections to prevent forgetting. FOCIL also determines the sparsity level and learning rate per task adaptively and ensures (almost) zero forgetting across all tasks without storing any replay data. Experimental results on 10-Task CIFAR100, 20-Task CIFAR100, and 100-Task TinyImagenet, demonstrate that our method outperforms the SOTA by a large margin. The code is publicly available at https://github.com/muratonuryildirim/FOCIL. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2309.07873 [pdf, other]

Optimally Controlling the Timing of Energy Transfer in Elastic Joints: Experimental Validation of the Bi-Stiffness Actuation Concept

Authors: Edmundo Pozo Fortunić, Mehmet C. Yildirim, Dennis Ossadnik, Abdalla Swikir, Saeed Abdolshah, Sami Haddadin

Abstract: Elastic actuation taps into elastic elements' energy storage for dynamic motions beyond rigid actuation. While Series Elastic Actuators (SEA) and Variable Stiffness Actuators (VSA) are highly sophisticated, they do not fully provide control over energy transfer timing. To overcome this problem on the basic system level, the Bi-Stiffness Actuation (BSA) concept was recently proposed. Theoretically,… ▽ More Elastic actuation taps into elastic elements' energy storage for dynamic motions beyond rigid actuation. While Series Elastic Actuators (SEA) and Variable Stiffness Actuators (VSA) are highly sophisticated, they do not fully provide control over energy transfer timing. To overcome this problem on the basic system level, the Bi-Stiffness Actuation (BSA) concept was recently proposed. Theoretically, it allows for full link decoupling, while simultaneously being able to lock the spring in the drive train via a switch-and-hold mechanism. Thus, the user would be in full control of the potential energy storage and release timing. In this work, we introduce an initial proof-of-concept of Bi-Stiffness-Actuation in the form of a 1-DoF physical prototype, which is implemented using a modular testbed. We present a hybrid system model, as well as the mechatronic implementation of the actuator. We corroborate the feasibility of the concept by conducting a series of hardware experiments using an open-loop control signal obtained by trajectory optimization. Here, we compare the performance of the prototype with a comparable SEA implementation. We show that BSA outperforms SEA 1) in terms of maximum velocity at low final times and 2) in terms of the movement strategy itself: The clutch mechanism allows the BSA to generate consistent launch sequences while the SEA has to rely on lengthy and possibly dangerous oscillatory swing-up motions. Furthermore, we demonstrate that providing full control authority over the energy transfer timing and link decoupling allows the user to synchronously release both elastic joint and gravitational energy. This facilitates the optimal exploitation of elastic and gravitational potentials in a synergistic manner. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 8 pages, 9 figures. Submitted to IEEE Robotics and Automation Letters

arXiv:2308.14831 [pdf, other]

Continual Learning with Dynamic Sparse Training: Exploring Algorithms for Effective Model Updates

Authors: Murat Onur Yildirim, Elif Ceren Gok Yildirim, Ghada Sokar, Decebal Constantin Mocanu, Joaquin Vanschoren

Abstract: Continual learning (CL) refers to the ability of an intelligent system to sequentially acquire and retain knowledge from a stream of data with as little computational overhead as possible. To this end; regularization, replay, architecture, and parameter isolation approaches were introduced to the literature. Parameter isolation using a sparse network which enables to allocate distinct parts of the… ▽ More Continual learning (CL) refers to the ability of an intelligent system to sequentially acquire and retain knowledge from a stream of data with as little computational overhead as possible. To this end; regularization, replay, architecture, and parameter isolation approaches were introduced to the literature. Parameter isolation using a sparse network which enables to allocate distinct parts of the neural network to different tasks and also allows to share of parameters between tasks if they are similar. Dynamic Sparse Training (DST) is a prominent way to find these sparse networks and isolate them for each task. This paper is the first empirical study investigating the effect of different DST components under the CL paradigm to fill a critical research gap and shed light on the optimal configuration of DST for CL if it exists. Therefore, we perform a comprehensive study in which we investigate various DST components to find the best topology per task on well-known CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup since our primary focus is to evaluate the performance of various DST criteria, rather than the process of mask selection. We found that, at a low sparsity level, Erdos-Rényi Kernel (ERK) initialization utilizes the backbone more efficiently and allows to effectively learn increments of tasks. At a high sparsity level, unless it is extreme, uniform initialization demonstrates a more reliable and robust performance. In terms of growth strategy; performance is dependent on the defined initialization strategy and the extent of sparsity. Finally, adaptivity within DST components is a promising way for better continual learners. △ Less

Submitted 4 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.08291 [pdf, other]

Robust Bayesian Satisficing

Authors: Artun Saday, Yaşar Cahit Yıldırım, Cem Tekin

Abstract: Distributional shifts pose a significant challenge to achieving robustness in contemporary machine learning. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true an… ▽ More Distributional shifts pose a significant challenge to achieving robustness in contemporary machine learning. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2303.13113 [pdf, other]

AdaCL:Adaptive Continual Learning

Authors: Elif Ceren Gok Yildirim, Murat Onur Yildirim, Mert Kilickaya, Joaquin Vanschoren

Abstract: Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include regularizing the neural network updates and storing exemplars in memory, which come with hyperparameters such as the learning rate, regularization strength, or the numb… ▽ More Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include regularizing the neural network updates and storing exemplars in memory, which come with hyperparameters such as the learning rate, regularization strength, or the number of exemplars. However, these hyperparameters are usually only tuned at the start and then kept fixed throughout the learning sessions, ignoring the fact that newly encountered tasks may have varying levels of novelty or difficulty. This study investigates the necessity of hyperparameter `adaptivity' in Class-Incremental Learning: the ability to dynamically adjust hyperparameters such as the learning rate, regularization strength, and memory size according to the properties of the new task at hand. We propose AdaCL, a Bayesian Optimization-based approach to automatically and efficiently determine the optimal values for those parameters with each learning task. We show that adapting hyperpararmeters on each new task leads to improvement in accuracy, forgetting and memory. Code is available at https://github.com/ElifCerenGokYildirim/AdaCL. △ Less

Submitted 1 July, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Published in 1st ContinualAI Unconference

arXiv:2212.14741 [pdf, other]

doi 10.1109/IROS47612.2022.9981928

BSA -- Bi-Stiffness Actuation for optimally exploiting intrinsic compliance and inertial coupling effects in elastic joint robots

Authors: Dennis Ossadnik, Mehmet C. Yildirim, Fan Wu, Abdalla Swikir, Hugo T. M. Kussaba, Saeed Abdolshah, Sami Haddadin

Abstract: Compliance in actuation has been exploited to generate highly dynamic maneuvers such as throwing that take advantage of the potential energy stored in joint springs. However, the energy storage and release could not be well-timed yet. On the contrary, for multi-link systems, the natural system dynamics might even work against the actual goal. With the introduction of variable stiffness actuators,… ▽ More Compliance in actuation has been exploited to generate highly dynamic maneuvers such as throwing that take advantage of the potential energy stored in joint springs. However, the energy storage and release could not be well-timed yet. On the contrary, for multi-link systems, the natural system dynamics might even work against the actual goal. With the introduction of variable stiffness actuators, this problem has been partially addressed. With a suitable optimal control strategy, the approximate decoupling of the motor from the link can be achieved to maximize the energy transfer into the distal link prior to launch. However, such continuous stiffness variation is complex and typically leads to oscillatory swing-up motions instead of clear launch sequences. To circumvent this issue, we investigate decoupling for speed maximization with a dedicated novel actuator concept denoted Bi-Stiffness Actuation. With this, it is possible to fully decouple the link from the joint mechanism by a switch-and-hold clutch and simultaneously keep the elastic energy stored. We show that with this novel paradigm, it is not only possible to reach the same optimal performance as with power-equivalent variable stiffness actuation, but even directly control the energy transfer timing. This is a major step forward compared to previous optimal control approaches, which rely on optimizing the full time-series control input. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Comments: Accepted version of article that has been published in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Journal ref: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3536-3543

arXiv:2110.11446 [pdf, other]

ML with HE: Privacy Preserving Machine Learning Inferences for Genome Studies

Authors: Ş. S. Mağara, C. Yıldırım, F. Yaman, B. Dilekoğlu, F. R. Tutaş, E. Öztürk, K. Kaya, Ö. Taştan, E. Savaş

Abstract: Preserving the privacy and security of big data in the context of cloud computing, while maintaining a certain level of efficiency of its processing remains to be a subject, open for improvement. One of the most popular applications epitomizing said concerns is found to be useful in genome analysis. This work proposes a secure multi-label tumor classification method using homomorphic encryption, w… ▽ More Preserving the privacy and security of big data in the context of cloud computing, while maintaining a certain level of efficiency of its processing remains to be a subject, open for improvement. One of the most popular applications epitomizing said concerns is found to be useful in genome analysis. This work proposes a secure multi-label tumor classification method using homomorphic encryption, whereby two different machine learning algorithms, SVM and XGBoost, are used to classify the encrypted genome data of different tumor types. △ Less

Submitted 1 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2105.10756 [pdf]

doi 10.1109/AIVR50618.2020.00035

The Efficacy of a Virtual Reality-Based Mindfulness Intervention

Authors: Caglar Yildirim, Tara OGrady

Abstract: Mindfulness can be defined as increased awareness of and sustained attentiveness to the present moment. Recently, there has been a growing interest in the applications of mindfulness for empirical research in wellbeing and the use of virtual reality (VR) environments and 3D interfaces as a conduit for mindfulness training. Accordingly, the current experiment investigated whether a brief VR-based m… ▽ More Mindfulness can be defined as increased awareness of and sustained attentiveness to the present moment. Recently, there has been a growing interest in the applications of mindfulness for empirical research in wellbeing and the use of virtual reality (VR) environments and 3D interfaces as a conduit for mindfulness training. Accordingly, the current experiment investigated whether a brief VR-based mindfulness intervention could induce a greater level of state mindfulness, when compared to an audio-based intervention and control group. Results indicated two mindfulness interventions, VR-based and audio-based, induced a greater state of mindfulness, compared to the control group. Participants in the VR-based mindfulness intervention group reported a greater state of mindfulness than those in the guided audio group, indicating the immersive mindfulness intervention was more robust. Collectively, these results provide empirical support for the efficaciousness of a brief VR-based mindfulness intervention in inducing a robust state of mindfulness in laboratory settings. △ Less

Submitted 22 May, 2021; originally announced May 2021.

Comments: 2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR)

arXiv:2105.10754 [pdf]

doi 10.1109/GEM.2019.8811554

Effects of VR Gaming and Game Genre on Player Experience

Authors: Michael Carroll, Ethan Osborne, Caglar Yildirim

Abstract: With the increasing availability of modern virtual reality (VR) headsets, the use and applications of VR technology for gaming purposes have become more pervasive than ever. Despite the growing popularity of VR gaming, user studies into how it might affect the player experience (PX) during the gameplay are scarce. Accordingly, the current study investigated the effects of VR gaming and game genre… ▽ More With the increasing availability of modern virtual reality (VR) headsets, the use and applications of VR technology for gaming purposes have become more pervasive than ever. Despite the growing popularity of VR gaming, user studies into how it might affect the player experience (PX) during the gameplay are scarce. Accordingly, the current study investigated the effects of VR gaming and game genre on PX. We compared PX metrics for two game genres, strategy (more interactive) and racing (less interactive), across two gaming platforms, VR and traditional desktop gaming. Participants were randomly assigned to one of the gaming platforms, played both a strategy and racing game on their corresponding platform, and provided PX ratings. Results revealed that, regardless of the game genre, participants in the VR gaming condition experienced a greater level of sense of presence than did those in the desktop gaming condition. That said, results showed that the two gaming platforms did not significantly differ from one another in PX ratings. As for the effect of game genre, participants provided greater PX ratings for the strategy game than for the racing game, regardless of whether the game was played on a VR headset or desktop computer. Collectively, these results indicate that although VR gaming affords a greater sense of presence in the game environment, this increase in presence does not seem to translate into a more satisfactory PX when playing either a strategy or racing game. △ Less

Submitted 22 May, 2021; originally announced May 2021.

Comments: 2019 IEEE Games, Entertainment, Media Conference (GEM)

arXiv:2012.00855 [pdf]

A Review of Deep Learning Approaches to EEG-Based Classification of Cybersickness in Virtual Reality

Authors: Caglar Yildirim

Abstract: Cybersickness is an unpleasant side effect of exposure to a virtual reality (VR) experience and refers to such physiological repercussions as nausea and dizziness triggered in response to VR exposure. Given the debilitating effect of cybersickness on the user experience in VR, academic interest in the automatic detection of cybersickness from physiological measurements has crested in recent years.… ▽ More Cybersickness is an unpleasant side effect of exposure to a virtual reality (VR) experience and refers to such physiological repercussions as nausea and dizziness triggered in response to VR exposure. Given the debilitating effect of cybersickness on the user experience in VR, academic interest in the automatic detection of cybersickness from physiological measurements has crested in recent years. Electroencephalography (EEG) has been extensively used to capture changes in electrical activity in the brain and to automatically classify cybersickness from brainwaves using a variety of machine learning algorithms. Recent advances in deep learning (DL) algorithms and increasing availability of computational resources for DL have paved the way for a new area of research into the application of DL frameworks to EEG-based detection of cybersickness. Accordingly, this review involved a systematic review of the peer-reviewed papers concerned with the application of DL frameworks to the classification of cybersickness from EEG signals. The relevant literature was identified through exhaustive database searches, and the papers were scrutinized with respect to experimental protocols for data collection, data preprocessing, and DL architectures. The review revealed a limited number of studies in this nascent area of research and showed that the DL frameworks reported in these studies (i.e., DNN, CNN, and RNN) could classify cybersickness with an average accuracy rate of 93%. This review provides a summary of the trends and issues in the application of DL frameworks to the EEG-based detection of cybersickness, with some guidelines for future research. △ Less

Submitted 1 December, 2020; originally announced December 2020.

Comments: 7 pages

arXiv:2005.06013 [pdf]

Two Dimensions for Organizing Immersive Analytics: Toward a Taxonomy for Facet and Position

Authors: David Saffo, Sara Di Bartolomeo, Caglar Yildirim, Cody Dunne

Abstract: As immersive analytics continues to grow as a discipline, so too should its underlying methodological support. Taxonomies play an important role for information visualization and human computer interaction. They provide an organization of the techniques used in a particular domain that better enable researchers to describe their work, discover existing methods, and identify gaps in the literature.… ▽ More As immersive analytics continues to grow as a discipline, so too should its underlying methodological support. Taxonomies play an important role for information visualization and human computer interaction. They provide an organization of the techniques used in a particular domain that better enable researchers to describe their work, discover existing methods, and identify gaps in the literature. Existing taxonomies in related fields do not capture or describe the unique paradigms employed in immersive analytics. We conceptualize a taxonomy that organizes immersive analytics according to two dimensions: spatial and visual presentation. Each intersection of this taxonomy represents a unique design paradigm which, when thoroughly explored, can aid in the design and research of new immersive analytic applications. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: Immersive Analytics Workshop CHI 2020

arXiv:1409.5834 [pdf, other]

Tight Error Bounds for Structured Prediction

Authors: Amir Globerson, Tim Roughgarden, David Sontag, Cafer Yildirim

Abstract: Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is typically done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. Intuitively, the more pairwise terms are used, the better the expected accuracy. However, there is currently no theoretical account… ▽ More Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is typically done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. Intuitively, the more pairwise terms are used, the better the expected accuracy. However, there is currently no theoretical account of this intuition. This paper takes a significant step in this direction. We formulate the problem as classifying the vertices of a known graph $G=(V,E)$, where the vertices and edges of the graph are labelled and correlate semi-randomly with the ground truth. We show that the prospects for achieving low expected Hamming error depend on the structure of the graph $G$ in interesting ways. For example, if $G$ is a very poor expander, like a path, then large expected Hamming error is inevitable. Our main positive result shows that, for a wide class of graphs including 2D grid graphs common in machine vision applications, there is a polynomial-time algorithm with small and information-theoretically near-optimal expected error. Our results provide a first step toward a theoretical justification for the empirical success of the efficient approximate inference algorithms that are used for structured prediction in models where exact inference is intractable. △ Less

Submitted 19 September, 2014; originally announced September 2014.

Showing 1–12 of 12 results for author: Yıldırım, C