Skip to main content

Showing 1–19 of 19 results for author: Krawczyk, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04002  [pdf, other

    cs.LG

    Continual Learning with Weight Interpolation

    Authors: Jędrzej Kozal, Jan Wasilewski, Bartosz Krawczyk, Michał Woźniak

    Abstract: Continual learning poses a fundamental challenge for modern machine learning systems, requiring models to adapt to new tasks while retaining knowledge from previous ones. Addressing this challenge necessitates the development of efficient algorithms capable of learning from data streams and accumulating knowledge over time. This paper proposes a novel approach to continual learning utilizing the w… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  2. arXiv:2307.04094  [pdf, other

    cs.LG

    Class-Incremental Mixture of Gaussians for Deep Continual Learning

    Authors: Lukasz Korycki, Bartosz Krawczyk

    Abstract: Continual learning models for stationary data focus on learning and retaining concepts coming to them in a sequential manner. In the most generic class-incremental environment, we have to be ready to deal with classes coming one by one, without any higher-level grou**. This requirement invalidates many previously proposed methods and forces researchers to look for more flexible alternative appro… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    ACM Class: I.5.0; I.5.1

  3. arXiv:2212.07743  [pdf, other

    cs.LG

    Interpretable ML for Imbalanced Data

    Authors: Damien A. Dablain, Colin Bellinger, Bartosz Krawczyk, David W. Aha, Nitesh V. Chawla

    Abstract: Deep learning models are being increasingly applied to imbalanced data in high stakes fields such as medicine, autonomous driving, and intelligence analysis. Imbalanced data compounds the black-box nature of deep networks because the relationships between classes may be highly skewed and unclear. This can reduce trust by model users and hamper the progress of developers of imbalanced learning algo… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  4. arXiv:2207.06084  [pdf, other

    cs.LG cs.CY

    Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning

    Authors: Damien Dablain, Bartosz Krawczyk, Nitesh Chawla

    Abstract: Machine learning (ML) is playing an increasingly important role in rendering decisions that affect a broad range of groups in society. ML models inform decisions in criminal justice, the extension of credit in banking, and the hiring practices of corporations. This posits the requirement of model fairness, which holds that automated decisions should be equitable with respect to protected features… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  5. arXiv:2207.06080  [pdf, other

    cs.LG

    Efficient Augmentation for Imbalanced Deep Learning

    Authors: Damien Dablain, Colin Bellinger, Bartosz Krawczyk, Nitesh Chawla

    Abstract: Deep learning models tend to memorize training data, which hurts their ability to generalize to under-represented classes. We empirically study a convolutional neural network's internal representation of imbalanced image data and measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes. This insight enable… ▽ More

    Submitted 17 October, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  6. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

    Authors: Gabriel Aguiar, Bartosz Krawczyk, Alberto Cano

    Abstract: Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures and benchmarks on how to evaluate these algorithms. This work proposes a standardized, exhaustive, and comp… ▽ More

    Submitted 18 July, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Journal ref: Machine Learning, 2023

  7. arXiv:2112.11019  [pdf, other

    cs.LG

    Mining Drifting Data Streams on a Budget: Combining Active Learning with Self-Labeling

    Authors: Łukasz Korycki, Bartosz Krawczyk

    Abstract: Mining data streams poses a number of challenges, including the continuous and non-stationary nature of data, the massive volume of information to be processed and constraints put on the computational resources. While there is a number of supervised solutions proposed for this problem in the literature, most of them assume that access to the ground truth (in form of class labels) is unlimited and… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

    ACM Class: I.5.0; I.5.1

  8. arXiv:2107.14194  [pdf, other

    cs.LG

    On the combined effect of class imbalance and concept complexity in deep learning

    Authors: Kushankur Ghosh, Colin Bellinger, Roberto Corizzo, Bartosz Krawczyk, Nathalie Japkowicz

    Abstract: Structural concept complexity, class overlap, and data scarcity are some of the most important factors influencing the performance of classifiers under class imbalance conditions. When these effects were uncovered in the early 2000s, understandably, the classifiers on which they were demonstrated belonged to the classical rather than Deep Learning categories of approaches. As Deep Learning is gain… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

  9. Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions

    Authors: William C. Sleeman IV, Bartosz Krawczyk

    Abstract: Learning from imbalanced data is among the most challenging areas in contemporary machine learning. This becomes even more difficult when considered the context of big data that calls for dedicated architectures capable of high-performance processing. Apache Spark is a highly efficient and popular architecture, but it poses specific challenges for algorithms to be implemented for it. While oversam… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: 52 pages, 9 tables, 13 figures, 15 algorithms

    ACM Class: I.5.2

    Journal ref: ACM Comput. Surv. (June 2022)

  10. arXiv:2105.02340  [pdf, other

    cs.CV cs.LG

    DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data

    Authors: Damien Dablain, Bartosz Krawczyk, Nitesh V. Chawla

    Abstract: Despite over two decades of progress, imbalanced data is still considered a significant challenge for contemporary machine learning models. Modern advances in deep learning have magnified the importance of the imbalanced data problem. The two main approaches to address this issue are based on loss function modifications and instance resampling. Instance sampling is typically based on Generative Ad… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 14 pages, 9 figures

  11. arXiv:2104.11861  [pdf, other

    cs.LG cs.CV

    Class-Incremental Experience Replay for Continual Learning under Concept Drift

    Authors: Łukasz Korycki, Bartosz Krawczyk

    Abstract: Modern machine learning systems need to be able to cope with constantly arriving and changing data. Two main areas of research dealing with such scenarios are continual learning and data stream mining. Continual learning focuses on accumulating knowledge and avoiding forgetting, assuming information once learned should be stored. Data stream mining focuses on adaptation to concept drift and discar… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021

    ACM Class: I.4.0; I.5.0

  12. arXiv:2104.10228  [pdf, other

    cs.LG

    Concept Drift Detection from Multi-Class Imbalanced Data Streams

    Authors: Łukasz Korycki, Bartosz Krawczyk

    Abstract: Continual learning from data streams is among the most important topics in contemporary machine learning. One of the biggest challenges in this domain lies in creating algorithms that can continuously adapt to arriving data. However, previously learned knowledge may become outdated, as streams evolve over time. This phenomenon is known as concept drift and must be detected to facilitate efficient… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: 37th IEEE International Conference on Data Engineering (ICDE), 2021. arXiv admin note: text overlap with arXiv:2009.09497

    ACM Class: I.5.0

  13. arXiv:2010.07340  [pdf, other

    cs.LG

    Adaptive Deep Forest for Online Learning from Drifting Data Streams

    Authors: Łukasz Korycki, Bartosz Krawczyk

    Abstract: Learning from data streams is among the most vital fields of contemporary data mining. The online analysis of information coming from those potentially unbounded data sources allows for designing reactive up-to-date models capable of adjusting themselves to continuous flows of data. While a plethora of shallow methods have been proposed for simpler low-dimensional streaming problems, almost none o… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 28 pages, 8 figures, 6 tables

    ACM Class: I.5.0; I.2.0

  14. arXiv:2009.09497  [pdf, other

    cs.LG stat.ML

    Adversarial Concept Drift Detection under Poisoning Attacks for Robust Data Stream Mining

    Authors: Łukasz Korycki, Bartosz Krawczyk

    Abstract: Continuous learning from streaming data is among the most challenging topics in the contemporary machine learning. In this domain, learning algorithms must not only be able to handle massive volumes of rapidly arriving data, but also adapt themselves to potential emerging changes. The phenomenon of the evolving nature of data streams is known as concept drift. While there is a plethora of methods… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: 42 pages, 13 figures, 8 tables

    ACM Class: I.5.0; I.2.0

  15. arXiv:2009.09382  [pdf, other

    cs.LG stat.ML

    Instance exploitation for learning temporary concepts from sparsely labeled drifting data streams

    Authors: Łukasz Korycki, Bartosz Krawczyk

    Abstract: Continual learning from streaming data sources becomes more and more popular due to the increasing number of online tools and systems. Dealing with dynamic and everlasting problems poses new challenges for which traditional batch-based offline algorithms turn out to be insufficient in terms of computational time and predictive performance. One of the most crucial limitations is that we cannot assu… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: 35 pages, 17 figures, 8 tables

    ACM Class: I.5.0; I.2.0

  16. arXiv:2004.03406  [pdf, other

    cs.LG stat.ML

    Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced Data with Label Noise

    Authors: Michał Koziarski, Michał Woźniak, Bartosz Krawczyk

    Abstract: The imbalanced data classification is one of the most crucial tasks facing modern data analysis. Especially when combined with other difficulty factors, such as the presence of noise, overlap** class distributions, and small disjuncts, data imbalance can significantly impact the classification performance. Furthermore, some of the data difficulty factors are known to affect the performance of th… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

  17. arXiv:1905.09203  [pdf, other

    cs.CY

    Uniqueness of Medical Data Mining: How the new technologies and data they generate are transforming medicine

    Authors: Krzysztof J. Cios, Bartosz Krawczyk, Jacquelyne Cios, Kevin J. Staley

    Abstract: The paper describes how the new technologies and data they generate are transforming medicine. It stresses the uniqueness of heterogeneous medical data and the ways of dealing with them. It lists different sources that generate big medical data, their security, legal and ethical issues, as well as machine learning/AI methods of dealing with them. A unique feature of the paper is use of case studie… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

  18. arXiv:1811.07155  [pdf, ps, other

    cs.AI cs.LG

    Monotonic classification: an overview on algorithms, performance measures and data sets

    Authors: José-Ramón Cano, Pedro Antonio Gutiérrez, Bartosz Krawczyk, Michał Woźniak, Salvador García

    Abstract: Currently, knowledge discovery in databases is an essential step to identify valid, novel and useful patterns for decision making. There are many real-world scenarios, such as bankruptcy prediction, option pricing or medical diagnosis, where the classification models to be learned need to fulfil restrictions of monotonicity (i.e. the target class label should not decrease when input attributes val… ▽ More

    Submitted 17 November, 2018; originally announced November 2018.

  19. arXiv:1804.00516  [pdf, other

    cs.CV

    Towards Highly Accurate Coral Texture Images Classification Using Deep Convolutional Neural Networks and Data Augmentation

    Authors: Anabel Gómez-Ríos, Siham Tabik, Julián Luengo, ASM Shihavuddin, Bartosz Krawczyk, Francisco Herrera

    Abstract: The recognition of coral species based on underwater texture images pose a significant difficulty for machine learning algorithms, due to the three following challenges embedded in the nature of this data: 1) datasets do not include information about the global structure of the coral; 2) several species of coral have very similar characteristics; and 3) defining the spatial borders between classes… ▽ More

    Submitted 27 March, 2018; originally announced April 2018.

    Comments: 22 pages, 10 figures