Skip to main content

Showing 1–30 of 30 results for author: Džeroski, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.07522  [pdf, other

    cs.AI cs.LG

    The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

    Authors: Hector Zenil, Jesper Tegnér, Felipe S. Abrahão, Alexander Lavin, Vipin Kumar, Jeremy G. Frey, Adrian Weller, Larisa Soldatova, Alan R. Bundy, Nicholas R. Jennings, Koichi Takahashi, Lawrence Hunter, Saso Dzeroski, Andrew Briggs, Frederick D. Gregory, Carla P. Gomes, Jon Rowe, James Evans, Hiroaki Kitano, Ross King

    Abstract: Recent advances in machine learning and AI, including Generative AI and LLMs, are disrupting technological innovation, product development, and society as a whole. AI's contribution to technology can come from multiple approaches that require access to large training data sets and clear performance evaluation criteria, ranging from pattern recognition and classification to generative models. Yet,… ▽ More

    Submitted 29 August, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

    Comments: 35 pages, first draft of the final report from the Alan Turing Institute on AI for Scientific Discovery

  2. Comparing Algorithm Selection Approaches on Black-Box Optimization Problems

    Authors: Ana Kostovska, Anja Jankovic, Diederick Vermetten, Sašo Džeroski, Tome Eftimov, Carola Doerr

    Abstract: Performance complementarity of solvers available to tackle black-box optimization problems gives rise to the important task of algorithm selection (AS). Automated AS approaches can help replace tedious and labor-intensive manual selection, and have already shown promising performance in various optimization domains. Automated AS relies on machine learning (ML) techniques to recommend the best algo… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: To appear in the Companion Proceedings of GECCO 2023 as poster paper

  3. Algorithm Instance Footprint: Separating Easily Solvable and Challenging Problem Instances

    Authors: Ana Nikolikj, Sašo Džeroski, Mario Andrés Muñoz, Carola Doerr, Peter Korošec, Tome Eftimov

    Abstract: In black-box optimization, it is essential to understand why an algorithm instance works on a set of problem instances while failing on others and provide explanations of its behavior. We propose a methodology for formulating an algorithm instance footprint that consists of a set of problem instances that are easy to be solved and a set of problem instances that are difficult to be solved, for an… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: To appear at GECCO 2023

  4. arXiv:2305.08413  [pdf, other

    cs.CV eess.IV stat.AP

    Artificial intelligence to advance Earth observation: a perspective

    Authors: Devis Tuia, Konrad Schindler, Begüm Demir, Gustau Camps-Valls, Xiao Xiang Zhu, Mrinalini Kochupillai, Sašo Džeroski, Jan N. van Rijn, Holger H. Hoos, Fabio Del Frate, Mihai Datcu, Jorge-Arnulfo Quiané-Ruiz, Volker Markl, Bertrand Le Saux, Rochelle Schneider

    Abstract: Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, and taking the pulse of our planet. This article gives a bird's eye view of the essential scientific tools and approaches informing and supporting the transition from raw EO data to usable EO-based information. The promises, as well as the current challenges of these developments, a… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  5. arXiv:2305.02251  [pdf

    cs.AI cs.LG

    Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems

    Authors: Stefan Kramer, Mattia Cerrato, Sašo Džeroski, Ross King

    Abstract: The paper surveys automated scientific discovery, from equation discovery and symbolic regression to autonomous discovery systems and agents. It discusses the individual approaches from a "big picture" perspective and in context, but also discusses open issues and recent topics like the various roles of deep neural networks in this area, aiding in the discovery of human-interpretable knowledge. Fu… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  6. Efficient Generator of Mathematical Expressions for Symbolic Regression

    Authors: Sebastian Mežnar, Sašo Džeroski, Ljupčo Todorovski

    Abstract: We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE. It combines simple atomic units with shared weights to recursively encode and decode the individual nodes in the hierarchy. Encoding is performed bottom-up and decoding top-down. We empirically show that HVAE can be trained efficiently with small corpora of mathemati… ▽ More

    Submitted 10 September, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: 35 pages, 11 tables, 7 multi-part figures, Machine learning (Springer) and journal track of ECML/PKDD 2023

    ACM Class: I.2.0; I.2.6

    Journal ref: Mach Learn (2023)

  7. arXiv:2301.09876  [pdf, other

    cs.NE cs.AI

    Using Knowledge Graphs for Performance Prediction of Modular Optimization Algorithms

    Authors: Ana Kostovska, Diederick Vermetten, Sašo Džeroski, Panče Panov, Tome Eftimov, Carola Doerr

    Abstract: Empirical data plays an important role in evolutionary computation research. To make better use of the available data, ontologies have been proposed in the literature to organize their storage in a structured way. However, the full potential of these formal methods to capture our domain knowledge has yet to be demonstrated. In this work, we evaluate a performance prediction model built on top of t… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: To appear at EvoApps 2023

  8. arXiv:2211.12757  [pdf, other

    cs.LG cs.AI cs.CY

    FAIRification of MLC data

    Authors: Ana Kostovska, Jasmin Bogatinovski, Andrej Treven, Sašo Džeroski, Dragi Kocev, Panče Panov

    Abstract: The multi-label classification (MLC) task has increasingly been receiving interest from the machine learning (ML) community, as evidenced by the growing number of papers and methods that appear in the literature. Hence, ensuring proper, correct, robust, and trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved by adhering to… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: This paper was accepted ECML PKDD 2022

  9. arXiv:2211.11332  [pdf, other

    cs.AI cs.SE

    OPTION: OPTImization Algorithm Benchmarking ONtology

    Authors: Ana Kostovska, Diederick Vermetten, Carola Doerr, Saso Džeroski, Panče Panov, Tome Eftimov

    Abstract: Many optimization algorithm benchmarking platforms allow users to share their experimental data to promote reproducible and reusable research. However, different platforms use different data models and formats, which drastically complicates the identification of relevant datasets, their interpretation, and their interoperability. Therefore, a semantically rich, ontology-based, machine-readable dat… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  10. arXiv:2211.11227  [pdf, other

    cs.LG

    Explainable Model-specific Algorithm Selection for Multi-Label Classification

    Authors: Ana Kostovska, Carola Doerr, Sašo Džeroski, Dragi Kocev, Panče Panov, Tome Eftimov

    Abstract: Multi-label classification (MLC) is an ML task of predictive modeling in which a data instance can simultaneously belong to multiple classes. MLC is increasingly gaining interest in different application domains such as text mining, computer vision, and bioinformatics. Several MLC algorithms have been proposed in the literature, resulting in a meta-optimization problem that the user needs to addre… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  11. arXiv:2207.09237  [pdf, other

    cs.LG cs.AI

    Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification

    Authors: Jurica Levatić, Michelangelo Ceci, Dragi Kocev, Sašo Džeroski

    Abstract: Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention from the research community, this is not properly investigated for complex prediction tasks with structurally dependent variables. This is the case of multi-lab… ▽ More

    Submitted 30 March, 2024; v1 submitted 19 July, 2022; originally announced July 2022.

  12. arXiv:2204.07431  [pdf, other

    cs.NE cs.LG

    The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants

    Authors: Ana Kostovska, Diederick Vermetten, Sašo Džeroski, Carola Doerr, Peter Korošec, Tome Eftimov

    Abstract: Selecting the most suitable algorithm and determining its hyperparameters for a given optimization problem is a challenging task. Accurately predicting how well a certain algorithm could solve the problem is hence desirable. Recent studies in single-objective numerical optimization show that supervised machine learning methods can predict algorithm performance using landscape features extracted fr… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

  13. Boosting the Performance of Quantum Annealers using Machine Learning

    Authors: Jure Brence, Dragan Mihailović, Viktor Kabanov, Ljupčo Todorovski, Sašo Džeroski, Jaka Vodeb

    Abstract: Noisy intermediate-scale quantum (NISQ) devices are spearheading the second quantum revolution. Of these, quantum annealers are the only ones currently offering real world, commercial applications on as many as 5000 qubits. The size of problems that can be solved by quantum annealers is limited mainly by errors caused by environmental noise and intrinsic imperfections of the processor. We address… ▽ More

    Submitted 7 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

  14. Unsupervised Feature Ranking via Attribute Networks

    Authors: Urh Primožič, Blaž Škrlj, Sašo Džeroski, Matej Petković

    Abstract: The need for learning from unlabeled data is increasing in contemporary machine learning. Methods for unsupervised feature ranking, which identify the most important features in such data are thus gaining attention, and so are their applications in studying high throughput biological experiments or user bases for recommender systems. We propose FRANe (Feature Ranking via Attribute Networks), an un… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: for online material and python package see https://github.com/FRANe-team/FRANe

    Journal ref: Soares C., Torgo L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science, vol 12986. Springer, Cham

  15. arXiv:2108.01407  [pdf, other

    cs.LG

    GalaxAI: Machine learning toolbox for interpretable analysis of spacecraft telemetry data

    Authors: Ana Kostovska, Matej Petković, Tomaž Stepišnik, Luke Lucas, Timothy Finn, José Martínez-Heras, Panče Panov, Sašo Džeroski, Alessandro Donati, Nikola Simidjievski, Dragi Kocev

    Abstract: We present GalaxAI - a versatile machine learning toolbox for efficient and interpretable end-to-end analysis of spacecraft telemetry data. GalaxAI employs various machine learning algorithms for multivariate time series analyses, classification, regression and structured output prediction, capable of handling high-throughput heterogeneous data. These methods allow for the construction of robust a… ▽ More

    Submitted 9 August, 2021; v1 submitted 3 August, 2021; originally announced August 2021.

    Journal ref: 8th IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT 2021)

  16. arXiv:2106.15411  [pdf, other

    cs.LG cs.AI

    Explaining the Performance of Multi-label Classification Methods with Data Set Properties

    Authors: Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, Dragi Kocev

    Abstract: Meta learning generalizes the empirical experience with different learning tasks and holds promise for providing important empirical insight into the behaviour of machine learning algorithms. In this paper, we present a comprehensive meta-learning study of data sets and methods for multi-label classification (MLC). MLC is a practically relevant machine learning task where each example is labelled… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

  17. OPTION: OPTImization Algorithm Benchmarking ONtology

    Authors: Ana Kostovska, Diederick Vermetten, Carola Doerr, Sašo Džeroski, Panče Panov, Tome Eftimov

    Abstract: Many platforms for benchmarking optimization algorithms offer users the possibility of sharing their experimental data with the purpose of promoting reproducible and reusable research. However, different platforms use different data models and formats, which drastically inhibits identification of relevant data sets, their interpretation, and their interoperability. Consequently, a semantically ric… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: To appear in the Proceedings of Genetic and Evolutionary Computation Conference Companion (GECCO 2021), ACM

  18. arXiv:2102.07113  [pdf, other

    cs.LG cs.AI cs.CC

    Comprehensive Comparative Study of Multi-Label Classification Methods

    Authors: Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, Dragi Kocev

    Abstract: Multi-label classification (MLC) has recently received increasing interest from the machine learning community. Several studies provide reviews of methods and datasets for MLC and a few provide empirical comparisons of MLC methods. However, they are limited in the number of methods and datasets considered. This work provides a comprehensive empirical study of a wide range of MLC methods on a pleth… ▽ More

    Submitted 16 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

  19. arXiv:2101.09577  [pdf, other

    cs.LG stat.ML

    ReliefE: Feature Ranking in High-dimensional Spaces via Manifold Embeddings

    Authors: Blaž Škrlj, Sašo Džeroski, Nada Lavrač, Matej Petković

    Abstract: Feature ranking has been widely adopted in machine learning applications such as high-throughput biology and social sciences. The approaches of the popular Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances. Despite their high utility, these algorithms can be computationally expensive and not-well suited for high-dimen… ▽ More

    Submitted 23 January, 2021; originally announced January 2021.

  20. Probabilistic Grammars for Equation Discovery

    Authors: Jure Brence, Ljupčo Todorovski, Sašo Džeroski

    Abstract: Equation discovery, also known as symbolic regression, is a type of automated modeling that discovers scientific laws, expressed in the form of equations, from observed data and expert knowledge. Deterministic grammars, such as context-free grammars, have been used to limit the search spaces in equation discovery by providing hard constraints that specify which equations to consider and which not.… ▽ More

    Submitted 22 March, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: Submitted to Knowledge-Based Systems, Elsevier. 28 pages + 13 pages appendix. 7 figures

    ACM Class: I.2.4; I.2.6; I.1.1; I.1.3; G.3

  21. arXiv:2011.11679  [pdf, other

    cs.LG

    Ensemble- and Distance-Based Feature Ranking for Unsupervised Learning

    Authors: Matej Petković, Dragi Kocev, Blaž Škrlj, Sašo Džeroski

    Abstract: In this work, we propose two novel (groups of) methods for unsupervised feature ranking and selection. The first group includes feature ranking scores (Genie3 score, RandomForest score) that are computed from ensembles of predictive clustering trees. The second method is URelief, the unsupervised extension of the Relief family of feature ranking algorithms. Using 26 benchmark data sets and 5 basel… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

    Comments: 20 pages, 5 figures

  22. arXiv:2008.03937  [pdf, other

    cs.LG stat.ML

    Feature Ranking for Semi-supervised Learning

    Authors: Matej Petković, Sašo Džeroski, Dragi Kocev

    Abstract: The data made available for analysis are becoming more and more complex along several directions: high dimensionality, number of examples and the amount of labels per example. This poses a variety of challenges for the existing machine learning methods: co** with dataset with a large number of examples that are described in a high-dimensional space and not all examples have labels provided. For… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Submitted to: Special Issue on Discovery Science of the Machine Learning Journal

  23. arXiv:2002.04464  [pdf, other

    cs.LG stat.ML

    Feature Importance Estimation with Self-Attention Networks

    Authors: Blaž Škrlj, Sašo Džeroski, Nada Lavrač, Matej Petkovič

    Abstract: Black-box neural network models are widely used in industry and science, yet are hard to understand and interpret. Recently, the attention mechanism was introduced, offering insights into the inner workings of neural language models. This paper explores the use of attention-based neural networks mechanism for estimating feature importance, as means for explaining the models learned from propositio… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: Accepted for publication in ECAI 2020

  24. arXiv:1907.00821  [pdf, other

    cs.LG eess.SY math.DS stat.ML

    Equation Discovery for Nonlinear System Identification

    Authors: Nikola Simidjievski, Ljupčo Todorovski, Juš Kocijan, Sašo Džeroski

    Abstract: Equation discovery methods enable modelers to combine domain-specific knowledge and system identification to construct models most suitable for a selected modeling task. The method described and evaluated in this paper can be used as a nonlinear system identification method for gray-box modeling. It consists of two interlaced parts of modeling that are computer-aided. The first performs computer-a… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  25. arXiv:1906.09088  [pdf, other

    cs.LG math.DS math.OC stat.ML

    Meta-Model Framework for Surrogate-Based Parameter Estimation in Dynamical Systems

    Authors: Žiga Lukšič, Jovan Tanevski, Sašo Džeroski, Ljupčo Todorovski

    Abstract: The central task in modeling complex dynamical systems is parameter estimation. This task involves numerous evaluations of a computationally expensive objective function. Surrogate-based optimization introduces a computationally efficient predictive model that approximates the value of the objective function. The standard approach involves learning a surrogate from training examples that correspon… ▽ More

    Submitted 18 December, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

  26. Machine learning for predicting thermal power consumption of the Mars Express Spacecraft

    Authors: Matej Petković, Redouane Boumghar, Martin Breskvar, Sašo Džeroski, Dragi Kocev, Jurica Levatić, Luke Lucas, Aljaž Osojnik, Bernard Ženko, Nikola Simidjievski

    Abstract: The thermal subsystem of the Mars Express (MEX) spacecraft keeps the on-board equipment within its pre-defined operating temperatures range. To plan and optimize the scientific operations of MEX, its operators need to estimate in advance, as accurately as possible, the power consumption of the thermal subsystem. The remaining power can then be allocated for scientific purposes. We present a machin… ▽ More

    Submitted 16 January, 2019; v1 submitted 3 September, 2018; originally announced September 2018.

    Journal ref: IEEE Aerospace and Electronic Systems Magazine. 34(7), 46-60. (2019)

  27. arXiv:1804.06207  [pdf, other

    cs.LG stat.ML

    MetaBags: Bagged Meta-Decision Trees for Regression

    Authors: Jihed Khiari, Luis Moreira-Matias, Ammar Shaker, Bernard Zenko, Saso Dzeroski

    Abstract: Ensembles are popular methods for solving practical supervised learning problems. They reduce the risk of having underperforming models in production-grade software. Although critical, methods for learning heterogeneous regression ensembles have not been proposed at large scale, whereas in classical ML literature, stacking, cascading and voting are mostly restricted to classification problems. Reg… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

  28. arXiv:1712.03100  [pdf, other

    physics.soc-ph cs.SI nlin.CD

    Decoupling approximation robustly reconstructs directed dynamical networks

    Authors: Nikola Simidjievski, Jovan Tanevski, Bernard Zenko, Zoran Levnajic, Ljupco Todorovski, Saso Dzeroski

    Abstract: Methods for reconstructing the topology of complex networks from time-resolved observations of node dynamics are gaining relevance across scientific disciplines. Of biggest practical interest are methods that make no assumptions about properties of the dynamics, and can cope with noisy, short and incomplete trajectories. Ideal reconstruction in such scenario requires and exhaustive approach of sim… ▽ More

    Submitted 7 November, 2018; v1 submitted 8 December, 2017; originally announced December 2017.

    Journal ref: New J. Phys. 20 (11), 113003 (2018)

  29. arXiv:1702.06831  [pdf

    q-bio.QM cs.AI q-bio.NC

    Using Redescription Mining to Relate Clinical and Biological Characteristics of Cognitively Impaired and Alzheimer's Disease Patients

    Authors: Matej Mihelčić, Goran Šimić, Mirjana Babić Leko, Nada Lavrač, Sašo Džeroski, Tomislav Šmuc

    Abstract: We used redescription mining to find interpretable rules revealing associations between those determinants that provide insights about the Alzheimer's disease (AD). We extended the CLUS-RM redescription mining algorithm to a constraint-based redescription mining (CBRM) setting, which enables several modes of targeted exploration of specific, user-constrained associations. Redescription mining enab… ▽ More

    Submitted 14 November, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

  30. A framework for redescription set construction

    Authors: Matej Mihelčić, Sašo Džeroski, Nada Lavrač, Tomislav Šmuc

    Abstract: Redescription mining is a field of knowledge discovery that aims at finding different descriptions of similar subsets of instances in the data. These descriptions are represented as rules inferred from one or more disjoint sets of attributes, called views. As such, they support knowledge discovery process and help domain experts in formulating new hypotheses or constructing new knowledge bases and… ▽ More

    Submitted 19 December, 2016; v1 submitted 13 June, 2016; originally announced June 2016.