Skip to main content

Showing 1–9 of 9 results for author: Junior, S B

.
  1. arXiv:2406.06596  [pdf, other

    cs.CL cs.AI cs.DB

    Are Large Language Models the New Interface for Data Pipelines?

    Authors: Sylvio Barbon Junior, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool, Florence Sèdes, Soror Sahri, Maurice Van Keulen

    Abstract: A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency and coherence, making them valuable for a wide range of data-related tasks fashioned as pipelines. The capabilities of LLMs in natural language underst… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2404.02942  [pdf, other

    cs.LG cs.AI

    Decision Predicate Graphs: Enhancing Interpretability in Tree Ensembles

    Authors: Leonardo Arrighi, Luca Pennella, Gabriel Marques Tavares, Sylvio Barbon Junior

    Abstract: Understanding the decisions of tree-based ensembles and their relationships is pivotal for machine learning model interpretation. Recent attempts to mitigate the human-in-the-loop interpretation challenge have explored the extraction of the decision structure underlying the model taking advantage of graph simplification and path emphasis. However, while these efforts enhance the visualisation expe… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  3. Tailoring Machine Learning for Process Mining

    Authors: Paolo Ceravolo, Sylvio Barbon Junior, Ernesto Damiani, Wil van der Aalst

    Abstract: Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observe… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

    Comments: 16 pages

    MSC Class: 68 ACM Class: I.2.6

  4. arXiv:2303.17879  [pdf, other

    cs.AI cs.LG

    CoSMo: a Framework to Instantiate Conditioned Process Simulation Models

    Authors: Rafael S. Oyamada, Gabriel M. Tavares, Sylvio Barbon Junior, Paolo Ceravolo

    Abstract: Process simulation is gaining attention for its ability to assess potential performance improvements and risks associated with business process changes. The existing literature presents various techniques, generally grounded in process models discovered from event log data or built upon deep learning algorithms. These techniques have specific strengths and limitations. Traditional data-driven appr… ▽ More

    Submitted 25 June, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

  5. arXiv:2206.08537  [pdf, ps, other

    cs.CV cs.LG

    Large-Margin Representation Learning for Texture Classification

    Authors: Jonathan de Matos, Luiz Eduardo Soares de Oliveira, Alceu de Souza Britto Junior, Alessandro Lameiras Koerich

    Abstract: This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The core of such an approach is a loss function that computes the distances between instances of interest and support vectors. The objective is to update the weights of CLs iteratively to learn a representation with… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: 7 pages

  6. arXiv:2008.00025  [pdf, other

    cs.LG stat.ML

    Rethinking Default Values: a Low Cost and Efficient Strategy to Define Hyperparameters

    Authors: Rafael Gomes Mantovani, André Luis Debiaso Rossi, Edesio Alcobaça, Jadson Castro Gertrudes, Sylvio Barbon Junior, André Carlos Ponce de Leon Ferreira de Carvalho

    Abstract: Machine Learning (ML) algorithms have been increasingly applied to problems from several different areas. Despite their growing popularity, their predictive performance is usually affected by the values assigned to their hyperparameters (HPs). As consequence, researchers and practitioners face the challenge of how to set these values. Many users have limited knowledge about ML algorithms and the e… ▽ More

    Submitted 8 July, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

    Comments: 44 pages, 13 figures

  7. arXiv:1907.09404  [pdf, other

    cs.CV cs.LG cs.MM

    Deep Learning Approaches for Image Retrieval and Pattern Spotting in Ancient Documents

    Authors: Kelly Lais Wiggers, Alceu de Souza Britto Junior, Alessandro Lameiras Koerich, Laurent Heutte, Luiz Eduardo Soares de Oliveira

    Abstract: This paper describes two approaches for content-based image retrieval and pattern spotting in document images using deep learning. The first approach uses a pre-trained CNN model to cope with the lack of training data, which is fine-tuned to achieve a compact yet discriminant representation of queries and image candidates. The second approach uses a Siamese Convolution Neural Network trained on a… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

    Comments: The paper is under consideration at Pattern Recognition Letters

  8. Better Trees: An empirical study on hyperparameter tuning of classification decision tree induction algorithms

    Authors: Rafael Gomes Mantovani, Tomáš Horváth, André L. D. Rossi, Ricardo Cerri, Sylvio Barbon Junior, Joaquin Vanschoren, André Carlos Ponce de Leon Ferreira de Carvalho

    Abstract: Machine learning algorithms often contain many hyperparameters (HPs) whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these HP configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive performance. However, insights into efficiently explo… ▽ More

    Submitted 21 December, 2023; v1 submitted 5 December, 2018; originally announced December 2018.

    Comments: 60 pages, 16 figures

  9. arXiv:1805.06368  [pdf, other

    cs.AI

    Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining

    Authors: Victor Guilherme Turrisi da Costa, André Carlos Ponce de Leon Ferreira de Carvalho, Sylvio Barbon Junior

    Abstract: Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors have suggested modifications to increase its performa… ▽ More

    Submitted 17 May, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: 7 pages, 26 figures, Under R1 revision in Pattern Recognition Letters