Skip to main content

Showing 1–15 of 15 results for author: Baumgartner, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18328  [pdf, ps, other

    cs.FL cs.LG

    PDFA Distillation via String Probability Queries

    Authors: Robert Baumgartner, Sicco Verwer

    Abstract: Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as lan… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: LearnAUT 2024

  2. arXiv:2406.07208  [pdf, other

    cs.FL

    Database-assisted automata learning

    Authors: Hielke Walinga, Robert Baumgartner, Sicco Verwer

    Abstract: This paper presents DAALder (Database-Assisted Automata Learning, with Dutch suffix from leerder), a new algorithm for learning state machines, or automata, specifically deterministic finite-state automata (DFA). When learning state machines from log data originating from software systems, the large amount of log data can pose a challenge. Conventional state merging algorithms cannot efficiently d… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 8 pages body, 12 pages total, LearnAut 2024 Keywords: Active/Passive state machine learning, Incomplete Minimally Adequate Teacher

  3. arXiv:2402.03447  [pdf, other

    stat.ML cs.LG stat.ME

    Challenges in Variable Importance Ranking Under Correlation

    Authors: Annie Liang, Thomas Jemielita, Andy Liaw, Vladimir Svetnik, Lingkang Huang, Richard Baumgartner, Jason M. Klusowski

    Abstract: Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation (or related approaches) can be applied. Such analysis is often utilized in pharmaceutical applications due to its ability to interpret black-box models, including… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  4. arXiv:2309.01823  [pdf

    eess.IV cs.CV

    Multi-dimension unified Swin Transformer for 3D Lesion Segmentation in Multiple Anatomical Locations

    Authors: Shaoyan Pan, Yiqiao Liu, Sarah Halek, Michal Tomaszewski, Shubing Wang, Richard Baumgartner, Jianda Yuan, Gregory Goldmacher, Antong Chen

    Abstract: In oncology research, accurate 3D segmentation of lesions from CT scans is essential for the modeling of lesion growth kinetics. However, following the RECIST criteria, radiologists routinely only delineate each lesion on the axial slice showing the largest transverse area, and delineate a small number of lesions in 3D for research purposes. As a result, we have plenty of unlabeled 3D volumes and… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  5. arXiv:2208.10605  [pdf, other

    cs.CR cs.CY cs.LG

    SoK: Explainable Machine Learning for Computer Security Applications

    Authors: Azqa Nadeem, Daniƫl Vos, Clinton Cao, Luca Pajola, Simon Dieck, Robert Baumgartner, Sicco Verwer

    Abstract: Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, who utilize XAI for 4 distinct objec… ▽ More

    Submitted 3 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: 13 pages. Accepted at Euro S&P

  6. arXiv:2207.01516  [pdf, other

    cs.FL cs.LG

    Learning state machines via efficient hashing of future traces

    Authors: Robert Baumgartner, Sicco Verwer

    Abstract: State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when d… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  7. arXiv:2206.14255  [pdf, other

    cs.LG math.ST stat.ML

    Target alignment in truncated kernel ridge regression

    Authors: Arash A. Amini, Richard Baumgartner, Dai Feng

    Abstract: Kernel ridge regression (KRR) has recently attracted renewed interest due to its potential for explaining the transient effects, such as double descent, that emerge during neural network training. In this work, we study how the alignment between the target function and the kernel affects the performance of the KRR. We focus on the truncated KRR (TKRR) which utilizes an additional parameter that co… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

  8. arXiv:2108.08752  [pdf, other

    stat.ML cs.LG

    A Framework for an Assessment of the Kernel-target Alignment in Tree Ensemble Kernel Learning

    Authors: Dai Feng, Richard Baumgartner

    Abstract: Kernels ensuing from tree ensembles such as random forest (RF) or gradient boosted trees (GBT), when used for kernel learning, have been shown to be competitive to their respective tree ensembles (particularly in higher dimensional scenarios). On the other hand, it has been also shown that performance of the kernel algorithms depends on the degree of the kernel-target alignment. However, the kerne… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  9. arXiv:2012.10737  [pdf, other

    stat.ML cs.LG

    (Decision and regression) tree ensemble based kernels for regression and classification

    Authors: Dai Feng, Richard Baumgartner

    Abstract: Tree based ensembles such as Breiman's random forest (RF) and Gradient Boosted Trees (GBT) can be interpreted as implicit kernel generators, where the ensuing proximity matrix represents the data-driven tree ensemble kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. Recently, it has been shown that the… ▽ More

    Submitted 19 December, 2020; originally announced December 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.00089

  10. arXiv:2009.00089  [pdf, other

    stat.ML cs.LG

    Random Forest (RF) Kernel for Regression, Classification and Survival

    Authors: Dai Feng, Richard Baumgartner

    Abstract: Breiman's random forest (RF) can be interpreted as an implicit kernel generator,where the ensuing proximity matrix represents the data-driven RF kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. However, practical utility of the links between kernels and the RF has not been widely explored and systemati… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

  11. arXiv:2003.02943  [pdf

    eess.IV cs.CV

    A deep learning-facilitated radiomics solution for the prediction of lung lesion shrinkage in non-small cell lung cancer trials

    Authors: Antong Chen, Jennifer Saouaf, Bo Zhou, Randolph Crawford, Jianda Yuan, Junshui Ma, Richard Baumgartner, Shubing Wang, Gregory Goldmacher

    Abstract: Herein we propose a deep learning-based approach for the prediction of lung lesion response based on radiomic features extracted from clinical CT scans of patients in non-small cell lung cancer trials. The approach starts with the classification of lung lesions from the set of primary and metastatic lesions at various anatomic locations. Focusing on the lung lesions, we perform automatic segmentat… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted by International Symposium on Biomedical Imaging (ISBI) 2020

  12. Web Data Extraction, Applications and Techniques: A Survey

    Authors: Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, Robert Baumgartner

    Abstract: Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey a… ▽ More

    Submitted 9 June, 2014; v1 submitted 1 July, 2012; originally announced July 2012.

    Comments: Knowledge-based Systems

    Journal ref: Knowledge-Based Systems, 70, 301-323. 2014

  13. Intelligent Self-Repairable Web Wrappers

    Authors: Emilio Ferrara, Robert Baumgartner

    Abstract: The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or fa… ▽ More

    Submitted 20 June, 2011; originally announced June 2011.

    Comments: 12 pages, 4 figures; Proceedings of the 12th International Conference of the Italian Association for Artificial Intelligence, 2011

    Journal ref: Lecture Notes in Computer Science, 6934:274-285, 2011

  14. arXiv:1103.1254  [pdf, other

    cs.AI cs.IR

    Design of Automatically Adaptable Web Wrappers

    Authors: Emilio Ferrara, Robert Baumgartner

    Abstract: Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a… ▽ More

    Submitted 7 March, 2011; originally announced March 2011.

    Comments: 7 pages, 2 figures, In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011)

    Journal ref: Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, pp 211-216, 2011

  15. Automatic Wrapper Adaptation by Tree Edit Distance Matching

    Authors: Emilio Ferrara, Robert Baumgartner

    Abstract: Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of robust… ▽ More

    Submitted 7 March, 2011; originally announced March 2011.

    Comments: 7 pages, 3 figures, In Proceedings of the 2nd International Workshop on Combinations of Intelligent Methods and Applications (CIMA 2010)

    Journal ref: Combinations of Intelligent Methods and Applications Smart Innovation, Systems and Technologies Volume 8, 2011, pp 41-54