Skip to main content

Showing 1–20 of 20 results for author: Baumgartner, R

.
  1. arXiv:2406.18328  [pdf, ps, other

    cs.FL cs.LG

    PDFA Distillation via String Probability Queries

    Authors: Robert Baumgartner, Sicco Verwer

    Abstract: Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as lan… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: LearnAUT 2024

  2. arXiv:2406.07208  [pdf, other

    cs.FL

    Database-assisted automata learning

    Authors: Hielke Walinga, Robert Baumgartner, Sicco Verwer

    Abstract: This paper presents DAALder (Database-Assisted Automata Learning, with Dutch suffix from leerder), a new algorithm for learning state machines, or automata, specifically deterministic finite-state automata (DFA). When learning state machines from log data originating from software systems, the large amount of log data can pose a challenge. Conventional state merging algorithms cannot efficiently d… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 8 pages body, 12 pages total, LearnAut 2024 Keywords: Active/Passive state machine learning, Incomplete Minimally Adequate Teacher

  3. arXiv:2405.19260  [pdf, other

    cond-mat.stat-mech cond-mat.str-el hep-th nlin.CD

    Hilbert Space Diffusion in Systems with Approximate Symmetries

    Authors: Rahel L. Baumgartner, Luca V. Delacrétaz, Pranjal Nayak, Julian Sonner

    Abstract: Random matrix theory (RMT) universality is the defining property of quantum mechanical chaotic systems, and can be probed by observables like the spectral form factor (SFF). In this paper, we describe systematic deviations from RMT behaviour at intermediate time scales in systems with approximate symmetries. At early times, the symmetries allow us to organize the Hilbert space into approximately d… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 32 pages + appendices, 4 figures

  4. arXiv:2402.03447  [pdf, other

    stat.ML cs.LG stat.ME

    Challenges in Variable Importance Ranking Under Correlation

    Authors: Annie Liang, Thomas Jemielita, Andy Liaw, Vladimir Svetnik, Lingkang Huang, Richard Baumgartner, Jason M. Klusowski

    Abstract: Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation (or related approaches) can be applied. Such analysis is often utilized in pharmaceutical applications due to its ability to interpret black-box models, including… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  5. arXiv:2309.01823  [pdf

    eess.IV cs.CV

    Multi-dimension unified Swin Transformer for 3D Lesion Segmentation in Multiple Anatomical Locations

    Authors: Shaoyan Pan, Yiqiao Liu, Sarah Halek, Michal Tomaszewski, Shubing Wang, Richard Baumgartner, Jianda Yuan, Gregory Goldmacher, Antong Chen

    Abstract: In oncology research, accurate 3D segmentation of lesions from CT scans is essential for the modeling of lesion growth kinetics. However, following the RECIST criteria, radiologists routinely only delineate each lesion on the axial slice showing the largest transverse area, and delineate a small number of lesions in 3D for research purposes. As a result, we have plenty of unlabeled 3D volumes and… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  6. arXiv:2208.10605  [pdf, other

    cs.CR cs.CY cs.LG

    SoK: Explainable Machine Learning for Computer Security Applications

    Authors: Azqa Nadeem, Daniël Vos, Clinton Cao, Luca Pajola, Simon Dieck, Robert Baumgartner, Sicco Verwer

    Abstract: Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, who utilize XAI for 4 distinct objec… ▽ More

    Submitted 3 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: 13 pages. Accepted at Euro S&P

  7. arXiv:2207.01516  [pdf, other

    cs.FL cs.LG

    Learning state machines via efficient hashing of future traces

    Authors: Robert Baumgartner, Sicco Verwer

    Abstract: State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when d… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  8. arXiv:2206.14255  [pdf, other

    cs.LG math.ST stat.ML

    Target alignment in truncated kernel ridge regression

    Authors: Arash A. Amini, Richard Baumgartner, Dai Feng

    Abstract: Kernel ridge regression (KRR) has recently attracted renewed interest due to its potential for explaining the transient effects, such as double descent, that emerge during neural network training. In this work, we study how the alignment between the target function and the kernel affects the performance of the KRR. We focus on the truncated KRR (TKRR) which utilizes an additional parameter that co… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

  9. arXiv:2108.08752  [pdf, other

    stat.ML cs.LG

    A Framework for an Assessment of the Kernel-target Alignment in Tree Ensemble Kernel Learning

    Authors: Dai Feng, Richard Baumgartner

    Abstract: Kernels ensuing from tree ensembles such as random forest (RF) or gradient boosted trees (GBT), when used for kernel learning, have been shown to be competitive to their respective tree ensembles (particularly in higher dimensional scenarios). On the other hand, it has been also shown that performance of the kernel algorithms depends on the degree of the kernel-target alignment. However, the kerne… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  10. arXiv:2106.14109  [pdf

    stat.CO stat.AP stat.ME

    Parmsurv: a SAS Macro for Flexible Parametric Survival Analysis with Long-Term Predictions

    Authors: Han Fu, Shahrul Mt-Isa, Richard Baumgartner, William Malbecq

    Abstract: Health economic evaluations often require predictions of survival rates beyond the follow-up period. Parametric survival models can be more convenient for economic modelling than the Cox model. The generalized gamma (GG) and generalized F (GF) distributions are extensive families that contain almost all commonly used distributions with various hazard shapes and arbitrary complexity. In this study,… ▽ More

    Submitted 12 July, 2022; v1 submitted 26 June, 2021; originally announced June 2021.

    Comments: 15 pages, 1 figure, 10 tables, accepted by The Clinical Data Science Conference - PHUSE US Connect 2021

  11. Influence of PEG on the Clustering of Active Janus Colloids

    Authors: Mohammed A. Kalil, Nicky R. Baumgartner, Marola W. Issa, Shawn D. Ryan, Christopher L. Wirth

    Abstract: Micrometer scale colloidal particles that propel in a deterministic fashion in response to local environmental cues are useful analogs to self-propelling entities found in nature. Both natural and synthetic active colloidal systems are often near boundaries or are located in crowded environments. Herein, we describe experiments in which we measured the influence of hydrogen peroxide concentration… ▽ More

    Submitted 15 January, 2021; originally announced January 2021.

  12. arXiv:2012.10737  [pdf, other

    stat.ML cs.LG

    (Decision and regression) tree ensemble based kernels for regression and classification

    Authors: Dai Feng, Richard Baumgartner

    Abstract: Tree based ensembles such as Breiman's random forest (RF) and Gradient Boosted Trees (GBT) can be interpreted as implicit kernel generators, where the ensuing proximity matrix represents the data-driven tree ensemble kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. Recently, it has been shown that the… ▽ More

    Submitted 19 December, 2020; originally announced December 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.00089

  13. arXiv:2009.00089  [pdf, other

    stat.ML cs.LG

    Random Forest (RF) Kernel for Regression, Classification and Survival

    Authors: Dai Feng, Richard Baumgartner

    Abstract: Breiman's random forest (RF) can be interpreted as an implicit kernel generator,where the ensuing proximity matrix represents the data-driven RF kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. However, practical utility of the links between kernels and the RF has not been widely explored and systemati… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

  14. arXiv:2003.02943  [pdf

    eess.IV cs.CV

    A deep learning-facilitated radiomics solution for the prediction of lung lesion shrinkage in non-small cell lung cancer trials

    Authors: Antong Chen, Jennifer Saouaf, Bo Zhou, Randolph Crawford, Jianda Yuan, Junshui Ma, Richard Baumgartner, Shubing Wang, Gregory Goldmacher

    Abstract: Herein we propose a deep learning-based approach for the prediction of lung lesion response based on radiomic features extracted from clinical CT scans of patients in non-small cell lung cancer trials. The approach starts with the classification of lung lesions from the set of primary and metastatic lesions at various anatomic locations. Focusing on the lung lesions, we perform automatic segmentat… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted by International Symposium on Biomedical Imaging (ISBI) 2020

  15. arXiv:1901.03990  [pdf, other

    q-bio.NC

    Formation of three-dimensional auditory space

    Authors: Piotr Majdak, Robert Baumgartner, Claudia Jenny

    Abstract: Human listeners need to permanently interact with their three-dimensional (3-D) environment. To this end, they require efficient perceptual mechanisms to form a sufficiently accurate 3-D auditory space. In this chapter, we discuss the formation of the 3-D auditory space from various perspectives. The aim is to show the link between cognition, acoustics, neurophysiology, and psychophysics, when it… ▽ More

    Submitted 13 January, 2019; originally announced January 2019.

  16. Web Data Extraction, Applications and Techniques: A Survey

    Authors: Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, Robert Baumgartner

    Abstract: Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey a… ▽ More

    Submitted 9 June, 2014; v1 submitted 1 July, 2012; originally announced July 2012.

    Comments: Knowledge-based Systems

    Journal ref: Knowledge-Based Systems, 70, 301-323. 2014

  17. Intelligent Self-Repairable Web Wrappers

    Authors: Emilio Ferrara, Robert Baumgartner

    Abstract: The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or fa… ▽ More

    Submitted 20 June, 2011; originally announced June 2011.

    Comments: 12 pages, 4 figures; Proceedings of the 12th International Conference of the Italian Association for Artificial Intelligence, 2011

    Journal ref: Lecture Notes in Computer Science, 6934:274-285, 2011

  18. arXiv:1103.1254  [pdf, other

    cs.AI cs.IR

    Design of Automatically Adaptable Web Wrappers

    Authors: Emilio Ferrara, Robert Baumgartner

    Abstract: Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a… ▽ More

    Submitted 7 March, 2011; originally announced March 2011.

    Comments: 7 pages, 2 figures, In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011)

    Journal ref: Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, pp 211-216, 2011

  19. Automatic Wrapper Adaptation by Tree Edit Distance Matching

    Authors: Emilio Ferrara, Robert Baumgartner

    Abstract: Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of robust… ▽ More

    Submitted 7 March, 2011; originally announced March 2011.

    Comments: 7 pages, 3 figures, In Proceedings of the 2nd International Workshop on Combinations of Intelligent Methods and Applications (CIMA 2010)

    Journal ref: Combinations of Intelligent Methods and Applications Smart Innovation, Systems and Technologies Volume 8, 2011, pp 41-54

  20. arXiv:0903.1880  [pdf, other

    stat.ME stat.AP stat.CO

    SMART: A statistical framework for optimal design matrix generation with application to fMRI

    Authors: Gautam Pendse, Adam Schwarz, Richard Baumgartner, Alexandre Coimbra, David Borsook, Lino Becerra

    Abstract: The general linear model (GLM) is a well established tool for analyzing functional magnetic resonance imaging (fMRI) data. Most fMRI analyses via GLM proceed in a massively univariate fashion where the same design matrix is used for analyzing data from each voxel. A major limitation of this approach is the locally varying nature of signals of interest as well as associated confounds. This local… ▽ More

    Submitted 11 March, 2009; originally announced March 2009.

    Comments: 68 pages, 34 figures