Search | arXiv e-print repository

Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Log

Authors: Adrian Rebmann, Timotheus Kampik, Carl Corea, Han van der Aa

Abstract: Detecting undesired process behavior is one of the main tasks of process mining and various conformance-checking techniques have been developed to this end. These techniques typically require a normative process model as input, specifically designed for the processes to be analyzed. Such models are rarely available, though, and their creation involves considerable manual effort.However, reference… ▽ More Detecting undesired process behavior is one of the main tasks of process mining and various conformance-checking techniques have been developed to this end. These techniques typically require a normative process model as input, specifically designed for the processes to be analyzed. Such models are rarely available, though, and their creation involves considerable manual effort.However, reference process models serve as best-practice templates for organizational processes in a plethora of domains, containing valuable knowledge about general behavioral relations in well-engineered processes. These general models can thus mitigate the need for dedicated models by providing a basis to check for undesired behavior. Still, finding a perfectly matching reference model for a real-life event log is unrealistic because organizational needs can vary, despite similarities in process execution. Furthermore, event logs may encompass behavior related to different reference models, making traditional conformance checking impractical as it requires aligning process executions to individual models. To still use reference models for conformance checking, we propose a framework for mining declarative best-practice constraints from a reference model collection, automatically selecting constraints that are relevant for a given event log, and checking for best-practice violations. We demonstrate the capability of our framework to detect best-practice violations through an evaluation based on real-world process model collections and event logs. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Preprint submitted to Information Systems

arXiv:2407.02310 [pdf, other]

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

Authors: Adrian Rebmann, Fabian David Schmidt, Goran Glavaš, Han van der Aa

Abstract: The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit… ▽ More The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Submitted to ICPM

arXiv:2407.01800 [pdf, other]

Normalization and effective learning rates in reinforcement learning

Authors: Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt, Razvan Pascanu, Will Dabney

Abstract: Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestimation bias. However, normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network paramet… ▽ More Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestimation bias. However, normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. This becomes problematic in continual learning settings, where the resulting effective learning rate schedule may decay to near zero too quickly relative to the timescale of the learning problem. We propose to make the learning rate schedule explicit with a simple re-parameterization which we call Normalize-and-Project (NaP), which couples the insertion of normalization layers with weight projection, ensuring that the effective learning rate remains constant throughout training. This technique reveals itself as a powerful analytical tool to better understand learning rate schedules in deep reinforcement learning, and as a means of improving robustness to nonstationarity in synthetic plasticity loss benchmarks along with both the single-task and sequential variants of the Arcade Learning Environment. We also show that our approach can be easily applied to popular architectures such as ResNets and transformers while recovering and in some cases even slightly improving the performance of the base model in common stationary benchmarks. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00129 [pdf]

Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction

Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Rishi Agrawal, Carol C. Wu, Hien Van Nguyen

Abstract: Predicting human gaze behavior within computer vision is integral for develo** interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models… ▽ More Predicting human gaze behavior within computer vision is integral for develo** interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models to medical imaging for scanpath prediction remains unexplored. Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images, potentially streamlining data collection and enhancing AI systems using larger datasets. However, predicting human scanpaths on medical images presents unique challenges due to the diverse nature of abnormal regions. Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community. Utilizing a two-stage training process and large publicly available datasets, our approach generates static heatmaps and eye gaze videos aligned with radiology reports, facilitating comprehensive analysis. We validate our approach by comparing its performance with state-of-the-art methods and assessing its generalizability among different radiologists, introducing novel strategies to model radiologists' search patterns during CXR image diagnosis. Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions over the CXR images. It sometimes also outperforms humans in terms of redundancy and randomness in the scanpaths. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Submitted to the Journal

arXiv:2406.19686 [pdf]

Enhancing Radiological Diagnosis: A Collaborative Approach Integrating AI and Human Expertise for Visual Miss Correction

Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Carol C. Wu, Hien Van Nguyen

Abstract: Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CX… ▽ More Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CXR, the study retrospectively developed CoRaX, employing a large multimodal model to analyze image embeddings, eye gaze data, and radiology reports. The system's effectiveness was evaluated based on its referral-making process, the quality of referrals, and performance in collaborative diagnostic settings. CoRaX was tested on a simulated error dataset of 271 samples with 28% (93 of 332) missed abnormalities. The system corrected 21% (71 of 332) of these errors, leaving 7% (22 of 312) unresolved. The Referral-Usefulness score, indicating the accuracy of predicted regions for all true referrals, was 0.63 (95% CI 0.59, 0.68). The Total-Usefulness score, reflecting the diagnostic accuracy of CoRaX's interactions with radiologists, showed that 84% (237 of 280) of these interactions had a score above 0.40. In conclusion, CoRaX efficiently collaborates with radiologists to address perceptual errors across various abnormalities, with potential applications in the education and training of novice radiologists. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Under Review in Journal

arXiv:2406.17462 [pdf, other]

The Tree of Diffusion Life: Evolutionary Embeddings to Understand the Generation Process of Diffusion Models

Authors: Vidya Prasad, Hans van Gorp, Christina Humer, Anna Vilanova, Nicola Pezzotti

Abstract: Diffusion models generate high-quality samples by corrupting data with Gaussian noise and iteratively reconstructing it with deep learning, slowly transforming noisy images into refined outputs. Understanding this data evolution is important for interpretability but is complex due to its high-dimensional evolutionary nature. While traditional dimensionality reduction methods like t-distributed sto… ▽ More Diffusion models generate high-quality samples by corrupting data with Gaussian noise and iteratively reconstructing it with deep learning, slowly transforming noisy images into refined outputs. Understanding this data evolution is important for interpretability but is complex due to its high-dimensional evolutionary nature. While traditional dimensionality reduction methods like t-distributed stochastic neighborhood embedding (t-SNE) aid in understanding high-dimensional spaces, they neglect evolutionary structure preservation. Hence, we propose Tree of Diffusion Life (TDL), a method to understand data evolution in the generative process of diffusion models. TDL samples a diffusion model's generative space via instances with varying prompts and employs image encoders to extract semantic meaning from these samples, projecting them to an intermediate space. It employs a novel evolutionary embedding algorithm that explicitly encodes the iterations while preserving the high-dimensional relations, facilitating the visualization of data evolution. This embedding leverages three metrics: a standard t-SNE loss to group semantically similar elements, a displacement loss to group elements from the same iteration step, and an instance alignment loss to align elements of the same instance across iterations. We present rectilinear and radial layouts to represent iterations, enabling comprehensive exploration. We assess various feature extractors and highlight TDL's potential with prominent diffusion models like GLIDE and Stable Diffusion with different prompt sets. TDL simplifies understanding data evolution within diffusion models, offering valuable insights into their functioning. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16785 [pdf, ps, other]

Bisimulation for Impure Simplicial Complexes

Authors: Marta Bílková, Hans van Ditmarsch, Roman Kuznets, Rojo Randrianomentsoa

Abstract: As an alternative to Kripke models, simplicial complexes are a versatile semantic primitive on which to interpret epistemic logic. Given a set of vertices, a simplicial complex is a downward closed set of subsets, called simplexes, of the vertex set. A maximal simplex is called a facet. Impure simplicial complexes represent that some agents (processes) are dead. It is known that impure simplicial… ▽ More As an alternative to Kripke models, simplicial complexes are a versatile semantic primitive on which to interpret epistemic logic. Given a set of vertices, a simplicial complex is a downward closed set of subsets, called simplexes, of the vertex set. A maximal simplex is called a facet. Impure simplicial complexes represent that some agents (processes) are dead. It is known that impure simplicial complexes categorically correspond to so-called partial epistemic (Kripke) models. In this contribution, we define a notion of bisimulation to compare impure simplicial complexes and show that it has the Hennessy-Milner property. These results are for a logical language including atoms that express whether agents are alive or dead. Without these atoms no reasonable standard notion of bisimulation exists, as we amply justify by counterexamples, because such a restricted language is insufficiently expressive. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Proceedings of Advances in Modal Logic 2024

arXiv:2406.06202 [pdf, other]

Federated learning in food research

Authors: Zuzanna Fendor, Bas H. M. van der Velden, Xinxin Wang, Andrea Jr. Carnoli, Osman Mutlu, Ali Hürriyetoğlu

Abstract: Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing o… ▽ More Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing obstacles. This systematic review investigates the use of federated learning within the food domain, structures included papers in a federated learning framework, highlights knowledge gaps, and discusses potential applications. A total of 41 papers were included in the review. The current applications include solutions to water and milk quality assessment, cybersecurity of water processing, pesticide residue risk analysis, weed detection, and fraud detection, focusing on centralized horizontal federated learning. One of the gaps found was the lack of vertical or transfer federated learning and decentralized architectures. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.16255 [pdf, other]

GeoAdaLer: Geometric Insights into Adaptive Stochastic Gradient Descent Algorithms

Authors: Chinedu Eleh, Masuzyo Mwanza, Ekene Aguegboh, Hans-Werner van Wyk

Abstract: The Adam optimization method has achieved remarkable success in addressing contemporary challenges in stochastic optimization. This method falls within the realm of adaptive sub-gradient techniques, yet the underlying geometric principles guiding its performance have remained shrouded in mystery, and have long confounded researchers. In this paper, we introduce GeoAdaLer (Geometric Adaptive Learne… ▽ More The Adam optimization method has achieved remarkable success in addressing contemporary challenges in stochastic optimization. This method falls within the realm of adaptive sub-gradient techniques, yet the underlying geometric principles guiding its performance have remained shrouded in mystery, and have long confounded researchers. In this paper, we introduce GeoAdaLer (Geometric Adaptive Learner), a novel adaptive learning method for stochastic gradient descent optimization, which draws from the geometric properties of the optimization landscape. Beyond emerging as a formidable contender, the proposed method extends the concept of adaptive learning by introducing a geometrically inclined approach that enhances the interpretability and effectiveness in complex optimization scenarios △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15997 [pdf, other]

$\textit{UniSaT}$: Unified-Objective Belief Model and Planner to Search for and Track Multiple Objects

Authors: Leonardo Santos, Brady Moon, Sebastian Scherer, Hoa Van Nguyen

Abstract: The problem of path planning for autonomously searching and tracking multiple objects is important to reconnaissance, surveillance, and many other data-gathering applications. Due to the inherent competing objectives of searching for new objects while maintaining tracks for found objects, most current approaches rely on multi-objective planning methods, leaving it up to the user to tune parameters… ▽ More The problem of path planning for autonomously searching and tracking multiple objects is important to reconnaissance, surveillance, and many other data-gathering applications. Due to the inherent competing objectives of searching for new objects while maintaining tracks for found objects, most current approaches rely on multi-objective planning methods, leaving it up to the user to tune parameters to balance between the two objectives, usually based on heuristics or trial and error. In this paper, we introduce $\textit{UniSaT}$ ($\textit{Unified Search and Track}$), a unified-objective formulation for the search and track problem based on Random Finite Sets (RFS). This is done by modeling both the unknown and known objects through a combined generalized labeled multi-Bernoulli (GLMB) filter. For the unseen objects, we can leverage both cardinality and spatial prior distributions, which means $\textit{UniSaT}$ does not rely on knowing the exact count of the expected number of objects in the space. The planner maximizes the mutual information of this unified belief model, creating balanced search and tracking behaviors. We demonstrate our work in a simulated environment and show both qualitative results as well as quantitative improvements over a multi-objective method. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 13 pages, 5 figures, 1 table

arXiv:2405.15671 [pdf, other]

The Undecidability of Quantified Announcements

Authors: Thomas Ågotnes, Hans van Ditmarsch, Tim French

Abstract: This paper demonstrates the undecidability of a number of logics with quantification over public announcements: arbitrary public announcement logic (APAL), group announcement logic (GAL), and coalition announcement logic (CAL). In APAL we consider the informative consequences of any announcement, in GAL we consider the informative consequences of a group of agents (this group may be a proper subse… ▽ More This paper demonstrates the undecidability of a number of logics with quantification over public announcements: arbitrary public announcement logic (APAL), group announcement logic (GAL), and coalition announcement logic (CAL). In APAL we consider the informative consequences of any announcement, in GAL we consider the informative consequences of a group of agents (this group may be a proper subset of the set of all agents) all of which are simultaneously (and publicly) making known announcements. So this is more restrictive than APAL. Finally, CAL is as GAL except that we now quantify over anything the agents not in that group may announce simultaneously as well. The logic CAL therefore has some features of game logic and of ATL. We show that when there are multiple agents in the language, the satisfiability problem is undecidable for APAL, GAL, and CAL. In the single agent case, the satisfiability problem is decidable for all three logics. This paper corrects an error to the submitted version of Undecidability of Quantified Announcements, identified by Yuta Asami . The nature of the error was in the definition of the formula $cga(X)$ (see Subsection 5.2) which is corrected in this version. △ Less

Submitted 5 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: This paper contains a correction to the 2016 article, The Undecidablity of Quantified Announcements, published in Studia Logica

Journal ref: The undecidability of quantified announcements. Studia Logica, 104(4) pages 597-640, 2016

arXiv:2405.15264 [pdf, other]

Self-Contrastive Weakly Supervised Learning Framework for Prognostic Prediction Using Whole Slide Images

Authors: Saul Fuster, Farbod Khoraminia, Julio Silva-Rodríguez, Umay Kiraz, Geert J. L. H. van Leenders, Trygve Eftestøl, Valery Naranjo, Emiel A. M. Janssen, Tahlita C. M. Zuiverloon, Kjersti Engan

Abstract: We present a pioneering investigation into the application of deep learning techniques to analyze histopathological images for addressing the substantial challenge of automated prognostic prediction. Prognostic prediction poses a unique challenge as the ground truth labels are inherently weak, and the model must anticipate future events that are not directly observable in the image. To address thi… ▽ More We present a pioneering investigation into the application of deep learning techniques to analyze histopathological images for addressing the substantial challenge of automated prognostic prediction. Prognostic prediction poses a unique challenge as the ground truth labels are inherently weak, and the model must anticipate future events that are not directly observable in the image. To address this challenge, we propose a novel three-part framework comprising of a convolutional network based tissue segmentation algorithm for region of interest delineation, a contrastive learning module for feature extraction, and a nested multiple instance learning classification module. Our study explores the significance of various regions of interest within the histopathological slides and exploits diverse learning scenarios. The pipeline is initially validated on artificially generated data and a simpler diagnostic task. Transitioning to prognostic prediction, tasks become more challenging. Employing bladder cancer as use case, our best models yield an AUC of 0.721 and 0.678 for recurrence and treatment outcome prediction respectively. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: https://github.com/Biomedical-Data-Analysis-Laboratory/HistoPrognostics

arXiv:2405.11519 [pdf, other]

MSNER: A Multilingual Speech Dataset for Named Entity Recognition

Authors: Quentin Meeus, Marie-Francine Moens, Hugo Van hamme

Abstract: While extensively explored in text-based tasks, Named Entity Recognition (NER) remains largely neglected in spoken language understanding. Existing resources are limited to a single, English-only dataset. This paper addresses this gap by introducing MSNER, a freely available, multilingual speech corpus annotated with named entities. It provides annotations to the VoxPopuli dataset in four language… ▽ More While extensively explored in text-based tasks, Named Entity Recognition (NER) remains largely neglected in spoken language understanding. Existing resources are limited to a single, English-only dataset. This paper addresses this gap by introducing MSNER, a freely available, multilingual speech corpus annotated with named entities. It provides annotations to the VoxPopuli dataset in four languages (Dutch, French, German, and Spanish). We have also releasing an efficient annotation tool that leverages automatic pre-annotations for faster manual refinement. This results in 590 and 15 hours of silver-annotated speech for training and validation, alongside a 17-hour, manually-annotated evaluation set. We further provide an analysis comparing silver and gold annotations. Finally, we present baseline NER models to stimulate further research on this newly available dataset. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.08780 [pdf]

Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling

Authors: Gregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan Peng

Abstract: Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and p… ▽ More Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of develo** disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.18981 [pdf, other]

Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis

Authors: Akash Awasthi, Safwan Ahmad, Bryant Le, Hien Van Nguyen

Abstract: In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial f… ▽ More In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial for correcting mistakes by guiding radiologists to the accurate regions of interest, especially in the diagnosis of chest radiograph abnormalities. In response to this imperative, we propose a novel system designed to identify the primary intentions articulated by radiologists in their reports and the corresponding regions of interest in CXR images. This system seeks to elucidate the visual context underlying radiologists' textual findings, with the potential to rectify errors made by less experienced practitioners and direct them to precise regions of interest. Importantly, the proposed system can be instrumental in providing constructive feedback to inexperienced radiologists or junior residents in the hospital, bridging the gap in face-to-face communication. The system represents a valuable tool for enhancing diagnostic accuracy and fostering continuous learning within the medical community. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted in ISBI 2024

arXiv:2404.18640 [pdf, other]

doi 10.1145/3626772.3657749

Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems

Authors: ** Huang, Harrie Oosterhuis, Masoud Mansoury, Herke van Hoof, Maarten de Rijke

Abstract: Two typical forms of bias in user interaction data with recommender systems (RSs) are popularity bias and positivity bias, which manifest themselves as the over-representation of interactions with popular items or items that users prefer, respectively. Debiasing methods aim to mitigate the effect of selection bias on the evaluation and optimization of RSs. However, existing debiasing methods only… ▽ More Two typical forms of bias in user interaction data with recommender systems (RSs) are popularity bias and positivity bias, which manifest themselves as the over-representation of interactions with popular items or items that users prefer, respectively. Debiasing methods aim to mitigate the effect of selection bias on the evaluation and optimization of RSs. However, existing debiasing methods only consider single-factor forms of bias, e.g., only the item (popularity) or only the rating value (positivity). This is in stark contrast with the real world where user selections are generally affected by multiple factors at once. In this work, we consider multifactorial selection bias in RSs. Our focus is on selection bias affected by both item and rating value factors, which is a generalization and combination of popularity and positivity bias. While the concept of multifactorial bias is intuitive, it brings a severe practical challenge as it requires substantially more data for accurate bias estimation. As a solution, we propose smoothing and alternating gradient descent techniques to reduce variance and improve the robustness of its optimization. Our experimental results reveal that, with our proposed techniques, multifactorial bias corrections are more effective and robust than single-factor counterparts on real-world and synthetic datasets. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: SIGIR 2024

arXiv:2404.06267 [pdf, other]

PGTNet: A Process Graph Transformer Network for Remaining Time Prediction of Business Process Instances

Authors: Keyvan Amiri Elyasi, Han van der Aa, Heiner Stuckenschmidt

Abstract: We present PGTNet, an approach that transforms event logs into graph datasets and leverages graph-oriented data for training Process Graph Transformer Networks to predict the remaining time of business process instances. PGTNet consistently outperforms state-of-the-art deep learning approaches across a diverse range of 20 publicly available real-world event logs. Notably, our approach is most prom… ▽ More We present PGTNet, an approach that transforms event logs into graph datasets and leverages graph-oriented data for training Process Graph Transformer Networks to predict the remaining time of business process instances. PGTNet consistently outperforms state-of-the-art deep learning approaches across a diverse range of 20 publicly available real-world event logs. Notably, our approach is most promising for highly complex processes, where existing deep learning approaches encounter difficulties stemming from their limited ability to learn control-flow relationships among process activities and capture long-range dependencies. PGTNet addresses these challenges, while also being able to consider multiple process perspectives during the learning process. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 16 pages, 4 figures, To be published in: Advanced Information Systems Engineering - 36th International Conference, CAiSE 2024, Limassol, Cyprus, June 03-07, 2024, Proceedings

arXiv:2404.03988 [pdf, other]

Model Selection with Model Zoo via Graph Learning

Authors: Ziyu Li, Hilco van der Wilk, Danning Zhan, Megha Khosla, Alessandro Bozzon, Rihan Hai

Abstract: Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Selecting the right pre-trained models is crucial, yet complicated by the diversity of models from various model families (… ▽ More Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Selecting the right pre-trained models is crucial, yet complicated by the diversity of models from various model families (like ResNet, Vit, Swin) and the hidden relationships between models and datasets. Existing methods, which utilize basic information from models and datasets to compute scores indicating model performance on target datasets, overlook the intrinsic relationships, limiting their effectiveness in model selection. In this study, we introduce TransferGraph, a novel framework that reformulates model selection as a graph learning problem. TransferGraph constructs a graph using extensive metadata extracted from models and datasets, while capturing their inherent relationships. Through comprehensive experiments across 16 real datasets, both images and texts, we demonstrate TransferGraph's effectiveness in capturing essential model-dataset relationships, yielding up to a 32% improvement in correlation between predicted performance and the actual fine-tuning results compared to the state-of-the-art methods. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Accepted at 40th IEEE International Conference on Data Engineering (ICDE 2024)

arXiv:2403.15301 [pdf, other]

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Authors: Guillermo Infante, David Kuric, Anders Jonsson, Vicenç Gómez, Herke van Hoof

Abstract: Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subprob… ▽ More Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments. △ Less

Submitted 3 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2402.18762 [pdf, other]

Disentangling the Causes of Plasticity Loss in Neural Networks

Authors: Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, Will Dabney

Abstract: Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is violated, e.g.\ deep reinforcement learning, learning algorithms become unstable and brittle with respect to hyperparameters and even random seeds. O… ▽ More Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is violated, e.g.\ deep reinforcement learning, learning algorithms become unstable and brittle with respect to hyperparameters and even random seeds. One factor driving this instability is the loss of plasticity, meaning that updating the network's predictions in response to new information becomes more difficult as training progresses. While many recent works provide analyses and partial solutions to this phenomenon, a fundamental question remains unanswered: to what extent do known mechanisms of plasticity loss overlap, and how can mitigation strategies be combined to best maintain the trainability of a network? This paper addresses these questions, showing that loss of plasticity can be decomposed into multiple independent mechanisms and that, while intervening on any single mechanism is insufficient to avoid the loss of plasticity in all cases, intervening on multiple mechanisms in conjunction results in highly robust learning algorithms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks, and further demonstrate its effectiveness on naturally arising nonstationarities, including reinforcement learning in the Arcade Learning Environment. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.16645 [pdf, other]

Learning Based NMPC Adaptation for Autonomous Driving using Parallelized Digital Twin

Authors: Jean Pierre Allamaa, Panagiotis Patrinos, Herman Van der Auweraer, Tong Duy Son

Abstract: In this work, we address the problem of transferring an autonomous driving (AD) module from one domain to another, in particular from simulation to the real world (Sim2Real). We propose a data-efficient method for online and on-the-fly learning based adaptation for parametrizable control architectures such that the target closed-loop performance is optimized under several uncertainty sources such… ▽ More In this work, we address the problem of transferring an autonomous driving (AD) module from one domain to another, in particular from simulation to the real world (Sim2Real). We propose a data-efficient method for online and on-the-fly learning based adaptation for parametrizable control architectures such that the target closed-loop performance is optimized under several uncertainty sources such as model mismatches, environment changes and task choice. The novelty of the work resides in leveraging black-box optimization enabled by executable digital twins, with data-driven hyper-parameter tuning through derivative-free methods to directly adapt in real-time the AD module. Our proposed method requires a minimal amount of interaction with the real-world in the randomization and online training phase. Specifically, we validate our approach in real-world experiments and show the ability to transfer and safely tune a nonlinear model predictive controller in less than 10 minutes, eliminating the need of day-long manual tuning and hours-long machine learning training phases. Our results show that the online adapted NMPC directly compensates for disturbances, avoids overtuning in simulation and for one specific task, and it generalizes for less than 15cm of tracking accuracy over a multitude of trajectories, and leads to 83% tracking improvement. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.10291 [pdf, other]

An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUM

Authors: Vijayalakshmi Saravanan, Perry Siehien, Shinjae Yoo, Hubertus Van Dam, Thomas Flynn, Christopher Kelly, Khaled Z Ibrahim

Abstract: Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between su… ▽ More Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between sudden change detection and minimizing false alarms is vital. Many existing algorithms for this purpose rely on known probability distributions, limiting their feasibility. In this study, we introduce the Kernel-based Cumulative Sum (KCUSUM) algorithm, a non-parametric extension of the traditional Cumulative Sum (CUSUM) method, which has gained prominence for its efficacy in online change point detection under less restrictive conditions. KCUSUM splits itself by comparing incoming samples directly with reference samples and computes a statistic grounded in the Maximum Mean Discrepancy (MMD) non-parametric framework. This approach extends KCUSUM's pertinence to scenarios where only reference samples are available, such as atomic trajectories of proteins in vacuum, facilitating the detection of deviations from the reference sample without prior knowledge of the data's underlying distribution. Furthermore, by harnessing MMD's inherent random-walk structure, we can theoretically analyze KCUSUM's performance across various use cases, including metrics like expected delay and mean runtime to false alarms. Finally, we discuss real-world use cases from scientific simulations such as NWChem CODAR and protein folding data, demonstrating KCUSUM's practical effectiveness in online change point detection. △ Less

Submitted 4 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 16 pages. arXiv admin note: text overlap with arXiv:1903.01661

MSC Class: CCS

arXiv:2402.08255 [pdf, other]

Distal Interference: Exploring the Limits of Model-Based Continual Learning

Authors: Heinrich van Deventer, Anna Sergeevna Bosman

Abstract: Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient desc… ▽ More Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient descent and overlap** representations between distant input points lead to distal interference and catastrophic interference. Distal interference refers to the phenomenon where training a model on a subset of the domain leads to non-local changes on other subsets of the domain. This study shows that uniformly trainable models without distal interference must be exponentially large. A novel antisymmetric bounded exponential layer B-spline ANN architecture named ABEL-Spline is proposed that can approximate any continuous function, is uniformly trainable, has polynomial computational complexity, and provides some guarantees for distal interference. Experiments are presented to demonstrate the theoretical properties of ABEL-Splines. ABEL-Splines are also evaluated on benchmark regression problems. It is concluded that the weaker distal interference guarantees in ABEL-Splines are insufficient for model-only continual learning. It is conjectured that continual learning with polynomial complexity models requires augmentation of the training data or algorithm. △ Less

Submitted 13 February, 2024; originally announced February 2024.

MSC Class: 68T07 ACM Class: I.5.1

arXiv:2402.07746 [pdf]

Minimally Interactive Segmentation of Soft-Tissue Tumors on CT and MRI using Deep Learning

Authors: Douwe J. Spaanderman, Martijn P. A. Starmans, Gonnie C. M. van Erp, David F. Hanff, Judith H. Sluijter, Anne-Rose W. Schut, Geert J. L. H. van Leenders, Cornelis Verhoef, Dirk J. Grunhagen, Wiro J. Niessen, Jacob J. Visser, Stefan Klein

Abstract: Segmentations are crucial in medical imaging to obtain morphological, volumetric, and radiomics biomarkers. Manual segmentation is accurate but not feasible in the radiologist's clinical workflow, while automatic segmentation generally obtains sub-par performance. We therefore developed a minimally interactive deep learning-based segmentation method for soft-tissue tumors (STTs) on CT and MRI. The… ▽ More Segmentations are crucial in medical imaging to obtain morphological, volumetric, and radiomics biomarkers. Manual segmentation is accurate but not feasible in the radiologist's clinical workflow, while automatic segmentation generally obtains sub-par performance. We therefore developed a minimally interactive deep learning-based segmentation method for soft-tissue tumors (STTs) on CT and MRI. The method requires the user to click six points near the tumor's extreme boundaries. These six points are transformed into a distance map and serve, with the image, as input for a Convolutional Neural Network. For training and validation, a multicenter dataset containing 514 patients and nine STT types in seven anatomical locations was used, resulting in a Dice Similarity Coefficient (DSC) of 0.85$\pm$0.11 (mean $\pm$ standard deviation (SD)) for CT and 0.84$\pm$0.12 for T1-weighted MRI, when compared to manual segmentations made by expert radiologists. Next, the method was externally validated on a dataset including five unseen STT phenotypes in extremities, achieving 0.81$\pm$0.08 for CT, 0.84$\pm$0.09 for T1-weighted MRI, and 0.88\pm0.08 for previously unseen T2-weighted fat-saturated (FS) MRI. In conclusion, our minimally interactive segmentation method effectively segments different types of STTs on CT and MRI, with robust generalization to previously unseen phenotypes and imaging modalities. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.02239 [pdf, other]

Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein

Authors: Hugues Van Assel, Cédric Vincent-Cuaz, Nicolas Courty, Rémi Flamary, Pascal Frossard, Titouan Vayer

Abstract: Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships wi… ▽ More Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem. We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets. △ Less

Submitted 22 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: 38 pages, 15 figures

arXiv:2401.13708 [pdf, other]

Accelerating hyperbolic t-SNE

Authors: Martin Skrodzki, Hunter van Geffen, Nicolas F. Chaves-de-Plaza, Thomas Höllt, Elmar Eisemann, Klaus Hildebrandt

Abstract: The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. Subsequently, they have also been used in the visualization of high-dimensional data, where they exhibit increased embe… ▽ More The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. Subsequently, they have also been used in the visualization of high-dimensional data, where they exhibit increased embedding performance. However, none of the existing dimensionality reduction methods for embedding into hyperbolic spaces scale well with the size of the input data. That is because the embeddings are computed via iterative optimization schemes and the computation cost of every iteration is quadratic in the size of the input. Furthermore, due to the non-linear nature of hyperbolic spaces, Euclidean acceleration structures cannot directly be translated to the hyperbolic setting. This paper introduces the first acceleration structure for hyperbolic embeddings, building upon a polar quadtree. We compare our approach with existing methods and demonstrate that it computes embeddings of similar quality in significantly less time. Implementation and scripts for the experiments can be found at https://graphics.tudelft.nl/accelerating-hyperbolic-tsne. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.10291 [pdf, other]

Detecting Post-Stroke Aphasia Via Brain Responses to Speech in a Deep Learning Framework

Authors: Pieter De Clercq, Corentin Puffay, Jill Kries, Hugo Van Hamme, Maaike Vandermosten, Tom Francart, Jonas Vanthornhout

Abstract: Aphasia, a language disorder primarily caused by a stroke, is traditionally diagnosed using behavioral language tests. However, these tests are time-consuming, require manual interpretation by trained clinicians, suffer from low ecological validity, and diagnosis can be biased by comorbid motor and cognitive problems present in aphasia. In this study, we introduce an automated screening tool for s… ▽ More Aphasia, a language disorder primarily caused by a stroke, is traditionally diagnosed using behavioral language tests. However, these tests are time-consuming, require manual interpretation by trained clinicians, suffer from low ecological validity, and diagnosis can be biased by comorbid motor and cognitive problems present in aphasia. In this study, we introduce an automated screening tool for speech processing impairments in aphasia that relies on time-locked brain responses to speech, known as neural tracking, within a deep learning framework. We modeled electroencephalography (EEG) responses to acoustic, segmentation, and linguistic speech representations of a story using convolutional neural networks trained on a large sample of healthy participants, serving as a model for intact neural tracking of speech. Subsequently, we evaluated our models on an independent sample comprising 26 individuals with aphasia (IWA) and 22 healthy controls. Our results reveal decreased tracking of all speech representations in IWA. Utilizing a support vector machine classifier with neural tracking measures as input, we demonstrate high accuracy in aphasia detection at the individual level (85.42\%) in a time-efficient manner (requiring 9 minutes of EEG data). Given its high robustness, time efficiency, and generalizability to unseen data, our approach holds significant promise for clinical applications. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Shared first authors: De Clercq & Puffay

arXiv:2401.06451 [pdf, ps, other]

doi 10.1007/978-3-031-63501-4_7

A Logic for Repair and State Recovery in Byzantine Fault-tolerant Multi-agent Systems

Authors: Hans van Ditmarsch, Krisztina Fruzsa, Roman Kuznets, Ulrich Schmid

Abstract: We provide an epistemic logical language and semantics for the modeling and analysis of byzantine fault-tolerant multi-agent systems. This not only facilitates reasoning about the agents' fault status but also supports model updates for implementing repair and state recovery. For each agent, besides the standard knowledge modality our logic provides an additional modality called hope, which is cap… ▽ More We provide an epistemic logical language and semantics for the modeling and analysis of byzantine fault-tolerant multi-agent systems. This not only facilitates reasoning about the agents' fault status but also supports model updates for implementing repair and state recovery. For each agent, besides the standard knowledge modality our logic provides an additional modality called hope, which is capable of expressing that the agent is correct (not faulty), and also dynamic modalities enabling change of the agents' correctness status. These dynamic modalities are interpreted as model updates that come in three flavours: fully public, more private, or involving factual change. We provide complete axiomatizations for all these variants in the form of reduction systems: formulas with dynamic modalities are equivalent to formulas without. Therefore, they have the same expressivity as the logic of knowledge and hope. Multiple examples are provided to demonstrate the utility and flexibility of our logic for modeling a wide range of repair and state recovery techniques that have been implemented in the context of fault-detection, isolation, and recovery (FDIR) approaches in fault-tolerant distributed computing with byzantine agents. △ Less

Submitted 27 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: Extended preprint

Journal ref: Proceedings of IJCAR 2024, LNAI v. 14740 (2024), pp. 114-134

arXiv:2401.05427 [pdf, other]

Slide FFT on a homogeneous mesh in wafer-scale computing

Authors: Maurice H. P. M. van Putten, Leighton Wilson, Adam W. Lavely, Mark Hair

Abstract: Searches for signals at low signal-to-noise ratios frequently involve the Fast Fourier Transform (FFT). For high-throughput searches, we here consider FFT on the homogeneous mesh of Processing Elements (PEs) of a wafer-scale engine (WSE). To minimize memory overhead in the inherently non-local FFT algorithm, we introduce a new synchronous slide operation ({\em Slide}) exploiting the fast interconn… ▽ More Searches for signals at low signal-to-noise ratios frequently involve the Fast Fourier Transform (FFT). For high-throughput searches, we here consider FFT on the homogeneous mesh of Processing Elements (PEs) of a wafer-scale engine (WSE). To minimize memory overhead in the inherently non-local FFT algorithm, we introduce a new synchronous slide operation ({\em Slide}) exploiting the fast interconnect between adjacent PEs. Feasibility of compute-limited performance is demonstrated in linear scaling of Slide execution times with varying array size in preliminary benchmarks on the CS-2 WSE. The proposed implementation appears opportune to accelerate and open the full discovery potential of FFT-based signal processing in multi-messenger astronomy. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 7 pages, 6 figures

arXiv:2401.00605 [pdf, other]

Distributed Multi-Object Tracking Under Limited Field of View Heterogeneous Sensors with Density Clustering

Authors: Fei Chen, Hoa Van Nguyen, Alex S. Leong, Sabita Panicker, Robin Baker, Damith C. Ranasinghe

Abstract: We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracki… ▽ More We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracking algorithm involves solving an NP-hard multidimensional assignment problem either optimally for small-size problems or sub-optimally for general practical problems. For general problems, we propose an efficient distributed multi-object tracking algorithm that performs track-to-track fusion using a clustering-based analysis of the state space transformed into a density space to mitigate the complexity of the assignment problem. The proposed algorithm can more efficiently group local track estimates for fusion than existing approaches. To ensure we achieve globally consistent identities for tracks across a network of nodes as objects move between FoVs, we develop a graph-based algorithm to achieve label consensus and minimise track segmentation. Numerical experiments with a synthetic and a real-world trajectory dataset demonstrate that our proposed method is significantly more computationally efficient than state-of-the-art solutions, achieving similar tracking accuracy and bandwidth requirements but with improved label consistency. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.07261 [pdf, other]

Relocating thermal stimuli to the proximal phalanx may not affect vibrotactile sensitivity on the fingertip

Authors: Huibert A. J. van Riessen, Yasemin Vardar

Abstract: Wearable devices that relocate tactile feedback from fingertips can enable users to interact with their physical world augmented by virtual effects. While studies have shown that relocating same-modality tactile stimuli can influence the one perceived at the fingertip, the interaction of cross-modal tactile stimuli remains unclear. Here, we investigate how thermal cues applied on the index finger'… ▽ More Wearable devices that relocate tactile feedback from fingertips can enable users to interact with their physical world augmented by virtual effects. While studies have shown that relocating same-modality tactile stimuli can influence the one perceived at the fingertip, the interaction of cross-modal tactile stimuli remains unclear. Here, we investigate how thermal cues applied on the index finger's proximal phalanx affect vibrotactile sensitivity at the fingertip of the same finger when employed at varying contact pressures. We designed a novel wearable device that can deliver thermal stimuli at adjustable contact pressures on the proximal phalanx. Utilizing this device, we measured the detection thresholds of fifteen participants for 250 Hz sinusoidal vibration applied on the fingertip while concurrently applying constant cold and warm stimuli at high and low contact pressures to the proximal phalanx. Our results revealed no significant differences in detection thresholds across conditions. These preliminary findings suggest that applying constant thermal stimuli to other skin locations does not affect fingertip vibrotactile sensitivity, possibly due to perceptual adaptation. However, the influence of dynamic multisensory tactile stimuli remains an open question for future research. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 6 pages, 5 figures, conference

arXiv:2312.01072 [pdf, other]

A Survey of Temporal Credit Assignment in Deep Reinforcement Learning

Authors: Eduardo Pignatelli, Johan Ferret, Matthieu Geist, Thomas Mesnard, Hado van Hasselt, Laura Toni

Abstract: The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the real world since most decision problems provide feedback that is noisy, delayed, and with little or no information about the causes. These conditions ma… ▽ More The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the real world since most decision problems provide feedback that is noisy, delayed, and with little or no information about the causes. These conditions make it hard to distinguish serendipitous outcomes from those caused by informed decision-making. However, the mathematical nature of credit and the CAP remains poorly understood and defined. In this survey, we review the state of the art of Temporal Credit Assignment (CA) in deep RL. We propose a unifying formalism for credit that enables equitable comparisons of state of the art algorithms and improves our understanding of the trade-offs between the various methods. We cast the CAP as the problem of learning the influence of an action over an outcome from a finite amount of experience. We discuss the challenges posed by delayed effects, transpositions, and a lack of action influence, and analyse how existing methods aim to address them. Finally, we survey the protocols to evaluate a credit assignment method, and suggest ways to diagnoses the sources of struggle for different credit assignment methods. Overall, this survey provides an overview of the field for new-entry practitioners and researchers, it offers a coherent perspective for scholars looking to expedite the starting stages of a new study on the CAP, and it suggests potential directions for future research △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 56 pages, 2 figures, 4 tables

arXiv:2311.17256 [pdf, other]

Pattern retrieval of traffic congestion using graph-based associations of traffic domain-specific features

Authors: Tin T. Nguyen, Simeon C. Calvert, Guopeng Li, Hans van Lint

Abstract: The fast-growing amount of traffic data brings many opportunities for revealing more insightful information about traffic dynamics. However, it also demands an effective database management system in which information retrieval is arguably an important feature. The ability to locate similar patterns in big datasets potentially paves the way for further valuable analyses in traffic management. This… ▽ More The fast-growing amount of traffic data brings many opportunities for revealing more insightful information about traffic dynamics. However, it also demands an effective database management system in which information retrieval is arguably an important feature. The ability to locate similar patterns in big datasets potentially paves the way for further valuable analyses in traffic management. This paper proposes a content-based retrieval system for spatiotemporal patterns of highway traffic congestion. There are two main components in our framework, namely pattern representation and similarity measurement. To effectively interpret retrieval outcomes, the paper proposes a graph-based approach (relation-graph) for the former component, in which fundamental traffic phenomena are encoded as nodes and their spatiotemporal relationships as edges. In the latter component, the similarities between congestion patterns are customizable with various aspects according to user expectations. We evaluated the proposed framework by applying it to a dataset of hundreds of patterns with various complexities (temporally and spatially). The example queries indicate the effectiveness of the proposed method, i.e. the obtained patterns present similar traffic phenomena as in the given examples. In addition, the success of the proposed approach directly derives a new opportunity for semantic retrieval, in which expected patterns are described by adopting the relation-graph notion to associate fundamental traffic phenomena. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 20 pages, 14 figures

arXiv:2311.06563 [pdf, ps, other]

All instances of MONOTONE 3-SAT-(3,1) are satisfiable

Authors: Hannah Van Santvliet, Ronald de Haan

Abstract: The satisfiability problem is NP-complete but there are subclasses where all the instances are satisfiable. For this, restrictions on the shape of the formula are made. Darman and Döcker show that the subclass MONOTONE $3$-SAT-($k$,1) with $k \geq 5$ proves to be NP-complete and pose the open question whether instances of MONOTONE $3$-SAT-(3,1) are satisfiable. This paper shows that all instances… ▽ More The satisfiability problem is NP-complete but there are subclasses where all the instances are satisfiable. For this, restrictions on the shape of the formula are made. Darman and Döcker show that the subclass MONOTONE $3$-SAT-($k$,1) with $k \geq 5$ proves to be NP-complete and pose the open question whether instances of MONOTONE $3$-SAT-(3,1) are satisfiable. This paper shows that all instances of MONOTONE $3$-SAT-(3,1) are satisfiable using the new concept of a color-structures. △ Less

Submitted 8 December, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: 14 pages, 10 figures

arXiv:2311.02129 [pdf, other]

Hierarchical Reinforcement Learning for Power Network Topology Control

Authors: Blazej Manczak, Jan Viebahn, Herke van Hoof

Abstract: Learning in high-dimensional action spaces is a key challenge in applying reinforcement learning (RL) to real-world systems. In this paper, we study the possibility of controlling power networks using RL methods. Power networks are critical infrastructures that are complex to control. In particular, the combinatorial nature of the action space poses a challenge to both conventional optimizers and… ▽ More Learning in high-dimensional action spaces is a key challenge in applying reinforcement learning (RL) to real-world systems. In this paper, we study the possibility of controlling power networks using RL methods. Power networks are critical infrastructures that are complex to control. In particular, the combinatorial nature of the action space poses a challenge to both conventional optimizers and learned controllers. Hierarchical reinforcement learning (HRL) represents one approach to address this challenge. More precisely, a HRL framework for power network topology control is proposed. The HRL framework consists of three levels of action abstraction. At the highest level, there is the overall long-term task of power network operation, namely, kee** the power grid state within security constraints at all times, which is decomposed into two temporally extended actions: 'do nothing' versus 'propose a topology change'. At the intermediate level, the action space consists of all controllable substations. Finally, at the lowest level, the action space consists of all configurations of the chosen substation. By employing this HRL framework, several hierarchical power network agents are trained for the IEEE 14-bus network. Whereas at the highest level a purely rule-based policy is still chosen for all agents in this study, at the intermediate level the policy is trained using different state-of-the-art RL algorithms. At the lowest level, either an RL algorithm or a greedy algorithm is used. The performance of the different 3-level agents is compared with standard baseline (RL or greedy) approaches. A key finding is that the 3-level agent that employs RL both at the intermediate and the lowest level outperforms all other agents on the most difficult task. Our code is publicly available. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.01843 [pdf, other]

Adaptive Assistance with an Active and Soft Back-Support Exosuit to Unknown External Loads via Model-Based Estimates of Internal Lumbosacral Moments

Authors: Alejandro Moya-Esteban, Saivimal Sridar, Mohamed Irfan Mohamed Refai, Herman van der Kooij, Massimo Sartori

Abstract: State of the art controllers for back exoskeletons largely rely on body kinematics. This results in control strategies which cannot provide adaptive support under unknown external loads. We developed a neuromechanical model-based controller (NMBC) for a soft back exosuit, wherein assistive forces were proportional to the active component of lumbosacral joint moments, derived from real-time electro… ▽ More State of the art controllers for back exoskeletons largely rely on body kinematics. This results in control strategies which cannot provide adaptive support under unknown external loads. We developed a neuromechanical model-based controller (NMBC) for a soft back exosuit, wherein assistive forces were proportional to the active component of lumbosacral joint moments, derived from real-time electromyography-driven models. The exosuit provided adaptive assistance forces with no a priori information on the external loading conditions. Across 10 participants, who stoop-lifted 5 and 15 kg boxes, our NMBC was compared to a non-adaptive virtual spring-based control(VSBC), in which exosuit forces were proportional to trunk inclination. Peak cable assistive forces were modulated across weight conditions for NMBC (5kg: 2.13 N/kg; 15kg: 2.82 N/kg) but not for VSBC (5kg: 1.92 N/kg; 15kg: 2.00 N/kg). The proposed NMBC strategy resulted in larger reduction of cumulative compression forces for 5 kg (NMBC: 18.2%; VSBC: 10.7%) and 15 kg conditions (NMBC: 21.3%; VSBC: 10.2%). Our proposed methodology may facilitate the adoption of non-hindering wearable robotics in real-life scenarios. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 17 pages, 8 figures

arXiv:2310.20384 [pdf, other]

Do large language models solve verbal analogies like children do?

Authors: Claire E. Stevenson, Mathilde ter Veen, Rochelle Choenni, Han L. J. van der Maas, Ekaterina Shutova

Abstract: Analogy-making lies at the heart of human cognition. Adults solve analogies such as \textit{Horse belongs to stable like chicken belongs to ...?} by map** relations (\textit{kept in}) and answering \textit{chicken coop}. In contrast, children often use association, e.g., answering \textit{egg}. This paper investigates whether large language models (LLMs) solve verbal analogies in A:B::C:? form u… ▽ More Analogy-making lies at the heart of human cognition. Adults solve analogies such as \textit{Horse belongs to stable like chicken belongs to ...?} by map** relations (\textit{kept in}) and answering \textit{chicken coop}. In contrast, children often use association, e.g., answering \textit{egg}. This paper investigates whether large language models (LLMs) solve verbal analogies in A:B::C:? form using associations, similar to what children do. We use verbal analogies extracted from an online adaptive learning environment, where 14,002 7-12 year-olds from the Netherlands solved 622 analogies in Dutch. The six tested Dutch monolingual and multilingual LLMs performed around the same level as children, with MGPT performing worst, around the 7-year-old level, and XLM-V and GPT-3 the best, slightly above the 11-year-old level. However, when we control for associative processes this picture changes and each model's performance level drops 1-2 years. Further experiments demonstrate that associative processes often underlie correctly solved analogies. We conclude that the LLMs we tested indeed tend to solve verbal analogies by association with C like children do. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.14506 [pdf, other]

Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning

Authors: Ji Youn Lee, Changbeom Shim, Hoa Van Nguyen, Tran Thien Dat Nguyen, Hyun** Choi, Youngho Kim

Abstract: Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the i… ▽ More Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the intersection of predicted measurements. Several geometry approaches have been used for label grou** since finding all intersected label pairs is clearly infeasible for large-scale tracking problems. This paper proposes an efficient implementation of label grou** for label-partitioned generalized labeled multi-Bernoulli filter framework using a secondary partitioning technique. This allows for parallel computation in the label graph indexing step, avoiding generating and eliminating duplicate comparisons. Additionally, we compare the performance of the proposed technique with several efficient spatial searching algorithms. The results demonstrate the superior performance of the proposed approach on large-scale data sets, enabling scalable trajectory estimation. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 6 pages, 4 figures

arXiv:2310.13083 [pdf, other]

How Can Everyday Users Efficiently Teach Robots by Demonstrations?

Authors: Maram Sakr, Zhikai Zhang, Benjamin Li, Haomiao Zhang, H. F. Machiel Van der Loos, Dana Kulic, Elizabeth Croft

Abstract: Learning from Demonstration (LfD) is a framework that allows lay users to easily program robots. However, the efficiency of robot learning and the robot's ability to generalize to task variations hinges upon the quality and quantity of the provided demonstrations. Our objective is to guide human teachers to furnish more effective demonstrations, thus facilitating efficient robot learning. To achie… ▽ More Learning from Demonstration (LfD) is a framework that allows lay users to easily program robots. However, the efficiency of robot learning and the robot's ability to generalize to task variations hinges upon the quality and quantity of the provided demonstrations. Our objective is to guide human teachers to furnish more effective demonstrations, thus facilitating efficient robot learning. To achieve this, we propose to use a measure of uncertainty, namely task-related information entropy, as a criterion for suggesting informative demonstration examples to human teachers to improve their teaching skills. In a conducted experiment (N=24), an augmented reality (AR)-based guidance system was employed to train novice users to produce additional demonstrations from areas with the highest entropy within the workspace. These novice users were trained for a few trials to teach the robot a generalizable task using a limited number of demonstrations. Subsequently, the users' performance after training was assessed first on the same task (retention) and then on a novel task (transfer) without guidance. The results indicated a substantial improvement in robot learning efficiency from the teacher's demonstrations, with an improvement of up to 198% observed on the novel task. Furthermore, the proposed approach was compared to a state-of-the-art heuristic rule and found to improve robot learning efficiency by 210% compared to the heuristic rule. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.03398 [pdf, other]

Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein

Authors: Hugues Van Assel, Cédric Vincent-Cuaz, Titouan Vayer, Rémi Flamary, Nicolas Courty

Abstract: We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR model… ▽ More We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.02925 [pdf, other]

Optimal Transport with Adaptive Regularisation

Authors: Hugues Van Assel, Titouan Vayer, Remi Flamary, Nicolas Courty

Abstract: Regularising the primal formulation of optimal transport (OT) with a strictly convex term leads to enhanced numerical complexity and a denser transport plan. Many formulations impose a global constraint on the transport plan, for instance by relying on entropic regularisation. As it is more expensive to diffuse mass for outlier points compared to central ones, this typically results in a significa… ▽ More Regularising the primal formulation of optimal transport (OT) with a strictly convex term leads to enhanced numerical complexity and a denser transport plan. Many formulations impose a global constraint on the transport plan, for instance by relying on entropic regularisation. As it is more expensive to diffuse mass for outlier points compared to central ones, this typically results in a significant imbalance in the way mass is spread across the points. This can be detrimental for some applications where a minimum of smoothing is required per point. To remedy this, we introduce OT with Adaptive RegularIsation (OTARI), a new formulation of OT that imposes constraints on the mass going in or/and out of each point. We then showcase the benefits of this approach for domain adaptation. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.02570 [pdf, other]

Improving severity preservation of healthy-to-pathological voice conversion with global style tokens

Authors: Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda

Abstract: In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2)… ▽ More In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2) by using phonetic posteriorgrams (PPG) and global style tokens (GST). Furthermore, we present a new dataset that contains parallel recordings of pathological and healthy speakers with the same identity which allows more precise evaluation. Listening tests by expert listeners show that the framework preserves severity of the source sample, while modelling target speaker's voice. We also show that (a) pathology impacts x-vectors but not all speaker information is lost, (b) choosing source speakers based on severity labels alone is insufficient. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 7 pages, 3 figures, 5 tables. Accepted to IEEE Automatic Speech Recognition and Understanding Workshop 2023

ACM Class: I.2.7

arXiv:2310.01921 [pdf, other]

Characterizing the Inter-Core Qubit Traffic in Large-Scale Quantum Modular Architectures

Authors: Sahar Ben Rached, Isaac Lopez Agudo, Santiago Rodrigo, Medina Bandic, Sebastian Feld, Hans van Someren, Eduard Alarcón, Carmen G. Almudéver, Sergi Abadal

Abstract: Modular quantum processor architectures are envisioned as a promising solution for the scalability of quantum computing systems beyond the Noisy Intermediate Scale Quantum (NISQ) devices era. Based upon interconnecting tens to hundreds of quantum cores via a quantum intranet, this approach unravels the pressing limitations of densely qubit-packed monolithic processors, mainly by mitigating the req… ▽ More Modular quantum processor architectures are envisioned as a promising solution for the scalability of quantum computing systems beyond the Noisy Intermediate Scale Quantum (NISQ) devices era. Based upon interconnecting tens to hundreds of quantum cores via a quantum intranet, this approach unravels the pressing limitations of densely qubit-packed monolithic processors, mainly by mitigating the requirements of qubit control and enhancing qubit isolation, and therefore enables executing large-scale algorithms on quantum computers. In order to optimize such architectures, it is crucial to analyze the quantum state transfers occurring via inter-core communication networks, referred to as inter-core qubit traffic. This would also provide insights to improve the software and hardware stack for multi-core quantum computers. To this aim, we present a pioneering characterization of the spatio-temporal inter-core qubit traffic in large-scale circuits. The programs are executed on an all-to-all connected multi-core architecture that supports up to around 1000 qubits. We characterize the qubit traffic based on multiple performance metrics to assess the computational process and the communication overhead. Based on the showcased results, we conclude on the scalability of the presented algorithms, provide a set of guidelines to improve map** quantum circuits to multi-core processors, and lay the foundations of benchmarking large-scale multi-core architectures. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 25 pages, 9 figures

arXiv:2310.00989 [pdf, ps, other]

doi 10.4204/EPTCS.390.4

On Two- and Three-valued Semantics for Impure Simplicial Complexes

Authors: Hans van Ditmarsch, Roman Kuznets, Rojo Randrianomentsoa

Abstract: Simplicial complexes are a convenient semantic primitive to reason about processes (agents) communicating with each other in synchronous and asynchronous computation. Impure simplicial complexes distinguish active processes from crashed ones, in other words, agents that are alive from agents that are dead. In order to rule out that dead agents reason about themselves and about other agents, three-… ▽ More Simplicial complexes are a convenient semantic primitive to reason about processes (agents) communicating with each other in synchronous and asynchronous computation. Impure simplicial complexes distinguish active processes from crashed ones, in other words, agents that are alive from agents that are dead. In order to rule out that dead agents reason about themselves and about other agents, three-valued epistemic semantics have been proposed where, in addition to the usual values true and false, the third value stands for undefined: the knowledge of dead agents is undefined and so are the propositional variables describing their local state. Other semantics for impure complexes are two-valued where a dead agent knows everything. Different choices in designing a semantics produce different three-valued semantics, and also different two-valued semantics. In this work, we categorize the available choices by discounting the bad ones, identifying the equivalent ones, and connecting the non-equivalent ones via a translation. The main result of the paper is identifying the main relevant distinction to be the number of truth values and bridging this difference by means of a novel embedding from three- into two-valued semantics. This translation also enables us to highlight quite fundamental modeling differences underpinning various two- and three-valued approaches in this area of combinatorial topology. In particular, pure complexes can be defined as those invariant under the translation. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: In Proceedings GandALF 2023, arXiv:2309.17318

Journal ref: EPTCS 390, 2023, pp. 50-66

arXiv:2310.00229 [pdf, other]

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Authors: Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

Abstract: Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies… ▽ More Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods. △ Less

Submitted 16 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: ICLR 2024 Camera-Ready

arXiv:2309.14092 [pdf, other]

From OCEL to DOCEL -- Datasets and Automated Transformation

Authors: Alexandre Goossens, Adrian Rebmann, Johannes De Smedt, Jan Vanthienen, Han van der Aa

Abstract: Object-centric event data represent processes from the point of view of all the involved object types. This perspective has gained interest in recent years as it supports the analysis of processes that previously could not be adequately captured, due to the lack of a clear case notion as well as an increasing amount of output data that needs to be stored. Although publicly available event logs are… ▽ More Object-centric event data represent processes from the point of view of all the involved object types. This perspective has gained interest in recent years as it supports the analysis of processes that previously could not be adequately captured, due to the lack of a clear case notion as well as an increasing amount of output data that needs to be stored. Although publicly available event logs are crucial artifacts for researchers to develop and evaluate novel process mining techniques, the currently available object-centric event logs have limitations in this regard. Specifically, they mainly focus on control-flow and rarely contain objects with attributes that change over time, even though this is not realistic, as the attribute values of objects can be altered during their lifecycle. This paper addresses this gap by providing two means of establishing object-centric datasets with dynamically evolving attributes. First, we provide event log generators, which allow researchers to generate customized, artificial logs with dynamic attributes in the recently proposed DOCEL format. Second, we propose and evaluate an algorithm to convert OCEL logs into DOCEL logs, which involves the detection of event attributes that capture evolving object information and the creation of dynamic attributes from these. Through these contributions, this paper supports the advancement of object-centric process analysis by providing researchers with new means to obtain relevant data to use during the development of new techniques. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.13994 [pdf, other]

Unsupervised Accent Adaptation Through Masked Language Model Correction Of Discrete Self-Supervised Speech Units

Authors: Jakob Poncelet, Hugo Van hamme

Abstract: Self-supervised pre-trained speech models have strongly improved speech recognition, yet they are still sensitive to domain shifts and accented or atypical speech. Many of these models rely on quantisation or clustering to learn discrete acoustic units. We propose to correct the discovered discrete units for accented speech back to a standard pronunciation in an unsupervised manner. A masked langu… ▽ More Self-supervised pre-trained speech models have strongly improved speech recognition, yet they are still sensitive to domain shifts and accented or atypical speech. Many of these models rely on quantisation or clustering to learn discrete acoustic units. We propose to correct the discovered discrete units for accented speech back to a standard pronunciation in an unsupervised manner. A masked language model is trained on discrete units from a standard accent and iteratively corrects an accented token sequence by masking unexpected cluster sequences and predicting their common variant. Small accent adapter blocks are inserted in the pre-trained model and fine-tuned by predicting the corrected clusters, which leads to an increased robustness of the pre-trained model towards a target accent, and this without supervision. We are able to improve a state-of-the-art HuBERT Large model on a downstream accented speech recognition task by altering the training regime with the proposed method. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP2024

arXiv:2309.05477 [pdf, other]

Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes

Authors: Tim Bakker, Herke van Hoof, Max Welling

Abstract: Pool-based active learning (AL) is a promising technology for increasing data-efficiency of machine learning models. However, surveys show that performance of recent AL methods is very sensitive to the choice of dataset and training setting, making them unsuitable for general application. In order to tackle this problem, the field Learning Active Learning (LAL) suggests to learn the active learnin… ▽ More Pool-based active learning (AL) is a promising technology for increasing data-efficiency of machine learning models. However, surveys show that performance of recent AL methods is very sensitive to the choice of dataset and training setting, making them unsuitable for general application. In order to tackle this problem, the field Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting. In this work, we propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem with an Attentive Conditional Neural Process model. Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives, such as those that do not equally weight the error on all data points. We experimentally verify that our Neural Process model outperforms a variety of baselines in these settings. Finally, our experiments show that our model exhibits a tendency towards improved stability to changing datasets. However, performance is sensitive to choice of classifier and more work is necessary to reduce the performance the gap with the myopic oracle and to improve scalability. We present our work as a proof-of-concept for LAL on nonstandard objectives and hope our analysis and modelling considerations inspire future LAL work. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted at ECML 2023

arXiv:2309.04607 [pdf]

Linking Symptom Inventories using Semantic Textual Similarity

Authors: Eamonn Kennedy, Shashank Vadlamani, Hannah M Lindsey, Kelly S Peterson, Kristen Dams OConnor, Kenton Murray, Ronak Agarwal, Houshang H Amiri, Raeda K Andersen, Talin Babikian, David A Baron, Erin D Bigler, Karen Caeyenberghs, Lisa Delano-Wood, Seth G Disner, Ekaterina Dobryakova, Blessen C Eapen, Rachel M Edelstein, Carrie Esopenko, Helen M Genova, Elbert Geuze, Naomi J Goodrich-Hunsaker, Jordan Grafman, Asta K Haberg, Cooper B Hodges , et al. (57 additional authors not shown)

Abstract: An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores… ▽ More An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.00900 [pdf, other]

Large Process Models: Business Process Management in the Age of Generative AI

Authors: Timotheus Kampik, Christian Warmuth, Adrian Rebmann, Ron Agam, Lukas N. P. Egger, Andreas Gerber, Johannes Hoffart, Jonas Kolk, Philipp Herzig, Gero Decker, Han van der Aa, Artem Polyvyanyy, Stefanie Rinderle-Ma, Ingo Weber, Matthias Weidlich

Abstract: The continued success of Large Language Models (LLMs) and other generative artificial intelligence approaches highlights the advantages that large information corpora can have over rigidly defined symbolic models, but also serves as a proof-point of the challenges that purely statistics-based approaches have in terms of safety and trustworthiness. As a framework for contextualizing the potential,… ▽ More The continued success of Large Language Models (LLMs) and other generative artificial intelligence approaches highlights the advantages that large information corpora can have over rigidly defined symbolic models, but also serves as a proof-point of the challenges that purely statistics-based approaches have in terms of safety and trustworthiness. As a framework for contextualizing the potential, as well as the limitations of LLMs and other foundation model-based technologies, we propose the concept of a Large Process Model (LPM) that combines the correlation power of LLMs with the analytical precision and reliability of knowledge-based systems and automated reasoning approaches. LPMs are envisioned to directly utilize the wealth of process management experience that experts have accumulated, as well as process performance data of organizations with diverse characteristics, e.g., regarding size, region, or industry. In this vision, the proposed LPM would allow organizations to receive context-specific (tailored) process and other business models, analytical deep-dives, and improvement recommendations. As such, they would allow to substantially decrease the time and effort required for business transformation, while also allowing for deeper, more impactful, and more actionable insights than previously possible. We argue that implementing an LPM is feasible, but also highlight limitations and research challenges that need to be solved to implement particular aspects of the LPM vision. △ Less

Submitted 11 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

Showing 1–50 of 503 results for author: Van, H