Search | arXiv e-print repository

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

Authors: Adrian Rebmann, Fabian David Schmidt, Goran Glavaš, Han van der Aa

Abstract: The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit… ▽ More The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Submitted to ICPM

arXiv:2406.12739 [pdf, other]

Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages

Authors: Fabian David Schmidt, Philipp Borchert, Ivan Vulić, Goran Glavaš

Abstract: LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resu… ▽ More LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resulting in strong translation performance even for low-resource languages. MT encoders, however, lack the knowledge necessary for comprehensive NLU that LLMs obtain through language modeling training on immense corpora. In this work, we get the best both worlds by integrating MT encoders directly into LLM backbones via sample-efficient self-distillation. The resulting MT-LLMs preserve the inherent multilingual representational alignment from the MT encoder, allowing lower-resource languages to tap into the rich knowledge embedded in English-centric LLMs. Merging the MT encoder and LLM in a single model, we mitigate the propagation of translation errors and inference overhead of MT decoding inherent to discrete translation-based cross-lingual transfer (e.g., translate-test). Evaluation spanning three prominent NLU tasks and 127 predominantly low-resource languages renders MT-LLMs highly effective in cross-lingual transfer. MT-LLMs substantially and consistently outperform translate-test based on the same MT model, showing that we truly unlock multilingual language understanding for LLMs. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12634 [pdf, other]

News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation

Authors: Andreea Iana, Fabian David Schmidt, Goran Glavaš, Heiko Paulheim

Abstract: Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbon… ▽ More Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbone LM of a neural recommender on task-specific data is computationally expensive and infeasible in few-shot recommendation and cold-start setups, where data is scarce or completely unavailable. In this work, we propose a news-adapted sentence encoder (NaSE), domain-specialized from a pretrained massively multilingual sentence encoder (SE). To this end, we construct and leverage PolyNews and PolyNewsParallel, two multilingual news-specific corpora. With the news-adapted multilingual SE in place, we test the effectiveness of (i.e., question the need for) supervised fine-tuning for news recommendation, and propose a simple and strong baseline based on (i) frozen NaSE embeddings and (ii) late click-behavior fusion. We show that NaSE achieves state-of-the-art performance in ZS-XLT in true cold-start and few-shot news recommendation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

ACM Class: I.2.7; H.3.3

arXiv:2404.19319 [pdf, other]

Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget

Authors: Minh Duc Bui, Fabian David Schmidt, Goran Glavaš, Katharina von der Wense

Abstract: Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that… ▽ More Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that smaller models can close the gap to larger counterparts if trained on more data (i.e., processing more tokens)-and under a fixed computation budget, smaller models are able be process more data than larger models. We thus hypothesize that KD might, in fact, be suboptimal to pretraining from scratch for obtaining smaller LMs, when appropriately accounting for the compute budget. To test this, we compare pretraining from scratch against several KD strategies for masked language modeling (MLM) in a fair experimental setup, with respect to amount of computation as well as pretraining data. Downstream results on GLUE, however, do not confirm our hypothesis: while pretraining from scratch performs comparably to ordinary KD under a fixed computation budget, more sophisticated KD strategies, namely TinyBERT (Jiao et al., 2020) and MiniLM (Wang et al., 2023), outperform it by a notable margin. We further find that KD yields larger gains over pretraining from scratch when the data must be repeated under the fixed computation budget. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Accepted to the 5th Workshop on Insights from Negative Results in NLP at NAACL 2024

arXiv:2310.10532 [pdf, other]

One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

Authors: Fabian David Schmidt, Ivan Vulić, Goran Glavaš

Abstract: Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is… ▽ More Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is unreliable: it picks model snapshots with suboptimal target-language performance. As a remedy, some work optimizes ZS-XLT by extensively tuning hyperparameters: the follow-up work then routinely struggles to replicate the original results. Other work searches over narrower hyperparameter grids, reporting substantially lower performance. In this work, we therefore propose an unsupervised evaluation protocol for ZS-XLT that decouples performance maximization from hyperparameter tuning. As a robust and more transparent alternative to extensive hyperparameter tuning, we propose to accumulatively average snapshots from different runs into a single model. We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER) and find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance. On the other hand, our accumulative run-by-run averaging of models trained with different hyperparameters boosts ZS-XLT performance and closely correlates with "oracle" ZS-XLT, i.e., model selection based on target-language validation performance. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted to findings of EMNLP 2023

arXiv:2305.16834 [pdf, other]

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

Authors: Fabian David Schmidt, Ivan Vulić, Goran Glavaš

Abstract: Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently… ▽ More Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently evaluated at model checkpoints that generalize best to validation instances in the target languages. This effectively violates the main assumptions of "true" ZS-XLT and FS-XLT. Such XLT setups require robust methods that do not depend on labeled target language data for validation and model selection. In this work, aiming to improve the robustness of "true" ZS-XLT and FS-XLT, we propose a simple and effective method that averages different checkpoints (i.e., model snapshots) during task fine-tuning. We conduct exhaustive ZS-XLT and FS-XLT experiments across higher-level semantic tasks (NLI, extractive QA) and lower-level token classification tasks (NER, POS). The results indicate that averaging model checkpoints yields systematic and consistent performance gains across diverse target languages in all tasks. Importantly, it simultaneously substantially desensitizes XLT to varying hyperparameter choices in the absence of target language validation. We also show that checkpoint averaging benefits performance when further combined with run averaging (i.e., averaging the parameters of models fine-tuned over independent runs). △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted To Appear In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics

arXiv:2210.06763 [pdf, other]

doi 10.1093/mnras/stad290

Dust survival rates in clumps passing through the Cas A reverse shock -- II. The impact of magnetic fields

Authors: Florian Kirchschlager, Franziska D. Schmidt, M. J. Barlow, Ilse De Looze, Nina S. Sartorio

Abstract: Dust grains form in the clumpy ejecta of core-collapse supernovae where they are subject to the reverse shock, which is able to disrupt the clumps and destroy the grains. Important dust destruction processes include thermal and kinetic sputtering as well as fragmentation and grain vaporization. In the present study, we focus on the effect of magnetic fields on the destruction processes. We have pe… ▽ More Dust grains form in the clumpy ejecta of core-collapse supernovae where they are subject to the reverse shock, which is able to disrupt the clumps and destroy the grains. Important dust destruction processes include thermal and kinetic sputtering as well as fragmentation and grain vaporization. In the present study, we focus on the effect of magnetic fields on the destruction processes. We have performed magneto-hydrodynamical simulations using AstroBEAR to model a shock wave interacting with an ejecta clump. The dust transport and destruction fractions are computed using our post-processing code Paperboats in which the acceleration of grains due to the magnetic field and a procedure that allows partial grain vaporization have been newly implemented. For the oxygen-rich supernova remnant Cassiopeia A we found a significantly lower dust survival rate when magnetic fields are aligned perpendicular to the shock direction compared to the non-magnetic case. For a parallel field alignment, the destruction is also enhanced but at a lower level. The survival fractions depend sensitively on the gas density contrast between the clump and the ambient medium and on the grain sizes. For a low-density contrast of $100$, e.g., $5\,$nm silicate grains are completely destroyed while the survival fraction of $1\,μ$m grains is $86\,$per cent. For a high-density contrast of $1000$, $95\,$per cent of the $5\,$nm grains survive while the survival fraction of $1\,μ$m grains is $26\,$per cent. Alternative clump sizes or dust materials (carbon) have non-negligible effects on the survival rate but have a lower impact compared to density contrast, magnetic field strength, and grain size. △ Less

Submitted 16 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted by MNRAS. Author accepted manuscript. Accepted on 23/01/2023. 24 pages, 21 Figures

arXiv:2003.03380 [pdf, other]

doi 10.3847/1538-4357/ab7db8

Silicate grain growth due to ion trap** in oxygen-rich supernova remnants like Cassiopeia A

Authors: Florian Kirchschlager, M. J. Barlow, Franziska D. Schmidt

Abstract: Core-collapse supernovae can condense large masses of dust post-explosion. However, sputtering and grain-grain collisions during the subsequent passage of the dust through the reverse shock can potentially destroy a significant fraction of the newly formed dust before it can reach the interstellar medium. Here we show that in oxygen-rich supernova remnants like Cassiopeia A the penetration and tra… ▽ More Core-collapse supernovae can condense large masses of dust post-explosion. However, sputtering and grain-grain collisions during the subsequent passage of the dust through the reverse shock can potentially destroy a significant fraction of the newly formed dust before it can reach the interstellar medium. Here we show that in oxygen-rich supernova remnants like Cassiopeia A the penetration and trap** within silicate grains of the same im**ing ions of oxygen, silicon and magnesium that are responsible for grain surface sputtering can significantly reduce the net loss of grain material. We model conditions representative of dusty clumps (density contrast $χ=100$) passing through the reverse shock in the oxygen-rich Cassiopeia A remnant and find that, compared to cases where the effect is neglected, as well as facilitating the formation of grains larger than those that had originally condensed, ion trap** increases the surviving masses of silicate dust by factors of up to two to four, depending on initial grain radii. For higher density contrasts ($χ\gtrsim180$), we find that the effect of gas accretion on the surface of dust grains surpasses ion trap**, and the survival rate increases to ${\sim}55 \%$ of the initial dust mass for $χ=256$. △ Less

Submitted 10 March, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: Accepted by ApJ. Author accepted manuscript. Accepted on 06/03/2020. Deposited on 06/03/2020. 11 pages

arXiv:1909.09068 [pdf, other]

Dust destruction by the reverse shock in the clumpy supernova remnant Cassiopeia A based on hydrodynamic simulations

Authors: Florian Kirchschlager, Franziska D. Schmidt, M. J. Barlow, Erica L. Fogerty, Antonia Bevan, Felix D. Priestley

Abstract: Observations of the ejecta of core-collapse supernovae have shown that dust grains form in over-dense gas clumps in the expanding ejecta. The clumps are later subject to the passage of the reverse shock and a significant amount of the newly formed dust material can be destroyed due to the high temperatures and high velocities in the post-shock gas. To determine dust survival rates, we have perform… ▽ More Observations of the ejecta of core-collapse supernovae have shown that dust grains form in over-dense gas clumps in the expanding ejecta. The clumps are later subject to the passage of the reverse shock and a significant amount of the newly formed dust material can be destroyed due to the high temperatures and high velocities in the post-shock gas. To determine dust survival rates, we have performed a set of hydrodynamic simulations using the grid-based code AstroBEAR in order to model a shock wave interacting with a clump of gas and dust. Afterwards, dust motions and dust destruction rates are computed using our newly developed external, post-processing code Paperboats, which includes gas and plasma drag, grain charging, kinematic and thermal sputtering as well as grain-grain collisions. We have determined dust survival rates for the oxygen-rich supernova remnant Cassiopeia A as a function of initial grain sizes, dust materials and clump gas densities. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: Conference proceeding

arXiv:1908.10875 [pdf, other]

doi 10.1093/mnras/stz2399

Dust survival rates in clumps passing through the Cas A reverse shock I: results for a range of clump densities

Authors: Florian Kirchschlager, Franziska D. Schmidt, M. J. Barlow, Erica L. Fogerty, Antonia Bevan, Felix D. Priestley

Abstract: The reverse shock in the ejecta of core-collapse supernovae is potentially able to destroy newly formed dust material. In order to determine dust survival rates, we have performed a set of hydrodynamic simulations using the grid-based code AstroBEAR in order to model a shock wave interacting with clumpy supernova ejecta. Dust motions and destruction rates were computed using our newly developed ex… ▽ More The reverse shock in the ejecta of core-collapse supernovae is potentially able to destroy newly formed dust material. In order to determine dust survival rates, we have performed a set of hydrodynamic simulations using the grid-based code AstroBEAR in order to model a shock wave interacting with clumpy supernova ejecta. Dust motions and destruction rates were computed using our newly developed external, post-processing code Paperboats, which includes gas drag, grain charging, sputtering and grain-grain collisions. We have determined dust destruction rates for the oxygen-rich supernova remnant Cassiopeia A as a function of initial grain sizes and clump gas density. We found that up to 30 % of the carbon dust mass is able to survive the passage of the reverse shock if the initial grain size distribution is narrow with radii around ~10 - 50 nm for high gas densities, or with radii around ~0.5 - 1.5 $μ$m for low and medium gas densities. Silicate grains with initial radii around 10 - 30 nm show survival rates of up to 40 % for medium and high density contrasts, while silicate material with micron sized distributions is mostly destroyed. For both materials, the surviving dust mass is rearranged into a new size distribution that can be approximated by two components: a power-law distribution of small grains and a log-normal distribution of grains having the same size range as the initial distribution. Our results show that grain-grain collisions and sputtering are synergistic and that grain-grain collisions can play a crucial role in determining the surviving dust budget in supernova remnants. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: Accepted by MNRAS. Author accepted manuscript. Accepted on 28/08/2019. Deposited on 28/08/19. 34 pages

Showing 1–10 of 10 results for author: Schmidt, F D