-
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks
Authors:
Adrian Rebmann,
Fabian David Schmidt,
Goran Glavaš,
Han van der Aa
Abstract:
The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit…
▽ More
The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages
Authors:
Fabian David Schmidt,
Philipp Borchert,
Ivan Vulić,
Goran Glavaš
Abstract:
LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resu…
▽ More
LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resulting in strong translation performance even for low-resource languages. MT encoders, however, lack the knowledge necessary for comprehensive NLU that LLMs obtain through language modeling training on immense corpora. In this work, we get the best both worlds by integrating MT encoders directly into LLM backbones via sample-efficient self-distillation. The resulting MT-LLMs preserve the inherent multilingual representational alignment from the MT encoder, allowing lower-resource languages to tap into the rich knowledge embedded in English-centric LLMs. Merging the MT encoder and LLM in a single model, we mitigate the propagation of translation errors and inference overhead of MT decoding inherent to discrete translation-based cross-lingual transfer (e.g., translate-test). Evaluation spanning three prominent NLU tasks and 127 predominantly low-resource languages renders MT-LLMs highly effective in cross-lingual transfer. MT-LLMs substantially and consistently outperform translate-test based on the same MT model, showing that we truly unlock multilingual language understanding for LLMs.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation
Authors:
Andreea Iana,
Fabian David Schmidt,
Goran Glavaš,
Heiko Paulheim
Abstract:
Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbon…
▽ More
Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbone LM of a neural recommender on task-specific data is computationally expensive and infeasible in few-shot recommendation and cold-start setups, where data is scarce or completely unavailable. In this work, we propose a news-adapted sentence encoder (NaSE), domain-specialized from a pretrained massively multilingual sentence encoder (SE). To this end, we construct and leverage PolyNews and PolyNewsParallel, two multilingual news-specific corpora. With the news-adapted multilingual SE in place, we test the effectiveness of (i.e., question the need for) supervised fine-tuning for news recommendation, and propose a simple and strong baseline based on (i) frozen NaSE embeddings and (ii) late click-behavior fusion. We show that NaSE achieves state-of-the-art performance in ZS-XLT in true cold-start and few-shot news recommendation.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
Authors:
Minh Duc Bui,
Fabian David Schmidt,
Goran Glavaš,
Katharina von der Wense
Abstract:
Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that…
▽ More
Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that smaller models can close the gap to larger counterparts if trained on more data (i.e., processing more tokens)-and under a fixed computation budget, smaller models are able be process more data than larger models. We thus hypothesize that KD might, in fact, be suboptimal to pretraining from scratch for obtaining smaller LMs, when appropriately accounting for the compute budget. To test this, we compare pretraining from scratch against several KD strategies for masked language modeling (MLM) in a fair experimental setup, with respect to amount of computation as well as pretraining data. Downstream results on GLUE, however, do not confirm our hypothesis: while pretraining from scratch performs comparably to ordinary KD under a fixed computation budget, more sophisticated KD strategies, namely TinyBERT (Jiao et al., 2020) and MiniLM (Wang et al., 2023), outperform it by a notable margin. We further find that KD yields larger gains over pretraining from scratch when the data must be repeated under the fixed computation budget.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer
Authors:
Fabian David Schmidt,
Ivan Vulić,
Goran Glavaš
Abstract:
Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is…
▽ More
Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is unreliable: it picks model snapshots with suboptimal target-language performance. As a remedy, some work optimizes ZS-XLT by extensively tuning hyperparameters: the follow-up work then routinely struggles to replicate the original results. Other work searches over narrower hyperparameter grids, reporting substantially lower performance. In this work, we therefore propose an unsupervised evaluation protocol for ZS-XLT that decouples performance maximization from hyperparameter tuning. As a robust and more transparent alternative to extensive hyperparameter tuning, we propose to accumulatively average snapshots from different runs into a single model. We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER) and find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance. On the other hand, our accumulative run-by-run averaging of models trained with different hyperparameters boosts ZS-XLT performance and closely correlates with "oracle" ZS-XLT, i.e., model selection based on target-language validation performance.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
Authors:
Fabian David Schmidt,
Ivan Vulić,
Goran Glavaš
Abstract:
Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently…
▽ More
Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently evaluated at model checkpoints that generalize best to validation instances in the target languages. This effectively violates the main assumptions of "true" ZS-XLT and FS-XLT. Such XLT setups require robust methods that do not depend on labeled target language data for validation and model selection. In this work, aiming to improve the robustness of "true" ZS-XLT and FS-XLT, we propose a simple and effective method that averages different checkpoints (i.e., model snapshots) during task fine-tuning. We conduct exhaustive ZS-XLT and FS-XLT experiments across higher-level semantic tasks (NLI, extractive QA) and lower-level token classification tasks (NER, POS). The results indicate that averaging model checkpoints yields systematic and consistent performance gains across diverse target languages in all tasks. Importantly, it simultaneously substantially desensitizes XLT to varying hyperparameter choices in the absence of target language validation. We also show that checkpoint averaging benefits performance when further combined with run averaging (i.e., averaging the parameters of models fine-tuned over independent runs).
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Dust survival rates in clumps passing through the Cas A reverse shock -- II. The impact of magnetic fields
Authors:
Florian Kirchschlager,
Franziska D. Schmidt,
M. J. Barlow,
Ilse De Looze,
Nina S. Sartorio
Abstract:
Dust grains form in the clumpy ejecta of core-collapse supernovae where they are subject to the reverse shock, which is able to disrupt the clumps and destroy the grains. Important dust destruction processes include thermal and kinetic sputtering as well as fragmentation and grain vaporization. In the present study, we focus on the effect of magnetic fields on the destruction processes. We have pe…
▽ More
Dust grains form in the clumpy ejecta of core-collapse supernovae where they are subject to the reverse shock, which is able to disrupt the clumps and destroy the grains. Important dust destruction processes include thermal and kinetic sputtering as well as fragmentation and grain vaporization. In the present study, we focus on the effect of magnetic fields on the destruction processes. We have performed magneto-hydrodynamical simulations using AstroBEAR to model a shock wave interacting with an ejecta clump. The dust transport and destruction fractions are computed using our post-processing code Paperboats in which the acceleration of grains due to the magnetic field and a procedure that allows partial grain vaporization have been newly implemented. For the oxygen-rich supernova remnant Cassiopeia A we found a significantly lower dust survival rate when magnetic fields are aligned perpendicular to the shock direction compared to the non-magnetic case. For a parallel field alignment, the destruction is also enhanced but at a lower level. The survival fractions depend sensitively on the gas density contrast between the clump and the ambient medium and on the grain sizes. For a low-density contrast of $100$, e.g., $5\,$nm silicate grains are completely destroyed while the survival fraction of $1\,μ$m grains is $86\,$per cent. For a high-density contrast of $1000$, $95\,$per cent of the $5\,$nm grains survive while the survival fraction of $1\,μ$m grains is $26\,$per cent. Alternative clump sizes or dust materials (carbon) have non-negligible effects on the survival rate but have a lower impact compared to density contrast, magnetic field strength, and grain size.
△ Less
Submitted 16 February, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Silicate grain growth due to ion trap** in oxygen-rich supernova remnants like Cassiopeia A
Authors:
Florian Kirchschlager,
M. J. Barlow,
Franziska D. Schmidt
Abstract:
Core-collapse supernovae can condense large masses of dust post-explosion. However, sputtering and grain-grain collisions during the subsequent passage of the dust through the reverse shock can potentially destroy a significant fraction of the newly formed dust before it can reach the interstellar medium. Here we show that in oxygen-rich supernova remnants like Cassiopeia A the penetration and tra…
▽ More
Core-collapse supernovae can condense large masses of dust post-explosion. However, sputtering and grain-grain collisions during the subsequent passage of the dust through the reverse shock can potentially destroy a significant fraction of the newly formed dust before it can reach the interstellar medium. Here we show that in oxygen-rich supernova remnants like Cassiopeia A the penetration and trap** within silicate grains of the same im**ing ions of oxygen, silicon and magnesium that are responsible for grain surface sputtering can significantly reduce the net loss of grain material. We model conditions representative of dusty clumps (density contrast $χ=100$) passing through the reverse shock in the oxygen-rich Cassiopeia A remnant and find that, compared to cases where the effect is neglected, as well as facilitating the formation of grains larger than those that had originally condensed, ion trap** increases the surviving masses of silicate dust by factors of up to two to four, depending on initial grain radii. For higher density contrasts ($χ\gtrsim180$), we find that the effect of gas accretion on the surface of dust grains surpasses ion trap**, and the survival rate increases to ${\sim}55 \%$ of the initial dust mass for $χ=256$.
△ Less
Submitted 10 March, 2020; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Dust destruction by the reverse shock in the clumpy supernova remnant Cassiopeia A based on hydrodynamic simulations
Authors:
Florian Kirchschlager,
Franziska D. Schmidt,
M. J. Barlow,
Erica L. Fogerty,
Antonia Bevan,
Felix D. Priestley
Abstract:
Observations of the ejecta of core-collapse supernovae have shown that dust grains form in over-dense gas clumps in the expanding ejecta. The clumps are later subject to the passage of the reverse shock and a significant amount of the newly formed dust material can be destroyed due to the high temperatures and high velocities in the post-shock gas. To determine dust survival rates, we have perform…
▽ More
Observations of the ejecta of core-collapse supernovae have shown that dust grains form in over-dense gas clumps in the expanding ejecta. The clumps are later subject to the passage of the reverse shock and a significant amount of the newly formed dust material can be destroyed due to the high temperatures and high velocities in the post-shock gas. To determine dust survival rates, we have performed a set of hydrodynamic simulations using the grid-based code AstroBEAR in order to model a shock wave interacting with a clump of gas and dust. Afterwards, dust motions and dust destruction rates are computed using our newly developed external, post-processing code Paperboats, which includes gas and plasma drag, grain charging, kinematic and thermal sputtering as well as grain-grain collisions. We have determined dust survival rates for the oxygen-rich supernova remnant Cassiopeia A as a function of initial grain sizes, dust materials and clump gas densities.
△ Less
Submitted 19 September, 2019;
originally announced September 2019.
-
Dust survival rates in clumps passing through the Cas A reverse shock I: results for a range of clump densities
Authors:
Florian Kirchschlager,
Franziska D. Schmidt,
M. J. Barlow,
Erica L. Fogerty,
Antonia Bevan,
Felix D. Priestley
Abstract:
The reverse shock in the ejecta of core-collapse supernovae is potentially able to destroy newly formed dust material. In order to determine dust survival rates, we have performed a set of hydrodynamic simulations using the grid-based code AstroBEAR in order to model a shock wave interacting with clumpy supernova ejecta. Dust motions and destruction rates were computed using our newly developed ex…
▽ More
The reverse shock in the ejecta of core-collapse supernovae is potentially able to destroy newly formed dust material. In order to determine dust survival rates, we have performed a set of hydrodynamic simulations using the grid-based code AstroBEAR in order to model a shock wave interacting with clumpy supernova ejecta. Dust motions and destruction rates were computed using our newly developed external, post-processing code Paperboats, which includes gas drag, grain charging, sputtering and grain-grain collisions. We have determined dust destruction rates for the oxygen-rich supernova remnant Cassiopeia A as a function of initial grain sizes and clump gas density. We found that up to 30 % of the carbon dust mass is able to survive the passage of the reverse shock if the initial grain size distribution is narrow with radii around ~10 - 50 nm for high gas densities, or with radii around ~0.5 - 1.5 $μ$m for low and medium gas densities. Silicate grains with initial radii around 10 - 30 nm show survival rates of up to 40 % for medium and high density contrasts, while silicate material with micron sized distributions is mostly destroyed. For both materials, the surviving dust mass is rearranged into a new size distribution that can be approximated by two components: a power-law distribution of small grains and a log-normal distribution of grains having the same size range as the initial distribution. Our results show that grain-grain collisions and sputtering are synergistic and that grain-grain collisions can play a crucial role in determining the surviving dust budget in supernova remnants.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.