Search | arXiv e-print repository

arXiv:2407.02006 [pdf, other]

The three-loop single-mass heavy flavor corrections to deep-inelastic scattering

Authors: J. Ablinger, A. Behring, J. Blümlein, A. De Freitas, A. von Manteuffel, C. Schneider, K. Schoenwald

Abstract: We report on the status of the calculation of the massive Wilson coefficients and operator matrix elements for deep-inelastic scatterung to three-loop order. We discuss both the unpolarized and the polarized case, for which all the single-mass and nearly all two-mass contributions have been calculated. Numerical results on the structure function $F_2(x,Q^2)$ are presented. In the polarized case, w… ▽ More We report on the status of the calculation of the massive Wilson coefficients and operator matrix elements for deep-inelastic scatterung to three-loop order. We discuss both the unpolarized and the polarized case, for which all the single-mass and nearly all two-mass contributions have been calculated. Numerical results on the structure function $F_2(x,Q^2)$ are presented. In the polarized case, we work in the Larin scheme and refer to parton distribution functions in this scheme. Furthermore, results on the three-loop variable flavor number scheme are presented △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 12 pages

Report number: CERN-TH-2024-100, ZU-TH 31/24, RISC Report number 24-04, PoS (LL2024) 047, DESY-24-096,

arXiv:2406.18626 [pdf, other]

An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery

Authors: Oskar Wysocki, Magdalena Wysocka, Danilo Carvalho, Alex Teodor Bogatu, Danilo Miranda Gusicuma, Maxime Delmas, Harriet Unsworth, Andre Freitas

Abstract: We present BioLunar, developed using the Lunar framework, as a tool for supporting biological analyses, with a particular emphasis on molecular-level evidence enrichment for biomarker discovery in oncology. The platform integrates Large Language Models (LLMs) to facilitate complex scientific reasoning across distributed evidence spaces, enhancing the capability for harmonizing and reasoning over h… ▽ More We present BioLunar, developed using the Lunar framework, as a tool for supporting biological analyses, with a particular emphasis on molecular-level evidence enrichment for biomarker discovery in oncology. The platform integrates Large Language Models (LLMs) to facilitate complex scientific reasoning across distributed evidence spaces, enhancing the capability for harmonizing and reasoning over heterogeneous data sources. Demonstrating its utility in cancer research, BioLunar leverages modular design, reusable data access and data analysis components, and a low-code user interface, enabling researchers of all programming levels to construct LLM-enabled scientific workflows. By facilitating automatic scientific discovery and inference from heterogeneous evidence, BioLunar exemplifies the potential of the integration between LLMs, specialised databases and biomedical tools to support expert-level knowledge synthesis and discovery. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: accepted for ACL 2024 System Demonstration Track

arXiv:2406.17837 [pdf, other]

Transformer Normalisation Layers and the Independence of Semantic Subspaces

Authors: Stephen Menary, Samuel Kaski, Andre Freitas

Abstract: Recent works have shown that transformers can solve contextual reasoning tasks by internally executing computational graphs called circuits. Circuits often use attention to logically match information from subspaces of the representation, e.g. using position-in-sequence to identify the previous token. In this work, we consider a semantic subspace to be any independent subspace of the latent repres… ▽ More Recent works have shown that transformers can solve contextual reasoning tasks by internally executing computational graphs called circuits. Circuits often use attention to logically match information from subspaces of the representation, e.g. using position-in-sequence to identify the previous token. In this work, we consider a semantic subspace to be any independent subspace of the latent representation that can fully determine an attention distribution. We show that Pre-Norm, the placement of normalisation layer used by state-of-the-art transformers, violates this ability unless the model learns a strict representation structure of orthogonal spheres. This is because it causes linear subspaces to interfere through their common normalisation factor. Theoretically, we analyse circuit stability by modelling this interference as random noise on the $L_2$-norms of the query/key/value vectors, predicting a phenomenon of circuit collapse when sparse-attention shifts to a different token. Empirically, we investigate the sensitivity of real-world models trained for mathematical addition, observing a 1% rate of circuit collapse when the norms are artificially perturbed by $\lesssim$10%. We contrast Pre-Norm with QKV-Norm, which places normalisation after the attention head's linear operators. Theoretically this relaxes the representational constraints. Empirically we observe comparable in-distribution but worse out-of-distribution performance. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.14807 [pdf, other]

Multivariate extreme values for dynamical systems

Authors: Romain Aimino, Ana Cristina Moreira Freitas, Jorge Milhazes Freitas, Mike Todd

Abstract: We establish a theory for multivariate extreme value analysis of dynamical systems. Namely, we provide conditions adapted to the dynamical setting which enable the study of dependence between extreme values of the components of $\R^d$-valued observables evaluated along the orbits of the systems. We study this cross-sectional dependence, which results from the combination of a spatial and a tempora… ▽ More We establish a theory for multivariate extreme value analysis of dynamical systems. Namely, we provide conditions adapted to the dynamical setting which enable the study of dependence between extreme values of the components of $\R^d$-valued observables evaluated along the orbits of the systems. We study this cross-sectional dependence, which results from the combination of a spatial and a temporal dependence structures. We give several illustrative applications, where concrete systems and dependence sources are introduced and analysed. △ Less

Submitted 20 June, 2024; originally announced June 2024.

MSC Class: 37A50; 37A25; 37B20; 60G70; 62H05

arXiv:2406.09898 [pdf, other]

Positive-Unlabelled Learning for Identifying New Candidate Dietary Restriction-related Genes among Ageing-related Genes

Authors: Jorge Paz-Ruza, Alex A. Freitas, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas

Abstract: Dietary Restriction (DR) is one of the most popular anti-ageing interventions, prompting exhaustive research into genes associated with its mechanisms. Recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-rel… ▽ More Dietary Restriction (DR) is one of the most popular anti-ageing interventions, prompting exhaustive research into genes associated with its mechanisms. Recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-related) and negative (non-DR-related) examples, existing ML methods naively label genes without known DR relation as negative examples, assuming that lack of DR-related annotation for a gene represents evidence of absence of DR-relatedness, rather than absence of evidence; this hinders the reliability of the negative examples (non-DR-related genes) and the method's ability to identify novel DR-related genes. This work introduces a novel gene prioritization method based on the two-step Positive-Unlabelled (PU) Learning paradigm: using a similarity-based, KNN-inspired approach, our method first selects reliable negative examples among the genes without known DR associations. Then, these reliable negatives and all known positives are used to train a classifier that effectively differentiates DR-related and non-DR-related genes, which is finally employed to generate a more reliable ranking of promising genes for novel DR-relatedness. Our method significantly outperforms the existing state-of-the-art non-PU approach for DR-relatedness prediction in three relevant performance metrics. In addition, curation of existing literature finds support for the top-ranked candidate DR-related genes identified by our model. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.17723 [pdf, other]

TableDC: Deep Clustering for Tabular Data

Authors: Hafiz Tayyab Rauf, Andre Freitas, Norman W. Paton

Abstract: Deep clustering (DC), a fusion of deep representation learning and clustering, has recently demonstrated positive results in data science, particularly text processing and computer vision. However, joint optimization of feature learning and data distribution in the multi-dimensional space is domain-specific, so existing DC methods struggle to generalize to other application domains (such as data i… ▽ More Deep clustering (DC), a fusion of deep representation learning and clustering, has recently demonstrated positive results in data science, particularly text processing and computer vision. However, joint optimization of feature learning and data distribution in the multi-dimensional space is domain-specific, so existing DC methods struggle to generalize to other application domains (such as data integration and cleaning). In data management tasks, where high-density embeddings and overlap** clusters dominate, a data management-specific DC algorithm should be able to interact better with the data properties for supporting data cleaning and integration tasks. This paper presents a deep clustering algorithm for tabular data (TableDC) that reflects the properties of data management applications, particularly schema inference, entity resolution, and domain discovery. To address overlap** clusters, TableDC integrates Mahalanobis distance, which considers variance and correlation within the data, offering a similarity method suitable for tables, rows, or columns in high-dimensional latent spaces. TableDC provides flexibility for the final clustering assignment and shows higher tolerance to outliers through its heavy-tailed Cauchy distribution as the similarity kernel. The proposed similarity measure is particularly beneficial where the embeddings of raw data are densely packed and exhibit high degrees of overlap. Data cleaning tasks may involve a large number of clusters, which affects the scalability of existing DC methods. TableDC's self-supervised module efficiently learns data embeddings with a large number of clusters compared to existing benchmarks, which scale in quadratic time. We evaluated TableDC with several existing DC, Standard Clustering (SC), and state-of-the-art bespoke methods over benchmark datasets. TableDC consistently outperforms existing DC, SC, and bespoke methods. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17312 [pdf, ps, other]

Rigidity results for Serrin's overdetermined problems in Riemannian manifolds

Authors: Maria Andrade, Allan Freitas, Diego A. Marín

Abstract: In this work, we are interested in studying Serrin's overdetermined problems in Riemannian manifolds. For manifolds endowed with a conformal vector field, we prove a Pohozoaev-type identity to show a Serrin's type rigidity result using the P-function approach introduced by Weinberger. We proceed with a conformal change to achieve this goal, starting from a geometric Pohozaev identity due to Schoen… ▽ More In this work, we are interested in studying Serrin's overdetermined problems in Riemannian manifolds. For manifolds endowed with a conformal vector field, we prove a Pohozoaev-type identity to show a Serrin's type rigidity result using the P-function approach introduced by Weinberger. We proceed with a conformal change to achieve this goal, starting from a geometric Pohozaev identity due to Schoen. Moreover, we obtain a symmetry result for the associated Dirichlet problem by using a generalized normalized wall shear stress bound. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Comments are welcome!

arXiv:2405.01379 [pdf, other]

Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving

Authors: Xin Quan, Marco Valentino, Louise A. Dennis, André Freitas

Abstract: Natural language explanations have become a proxy for evaluating explainable and multi-step Natural Language Inference (NLI) models. However, assessing the validity of explanations for NLI is challenging as it typically involves the crowd-sourcing of apposite datasets, a process that is time-consuming and prone to logical errors. To address existing limitations, this paper investigates the verific… ▽ More Natural language explanations have become a proxy for evaluating explainable and multi-step Natural Language Inference (NLI) models. However, assessing the validity of explanations for NLI is challenging as it typically involves the crowd-sourcing of apposite datasets, a process that is time-consuming and prone to logical errors. To address existing limitations, this paper investigates the verification and refinement of natural language explanations through the integration of Large Language Models (LLMs) and Theorem Provers (TPs). Specifically, we present a neuro-symbolic framework, named Explanation-Refiner, that augments a TP with LLMs to generate and formalise explanatory sentences and suggest potential inference strategies for NLI. In turn, the TP is employed to provide formal guarantees on the logical validity of the explanations and to generate feedback for subsequent improvements. We demonstrate how Explanation-Refiner can be jointly used to evaluate explanatory reasoning, autoformalisation, and error correction mechanisms of state-of-the-art LLMs as well as to automatically enhance the quality of human-annotated explanations of variable complexity in different domains. △ Less

Submitted 7 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00402 [pdf, other]

Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models

Authors: Leonardo Ranaldi, Andrè Freitas

Abstract: The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations. In this pap… ▽ More The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations. In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Our approach is based on a two-stage process, where reasoning abilities are first transferred between LLMs and Small Language Models (SLMs) via Instruction-tuning on demonstrations provided by LLMs, and then the instructed models Self-refine their abilities through preference optimization strategies. In particular, the second phase operates refinement heuristics based on the Direct Preference Optimization algorithm, where the SLMs are elicited to deliver a series of reasoning paths by automatically sampling the generated responses and providing rewards using ground truths from the LLMs. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios, aligning the reasoning abilities of Smaller and Larger Language Models. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.18384 [pdf, other]

Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions

Authors: Jordan Meadows, Tamsin James, Andre Freitas

Abstract: Language models can hallucinate when performing complex and detailed mathematical reasoning. Physics provides a rich domain for assessing mathematical reasoning capabilities where physical context imbues the use of symbols which needs to satisfy complex semantics (\textit{e.g.,} units, tensorial order), leading to instances where inference may be algebraically coherent, yet unphysical. In this wor… ▽ More Language models can hallucinate when performing complex and detailed mathematical reasoning. Physics provides a rich domain for assessing mathematical reasoning capabilities where physical context imbues the use of symbols which needs to satisfy complex semantics (\textit{e.g.,} units, tensorial order), leading to instances where inference may be algebraically coherent, yet unphysical. In this work, we assess the ability of Language Models (LMs) to perform fine-grained mathematical and physical reasoning using a curated dataset encompassing multiple notations and Physics subdomains. We improve zero-shot scores using synthetic in-context examples, and demonstrate non-linear degradation of derivation quality with perturbation strength via the progressive omission of supporting premises. We find that the models' mathematical reasoning is not physics-informed in this setting, where physical context is predominantly ignored in favour of reverse-engineering solutions. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.12197 [pdf, ps, other]

A congruence theorem for compact embedded hypersurfaces in $\mathbb{S}^{n+1}_+$

Authors: Allan Freitas, Felippe Guimarães

Abstract: We prove a codimension reduction and congruence theorem for compact $n$-dimensional submanifolds of $\mathbb{S}^{n+p}$ that admit a mean convex isometric embedding into $\mathbb{S}^{n+1}_+$ using a Reilly type formula for space forms. We prove a codimension reduction and congruence theorem for compact $n$-dimensional submanifolds of $\mathbb{S}^{n+p}$ that admit a mean convex isometric embedding into $\mathbb{S}^{n+1}_+$ using a Reilly type formula for space forms. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Comments are welcome!

MSC Class: 53A07; 53C42

arXiv:2404.09369 [pdf, ps, other]

Perelman singular manifolds

Authors: Márcio Batista, Allan Freitas, Márcio Santos

Abstract: On a Riemannian manifold with a smooth function $f: M\to \mathbb{R}$, we consider the linearization of the Perelman scalar curvature $\mathcal{R}$ and its $L^2$-formal adjoint operator $δ\mathcal{R}^*$. A manifold endowed with a metric $g$ whose operator $δ\mathcal{R}^*$ has a nontrivial kernel is called a Perelman singular manifold. In this paper, we present examples and apply general maximum pri… ▽ More On a Riemannian manifold with a smooth function $f: M\to \mathbb{R}$, we consider the linearization of the Perelman scalar curvature $\mathcal{R}$ and its $L^2$-formal adjoint operator $δ\mathcal{R}^*$. A manifold endowed with a metric $g$ whose operator $δ\mathcal{R}^*$ has a nontrivial kernel is called a Perelman singular manifold. In this paper, we present examples and apply general maximum principles to obtain rigidity or nonexistence results in the underlying setting. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Comments are welcome!

MSC Class: 53C21; 53C15; 58JXX

arXiv:2404.04963 [pdf, other]

SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials

Authors: Mael Jullien, Marco Valentino, André Freitas

Abstract: Large Language Models (LLMs) are at the forefront of NLP achievements but fall short in dealing with shortcut learning, factual inconsistency, and vulnerability to adversarial inputs.These shortcomings are especially critical in medical contexts, where they can misrepresent actual model capabilities. Addressing this, we present SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Cl… ▽ More Large Language Models (LLMs) are at the forefront of NLP achievements but fall short in dealing with shortcut learning, factual inconsistency, and vulnerability to adversarial inputs.These shortcomings are especially critical in medical contexts, where they can misrepresent actual model capabilities. Addressing this, we present SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for ClinicalTrials. Our contributions include the refined NLI4CT-P dataset (i.e., Natural Language Inference for Clinical Trials - Perturbed), designed to challenge LLMs with interventional and causal reasoning tasks, along with a comprehensive evaluation of methods and results for participant submissions. A total of 106 participants registered for the task contributing to over 1200 individual submissions and 25 system overview papers. This initiative aims to advance the robustness and applicability of NLI models in healthcare, ensuring safer and more dependable AI assistance in clinical decision-making. We anticipate that the dataset, models, and outcomes of this task can support future research in the field of biomedical NLI. The dataset, competition leaderboard, and website are publicly available. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.02625 [pdf, other]

A Differentiable Integer Linear Programming Solver for Explanation-Based Natural Language Inference

Authors: Mokanarangan Thayaparan, Marco Valentino, André Freitas

Abstract: Integer Linear Programming (ILP) has been proposed as a formalism for encoding precise structural and semantic constraints for Natural Language Inference (NLI). However, traditional ILP frameworks are non-differentiable, posing critical challenges for the integration of continuous language representations based on deep learning. In this paper, we introduce a novel approach, named Diff-Comb Explain… ▽ More Integer Linear Programming (ILP) has been proposed as a formalism for encoding precise structural and semantic constraints for Natural Language Inference (NLI). However, traditional ILP frameworks are non-differentiable, posing critical challenges for the integration of continuous language representations based on deep learning. In this paper, we introduce a novel approach, named Diff-Comb Explainer, a neuro-symbolic architecture for explanation-based NLI based on Differentiable BlackBox Combinatorial Solvers (DBCS). Differently from existing neuro-symbolic solvers, Diff-Comb Explainer does not necessitate a continuous relaxation of the semantic constraints, enabling a direct, more precise, and efficient incorporation of neural representations into the ILP formulation. Our experiments demonstrate that Diff-Comb Explainer achieves superior performance when compared to conventional ILP solvers, neuro-symbolic black-box solvers, and Transformer-based encoders. Moreover, a deeper analysis reveals that Diff-Comb Explainer can significantly improve the precision, consistency, and faithfulness of the constructed explanations, opening new opportunities for research on neuro-symbolic architectures for explainable and transparent NLI in complex domains. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted to LREC-COLING 2024 - Camera Ready. arXiv admin note: substantial text overlap with arXiv:2208.03339

arXiv:2404.02622 [pdf, other]

Estimating the Causal Effects of Natural Logic Features in Transformer-Based NLI Models

Authors: Julia Rozanova, Marco Valentino, André Freitas

Abstract: Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to investigate specific patterns of reasoning with enough structure and regularity to identify and quantify… ▽ More Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to investigate specific patterns of reasoning with enough structure and regularity to identify and quantify systematic reasoning failures in widely-used models. In this vein, we pick a portion of the NLI task for which an explicit causal diagram can be systematically constructed: the case where across two sentences (the premise and hypothesis), two related words/terms occur in a shared context. In this work, we apply causal effect estimation strategies to measure the effect of context interventions (whose effect on the entailment label is mediated by the semantic monotonicity characteristic) and interventions on the inserted word-pair (whose effect on the entailment label is mediated by the relation between these words). Extending related work on causal analysis of NLP models in different settings, we perform an extensive interventional study on the NLI task to investigate robustness to irrelevant changes and sensitivity to impactful changes of Transformers. The results strongly bolster the fact that similar benchmark accuracy scores may be observed for models that exhibit very different behaviour. Moreover, our methodology reinforces previously suspected biases from a causal perspective, including biases in favour of upward-monotone contexts and ignoring the effects of negation markers. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted to LREC-COLING 2024 - Camera Ready. arXiv admin note: substantial text overlap with arXiv:2305.08572

arXiv:2403.00513 [pdf, other]

doi 10.1016/j.physletb.2024.138713

The non-first-order-factorizable contributions to the three-loop single-mass operator matrix elements $A_{Qg}^{(3)}$ and $ΔA_{Qg}^{(3)}$

Authors: J. Ablinger, A. Behring, J. Blümlein, A. De Freitas, A. von Manteuffel, C. Schneider, K. Schönwald

Abstract: The non-first-order-factorizable contributions (The terms 'first-order-factorizable contributions' and 'non-first-order-factorizable contributions' have been introduced and discussed in Refs. \cite{Behring:2023rlq,Ablinger:2023ahe}. They describe the factorization behaviour of the difference- or differential equations for a subset of master integrals of a given problem.) to the unpolarized and pol… ▽ More The non-first-order-factorizable contributions (The terms 'first-order-factorizable contributions' and 'non-first-order-factorizable contributions' have been introduced and discussed in Refs. \cite{Behring:2023rlq,Ablinger:2023ahe}. They describe the factorization behaviour of the difference- or differential equations for a subset of master integrals of a given problem.) to the unpolarized and polarized massive operator matrix elements to three-loop order, $A_{Qg}^{(3)}$ and $ΔA_{Qg}^{(3)}$, are calculated in the single-mass case. For the $_2F_1$-related master integrals of the problem, we use a semi-analytic method based on series expansions and utilize the first-order differential equations for the master integrals which does not need a special basis of the master integrals. Due to the singularity structure of this basis a part of the integrals has to be computed to $O(\varepsilon^5)$ in the dimensional parameter. The solutions have to be matched at a series of thresholds and pseudo-thresholds in the region of the Bjorken variable $x \in ]0,\infty[$ using highly precise series expansions to obtain the imaginary part of the physical amplitude for $x \in ]0,1]$ at a high relative accuracy. We compare the present results both with previous analytic results, the results for fixed Mellin moments, and a prediction in the small-$x$ region. We also derive expansions in the region of small and large values of $x$. With this paper, all three-loop single-mass unpolarized and polarized operator matrix elements are calculated. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Report number: DO--TH 23/15. DESY 24--027, RISC Report series 24--02, ZU-TH 13/24, CERN-TH-2024-30

arXiv:2402.10767 [pdf, other]

Inference to the Best Explanation in Large Language Models

Authors: Dhairya Dalal, Marco Valentino, André Freitas, Paul Buitelaar

Abstract: While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanati… ▽ More While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where \textit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77\% accuracy ($\approx 27\%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($\approx+17\%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools. △ Less

Submitted 16 February, 2024; originally announced February 2024.

ACM Class: I.2.7

arXiv:2402.00745 [pdf, other]

Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement

Authors: Xin Quan, Marco Valentino, Louise A. Dennis, André Freitas

Abstract: An increasing amount of research in Natural Language Inference (NLI) focuses on the application and evaluation of Large Language Models (LLMs) and their reasoning capabilities. Despite their success, however, LLMs are still prone to factual errors and inconsistencies in their explanations, offering limited control and interpretability for inference in complex domains. In this paper, we focus on et… ▽ More An increasing amount of research in Natural Language Inference (NLI) focuses on the application and evaluation of Large Language Models (LLMs) and their reasoning capabilities. Despite their success, however, LLMs are still prone to factual errors and inconsistencies in their explanations, offering limited control and interpretability for inference in complex domains. In this paper, we focus on ethical NLI, investigating how hybrid neuro-symbolic techniques can enhance the logical validity and alignment of ethical explanations produced by LLMs. Specifically, we present an abductive-deductive framework named Logic-Explainer, which integrates LLMs with an external backward-chaining solver to refine step-wise natural language explanations and jointly verify their correctness, reduce incompleteness and minimise redundancy. An extensive empirical analysis demonstrates that Logic-Explainer can improve explanations generated via in-context learning methods and Chain-of-Thought (CoT) on challenging ethical NLI tasks, while, at the same time, producing formal proofs describing and supporting models' reasoning. As ethical NLI requires commonsense reasoning to identify underlying moral violations, our results suggest the effectiveness of neuro-symbolic methods for multi-step NLI more broadly, opening new opportunities to enhance the logical consistency, reliability, and alignment of LLMs. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Camera-ready for EACL 2024

arXiv:2402.00723 [pdf, other]

Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders

Authors: Yingji Zhang, Danilo S. Carvalho, Marco Valentino, Ian Pratt-Hartmann, Andre Freitas

Abstract: Achieving precise semantic control over the latent spaces of Variational AutoEncoders (VAEs) holds significant value for downstream tasks in NLP as the underlying generative mechanisms could be better localised, explained and improved upon. Recent research, however, has struggled to achieve consistent results, primarily due to the inevitable loss of semantic information in the variational bottlene… ▽ More Achieving precise semantic control over the latent spaces of Variational AutoEncoders (VAEs) holds significant value for downstream tasks in NLP as the underlying generative mechanisms could be better localised, explained and improved upon. Recent research, however, has struggled to achieve consistent results, primarily due to the inevitable loss of semantic information in the variational bottleneck and limited control over the decoding mechanism. To overcome these challenges, we investigate discrete latent spaces in Vector Quantized Variational AutoEncoders (VQVAEs) to improve semantic control and generation in Transformer-based VAEs. In particular, We propose T5VQVAE, a novel model that leverages the controllability of VQVAEs to guide the self-attention mechanism in T5 at the token-level, exploiting its full generalization capabilities. Experimental results indicate that T5VQVAE outperforms existing state-of-the-art VAE models, including Optimus, in terms of controllability and preservation of semantic information across different tasks such as auto-encoding of sentences and mathematical expressions, text transfer, and inference. Moreover, T5VQVAE exhibits improved inference capabilities, suggesting potential applications for downstream natural language and symbolic reasoning tasks. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.07564 [pdf, other]

Focus topics for the ECFA study on Higgs / Top / EW factories

Authors: Jorge de Blas, Patrick Koppenburg, Jenny List, Fabio Maltoni, Juan Alcaraz Maestre, Juliette Alimena, John Alison, Patrizia Azzi, Paolo Azzurri, Emanuele Bagnaschi, Timothy Barklow, Matthew J. Basso, Josh Bendavid, Martin Beneke, Eli Ben-Haim, Mikael Berggren, Marzia Bordone, Ivanka Bozovic, Valentina Cairo, Nuno Filipe Castro, Marina Cobal, Paula Collins, Mogens Dam, Valerio Dao, Matteo Defranchis , et al. (83 additional authors not shown)

Abstract: In order to stimulate new engagement and trigger some concrete studies in areas where further work would be beneficial towards fully understanding the physics potential of an $e^+e^-$ Higgs / Top / Electroweak factory, we propose to define a set of focus topics. The general reasoning and the proposed topics are described in this document. In order to stimulate new engagement and trigger some concrete studies in areas where further work would be beneficial towards fully understanding the physics potential of an $e^+e^-$ Higgs / Top / Electroweak factory, we propose to define a set of focus topics. The general reasoning and the proposed topics are described in this document. △ Less

Submitted 18 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: v3: fixed spelling of two authors

arXiv:2401.06452 [pdf, other]

Automated Machine Learning for Positive-Unlabelled Learning

Authors: Jack D. Saunders, Alex A. Freitas

Abstract: Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a give… ▽ More Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a given PU learning task presents a challenge. Our previous work has addressed this by proposing GA-Auto-PU, the first Automated Machine Learning (Auto-ML) system for PU learning. In this work, we propose two new Auto-ML systems for PU learning: BO-Auto-PU, based on a Bayesian Optimisation approach, and EBO-Auto-PU, based on a novel evolutionary/Bayesian optimisation approach. We also present an extensive evaluation of the three Auto-ML systems, comparing them to each other and to well-established PU learning methods across 60 datasets (20 real-world datasets, each with 3 versions in terms of PU learning characteristics). △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: 36 pages, 4 figures

arXiv:2312.13208 [pdf, other]

LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent Sentence Spaces

Authors: Yingji Zhang, Danilo S. Carvalho, Ian Pratt-Hartmann, André Freitas

Abstract: Deep generative neural networks, such as Variational AutoEncoders (VAEs), offer an opportunity to better understand and control language models from the perspective of sentence-level latent spaces. To combine the controllability of VAE latent spaces with the state-of-the-art performance of recent large language models (LLMs), we present in this work LlaMaVAE, which combines expressive encoder and… ▽ More Deep generative neural networks, such as Variational AutoEncoders (VAEs), offer an opportunity to better understand and control language models from the perspective of sentence-level latent spaces. To combine the controllability of VAE latent spaces with the state-of-the-art performance of recent large language models (LLMs), we present in this work LlaMaVAE, which combines expressive encoder and decoder models (sentenceT5 and LlaMA) with a VAE architecture, aiming to provide better text generation control to LLMs. In addition, to conditionally guide the VAE generation, we investigate a new approach based on flow-based invertible neural networks (INNs) named Invertible CVAE. Experimental results reveal that LlaMaVAE can outperform the previous state-of-the-art VAE language model, Optimus, across various tasks, including language modelling, semantic textual similarity and definition modelling. Qualitative analysis on interpolation and traversal experiments also indicates an increased degree of semantic clustering and geometric consistency, which enables better generation control. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2311.08579 [pdf, other]

Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders

Authors: Yingji Zhang, Marco Valentino, Danilo S. Carvalho, Ian Pratt-Hartmann, André Freitas

Abstract: The injection of syntactic information in Variational AutoEncoders (VAEs) has been shown to result in an overall improvement of performances and generalisation. An effective strategy to achieve such a goal is to separate the encoding of distributional semantic features and syntactic structures into heterogeneous latent spaces via multi-task learning or dual encoder architectures. However, existing… ▽ More The injection of syntactic information in Variational AutoEncoders (VAEs) has been shown to result in an overall improvement of performances and generalisation. An effective strategy to achieve such a goal is to separate the encoding of distributional semantic features and syntactic structures into heterogeneous latent spaces via multi-task learning or dual encoder architectures. However, existing works employing such techniques are limited to LSTM-based VAEs. In this paper, we investigate latent space separation methods for structural syntactic injection in Transformer-based VAE architectures (i.e., Optimus). Specifically, we explore how syntactic structures can be leveraged in the encoding stage through the integration of graph-based and sequential models, and how multiple, specialised latent representations can be injected into the decoder's attention mechanism via low-rank operators. Our empirical evaluation, carried out on natural language sentences and mathematical expressions, reveals that the proposed end-to-end VAE architecture can result in a better overall organisation of the latent space, alleviating the information loss occurring in standard VAE setups, resulting in enhanced performances on language modelling and downstream generation tasks. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.06364 [pdf, other]

Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach

Authors: Maxime Delmas, Magdalena Wysocka, André Freitas

Abstract: The sparsity of labelled data is an obstacle to the development of Relation Extraction models and the completion of databases in various biomedical areas. While being of high interest in drug-discovery, the natural-products literature, reporting the identification of potential bioactive compounds from organisms, is a concrete example of such an overlooked topic. To mark the start of this new task,… ▽ More The sparsity of labelled data is an obstacle to the development of Relation Extraction models and the completion of databases in various biomedical areas. While being of high interest in drug-discovery, the natural-products literature, reporting the identification of potential bioactive compounds from organisms, is a concrete example of such an overlooked topic. To mark the start of this new task, we created the first curated evaluation dataset and extracted literature items from the LOTUS database to build training sets. To this end, we developed a new sampler inspired by diversity metrics in ecology, named Greedy Maximum Entropy sampler, or GME-sampler (https://github.com/idiap/gme-sampler). The strategic optimization of both balance and diversity of the selected items in the evaluation set is important given the resource-intensive nature of manual curation. After quantifying the noise in the training set, in the form of discrepancies between the input abstracts text and the expected output labels, we explored different strategies accordingly. Framing the task as an end-to-end Relation Extraction, we evaluated the performance of standard fine-tuning as a generative task and few-shot learning with open Large Language Models (LLaMA 7B-65B). In addition to their evaluation in few-shot settings, we explore the potential of open Large Language Models (Vicuna-13B) as synthetic data generator and propose a new workflow for this purpose. All evaluated models exhibited substantial improvements when fine-tuned on synthetic abstracts rather than the original noisy data. We provide our best performing (f1-score=59.0) BioGPT-Large model for end-to-end RE of natural-products relationships along with all the generated synthetic data and the evaluation dataset. See more details at https://github.com/idiap/abroad-re. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2311.05330 [pdf, other]

doi 10.1145/3589335.3651911

A Bayesian framework for measuring association and its application to emotional dynamics in Web discourse

Authors: Henrique S. Xavier, Diogo Cortiz, Mateus Silvestrin, Ana Luísa Freitas, Letícia Yumi Nakao Morello, Fernanda Naomi Pantaleão, Gabriel Gaudencio do Rêgo

Abstract: This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidenc… ▽ More This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidence intervals and the statistical significance of the measured degree of association. We applied the method to non-exclusive emotions identified by annotators in 4,613 tweets written in Portuguese. This analysis revealed pairs of emotions that exhibit associations and mutually opposed pairs. Moreover, the method identifies hierarchical relations between categories, a feature observed in our data, and is utilized to cluster emotions into basic-level groups. △ Less

Submitted 11 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 9 pages, 2 tables, 4 figures. Accepted for publication at the Beyond Facts workshop of the Web Conference 2024

arXiv:2311.04905 [pdf, other]

doi 10.1109/ICMLA58977.2023.00299

Detecting Relevant Information in High-Volume Chat Logs: Keyphrase Extraction for Grooming and Drug Dealing Forensic Analysis

Authors: Jeovane Honório Alves, Horácio A. C. G. Pedroso, Rafael Honorio Venetikides, Joel E. M. Köster, Luiz Rodrigo Grochocki, Cinthia O. A. Freitas, Jean Paul Barddal

Abstract: The growing use of digital communication platforms has given rise to various criminal activities, such as grooming and drug dealing, which pose significant challenges to law enforcement and forensic experts. This paper presents a supervised keyphrase extraction approach to detect relevant information in high-volume chat logs involving grooming and drug dealing for forensic analysis. The proposed m… ▽ More The growing use of digital communication platforms has given rise to various criminal activities, such as grooming and drug dealing, which pose significant challenges to law enforcement and forensic experts. This paper presents a supervised keyphrase extraction approach to detect relevant information in high-volume chat logs involving grooming and drug dealing for forensic analysis. The proposed method, JointKPE++, builds upon the JointKPE keyphrase extractor by employing improvements to handle longer texts effectively. We evaluate JointKPE++ using BERT-based pre-trained models on grooming and drug dealing datasets, including BERT, RoBERTa, SpanBERT, and BERTimbau. The results show significant improvements over traditional approaches and demonstrate the potential for JointKPE++ to aid forensic experts in efficiently detecting keyphrases related to criminal activities. △ Less

Submitted 14 September, 2023; originally announced November 2023.

Comments: Accepted for presentation at the 22nd IEEE International Conference on Machine Learning and Applications (ICMLA) 2023

arXiv:2311.01230 [pdf, other]

Multi-Operational Mathematical Derivations in Latent Space

Authors: Marco Valentino, Jordan Meadows, Lan Zhang, André Freitas

Abstract: This paper investigates the possibility of approximating multiple mathematical operations in latent space for expression derivation. To this end, we introduce different multi-operational representation paradigms, modelling mathematical operations as explicit geometric transformations. By leveraging a symbolic engine, we construct a large-scale dataset comprising 1.7M derivation steps stemming from… ▽ More This paper investigates the possibility of approximating multiple mathematical operations in latent space for expression derivation. To this end, we introduce different multi-operational representation paradigms, modelling mathematical operations as explicit geometric transformations. By leveraging a symbolic engine, we construct a large-scale dataset comprising 1.7M derivation steps stemming from 61K premises and 6 operators, analysing the properties of each paradigm when instantiated with state-of-the-art neural encoders. Specifically, we investigate how different encoding mechanisms can approximate expression manipulation in latent space, exploring the trade-off between learning different operators and specialising within single operations, as well as the ability to support multi-step derivations and out-of-distribution generalisation. Our empirical analysis reveals that the multi-operational paradigm is crucial for disentangling different operators, while discriminating the conclusions for a single operation is achievable in the original expression encoder. Moreover, we show that architectural choices can heavily affect the training dynamics, structural organisation, and generalisation of the latent space, resulting in significant variations across paradigms and classes of encoders. △ Less

Submitted 3 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: Accepted to NAACL 2024 - Camera Ready

arXiv:2311.00644 [pdf, other]

doi 10.1016/j.nuclphysb.2023.116427

The first-order factorizable contributions to the three-loop massive operator matrix elements $A_{Qg}^{(3)}$ and $ΔA_{Qg}^{(3)}$

Authors: J. Ablinger, A. Behring, J. Blümlein, A. De Freitas, A. von Manteuffel, C. Schneider, K. Schönwald

Abstract: The unpolarized and polarized massive operator matrix elements $A_{Qg}^{(3)}$ and $ΔA_{Qg}^{(3)}$ contain first-order factorizable and non-first-order factorizable contributions in the determining difference or differential equations of their master integrals. We compute their first-order factorizable contributions in the single heavy mass case for all contributing Feynman diagrams. Moreover, we p… ▽ More The unpolarized and polarized massive operator matrix elements $A_{Qg}^{(3)}$ and $ΔA_{Qg}^{(3)}$ contain first-order factorizable and non-first-order factorizable contributions in the determining difference or differential equations of their master integrals. We compute their first-order factorizable contributions in the single heavy mass case for all contributing Feynman diagrams. Moreover, we present the complete color-$ζ$ factors for the cases in which also non-first-order factorizable contributions emerge in the master integrals, but cancel in the final result as found by using the method of arbitrary high Mellin moments. Individual contributions depend also on generalized harmonic sums and on nested finite binomial and inverse binomial sums in Mellin $N$-space, and correspondingly, on Kummer-Poincaré and square-root valued alphabets in Bjorken-$x$ space. We present a complete discussion of the possibilities of solving the present problem in $N$-space analytically and we also discuss the limitations in the present case to analytically continue the given $N$-space expressions to $N \in \mathbb{C}$ by strict methods. The representation through generating functions allows a well synchronized representation of the first-order factorizable results over a 17-letter alphabet. We finally obtain representations in terms of iterated integrals over the corresponding alphabet in $x$-space, also containing up to weight {\sf w = 5} special constants, which can be rationalized to Kummer-Poincaré iterated integrals at special arguments. The analytic $x$-space representation requires separate analyses for the intervals $x \in [0,1/4], [1/4,1/2], [1/2,1]$ and $x > 1$. We also derive the small and large $x$ limits of the first-order factorizable contributions. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 58 pages, 4 Figures

Report number: DO-TH 23/12, DESY 23-142, CERN-TH-2023-164,RISC Report series 23-12, ZU-TH 60/23, MSUHEP-23-025

arXiv:2310.02752 [pdf, ps, other]

Fair Feature Selection: A Comparison of Multi-Objective Genetic Algorithms

Authors: James Brookhouse, Alex Freitas

Abstract: Machine learning classifiers are widely used to make decisions with a major impact on people's lives (e.g. accepting or denying a loan, hiring decisions, etc). In such applications,the learned classifiers need to be both accurate and fair with respect to different groups of people, with different values of variables such as sex and race. This paper focuses on fair feature selection for classificat… ▽ More Machine learning classifiers are widely used to make decisions with a major impact on people's lives (e.g. accepting or denying a loan, hiring decisions, etc). In such applications,the learned classifiers need to be both accurate and fair with respect to different groups of people, with different values of variables such as sex and race. This paper focuses on fair feature selection for classification, i.e. methods that select a feature subset aimed at maximising both the accuracy and the fairness of the predictions made by a classifier. More specifically, we compare two recently proposed Genetic Algorithms (GAs) for fair feature selection that are based on two different multi-objective optimisation approaches: (a) a Pareto dominance-based GA; and (b) a lexicographic optimisation-based GA, where maximising accuracy has higher priority than maximising fairness. Both GAs use the same measures of accuracy and fairness, allowing for a controlled comparison. As far as we know, this is the first comparison between the Pareto and lexicographic approaches for fair classification. The results show that, overall, the lexicographic GA outperformed the Pareto GA with respect to accuracy without degradation of the fairness of the learned classifiers. This is an important result because at present nearly all GAs for fair classification are based on the Pareto approach, so these results suggest a promising new direction for research in this area. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 10 pages, 1 figure, 3 tables

arXiv:2310.00978 [pdf, other]

Convergence to decorated Lévy processes in non-Skorohod topologies for dynamical systems

Authors: Ana Cristina Moreira Freitas, Jorge Milhazes Freitas, Ian Melbourne, Mike Todd

Abstract: We present a general framework for weak convergence to decorated Lévy processes in enriched spaces of càdlàg functions for vector-valued processes arising in deterministic systems. Applications include uniformly expanding maps and unbounded observables as well as nonuniformly expanding/hyperbolic maps with bounded observables. The latter includes intermittent maps and dispersing billiards with fla… ▽ More We present a general framework for weak convergence to decorated Lévy processes in enriched spaces of càdlàg functions for vector-valued processes arising in deterministic systems. Applications include uniformly expanding maps and unbounded observables as well as nonuniformly expanding/hyperbolic maps with bounded observables. The latter includes intermittent maps and dispersing billiards with flat cusps. In many of these examples, convergence fails in all of the Skorohod topologies. Moreover, the enriched space picks up details of excursions that are not recorded by Skorohod or Whitt topologies. △ Less

Submitted 9 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Slight change to hypothesis (5.2)

MSC Class: 37A50; 60F17; 37D25; 37C83

arXiv:2309.10405 [pdf, other]

Gap results and existence of CMC free boundary hypersurfaces in rotational domains

Authors: Allan Freitas, Márcio Santos, J. Sindeaux

Abstract: In this paper, we work with the existence and uniqueness of free boundary constant mean curvature hypersurfaces in rotational domains. These are domains whose boundary is generated by a rotation of a graph. Under some conditions on the function that generates the graph and a gap condition on the umbilicity tensor, we classify the CMC free boundary hypersurfaces as topological disks or annulus. Als… ▽ More In this paper, we work with the existence and uniqueness of free boundary constant mean curvature hypersurfaces in rotational domains. These are domains whose boundary is generated by a rotation of a graph. Under some conditions on the function that generates the graph and a gap condition on the umbilicity tensor, we classify the CMC free boundary hypersurfaces as topological disks or annulus. Also, we construct some examples of free boundary minimal surfaces in the rotational ellipsoid that, in particular, satisfy our gap condition. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Comments are welcome!

arXiv:2308.14186 [pdf, other]

Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations

Authors: Leonardo Ranaldi, Giulia Pucci, Andre Freitas

Abstract: The language ability of Large Language Models (LLMs) is often unbalanced towards English because of the imbalance in the distribution of the pre-training data. This disparity is demanded in further fine-tuning and affecting the cross-lingual abilities of LLMs. In this paper, we propose to empower Instructiontuned LLMs (It-LLMs) in languages other than English by building semantic alignment between… ▽ More The language ability of Large Language Models (LLMs) is often unbalanced towards English because of the imbalance in the distribution of the pre-training data. This disparity is demanded in further fine-tuning and affecting the cross-lingual abilities of LLMs. In this paper, we propose to empower Instructiontuned LLMs (It-LLMs) in languages other than English by building semantic alignment between them. Hence, we propose CrossAlpaca, an It-LLM with cross-lingual instruction-following and Translation-following demonstrations to improve semantic alignment between languages. We validate our approach on the multilingual Question Answering (QA) benchmarks XQUAD and MLQA and adapted versions of MMLU and BBH. Our models, tested over six different languages, outperform the It-LLMs tuned on monolingual data. The final results show that instruction tuning on non-English data is not enough and that semantic alignment can be further improved by Translation-following demonstrations. △ Less

Submitted 27 August, 2023; originally announced August 2023.

arXiv:2308.13030 [pdf, other]

doi 10.1007/JHEP01(2024)137

Probing dark sector fermions in Higgs precision studies and direct searches

Authors: Ayres Freitas, Qian Song

Abstract: In this paper, we investigate the discovery prospect of simplified fermionic dark sectors models through Higgs precision measurements at $e^+e^-$ colliders and direct searches at hadron colliders. These models extend the Standard Model with two Majorana or Dirac fermions that are singlets, doublets or triplets under the weak SU(2) group. For all models, we consider two scenarios where the lightest… ▽ More In this paper, we investigate the discovery prospect of simplified fermionic dark sectors models through Higgs precision measurements at $e^+e^-$ colliders and direct searches at hadron colliders. These models extend the Standard Model with two Majorana or Dirac fermions that are singlets, doublets or triplets under the weak SU(2) group. For all models, we consider two scenarios where the lightest new fermion is either stable, or where it decays into other visible final states. For the Higgs precision observables we primarily focus on $σ(e^+e^-\to ZH)$, which can deviate from the Standard Model through one-loop corrections involving the new fermions. Deviations of 0.5\% or more, which could be observable at future $e^+e^-$ colliders, are found for TeV-scale dark sector masses. By combining the constraints from the oblique parameters, $\text{Br}(H\toγγ)$, and direct production of the new fermions at the LHC, a comprehensive understanding of the discovery potential of these models can be achieved. In both scenarios, there exist some parameter regions where the Higgs precision measurements can provide complementary information to direct LHC searches. △ Less

Submitted 26 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 28 pages, 15 figures; update: add 2 references and 1 table

Journal ref: JHEP 01 (2024), 137

arXiv:2308.03581 [pdf, other]

Towards Controllable Natural Language Inference through Lexical Inference Types

Authors: Yingji Zhang, Danilo S. Carvalho, Ian Pratt-Hartmann, Andre Freitas

Abstract: Explainable natural language inference aims to provide a mechanism to produce explanatory (abductive) inference chains which ground claims to their supporting premises. A recent corpus called EntailmentBank strives to advance this task by explaining the answer to a question using an entailment tree \cite{dalvi2021explaining}. They employ the T5 model to directly generate the tree, which can explai… ▽ More Explainable natural language inference aims to provide a mechanism to produce explanatory (abductive) inference chains which ground claims to their supporting premises. A recent corpus called EntailmentBank strives to advance this task by explaining the answer to a question using an entailment tree \cite{dalvi2021explaining}. They employ the T5 model to directly generate the tree, which can explain how the answer is inferred. However, it lacks the ability to explain and control the generation of intermediate steps, which is crucial for the multi-hop inference process. % One recent corpus, EntailmentBank, aims to push this task forward by explaining an answer to a question according to an entailment tree \cite{dalvi2021explaining}. They employ T5 to generate the tree directly, which can explain how the answer is inferred but cannot explain how the intermediate is generated, which is essential to the multi-hop inference process. In this work, we focus on proposing a controlled natural language inference architecture for multi-premise explanatory inference. To improve control and enable explanatory analysis over the generation, we define lexical inference types based on Abstract Meaning Representation (AMR) graph and modify the architecture of T5 to learn a latent sentence representation (T5 bottleneck) conditioned on said type information. We also deliver a dataset of approximately 5000 annotated explanatory inference steps, with well-grounded lexical-symbolic operations. Experimental results indicate that the inference ty** induced at the T5 bottleneck can help T5 to generate a conclusion under explicit control. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.00425 [pdf, ps, other]

Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions

Authors: Christina Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh

Abstract: Sentences that present a complex syntax act as a major stumbling block for downstream Natural Language Processing applications whose predictive quality deteriorates with sentence length and complexity. The task of Text Simplification (TS) may remedy this situation. It aims to modify sentences in order to make them easier to process, using a set of rewriting operations, such as reordering, deletion… ▽ More Sentences that present a complex syntax act as a major stumbling block for downstream Natural Language Processing applications whose predictive quality deteriorates with sentence length and complexity. The task of Text Simplification (TS) may remedy this situation. It aims to modify sentences in order to make them easier to process, using a set of rewriting operations, such as reordering, deletion, or splitting. State-of-the-art syntactic TS approaches suffer from two major drawbacks: first, they follow a very conservative approach in that they tend to retain the input rather than transforming it, and second, they ignore the cohesive nature of texts, where context spread across clauses or sentences is needed to infer the true meaning of a statement. To address these problems, we present a discourse-aware TS approach that splits and rephrases complex English sentences within the semantic context in which they occur. Based on a linguistically grounded transformation stage that uses clausal and phrasal disembedding mechanisms, complex sentences are transformed into shorter utterances with a simple canonical structure that can be easily analyzed by downstream applications. With sentence splitting, we thus address a TS task that has hardly been explored so far. Moreover, we introduce the notion of minimality in this context, as we aim to decompose source sentences into a set of self-contained minimal semantic units. To avoid breaking down the input into a disjointed sequence of statements that is difficult to interpret because important contextual information is missing, we incorporate the semantic context between the split propositions in the form of hierarchical structures and semantic relationships. In that way, we generate a semantic hierarchy of minimal propositions that leads to a novel representation of complex assertions that puts a semantic layer on top of the simplified sentences. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.09998 [pdf, other]

Generating Mathematical Derivations with Large Language Models

Authors: Jordan Meadows, Marco Valentino, Andre Freitas

Abstract: The derivation of mathematical results in specialised fields, using Large Language Models (LLMs), is an emerging research direction that can help identify models' limitations, and potentially support mathematical discovery. In this paper, we leverage a symbolic engine to generate derivations of equations at scale, and investigate the capabilities of LLMs when deriving goal equations from premises.… ▽ More The derivation of mathematical results in specialised fields, using Large Language Models (LLMs), is an emerging research direction that can help identify models' limitations, and potentially support mathematical discovery. In this paper, we leverage a symbolic engine to generate derivations of equations at scale, and investigate the capabilities of LLMs when deriving goal equations from premises. Specifically, we employ in-context learning for GPT and fine-tune a range of T5 models to compare the robustness and generalisation of pre-training strategies to specialised models. Empirical results show that fine-tuned FLAN-T5-large (MathT5) outperforms GPT models on all static and out-of-distribution test sets in conventional scores. However, an in-depth analysis reveals that the fine-tuned models are more sensitive to perturbations involving unseen symbols and (to a lesser extent) changes to equation structure. In addition, we analyse 1.7K equations, and over 200 derivations, to highlight common reasoning errors such as the inclusion of incorrect, irrelevant, and redundant equations. Finally, we explore the suitability of existing metrics for evaluating mathematical derivations and find evidence that, while they can capture general properties such as sensitivity to perturbations, they fail to highlight fine-grained reasoning errors and essential differences between models. Overall, this work demonstrates that training models on synthetic data may improve their math capabilities beyond much larger LLMs, but current metrics are not appropriately assessing the quality of generated mathematical text. △ Less

Submitted 8 August, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: 10 pages

arXiv:2307.02983 [pdf, other]

Analytic results on the massive three-loop form factors: quarkonic contributions

Authors: Johannes Blümlein, Abilio De Freitas, Peter Marquard, Narayan Rana, Carsten Schneider

Abstract: The quarkonic contributions to the three-loop heavy-quark form factors for vector, axial-vector, scalar and pseudoscalar currents are described by closed form difference equations for the expansion coefficients in the limit of small virtualities $q^2/m^2$. A part of the contributions can be solved analytically and expressed in terms of harmonic and cyclotomic harmonic polylogarithms and square-roo… ▽ More The quarkonic contributions to the three-loop heavy-quark form factors for vector, axial-vector, scalar and pseudoscalar currents are described by closed form difference equations for the expansion coefficients in the limit of small virtualities $q^2/m^2$. A part of the contributions can be solved analytically and expressed in terms of harmonic and cyclotomic harmonic polylogarithms and square-root valued iterated integrals. Other contributions obey equations which are not first-order factorizable. For them still infinite series expansions around the singularities of the form factors can be obtained by matching the expansions at intermediate points and using differential equations which are obeyed directly by the form factors and are derived by guessing algorithms. One may determine all expansion coefficients for $q^2 /m^2 \to \infty$ analytically in terms of multiple zeta values. By expanding around the threshold and pseudo-threshold, the corresponding constants are multiple zeta values supplemented by a finite amount of new constants, which can be computed at high precision. For a part of these coefficients, the infinite series in front of these constants may be even resummed into harmonic polylogarithms. In this way, one obtains a deeper analytic description of the massive form factors, beyond their pure numerical evaluation. The calculations of these analytic results are based on sophisticated computer algebra techniques. We also compare our results with numerical results in the literature. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 92 pages, 14 figures

Report number: DESY 23--012, DO--TH 23/02, RISC Report Series 23-08

arXiv:2306.16550 [pdf, other]

Recent 3-Loop Heavy Flavor Corrections to Deep-Inelastic Scattering

Authors: J. Ablinger, A. Behring, J. Blümlein, A. De Freitas, A. Goedicke, A. von Manteuffel, C. Schneider, K. Schönwald

Abstract: We report on recent progress in calculating the three loop QCD corrections of the heavy flavor contributions in deep--inelastic scattering and the massive operator matrix elements of the variable flavor number scheme. Notably we deal with the operator matrix elements $A_{gg,Q}^{(3)}$ and $A_{Qg}^{(3)}$ and technical steps to their calculation. In particular, a new method to obtain the inverse Mell… ▽ More We report on recent progress in calculating the three loop QCD corrections of the heavy flavor contributions in deep--inelastic scattering and the massive operator matrix elements of the variable flavor number scheme. Notably we deal with the operator matrix elements $A_{gg,Q}^{(3)}$ and $A_{Qg}^{(3)}$ and technical steps to their calculation. In particular, a new method to obtain the inverse Mellin transform without computing the corresponding $N$--space expressions is discussed. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Proc RADCOR 2023, 7 pages, 1 figure

Report number: DESY-23-089, DO-TH 23/09, CERN-TH-2023-122, ZU-TH 29/23, RISC Report Series 23-09, MSUHEP-23-018

arXiv:2306.08563 [pdf, other]

Microscopic origin of polarization-entangled Stokes-anti-Stokes photons in diamond

Authors: Tiago A. Freitas, Paula Machado, Lucas V. de Carvalho, Diego Sier, Raul Corrêa, Riichiro Saito, Marcelo F. Santos, Carlos H. Monken, Ado Jorio

Abstract: Violation of the Clauser-Horne-Shimony-Holt inequality for the polarization of Stokes-anti-Stokes (SaS) photon pairs near a Raman resonance is demonstrated. The pairs are generated by shining a pulsed laser on a diamond sample, where two photons of the laser are converted into a pair of photons of different frequencies. The generated pairs are collected by standard Bell analyzers and shown to be e… ▽ More Violation of the Clauser-Horne-Shimony-Holt inequality for the polarization of Stokes-anti-Stokes (SaS) photon pairs near a Raman resonance is demonstrated. The pairs are generated by shining a pulsed laser on a diamond sample, where two photons of the laser are converted into a pair of photons of different frequencies. The generated pairs are collected by standard Bell analyzers and shown to be entangled in polarization, with the degree of entanglement depending on the spectral region and on the orientation of the polarization of the incident light with respect to the crystallographic orientation of the sample. This result opens up the possibility to combine quantum optics and SaS Raman spectroscopy in order to improve materials science and quantum information. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 12 pages, 3 figures

arXiv:2305.19772 [pdf, ps, other]

A note on Serrin's type problem on Riemannian manifolds

Authors: Allan Freitas, Alberto Roncoroni, Márcio Santos

Abstract: In this paper, we deal with Serrin-type problems in Riemannian manifolds. First, we obtain a Heintze-Karcher inequality and a Soap Bubble result, with its respective rigidity, when the ambient space has a Ricci tensor bounded below. After, we approach a Serrin problem in bounded domains of manifolds endowed with a closed conformal vector field. Our primary tool, in this case, is a new Pohozaev ide… ▽ More In this paper, we deal with Serrin-type problems in Riemannian manifolds. First, we obtain a Heintze-Karcher inequality and a Soap Bubble result, with its respective rigidity, when the ambient space has a Ricci tensor bounded below. After, we approach a Serrin problem in bounded domains of manifolds endowed with a closed conformal vector field. Our primary tool, in this case, is a new Pohozaev identity, which depends on the scalar curvature of the manifold. Applications involve Einstein and constant scalar curvature spaces. △ Less

Submitted 6 March, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: Comments are welcome!

arXiv:2305.17819 [pdf, other]

Large Language Models, scientific knowledge and factuality: A systematic analysis in antibiotic discovery

Authors: Magdalena Wysocka, Oskar Wysocki, Maxime Delmas, Vincent Mutel, Andre Freitas

Abstract: Inferring over and extracting information from Large Language Models (LLMs) trained on a large corpus of scientific literature can potentially drive a new era in biomedical research, reducing the barriers for accessing existing medical evidence. This work examines the potential of LLMs for dialoguing with biomedical background knowledge, using the context of antibiotic discovery. The systematic an… ▽ More Inferring over and extracting information from Large Language Models (LLMs) trained on a large corpus of scientific literature can potentially drive a new era in biomedical research, reducing the barriers for accessing existing medical evidence. This work examines the potential of LLMs for dialoguing with biomedical background knowledge, using the context of antibiotic discovery. The systematic analysis is applied to ten state-of-the-art models, from models specialised on biomedical scientific corpora to general models such as ChatGPT, GPT-4 and Llama 2 in two prompting-based tasks: chemical compound definition generation and chemical compound-fungus relation determination. The work provides a systematic assessment on the ability of LLMs to encode and express these relations, verifying for fluency, prompt-alignment, semantic coherence, factual knowledge and specificity of generated responses. Results show that while recent models have improved in fluency, factual accuracy is still low and models are biased towards over-represented entities. The ability of LLMs to serve as biomedical knowledge bases is questioned, and the need for additional systematic evaluation frameworks is highlighted. The best performing GPT-4 produced a factual definition for 70% of chemical compounds and 43.6% factual relations to fungi, whereas the best open source model BioGPT-large 30% of the compounds and 30% of the relations for the best-performing prompt. The results show that while LLMs are currently not fit for purpose to be used as biomedical factual knowledge bases, there is a promising emerging property in the direction of factuality as the models become domain specialised, scale-up in size and level of human feedback. △ Less

Submitted 5 December, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 28 pages, 3 figures

arXiv:2305.16547 [pdf, other]

Fermionic Electroweak NNLO Corrections to $e^+ e^- \to ZH$ with Polarized Beams and Different Renormalization Schemes

Authors: Ayres Freitas, Qian Song, Abstract: Recently, the next-to-next-to-leading order (NNLO) electroweak corrections with fermion loops to the Higgsstrahling process were computed. Here we present numerical results for polarized electron/positron beams, as well as for two input parameter schemes known as the $α(0)$ and $G_μ$ schemes. The size of the NNLO corrections strongly depends on the beam polarization, leading to an increase of the… ▽ More Recently, the next-to-next-to-leading order (NNLO) electroweak corrections with fermion loops to the Higgsstrahling process were computed. Here we present numerical results for polarized electron/positron beams, as well as for two input parameter schemes known as the $α(0)$ and $G_μ$ schemes. The size of the NNLO corrections strongly depends on the beam polarization, leading to an increase of the $ZH$ cross-section by 0.76% for $e^+_{\rm L} e^-_{\rm R}$ beams, and a decrease of 0.04% for $e^+_{\rm R} e^-_{\rm L}$ beams. Furthermore, inclusion of the NNLO corrections is found to significantly reduce the discrepancy between the results in the $α(0)$ and $G_μ$ schemes. Using the remaining difference, together with other methods, the theory uncertainty from missing bosonic electroweak corrections is estimated to be less than 0.3%. △ Less

Submitted 29 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 6 pages, 2 figures and 3 tables

arXiv:2305.13494 [pdf, other]

Deep Clustering for Data Cleaning and Integration

Authors: Hafiz Tayyab Rauf, Andre Freitas, Norman W. Paton

Abstract: Deep Learning (DL) techniques now constitute the state-of-the-art for important problems in areas such as text and image processing, and there have been impactful results that deploy DL in several data management tasks. Deep Clustering (DC) has recently emerged as a sub-discipline of DL, in which data representations are learned in tandem with clustering, with a view to automatically identifying t… ▽ More Deep Learning (DL) techniques now constitute the state-of-the-art for important problems in areas such as text and image processing, and there have been impactful results that deploy DL in several data management tasks. Deep Clustering (DC) has recently emerged as a sub-discipline of DL, in which data representations are learned in tandem with clustering, with a view to automatically identifying the features of the data that lead to improved clustering results. While DC has been used to good effect in several domains, particularly in image processing, the impact of DC on mainstream data management tasks remains unexplored. In this paper, we address this gap by investigating the impact of DC in data cleaning and integration tasks, specifically schema inference, entity resolution, and domain discovery, tasks that represent clustering from the perspective of tables, rows, and columns, respectively. In this setting, we compare and contrast several DC and non-DC clustering algorithms using standard benchmarks. The results show, among other things, that the most effective DC algorithms consistently outperform non-DC clustering algorithms for data integration tasks. However, we observed a significant correlation between the DC method and embedding approaches for rows, columns, and tables, highlighting that the suitable combination can enhance the efficiency of DC methods. △ Less

Submitted 22 September, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: The following enhancements have been carried out in the updated version of the manuscript: *Evaluated each data integration problem on additional datasets. *Added more DC and SC methods to the evaluation *Discussed algorithmic-specific observations

arXiv:2305.12563 [pdf, other]

A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers

Authors: Jordan Meadows, Marco Valentino, Damien Teney, Andre Freitas

Abstract: This paper proposes a methodology for generating and perturbing detailed derivations of equations at scale, aided by a symbolic engine, to evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. Instantiating the framework in the context of sequence classification tasks, we compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT mode… ▽ More This paper proposes a methodology for generating and perturbing detailed derivations of equations at scale, aided by a symbolic engine, to evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. Instantiating the framework in the context of sequence classification tasks, we compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT models, exploring the relationship between specific operators and generalisation failure via the perturbation of reasoning aspects such as symmetry and variable surface forms. Surprisingly, our empirical evaluation reveals that the average in-distribution performance of fine-tuned models surpasses GPT-3.5, and rivals GPT-4. However, perturbations to input reasoning can reduce their performance by up to 80 F1 points. Overall, the results suggest that the in-distribution performance of smaller open-source models may potentially rival GPT by incorporating appropriately structured derivation dependencies during training, and highlight a shared weakness between BERT and GPT involving a relative inability to decode indirect references to mathematical entities. We release the full codebase, constructed datasets, and fine-tuned models to encourage future progress in the field. △ Less

Submitted 8 April, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: NAACL 2024

arXiv:2305.11391 [pdf, other]

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Authors: Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie **, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa

Abstract: Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorisi… ▽ More Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements. △ Less

Submitted 27 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.08572 [pdf, other]

Estimating the Causal Effects of Natural Logic Features in Neural NLI Models

Authors: Julia Rozanova, Marco Valentino, Andre Freitas

Abstract: Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to zone in on specific patterns of reasoning with enough structure and regularity to be able to identify and… ▽ More Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to zone in on specific patterns of reasoning with enough structure and regularity to be able to identify and quantify systematic reasoning failures in widely-used models. In this vein, we pick a portion of the NLI task for which an explicit causal diagram can be systematically constructed: in particular, the case where across two sentences (the premise and hypothesis), two related words/terms occur in a shared context. In this work, we apply causal effect estimation strategies to measure the effect of context interventions (whose effect on the entailment label is mediated by the semantic monotonicity characteristic) and interventions on the inserted word-pair (whose effect on the entailment label is mediated by the relation between these words.). Following related work on causal analysis of NLP models in different settings, we adapt the methodology for the NLI task to construct comparative model profiles in terms of robustness to irrelevant changes and sensitivity to impactful changes. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.07303 [pdf, other]

Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions

Authors: Marco Valentino, Danilo S. Carvalho, André Freitas

Abstract: Natural language definitions possess a recursive, self-explanatory semantic structure that can support representation learning methods able to preserve explicit conceptual relations and constraints in the latent space. This paper presents a multi-relational model that explicitly leverages such a structure to derive word embeddings from definitions. By automatically extracting the relations linking… ▽ More Natural language definitions possess a recursive, self-explanatory semantic structure that can support representation learning methods able to preserve explicit conceptual relations and constraints in the latent space. This paper presents a multi-relational model that explicitly leverages such a structure to derive word embeddings from definitions. By automatically extracting the relations linking defined and defining terms from dictionaries, we demonstrate how the problem of learning word embeddings can be formalised via a translational framework in Hyperbolic space and used as a proxy to capture the global semantic structure of definitions. An extensive empirical analysis demonstrates that the framework can help imposing the desired structural constraints while preserving the semantic map** required for controllable and interpretable traversal. Moreover, the experiments reveal the superiority of the Hyperbolic word embeddings over the Euclidean counterparts and demonstrate that the multi-relational approach can obtain competitive results when compared to state-of-the-art neural models, with the advantage of being intrinsically more efficient and interpretable. △ Less

Submitted 16 February, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), camera-ready

arXiv:2305.04675 [pdf, other]

Predicting nuclear masses with product-unit networks

Authors: Babette Dellen, Uwe Jaekel, Paulo S. A. Freitas, John W. Clark

Abstract: Accurate estimation of nuclear masses and their prediction beyond the experimentally explored domains of the nuclear landscape are crucial to an understanding of the fundamental origin of nuclear properties and to many applications of nuclear science, most notably in quantifying the $r$-process of stellar nucleosynthesis. Neural networks have been applied with some success to the prediction of nuc… ▽ More Accurate estimation of nuclear masses and their prediction beyond the experimentally explored domains of the nuclear landscape are crucial to an understanding of the fundamental origin of nuclear properties and to many applications of nuclear science, most notably in quantifying the $r$-process of stellar nucleosynthesis. Neural networks have been applied with some success to the prediction of nuclear masses, but they are known to have shortcomings in application to extrapolation tasks. In this work, we propose and explore a novel type of neural network for mass prediction in which the usual neuron-like processing units are replaced by complex-valued product units that permit multiplicative couplings of inputs to be learned from the input data. This generalized network model is tested on both interpolation and extrapolation data sets drawn from the Atomic Mass Evaluation. Its performance is compared with that of several neural-network architectures, substantiating its suitability for nuclear mass prediction. Additionally, a prediction-uncertainty measure for such complex-valued networks is proposed that serves to identify regions of expected low prediction error. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.03598 [pdf, other]

NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports

Authors: Maël Jullien, Marco Valentino, Hannah Frost, Paul O'Regan, Donal Landers, André Freitas

Abstract: How can we interpret and retrieve medical evidence to support clinical decisions? Clinical trial reports (CTR) amassed over the years contain indispensable information for the development of personalized medicine. However, it is practically infeasible to manually inspect over 400,000+ clinical trial reports in order to find the best evidence for experimental treatments. Natural Language Inference… ▽ More How can we interpret and retrieve medical evidence to support clinical decisions? Clinical trial reports (CTR) amassed over the years contain indispensable information for the development of personalized medicine. However, it is practically infeasible to manually inspect over 400,000+ clinical trial reports in order to find the best evidence for experimental treatments. Natural Language Inference (NLI) offers a potential solution to this problem, by allowing the scalable computation of textual entailment. However, existing NLI models perform poorly on biomedical corpora, and previously published datasets fail to capture the full complexity of inference over CTRs. In this work, we present a novel resource to advance research on NLI for reasoning on CTRs. The resource includes two main tasks. Firstly, to determine the inference relation between a natural language statement, and a CTR. Secondly, to retrieve supporting facts to justify the predicted relation. We provide NLI4CT, a corpus of 2400 statements and CTRs, annotated for these tasks. Baselines on this corpus expose the limitations of existing NLI models, with 6 state-of-the-art NLI models achieving a maximum F1 score of 0.627. To the best of our knowledge, we are the first to design a task that covers the interpretation of full CTRs. To encourage further work on this challenging dataset, we make the corpus, competition leaderboard, website and code to replicate the baseline experiments available at: https://github.com/ai-systems/nli4ct △ Less

Submitted 28 October, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 Camera-ready, 15 pages

arXiv:2305.02993 [pdf, other]

SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for Clinical Trial Data

Authors: Maël Jullien, Marco Valentino, Hannah Frost, Paul O'Regan, Donal Landers, André Freitas

Abstract: This paper describes the results of SemEval 2023 task 7 -- Multi-Evidence Natural Language Inference for Clinical Trial Data (NLI4CT) -- consisting of 2 tasks, a Natural Language Inference (NLI) task, and an evidence selection task on clinical trial data. The proposed challenges require multi-hop biomedical and numerical reasoning, which are of significant importance to the development of systems… ▽ More This paper describes the results of SemEval 2023 task 7 -- Multi-Evidence Natural Language Inference for Clinical Trial Data (NLI4CT) -- consisting of 2 tasks, a Natural Language Inference (NLI) task, and an evidence selection task on clinical trial data. The proposed challenges require multi-hop biomedical and numerical reasoning, which are of significant importance to the development of systems capable of large-scale interpretation and retrieval of medical evidence, to provide personalized evidence-based care. Task 1, the entailment task, received 643 submissions from 40 participants, and Task 2, the evidence selection task, received 364 submissions from 23 participants. The tasks are challenging, with the majority of submitted systems failing to significantly outperform the majority class baseline on the entailment task, and we observe significantly better performance on the evidence selection task than on the entailment task. Increasing the number of model parameters leads to a direct increase in performance, far more significant than the effect of biomedical pre-training. Future works could explore the limitations of large models for generalization and numerical inference, and investigate methods to augment clinical datasets to allow for more rigorous testing and to facilitate fine-tuning. We envisage that the dataset, models, and results of this task will be useful to the biomedical NLI and evidence retrieval communities. The dataset, competition leaderboard, and website are publicly available. △ Less

Submitted 11 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Showing 1–50 of 399 results for author: Freitas, A