-
Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data
Authors:
Manal Helal,
Fanrong Kong,
Sharon C. A. Chen,
Michael Bain,
Richard Christen,
Vitali Sintchenko
Abstract:
The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S…
▽ More
The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear map** (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. Results: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Resolving Legalese: A Multilingual Exploration of Negation Scope Resolution in Legal Documents
Authors:
Ramona Christen,
Anastassia Shaitarova,
Matthias Stürmer,
Joel Niklaus
Abstract:
Resolving the scope of a negation within a sentence is a challenging NLP task. The complexity of legal texts and the lack of annotated in-domain negation corpora pose challenges for state-of-the-art (SotA) models when performing negation scope resolution on multilingual legal data. Our experiments demonstrate that models pre-trained without legal data underperform in the task of negation scope res…
▽ More
Resolving the scope of a negation within a sentence is a challenging NLP task. The complexity of legal texts and the lack of annotated in-domain negation corpora pose challenges for state-of-the-art (SotA) models when performing negation scope resolution on multilingual legal data. Our experiments demonstrate that models pre-trained without legal data underperform in the task of negation scope resolution. Our experiments, using language models exclusively fine-tuned on domains like literary texts and medical data, yield inferior results compared to the outcomes documented in prior cross-domain experiments. We release a new set of annotated court decisions in German, French, and Italian and use it to improve negation scope resolution in both zero-shot and multilingual settings. We achieve token-level F1-scores of up to 86.7% in our zero-shot cross-lingual experiments, where the models are trained on two languages of our legal datasets and evaluated on the third. Our multilingual experiments, where the models were trained on all available negation data and evaluated on our legal datasets, resulted in F1-scores of up to 91.1%.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Exogenous Data in Forecasting: FARM -- A New Measure for Relevance Evaluation
Authors:
Ramón Christen,
Luca Mazzola,
Alexander Denzler,
Edy Portmann
Abstract:
Evaluating the relevance of an exogenous data series is the first step in improving the prediction capabilities of a forecast algorithm. Inspired by existing metrics for time series similarity, we introduce a new approach named FARM - Forward Aligned Relevance Metric. Our forward method relies on an angular measure that compares changes in subsequent data points to align time-warped series in an e…
▽ More
Evaluating the relevance of an exogenous data series is the first step in improving the prediction capabilities of a forecast algorithm. Inspired by existing metrics for time series similarity, we introduce a new approach named FARM - Forward Aligned Relevance Metric. Our forward method relies on an angular measure that compares changes in subsequent data points to align time-warped series in an efficient way. The proposed algorithm combines local and global measures to provide a balanced relevance metric. This results in considering also partial, intermediate matches as relevant indicators for exogenous data series significance. As a first validation step, we present the application of our FARM approach to synthetic but representative signals. While demonstrating the improved capabilities with respect to existing approaches, we also discuss existing constraints and limitations of our idea.
△ Less
Submitted 24 April, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Towards a Peer-to-Peer Energy Market: an Overview
Authors:
Luca Mazzola,
Alexander Denzler,
Ramon Christen
Abstract:
This work focuses on the electric power market, comparing the status quo with the recent trend towards the increase in distributed self-generation capabilities by prosumers. Starting from the existing tension between the intrinsically hierarchical current structure of the electricity distribution network and the substantially distributed and self-organising nature of the self-generation, we explor…
▽ More
This work focuses on the electric power market, comparing the status quo with the recent trend towards the increase in distributed self-generation capabilities by prosumers. Starting from the existing tension between the intrinsically hierarchical current structure of the electricity distribution network and the substantially distributed and self-organising nature of the self-generation, we explore the limitations imposed by the current conditions. Initially, we introduce a potential multi-layered architecture for a Peer-to-Peer (P2P) energy market, discussing the fundamental aspects of local production and local consumption as part of a microgrid. Secondly, we analyse the consequent changes for the different users' roles, also in connection with some incentive models connected with the decentralisation of the power production. To give a full picture to the reader, we also scrutinise relevant elements of energy trading, such as Smart Contract and grid stability. Thirdly, we present an example of a typical P2P settlement, showcasing the role of all the previously analysed aspects. To conclude, we performed a review of relevant activities in this domain, to showcase where existing projects are going and what are the most important themes covered. Being this a work in progress, many open questions are still on the table and will be addressed in the next stages of the research. Eventually, by providing a reference model as base for further discussions and improvements, we would like to engage ourselves in a dialog with the different users and the broad community, oriented towards a more fair and ecological-friendly solution for the electricity market of the future.
△ Less
Submitted 26 March, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.