-
Deep learning-based method for segmenting epithelial layer of tubules in histopathological images of testicular tissue
Authors:
Azadeh Fakhrzadeh,
Pouya Karimian,
Mahsa Meyari,
Cris L. Luengo Hendriks,
Lena Holm,
Christian Sonne,
Rune Dietz,
Ellinor Spörndly-Nees
Abstract:
There is growing concern that male reproduction is affected by environmental chemicals. One way to determine the adverse effect of environmental pollutants is to use wild animals as monitors and evaluate testicular toxicity using histopathology. Automated methods are necessary tools in the quantitative assessment of histopathology to overcome the subjectivity of manual evaluation and accelerate th…
▽ More
There is growing concern that male reproduction is affected by environmental chemicals. One way to determine the adverse effect of environmental pollutants is to use wild animals as monitors and evaluate testicular toxicity using histopathology. Automated methods are necessary tools in the quantitative assessment of histopathology to overcome the subjectivity of manual evaluation and accelerate the process. We propose an automated method to process histology images of testicular tissue. Segmenting the epithelial layer of the seminiferous tubule is a prerequisite for develo** automated methods to detect abnormalities in tissue. We suggest an encoder-decoder fully connected convolutional neural network (F-CNN) model to segment the epithelial layer of the seminiferous tubules in histological images. Using ResNet-34 modules in the encoder adds a shortcut mechanism to avoid the gradient vanishing and accelerate the network convergence. The squeeze & excitation (SE) attention block is integrated into the encoding module improving the segmentation and localization of epithelium. We applied the proposed method for the 2-class problem where the epithelial layer of the tubule is the target class. The f-score and IoU of the proposed method are 0.85 and 0.92. Although the proposed method is trained on a limited training set, it performs well on an independent dataset and outperforms other state-of-the-art methods. The pretrained ResNet-34 in the encoder and attention block suggested in the decoder result in better segmentation and generalization. The proposed method can be applied to testicular tissue images from any mammalian species and can be used as the first part of a fully automated testicular tissue processing pipeline. The dataset and codes are publicly available on GitHub.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing
Authors:
Pranav Shetty,
Arunkumar Chitteth Rajan,
Christopher Kuenneth,
Sonkakshi Gupta,
Lakshmi Prerana Panchumarti,
Lauren Holm,
Chao Zhang,
Rampi Ramprasad
Abstract:
The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from published literature. We used natural language processing (NLP) methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science ab…
▽ More
The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from published literature. We used natural language processing (NLP) methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets when used as the encoder for text. Using this pipeline, we obtained ~300,000 material property records from ~130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available through a web platform at https://polymerscholar.org which can be used to locate material property data recorded in abstracts conveniently. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with a complete set of extracted material property information.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Novel split quality measures for stratified multilabel Cross Validation with application to large and sparse gene ontology datasets
Authors:
Henri Tiittanen,
Liisa Holm,
Petri Törönen
Abstract:
Multilabel learning is an important topic in machine learning research. Evaluating models in multilabel settings requires specific cross validation methods designed for multilabel data. In this article, we show that the most widely used cross validation split quality measures do not behave adequately with multilabel data that has strong class imbalance. We present improved measures and an algorith…
▽ More
Multilabel learning is an important topic in machine learning research. Evaluating models in multilabel settings requires specific cross validation methods designed for multilabel data. In this article, we show that the most widely used cross validation split quality measures do not behave adequately with multilabel data that has strong class imbalance. We present improved measures and an algorithm, optisplit, for optimising cross validations splits. We present an extensive comparison of various types of cross validation methods in which we show that optisplit produces more even cross validation splits than the existing methods and that it is fast enough to be used on big Gene Ontology (GO) datasets.
△ Less
Submitted 11 February, 2022; v1 submitted 3 September, 2021;
originally announced September 2021.
-
Π-cyc: A Reference-free SNP Discovery Application using Parallel Graph Search
Authors:
Reda Younsi,
**g Tang,
Liisa Holm
Abstract:
Motivation: Working with a large number of genomes simultaneously is of great interest in genetic population and comparative genomics research. Bubbles discovery in multi-genomes coloured de bruijn graph for de novo genome assembly is a problem that can be translated to cycles enumeration in graph theory. Cycle enumerations algorithms in big and complex de Bruijn graphs are time consuming. Special…
▽ More
Motivation: Working with a large number of genomes simultaneously is of great interest in genetic population and comparative genomics research. Bubbles discovery in multi-genomes coloured de bruijn graph for de novo genome assembly is a problem that can be translated to cycles enumeration in graph theory. Cycle enumerations algorithms in big and complex de Bruijn graphs are time consuming. Specialised fast algorithms for efficient bubble search are needed for coloured de bruijn graph variant calling applications. In coloured de Bruijn graphs, bubble paths coverages are used in downstream variants calling analysis. Results: In this paper, we introduce a fast parallel graph search for different K-mer cycle sizes. Coloured path coverages are used for SNP prediction. The graph search method uses a combined multi-node and multi-core design to speeds up cycles enumeration. The search algorithm uses an index extracted from the raw assembly of a coloured de Bruijn graph stored in a hash table. The index is distributed across different CPU-cores, in a shared memory HPC compute node, to build undirected subgraphs then search independently and simultaneously specific cycle sizes. This same index can also be split between several HPC compute nodes to take advantage of as many CPU-cores available to the user. The local neighbourhood parallel search approach reduces the graph's complexity and facilitate cycles search of a multi-colour de Bruijn graph. The search algorithm is incorporated into $Π$-cyc application and tested on a number of Schizosaccharomyces Pombe genomes. Availability: $Π$-cyc is an open-source software available at www.github.com/2kplus2P
△ Less
Submitted 18 September, 2018;
originally announced September 2018.