-
Levenshtein Distance Embedding with Poisson Regression for DNA Storage
Authors:
Xiang Wei,
Alan J. X. Guo,
Sihan Sun,
Mengyi Wei,
Wei Yu
Abstract:
Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural n…
▽ More
Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural network-based sequence embedding technique using Poisson regression is proposed. We first provide a theoretical analysis of the impact of embedding dimension on model performance and present a criterion for selecting an appropriate embedding dimension. Under this embedding dimension, the Poisson regression is introduced by assuming the Levenshtein distance between sequences of fixed length following a Poisson distribution, which naturally aligns with the definition of Levenshtein distance. Moreover, from the perspective of the distribution of embedding distances, Poisson regression approximates the negative log likelihood of the chi-squared distribution and offers advancements in removing the skewness. Through comprehensive experiments on real DNA storage data, we demonstrate the superior performance of the proposed method compared to state-of-the-art approaches.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
In Silico Tools in PROTACs design
Authors:
Mengman Wei
Abstract:
PROTACs, as a highly promising new. therapeutic paradigm, have attracted widespread attention from the academic and pharmaceutical communities in recent years. To date, the design and validation of PROTACs molecule's druggability primarily rely on experimental approaches, making the development process of this kind of drug molecule time-consuming. Computer-aided tools for PROTACs design may offer…
▽ More
PROTACs, as a highly promising new. therapeutic paradigm, have attracted widespread attention from the academic and pharmaceutical communities in recent years. To date, the design and validation of PROTACs molecule's druggability primarily rely on experimental approaches, making the development process of this kind of drug molecule time-consuming. Computer-aided tools for PROTACs design may offer a potential solution to expedite the design process and enhance its efficiency. This mini review briefly summarizes the in silico tools for PROTACs drug molecule design reported recently.
△ Less
Submitted 9 July, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Learning Melanocytic Cell Masks from Adjacent Stained Tissue
Authors:
Mikio Tada,
Ursula E. Lang,
Iwei Yeh,
Elizabeth S. Keiser,
Maria L. Wei,
Michael J. Keiser
Abstract:
Melanoma is one of the most aggressive forms of skin cancer, causing a large proportion of skin cancer deaths. However, melanoma diagnoses by pathologists shows low interrater reliability. As melanoma is a cancer of the melanocyte, there is a clear need to develop a melanocytic cell segmentation tool that is agnostic to pathologist variability and automates pixel-level annotation. Gigapixel-level…
▽ More
Melanoma is one of the most aggressive forms of skin cancer, causing a large proportion of skin cancer deaths. However, melanoma diagnoses by pathologists shows low interrater reliability. As melanoma is a cancer of the melanocyte, there is a clear need to develop a melanocytic cell segmentation tool that is agnostic to pathologist variability and automates pixel-level annotation. Gigapixel-level pathologist labeling, however, is impractical. Herein, we propose a means to train deep neural networks for melanocytic cell segmentation from hematoxylin and eosin (H&E) stained sections and paired immunohistochemistry (IHC) of adjacent tissue sections, achieving a mean IOU of 0.64 despite imperfect ground-truth labels.
△ Less
Submitted 13 March, 2024; v1 submitted 31 October, 2022;
originally announced November 2022.