-
AbDiffuser: Full-Atom Generation of in vitro Functioning Antibodies
Authors:
Karolis Martinkus,
Jan Ludwiczak,
Kyunghyun Cho,
Wei-Ching Liang,
Julien Lafrance-Vanasse,
Isidro Hotzel,
Arvind Rajpal,
Yan Wu,
Richard Bonneau,
Vladimir Gligorijevic,
Andreas Loukas
Abstract:
We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage…
▽ More
We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage of domain knowledge and physics-based constraints; handles sequence-length changes; and reduces memory complexity by an order of magnitude, enabling backbone and side chain generation. We validate AbDiffuser in silico and in vitro. Numerical experiments showcase the ability of AbDiffuser to generate antibodies that closely track the sequence and structural properties of a reference set. Laboratory experiments confirm that all 16 HER2 antibodies discovered were expressed at high levels and that 57.1% of the selected designs were tight binders.
△ Less
Submitted 6 March, 2024; v1 submitted 28 July, 2023;
originally announced August 2023.
-
Protein Design with Guided Discrete Diffusion
Authors:
Nate Gruver,
Samuel Stanton,
Nathan C. Frey,
Tim G. J. Rudner,
Isidro Hotzel,
Julien Lafrance-Vanasse,
Arvind Rajpal,
Kyunghyun Cho,
Andrew Gordon Wilson
Abstract:
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to…
▽ More
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments.
△ Less
Submitted 12 December, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation
Authors:
Aakash Rajpal,
Noshaba Cheema,
Klaus Illgner-Fehns,
Philipp Slusallek,
Sunil Jaiswal
Abstract:
Accurate depth maps are essential in various applications, such as autonomous driving, scene reconstruction, point-cloud creation, etc. However, monocular-depth estimation (MDE) algorithms often fail to provide enough texture & sharpness, and also are inconsistent for homogeneous scenes. These algorithms mostly use CNN or vision transformer-based architectures requiring large datasets for supervis…
▽ More
Accurate depth maps are essential in various applications, such as autonomous driving, scene reconstruction, point-cloud creation, etc. However, monocular-depth estimation (MDE) algorithms often fail to provide enough texture & sharpness, and also are inconsistent for homogeneous scenes. These algorithms mostly use CNN or vision transformer-based architectures requiring large datasets for supervised training. But, MDE algorithms trained on available depth datasets do not generalize well and hence fail to perform accurately in diverse real-world scenes. Moreover, the ground-truth depth maps are either lower resolution or sparse leading to relatively inconsistent depth maps. In general, acquiring a high-resolution ground truth dataset with pixel-level precision for accurate depth prediction is an expensive, and time-consuming challenge.
In this paper, we generate a high-resolution synthetic depth dataset (HRSD) of dimension 1920 X 1080 from Grand Theft Auto (GTA-V), which contains 100,000 color images and corresponding dense ground truth depth maps. The generated datasets are diverse and have scenes from indoors to outdoors, from homogeneous surfaces to textures. For experiments and analysis, we train the DPT algorithm, a state-of-the-art transformer-based MDE algorithm on the proposed synthetic dataset, which significantly increases the accuracy of depth maps on different scenes by 9 %. Since the synthetic datasets are of higher resolution, we propose adding a feature extraction module in the transformer encoder and incorporating an attention-based loss, further improving the accuracy by 15 %.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences
Authors:
Nataša Tagasovska,
Nathan C. Frey,
Andreas Loukas,
Isidro Hötzel,
Julien Lafrance-Vanasse,
Ryan Lewis Kelly,
Yan Wu,
Arvind Rajpal,
Richard Bonneau,
Kyunghyun Cho,
Stephen Ra,
Vladimir Gligorijević
Abstract:
Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other…
▽ More
Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other. In this work, we propose a Pareto-compositional energy-based model (pcEBM), a framework that uses multiple gradient descent for sampling new designs that adhere to various constraints in optimizing distinct properties. We demonstrate its ability to learn non-convex Pareto fronts and generate sequences that simultaneously satisfy multiple desired properties across a series of real-world antibody design tasks.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
A Comprehensive Review on Digital Image Watermarking
Authors:
Shweta Wadhera,
Deepa Kamra,
Ankit Rajpal,
Aruna Jain,
Vishal Jain
Abstract:
The advent of the Internet led to the easy availability of digital data like images, audio, and video. Easy access to multimedia gives rise to the issues such as content authentication, security, copyright protection, and ownership identification. Here, we discuss the concept of digital image watermarking with a focus on the technique used in image watermark embedding and extraction of the waterma…
▽ More
The advent of the Internet led to the easy availability of digital data like images, audio, and video. Easy access to multimedia gives rise to the issues such as content authentication, security, copyright protection, and ownership identification. Here, we discuss the concept of digital image watermarking with a focus on the technique used in image watermark embedding and extraction of the watermark. The detailed classification along with the basic characteristics, namely visual imperceptibility, robustness, capacity, security of digital watermarking is also presented in this work. Further, we have also discussed the recent application areas of digital watermarking such as healthcare, remote education, electronic voting systems, and the military. The robustness is evaluated by examining the effect of image processing attacks on the signed content and the watermark recoverability. The authors believe that the comprehensive survey presented in this paper will help the new researchers to gather knowledge in this domain. Further, the comparative analysis can enkindle ideas to improve upon the already mentioned techniques.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Biomarker Gene Identification for Breast Cancer Classification
Authors:
Sheetal Rajpal,
Ankit Rajpal,
Manoj Agarwal,
Naveen Kumar
Abstract:
BACKGROUND: Breast cancer has emerged as one of the most prevalent cancers among women leading to a high mortality rate. Due to the heterogeneous nature of breast cancer, there is a need to identify differentially expressed genes associated with breast cancer subtypes for its timely diagnosis and treatment. OBJECTIVE: To identify a small gene set for each of the four breast cancer subtypes that co…
▽ More
BACKGROUND: Breast cancer has emerged as one of the most prevalent cancers among women leading to a high mortality rate. Due to the heterogeneous nature of breast cancer, there is a need to identify differentially expressed genes associated with breast cancer subtypes for its timely diagnosis and treatment. OBJECTIVE: To identify a small gene set for each of the four breast cancer subtypes that could act as its signature, the paper proposes a novel algorithm for gene signature identification. METHODS: The present work uses interpretable AI methods to investigate the predictions made by the deep neural network employed for subtype classification to identify biomarkers using the TCGA breast cancer RNA Sequence data. RESULTS: The proposed algorithm led to the discovery of a set of 43 differentially expressed gene signatures. We achieved a competitive average 10-fold accuracy of 0.91, using neural network classifier. Further, gene set analysis revealed several relevant pathways, such as GRB7 events in ERBB2 and p53 signaling pathway. Using the Pearson correlation matrix, we noted that the subtype-specific genes are correlated within each subtype. CONCLUSIONS: The proposed technique enables us to find a concise and clinically relevant gene signature set.
△ Less
Submitted 29 November, 2021; v1 submitted 10 November, 2021;
originally announced November 2021.
-
COV-ELM classifier: An Extreme Learning Machine based identification of COVID-19 using Chest X-Ray Images
Authors:
Sheetal Rajpal,
Manoj Agarwal,
Ankit Rajpal,
Navin Lakhyani,
Arpita Saggar,
Naveen Kumar
Abstract:
Coronaviruses constitute a family of viruses that gives rise to respiratory diseases. As COVID-19 is highly contagious, early diagnosis of COVID-19 is crucial for an effective treatment strategy. However, the RT-PCR test which is considered to be a gold standard in the diagnosis of COVID-19 suffers from a high false-negative rate. Chest X-ray (CXR) image analysis has emerged as a feasible and effe…
▽ More
Coronaviruses constitute a family of viruses that gives rise to respiratory diseases. As COVID-19 is highly contagious, early diagnosis of COVID-19 is crucial for an effective treatment strategy. However, the RT-PCR test which is considered to be a gold standard in the diagnosis of COVID-19 suffers from a high false-negative rate. Chest X-ray (CXR) image analysis has emerged as a feasible and effective diagnostic technique towards this objective. In this work, we propose the COVID-19 classification problem as a three-class classification problem to distinguish between COVID-19, normal, and pneumonia classes. We propose a three-stage framework, named COV-ELM. Stage one deals with preprocessing and transformation while stage two deals with feature extraction. These extracted features are passed as an input to the ELM at the third stage, resulting in the identification of COVID-19. The choice of ELM in this work has been motivated by its faster convergence, better generalization capability, and shorter training time in comparison to the conventional gradient-based learning algorithms. As bigger and diverse datasets become available, ELM can be quickly retrained as compared to its gradient-based competitor models. The proposed model achieved a macro average F1-score of 0.95 and the overall sensitivity of ${0.94 \pm 0.02} at a 95% confidence interval. When compared to state-of-the-art machine learning algorithms, the COV-ELM is found to outperform its competitors in this three-class classification scenario. Further, LIME has been integrated with the proposed COV-ELM model to generate annotated CXR images. The annotations are based on the superpixels that have contributed to distinguish between the different classes. It was observed that the superpixels correspond to the regions of the human lungs that are clinically observed in COVID-19 and Pneumonia cases.
△ Less
Submitted 28 September, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Quality assessment of voice converted speech using articulatory features
Authors:
Avni Rajpal,
Nirmesh J. Shah,
Mohammadi Zaki,
Hemant A. Patil
Abstract:
We propose a novel application based on acoustic-to-articulatory inversion towards quality assessment of voice converted speech. The ability of humans to speak effortlessly requires coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards naturalness, intelligibility and speakers identity which is partially present in voice converted speech. Hence,…
▽ More
We propose a novel application based on acoustic-to-articulatory inversion towards quality assessment of voice converted speech. The ability of humans to speak effortlessly requires coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards naturalness, intelligibility and speakers identity which is partially present in voice converted speech. Hence, during voice conversion, the information related to speech production is lost. In this paper, this loss is quantified for male voice, by showing increase in RMSE error for voice converted speech followed by showing decrease in mutual information. Similar results are obtained in case of female voice. This observation is extended by showing that articulatory features can be used as an objective measure. The effectiveness of proposed measure over MCD is illustrated by comparing their correlation with Mean Opinion Score.
△ Less
Submitted 23 November, 2015; v1 submitted 16 November, 2015;
originally announced November 2015.