Search | arXiv e-print repository

CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules

Abstract: Progress in both Machine Learning (ML) and conventional Quantum Chemistry (QC) computational methods have resulted in high accuracy ML models for QC properties ranging from atomization energies to excitation energies. Various datasets such as MD17, MD22, and WS22, which consist of properties calculated at some level of QC method, or fidelity, have been generated to benchmark such ML models. The te… ▽ More Progress in both Machine Learning (ML) and conventional Quantum Chemistry (QC) computational methods have resulted in high accuracy ML models for QC properties ranging from atomization energies to excitation energies. Various datasets such as MD17, MD22, and WS22, which consist of properties calculated at some level of QC method, or fidelity, have been generated to benchmark such ML models. The term fidelity refers to the accuracy of the chosen QC method to the actual real value of the property. The higher the fidelity, the more accurate the calculated property, albeit at a higher computational cost. Research in multifidelity ML (MFML) methods, where ML models are trained on data from more than one numerical QC method, has shown the effectiveness of such models over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. A major hurdle for effective research in this field of research in the community is the lack of a diverse multifidelity dataset for benchmarking. Here, we present a comprehensive multifidelity dataset drawn from the WS22 molecular conformations. We provide the quantum Chemistry MultiFidelity (CheMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice and are namely: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. CheMFi offers to the community a variety of QC properties including vertical excitation energies, oscillator strengths, molecular dipole moments, and ground state energies. In addition to the dataset, multifidelity benchmarks are set with state-of-the-art MFML and optimized-MFML △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: SI not included

arXiv:2402.08648 [pdf, other]

Generating Universal Adversarial Perturbations for Quantum Classifiers

Authors: Gautham Anil, Vishnu Vinod, Apurva Narayan

Abstract: Quantum Machine Learning (QML) has emerged as a promising field of research, aiming to leverage the capabilities of quantum computing to enhance existing machine learning methodologies. Recent studies have revealed that, like their classical counterparts, QML models based on Parametrized Quantum Circuits (PQCs) are also vulnerable to adversarial attacks. Moreover, the existence of Universal Advers… ▽ More Quantum Machine Learning (QML) has emerged as a promising field of research, aiming to leverage the capabilities of quantum computing to enhance existing machine learning methodologies. Recent studies have revealed that, like their classical counterparts, QML models based on Parametrized Quantum Circuits (PQCs) are also vulnerable to adversarial attacks. Moreover, the existence of Universal Adversarial Perturbations (UAPs) in the quantum domain has been demonstrated theoretically in the context of quantum classifiers. In this work, we introduce QuGAP: a novel framework for generating UAPs for quantum classifiers. We conceptualize the notion of additive UAPs for PQC-based classifiers and theoretically demonstrate their existence. We then utilize generative models (QuGAP-A) to craft additive UAPs and experimentally show that quantum classifiers are susceptible to such attacks. Moreover, we formulate a new method for generating unitary UAPs (QuGAP-U) using quantum generative models and a novel loss function based on fidelity constraints. We evaluate the performance of the proposed framework and show that our method achieves state-of-the-art misclassification rates, while maintaining high fidelity between legitimate and adversarial samples. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted at AAAI 2024

arXiv:2312.05661 [pdf, ps, other]

doi 10.1088/2632-2153/ad2cef

Optimized Multifidelity Machine Learning for Quantum Chemistry

Authors: Vivin Vinod, Ulrich Kleinekathöfer, Peter Zaspel

Abstract: Machine learning (ML) provides access to fast and accurate quantum chemistry (QC) calculations for various properties of interest such as excitation energies. It is often the case that high accuracy in prediction using an ML model, demands a large and costly training set. Various solutions and procedures have been presented to reduce this cost. These include methods such as $Δ$-ML, hierarchical-ML… ▽ More Machine learning (ML) provides access to fast and accurate quantum chemistry (QC) calculations for various properties of interest such as excitation energies. It is often the case that high accuracy in prediction using an ML model, demands a large and costly training set. Various solutions and procedures have been presented to reduce this cost. These include methods such as $Δ$-ML, hierarchical-ML, and multifidelity machine learning (MFML). MFML combines various $Δ$-ML like sub-models for various fidelities according to a fixed scheme derived from the sparse grid combination technique. In this work we implement an optimization procedure to combine multifidelity models in a flexible scheme resulting in optimized MFML (o-MFML) that provides superior prediction capabilities. This hyper-parameter optimization is carried out on a holdout validation set of the property of interest. This work benchmarks the o-MFML method in predicting the atomization energies on the QM7b dataset, and again in the prediction of excitation energies for three molecules of growing size. The results indicate that o-MFML is a strong methodological improvement over MFML and provides lower error of prediction. Even in cases of poor data distributions and lack of clear hierarchies among the fidelities, which were previously identified as issues for multifidelity methods, the o-MFML provides advantage to the prediction of quantum chemical properties. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: SI not included

arXiv:2305.11292 [pdf, other]

doi 10.1021/acs.jctc.3c00882

Multi-Fidelity Machine Learning for Excited State Energies of Molecules

Authors: Vivin Vinod, Sayan Maity, Peter Zaspel, Ulrich Kleinekathöfer

Abstract: The accurate but fast calculation of molecular excited states is still a very challenging topic. For many applications, detailed knowledge of the energy funnel in larger molecular aggregates is of key importance requiring highly accurate excited state energies. To this end, machine learning techniques can be an extremely useful tool though the cost of generating highly accurate training datasets s… ▽ More The accurate but fast calculation of molecular excited states is still a very challenging topic. For many applications, detailed knowledge of the energy funnel in larger molecular aggregates is of key importance requiring highly accurate excited state energies. To this end, machine learning techniques can be an extremely useful tool though the cost of generating highly accurate training datasets still remains a severe challenge. To overcome this hurdle, this work proposes the use of multi-fidelity machine learning where very little training data from high accuracies is combined with cheaper and less accurate data to achieve the accuracy of the costlier level. In the present study, the approach is employed to predict the first excited state energies for three molecules of increasing size, namely, benzene, naphthalene, and anthracene. The energies are trained and tested for conformations stemming from classical molecular dynamics simulations and from real-time density functional tight-binding calculations. It can be shown that the multi-fidelity machine learning model can achieve the same accuracy as a machine learning model built only on high cost training data while having a much lower computational effort to generate the data. The numerical gain observed in these benchmark test calculations was over a factor of 30 but certainly can be much higher for high accuracy data. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2303.15528 [pdf, other]

Few-Shot Domain Adaptation for Low Light RAW Image Enhancement

Authors: K. Ram Prabhakar, Vishal Vinod, Nihar Ranjan Sahoo, R. Venkatesh Babu

Abstract: Enhancing practical low light raw images is a difficult task due to severe noise and color distortions from short exposure time and limited illumination. Despite the success of existing Convolutional Neural Network (CNN) based methods, their performance is not adaptable to different camera domains. In addition, such methods also require large datasets with short-exposure and corresponding long-exp… ▽ More Enhancing practical low light raw images is a difficult task due to severe noise and color distortions from short exposure time and limited illumination. Despite the success of existing Convolutional Neural Network (CNN) based methods, their performance is not adaptable to different camera domains. In addition, such methods also require large datasets with short-exposure and corresponding long-exposure ground truth raw images for each camera domain, which is tedious to compile. To address this issue, we present a novel few-shot domain adaptation method to utilize the existing source camera labeled data with few labeled samples from the target camera to improve the target domain's enhancement quality in extreme low-light imaging. Our experiments show that only ten or fewer labeled samples from the target camera domain are sufficient to achieve similar or better enhancement performance than training a model with a large labeled target camera dataset. To support research in this direction, we also present a new low-light raw image dataset captured with a Nikon camera, comprising short-exposure and their corresponding long-exposure ground truth images. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: BMVC 2021 Best Student Paper Award (Runner-Up). Project Page: https://val.cds.iisc.ac.in/HDR/BMVC21/index.html

Journal ref: 32nd British Machine Vision Conference 2021, BMVC 2021, 327

arXiv:2303.13743 [pdf, other]

TEGLO: High Fidelity Canonical Texture Map** from Single-View Images

Authors: Vishal Vinod, Tanmay Shah, Dmitry Lagun

Abstract: Recent work in Neural Fields (NFs) learn 3D representations from class-specific single view image collections. However, they are unable to reconstruct the input data preserving high-frequency details. Further, these methods do not disentangle appearance from geometry and hence are not suitable for tasks such as texture transfer and editing. In this work, we propose TEGLO (Textured EG3D-GLO) for le… ▽ More Recent work in Neural Fields (NFs) learn 3D representations from class-specific single view image collections. However, they are unable to reconstruct the input data preserving high-frequency details. Further, these methods do not disentangle appearance from geometry and hence are not suitable for tasks such as texture transfer and editing. In this work, we propose TEGLO (Textured EG3D-GLO) for learning 3D representations from single view in-the-wild image collections for a given class of objects. We accomplish this by training a conditional Neural Radiance Field (NeRF) without any explicit 3D supervision. We equip our method with editing capabilities by creating a dense correspondence map** to a 2D canonical space. We demonstrate that such map** enables texture transfer and texture editing without requiring meshes with shared topology. Our key insight is that by map** the input image pixels onto the texture space we can achieve near perfect reconstruction (>= 74 dB PSNR at 1024^2 resolution). Our formulation allows for high quality 3D consistent novel view synthesis with high-frequency details at megapixel image resolution. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2109.02250 [pdf, other]

Estimating Leaf Water Content using Remotely Sensed Hyperspectral Data

Authors: Vishal Vinod, Rahul Raj, Rohit **ale, Adinarayana Jagarlapudi

Abstract: Plant water stress may occur due to the limited availability of water to the roots/soil or due to increased transpiration. These factors adversely affect plant physiology and photosynthetic ability to the extent that it has been shown to have inhibitory effects in both growth and yield [18]. Early identification of plant water stress status enables suitable corrective measures to be applied to obt… ▽ More Plant water stress may occur due to the limited availability of water to the roots/soil or due to increased transpiration. These factors adversely affect plant physiology and photosynthetic ability to the extent that it has been shown to have inhibitory effects in both growth and yield [18]. Early identification of plant water stress status enables suitable corrective measures to be applied to obtain the expected crop yield. Further, improving crop yield through precision agriculture methods is a key component of climate policy and the UN sustainable development goals [1]. Leaf water content (LWC) is a measure that can be used to estimate water content and identify stressed plants. LWC during the early crop growth stages is an important indicator of plant productivity and yield. The effect of water stress can be instantaneous [15], affecting gaseous exchange or long-term, significantly reducing [9, 18, 22]. It is thus necessary to identify potential plant water stress during the early stages of growth [15] to introduce corrective irrigation and alleviate stress. LWC is also useful for identifying plant genotypes that are tolerant to water stress and salinity by measuring the stability of LWC even under artificially induced water stress [18, 25]. Such experiments generally employ destructive procedures to obtain the LWC, which is time-consuming and labor intensive. Accordingly, this research has developed a non-destructive method to estimate LWC from UAV-based hyperspectral data. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: ICCV 2021 CVPPA Workshop Extended Abstract

arXiv:2106.01251 [pdf, other]

Multilingual Medical Question Answering and Information Retrieval for Rural Health Intelligence Access

Authors: Vishal Vinod, Susmit Agrawal, Vipul Gaurav, Pallavi R, Savita Choudhary

Abstract: In rural regions of several develo** countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable. Many of these regions are gradually gaining access to internet infrastructure, although not with a strong enough connection to allow for sustained communication with a medical practitioner. Several deaths resulting from this lack of medical acce… ▽ More In rural regions of several develo** countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable. Many of these regions are gradually gaining access to internet infrastructure, although not with a strong enough connection to allow for sustained communication with a medical practitioner. Several deaths resulting from this lack of medical access, absence of patient's previous health records, and the unavailability of information in indigenous languages can be easily prevented. In this paper, we describe an approach leveraging the phenomenal progress in Machine Learning and NLP (Natural Language Processing) techniques to design a model that is low-resource, multilingual, and a preliminary first-point-of-contact medical assistant. Our contribution includes defining the NLP pipeline required for named-entity-recognition, language-agnostic sentence embedding, natural language translation, information retrieval, question answering, and generative pre-training for final query processing. We obtain promising results for this pipeline and preliminary results for EHR (Electronic Health Record) analysis with text summarization for medical practitioners to peruse for their diagnosis. Through this NLP pipeline, we aim to provide preliminary medical information to the user and do not claim to supplant diagnosis from qualified medical practitioners. Using the input from subject matter experts, we have compiled a large corpus to pre-train and fine-tune our BioBERT based NLP model for the specific tasks. We expect recent advances in NLP architectures, several of which are efficient and privacy-preserving models, to further the impact of our solution and improve on individual task performance. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Journal ref: ICLR 2021 Workshop

Showing 1–8 of 8 results for author: Vinod, V