-
Wasm-iCARE: a portable and privacy-preserving web module to build, validate, and apply absolute risk models
Authors:
Jeya Balaji Balasubramanian,
Parichoy Pal Choudhury,
Srijon Mukhopadhyay,
Thomas Ahearn,
Nilanjan Chatterjee,
Montserrat GarcĂa-Closas,
Jonas S. Almeida
Abstract:
Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to ove…
▽ More
Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to overcome these limitations.
Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) then compiled it to WebAssembly (Wasm-iCARE): a portable web module, which operates entirely within the privacy of the user's device.
Results: We showcase the portability and privacy of Wasm-iCARE through two applications: for researchers to statistically validate risk models, and to deliver them to end-users. Both applications run entirely on the client-side, requiring no downloads or installations, and keeps user data on-device during risk calculation.
Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Moving towards FAIR practices in epidemiological research
Authors:
Montserrat Garcia-Closas,
Thomas U. Ahearn,
Mia M. Gaudet,
Amber N. Hurson,
Jeya Balaji Balasubramanian,
Parichoy Pal Choudhury,
Nicole M. Gerlanc,
Bhaumik Patel,
Daniel Russ,
Mustapha Abubakar,
Neal D. Freedman,
Wendy S. W. Wong,
Stephen J. Chanock,
Amy Berrington de Gonzalez,
Jonas S Almeida
Abstract:
Reproducibility and replicability of research findings are central to the scientific integrity of epidemiology. In addition, many research questions require combiningdata from multiple sources to achieve adequate statistical power. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of sharing resources, both data and code. Epidemiological practices…
▽ More
Reproducibility and replicability of research findings are central to the scientific integrity of epidemiology. In addition, many research questions require combiningdata from multiple sources to achieve adequate statistical power. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of sharing resources, both data and code. Epidemiological practices that follow FAIR principles can address these barriers by making resources (F)indable with the necessary metadata , (A)ccessible to authorized users and (I)nteroperable with other data, to optimize the (R)e-use of resources with appropriate credit to its creators. We provide an overview of these principles and describe approaches for implementation in epidemiology. Increasing degrees of FAIRness can be achieved by moving data and code from on-site locations to the Cloud, using machine-readable and non-proprietary files, and develo** open-source code. Adoption of these practices will improve daily work and collaborative analyses, and facilitate compliance with data sharing policies from funders and scientific journals. Achieving a high degree of FAIRness will require funding, training, organizational support, recognition, and incentives for sharing resources. But these costs are amply outweighed by the benefits of making research more reproducible, impactful, and equitable by facilitating the re-use of precious research resources by the scientific community.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Understanding the behavioural difference of PPCA among its homologs in C7 family towards recognition of DXCA
Authors:
Suvankar Ghosh,
Shankar Kumar Ghosh,
Camellia Ray,
Goutam Paul,
Pabitra Pal Choudhury,
Raja Banerjee
Abstract:
Among all the proteins of Periplasmic C type cytochrome A (PPCA) family obtained from cytochrome C7 found in Geobacter sulfurreducens, PPCA protein can interact with Deoxycholate (DXCA), while its other homologs do not, as observed from the crystal structures. Utilizing the concept of 'structure-function relationship', an effort has been initiated towards understanding the driving force for recogn…
▽ More
Among all the proteins of Periplasmic C type cytochrome A (PPCA) family obtained from cytochrome C7 found in Geobacter sulfurreducens, PPCA protein can interact with Deoxycholate (DXCA), while its other homologs do not, as observed from the crystal structures. Utilizing the concept of 'structure-function relationship', an effort has been initiated towards understanding the driving force for recognition of DXCA exclusively by PPCA among its homologs. Further, a combinatorial analysis of the binding sequences (contiguous sequence of amino acid residues of binding locations) is performed to build graph-theoretic models, which show that PPCA differs from its homologues. Analysis of the results suggests that the underlying impetus of recognition of DXCA by PPCA is embedded in its primary sequence and 3D conformation.
△ Less
Submitted 17 September, 2015;
originally announced August 2016.
-
Understanding Functional Protein-Protein Interactions Of ABCB11 And ADA In Human And Mouse
Authors:
Antara Sengupta,
Sk. Sarif Hassan,
Pabitra Pal Choudhury
Abstract:
Proteins are macromolecules which hardly act alone; they need to make interactions with some other proteins to do so. Numerous factors are there which can regulate the interactions between proteins [4]. Here in this present study we aim to understand Protein -Protein Interactions (PPIs) of two proteins ABCB11 and ADA from quantitative point of view. One of our major aims also is to study the facto…
▽ More
Proteins are macromolecules which hardly act alone; they need to make interactions with some other proteins to do so. Numerous factors are there which can regulate the interactions between proteins [4]. Here in this present study we aim to understand Protein -Protein Interactions (PPIs) of two proteins ABCB11 and ADA from quantitative point of view. One of our major aims also is to study the factors that regulate the PPIs and thus to distinguish these PPIs with proper quantification across the two species Homo Sapiens and Mus Musculus respectively to know how one protein interacts with different set of proteins in different species.
△ Less
Submitted 2 December, 2015;
originally announced December 2015.
-
Understanding of Genetic Code Degeneracy and New Way of Classifying of Protein Family: A Mathematical Approach
Authors:
Jayanta Kumar Das,
Atrayee Majumder,
Pabitra Pal Choudhury
Abstract:
The genetic code is the set of rules by which information encoded in genetic material (DNA or RNA sequences) is translated into proteins (amino acid sequences) by living cells. The code defines a map** between tri-nucleotide sequences, called codons, and amino acids. Since there are 20 amino acids and 64 possible tri-nucleotide sequences, more than one among these 64 triplets can code for a sing…
▽ More
The genetic code is the set of rules by which information encoded in genetic material (DNA or RNA sequences) is translated into proteins (amino acid sequences) by living cells. The code defines a map** between tri-nucleotide sequences, called codons, and amino acids. Since there are 20 amino acids and 64 possible tri-nucleotide sequences, more than one among these 64 triplets can code for a single amino acid which incorporates the problem of degeneracy. This manuscript explains the underlying logic of degeneracy of genetic code based on a mathematical point of view using a parameter named Impression. Classification of protein family is also a long standing problem in the field of Bio-chemistry and Genomics. Proteins belonging to a particular class have some similar bio-chemical properties which are of utmost importance for new drug design. Using the same parameter Impression and using graph theoretic properties we have also devised a new way of classifying a protein family.
△ Less
Submitted 30 November, 2015;
originally announced December 2015.
-
A Quantitative Understanding of Human Sex Chromosomal Genes
Authors:
Sk. Sarif Hassan,
Pabitra Pal Choudhury,
Antara Sengupta,
Binayak Sahu,
Rojalin Mishra,
Devendra Kumar Yadav,
Saswatee Panda,
Dharamveer Pradhan,
Shrusti Dash,
Gourav Pradhan
Abstract:
In the last few decades, the human allosomes are engrossed in an intensive attention among researchers. The allosomes are now already been sequenced and found there are about 2000 and 78 genes in human X and Y chromosomes respectively. The hemizygosity of the human X chromosome in males exposes recessive disease alleles, and this phenomenon has prompted decades of intensive study of X-linked disor…
▽ More
In the last few decades, the human allosomes are engrossed in an intensive attention among researchers. The allosomes are now already been sequenced and found there are about 2000 and 78 genes in human X and Y chromosomes respectively. The hemizygosity of the human X chromosome in males exposes recessive disease alleles, and this phenomenon has prompted decades of intensive study of X-linked disorders. By contrast, the small size of the human Y chromosome, and its prominent long-arm heterochromatic region suggested absence of function beyond sex determination. But the present problem is to accomplish whether a given sequence of nucleotides i.e. a DNA is a Human X or Y chromosomal genes or not, without any biological experimental support. In our perspective, a proper quantitative understanding of these genes is required to justify or nullify whether a given sequence is a Human X or Y chromosomal gene. In this paper, some of the X and Y chromosomal genes have been quantified in genomic and proteomic level through Fractal Geometric and Mathematical Morphometric analysis. Using the proposed quantitative model, one can easily make probable justification or deterministic nullification whether a given sequence of nucleotides is a probable Human X or Y chromosomal gene or not, without seeking any biological experiment. Of course, a further biological experiment is essential to validate it as the probable Human X or Y chromosomal gene homologue. This study would enable Biologists to understand these genes in more quantitative manner instead of their qualitative features.
△ Less
Submitted 1 December, 2013; v1 submitted 23 July, 2012;
originally announced July 2012.
-
Complete Human Mitochondrial Genome Construction Using L-systems
Authors:
Sk. Sarif Hassana,
Pabitra Pal Choudhury,
Amita Pal,
R. L. Brahmachary,
Arunava Goswami
Abstract:
Recently, scientists from The Craig J. Venter Institute reported construction of very long DNA molecules using a variety of experimental procedures adopting a number of working hypotheses. Finding a mathematical rule for generation of such a long sequence would revolutionize our thinking on various advanced areas of biology, viz. evolution of long DNA chains in chromosomes, reasons for existence…
▽ More
Recently, scientists from The Craig J. Venter Institute reported construction of very long DNA molecules using a variety of experimental procedures adopting a number of working hypotheses. Finding a mathematical rule for generation of such a long sequence would revolutionize our thinking on various advanced areas of biology, viz. evolution of long DNA chains in chromosomes, reasons for existence of long stretches of non-coding regions as well as would usher automated methods for long DNA chains preparation for chromosome engineering. However, this mathematical principle must have room for editing / correcting DNA sequences locally in those areas of genomes where mutation and / or DNA polymerase has introduced errors over millions of years. In this paper, we report the basics and applications of L-system (a mathematical principle) which could answer all the aforesaid issues. At the end, we present the whole human mitochondrial genome which has been generated using this mathematical principle using PC computation power. We can claim now that we can make any stretch of DNA, be it 936 bp of olfactory receptor, with or without introns, mitochondrial DNA to 3 x 109 bp DNA sequences of the whole human genome with even a PC computation power.
△ Less
Submitted 25 February, 2010; v1 submitted 17 February, 2010;
originally announced February 2010.