Showing 1–2 of 2 results for author: Basak, S C
-
Adapting the Interrelated Two-way Clustering method for Quantitative Structure-Activity Relationship (QSAR) Modeling of a Diverse Set of Chemical Compounds
Authors:
Subhabrata Majumdar,
Subhash C. Basak,
Gregory D. Grunwald
Abstract:
Interrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the process. This has been found to be a better approach than conventional clustering methods like K-means or self-organizing map for the scenarios when number of samples much smaller…
▽ More
Interrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the process. This has been found to be a better approach than conventional clustering methods like K-means or self-organizing map for the scenarios when number of samples much smaller than number of variables (n<<p). In this paper we used the ITC approach for classification of a diverse set of 508 chemicals regarding mutagenicity. A large number of topological indices (TIs), 3-dimensional, and quantum chemical descriptors, as well as atom pairs (APs) have been used as explanatory variables. In this paper, ITC has been used only for predictor selection, after which ridge regression is employed to build the final predictive model. The proper leave-one-out (LOO) method of cross-validation in this scenario is to take as holdout each of the 508 compounds before predictor thinning and compare the predicted values with the experimental data. ITC based results obtained here are comparable to those developed earlier.
△ Less
Submitted 30 May, 2013;
originally announced May 2013.
-
Statistical theory of spectra: Statistical moments as descriptors in the theory of molecular similarity
Authors:
Dorota Bielinska-Waz,
Piotr Waz,
Subhash C. Basak
Abstract:
Statistical moments of the intensity distributions are used as molecular descriptors. They are used as a basis for defining similarity distances between two model spectra. Parameters which carry the information derived from the comparison of shapes of the spectra and are related to the number of properties taken into account, are defined.
Statistical moments of the intensity distributions are used as molecular descriptors. They are used as a basis for defining similarity distances between two model spectra. Parameters which carry the information derived from the comparison of shapes of the spectra and are related to the number of properties taken into account, are defined.
△ Less
Submitted 29 September, 2005;
originally announced September 2005.