Exploring the efficacy of molecular fragments of different complexity in computational SAR modeling

Zimmermann, Albrecht; Bringmann, Björn; De Raedt, Luc

Computer Science > Computational Engineering, Finance, and Science

arXiv:1501.03015 (cs)

[Submitted on 13 Jan 2015]

Title:Exploring the efficacy of molecular fragments of different complexity in computational SAR modeling

Authors:Albrecht Zimmermann, Björn Bringmann, Luc De Raedt

View PDF

Abstract:An important first step in computational SAR modeling is to transform the compounds into a representation that can be processed by predictive modeling techniques. This is typically a feature vector where each feature indicates the presence or absence of a molecular fragment. While the traditional approach to SAR modeling employed size restricted fingerprints derived from path fragments, much research in recent years focussed on mining more complex graph based fragments. Today, there seems to be a growing consensus in the data mining community that these more expressive fragments should be more useful. We question this consensus and show experimentally that fragments of low complexity, i.e. sequences, perform better than equally large sets of more complex ones, an effect we explain by pairwise correlation among fragments and the ability of a fragment set to encode compounds from different classes distinctly. The size restriction on these sets is based on ordering the fragments by class-correlation scores. In addition, we also evaluate the effects of using a significance value instead of a length restriction for path fragments and find a significant reduction in the number of features with little loss in performance.

Subjects:	Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Cite as:	arXiv:1501.03015 [cs.CE]
	(or arXiv:1501.03015v1 [cs.CE] for this version)
	https://doi.org/10.48550/arXiv.1501.03015

Submission history

From: Albrecht Zimmermann [view email]
[v1] Tue, 13 Jan 2015 14:24:58 UTC (1,860 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CE

< prev | next >

new | recent | 2015-01

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Albrecht Zimmermann
Björn Bringmann
Luc De Raedt

export BibTeX citation

Computer Science > Computational Engineering, Finance, and Science

Title:Exploring the efficacy of molecular fragments of different complexity in computational SAR modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computational Engineering, Finance, and Science

Title:Exploring the efficacy of molecular fragments of different complexity in computational SAR modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators