-
Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance
Abstract: The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our… ▽ More
Submitted 22 May, 2015; originally announced May 2015.
Comments: Peer-reviewed and accepted for an oral presentation in the Greed is Great workshop at the International Conference on Machine Learning, Lille, France, 2015
-
Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine
Abstract: The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa… ▽ More
Submitted 2 December, 2014; originally announced December 2014.
Comments: Presented at Machine Learning in Computational Biology 2014, Montréal, Québec, Canada