Skip to main content

Showing 1–46 of 46 results for author: Moore, J H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14864  [pdf, other

    cs.LG stat.AP stat.ML

    A review of feature selection strategies utilizing graph data structures and knowledge graphs

    Authors: Sisi Shao, Pedro Henrique Ribeiro, Christina Ramirez, Jason H. Moore

    Abstract: Feature selection in Knowledge Graphs (KGs) are increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.12006  [pdf, other

    cs.NE

    Lexidate: Model Evaluation and Selection with Lexicase

    Authors: Jose Guadalupe Hernandez, Anil Kumar Saini, Jason H. Moore

    Abstract: Automated machine learning streamlines the task of finding effective machine learning pipelines by automating model training, evaluation, and selection. Traditional evaluation strategies, like cross-validation (CV), generate one value that averages the accuracy of a pipeline's predictions. This single value, however, may not fully describe the generalizability of the pipeline. Here, we present Lex… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2404.18961  [pdf, other

    cs.LG cs.AI cs.CV

    Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras

    Authors: Jun Yu, Yutong Dai, Xiaokang Liu, ** Huang, Yishan Shen, Ke Zhang, Rong Zhou, Eashan Adhikarla, Wenxuan Ye, Yixin Liu, Zhaoming Kong, Kai Zhang, Yilong Yin, Vinod Namboodiri, Brian D. Davison, Jason H. Moore, Yong Chen

    Abstract: MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the pa… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 60 figures, 116 pages, 500+ references

  4. Genetic Programming Theory and Practice: A Fifteen-Year Trajectory

    Authors: Moshe Sipper, Jason H. Moore

    Abstract: The GPTP workshop series, which began in 2003, has served over the years as a focal meeting for genetic programming (GP) researchers. As such, we think it provides an excellent source for studying the development of GP over the past fifteen years. We thus present herein a trajectory of the thematic developments in the field of GP.

    Submitted 1 February, 2024; originally announced February 2024.

    Journal ref: Genetic Programming and Evolvable Machines (2020) 21:169-179

  5. arXiv:2401.11167  [pdf, other

    cs.NE

    Coevolving Artistic Images Using OMNIREP

    Authors: Moshe Sipper, Jason H. Moore, Ryan J. Urbanowicz

    Abstract: We have recently developed OMNIREP, a coevolutionary algorithm to discover both a representation and an interpreter that solve a particular problem of interest. Herein, we demonstrate that the OMNIREP framework can be successfully applied within the field of evolutionary art. Specifically, we coevolve representations that encode image position, alongside interpreters that transform these positions… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Journal ref: J. Romero et al. (Eds.), EvoMUSART 2020, LNCS 12103, pp. 165-178, 2020

  6. New Pathways in Coevolutionary Computation

    Authors: Moshe Sipper, Jason H. Moore, Ryan J. Urbanowicz

    Abstract: The simultaneous evolution of two or more species with coupled fitness -- coevolution -- has been put to good use in the field of evolutionary computation. Herein, we present two new forms of coevolutionary algorithms, which we have recently designed and applied with success. OMNIREP is a cooperative coevolutionary algorithm that discovers both a representation and an encoding for solving a partic… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.13509, arXiv:2206.15409, arXiv:2206.12707

    Journal ref: W. Banzhaf et al. (eds.), Genetic Programming Theory and Practice XVII, Genetic and Evolutionary Computation, 2020

  7. arXiv:2401.02965  [pdf

    cs.DL

    Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies

    Authors: Yu-Ning Huang, Michael I. Love, Cynthia Flaire Ronkowski, Dhrithi Deshpande, Lynn M. Schriml, Annie Wong-Beringer, Barend Mons, Russell Corbett-Detig, Christopher I Hunter, Jason H. Moore, Lana X. Garmire, T. B. K. Reddy, Winston A. Hide, Atul J. Butte, Mark D. Robinson, Serghei Mangul

    Abstract: Metadata, often termed "data about data," is crucial for organizing, understanding, and managing vast omics datasets. It aids in efficient data discovery, integration, and interpretation, enabling users to access, comprehend, and utilize data effectively. Its significance spans the domains of scientific research, facilitating data reproducibility, reusability, and secondary analysis. However, nume… ▽ More

    Submitted 22 November, 2023; originally announced January 2024.

  8. arXiv:2302.00731  [pdf, other

    cs.NE cs.AI

    Faster Convergence with Lexicase Selection in Tree-based Automated Machine Learning

    Authors: Nicholas Matsumoto, Anil Kumar Saini, Pedro Ribeiro, Hyunjun Choi, Alena Orlenko, Leo-Pekka Lyytikäinen, Jari O Laurikka, Terho Lehtimäki, Sandra Batista, Jason H. Moore

    Abstract: In many evolutionary computation systems, parent selection methods can affect, among other things, convergence to a solution. In this paper, we present a study comparing the role of two commonly used parent selection methods in evolving machine learning pipelines in an automated machine learning system called Tree-based Pipeline Optimization Tool (TPOT). Specifically, we demonstrate, using experim… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  9. arXiv:2212.02704  [pdf, other

    cs.LG

    Benchmarking AutoML algorithms on a collection of synthetic classification problems

    Authors: Pedro Henrique Ribeiro, Patryk Orzechowski, Joost Wagenaar, Jason H. Moore

    Abstract: Automated machine learning (AutoML) algorithms have grown in popularity due to their high performance and flexibility to adapt to different problems and data sets. With the increasing number of AutoML algorithms, deciding which would best suit a given problem becomes increasingly more work. Therefore, it is essential to use complex and challenging benchmarks which would be able to differentiate th… ▽ More

    Submitted 8 March, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

  10. Applying Autonomous Hybrid Agent-based Computing to Difficult Optimization Problems

    Authors: Mateusz Godzik, Jacek Dajda, Marek Kisiel-Dorohinicki, Aleksander Byrski, Leszek Rutkowski, Patryk Orzechowski, Joost Wagenaar, Jason H. Moore

    Abstract: Evolutionary multi-agent systems (EMASs) are very good at dealing with difficult, multi-dimensional problems, their efficacy was proven theoretically based on analysis of the relevant Markov-Chain based model. Now the research continues on introducing autonomous hybridization into EMAS. This paper focuses on a proposed hybrid version of the EMAS, and covers selection and introduction of a number o… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    ACM Class: I.2.8; I.2.11

    Journal ref: Journal of Computational Science, Volume 64, October 2022, 101858

  11. arXiv:2206.15409  [pdf, other

    cs.NE

    Automatically Balancing Model Accuracy and Complexity using Solution and Fitness Evolution (SAFE)

    Authors: Moshe Sipper, Jason H. Moore, Ryan J. Urbanowicz

    Abstract: When seeking a predictive model in biomedical data, one often has more than a single objective in mind, e.g., attaining both high accuracy and low complexity (to promote interpretability). We investigate herein whether multiple objectives can be dynamically tuned by our recently proposed coevolutionary algorithm, SAFE (Solution And Fitness Evolution). We find that SAFE is able to automatically tun… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  12. arXiv:2206.13509  [pdf, other

    cs.NE

    Solution and Fitness Evolution (SAFE): A Study of Multiobjective Problems

    Authors: Moshe Sipper, Jason H. Moore, Ryan J. Urbanowicz

    Abstract: We have recently presented SAFE -- Solution And Fitness Evolution -- a commensalistic coevolutionary algorithm that maintains two coevolving populations: a population of candidate solutions and a population of candidate objective functions. We showed that SAFE was successful at evolving solutions within a robotic maze domain. Herein we present an investigation of SAFE's adaptation and application… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.12707

    Journal ref: Proceedings of 2019 IEEE Congress on Evolutionary Computation

  13. Solution and Fitness Evolution (SAFE): Coevolving Solutions and Their Objective Functions

    Authors: Moshe Sipper, Jason H. Moore, Ryan J. Urbanowicz

    Abstract: We recently highlighted a fundamental problem recognized to confound algorithmic optimization, namely, \textit{conflating} the objective with the objective function. Even when the former is well defined, the latter may not be obvious, e.g., in learning a strategy to navigate a maze to find a goal (objective), an effective objective function to \textit{evaluate} strategies may not be a simple funct… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Journal ref: EuroGP 2019, LNCS 11451, pages 1-16, 2019

  14. Symbolic-Regression Boosting

    Authors: Moshe Sipper, Jason H Moore

    Abstract: Modifying standard gradient boosting by replacing the embedded weak learner in favor of a strong(er) one, we present SyRBo: Symbolic-Regression Boosting. Experiments over 98 regression datasets show that by adding a small number of boosting stages -- between 2--5 -- to a symbolic regressor, statistically significant improvements can often be attained. We note that coding SyRBo on top of any symbol… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Journal ref: Genetic Programming and Evolvable Machines, 22, 357-381, 2021

  15. arXiv:2107.14351  [pdf, other

    cs.NE

    Contemporary Symbolic Regression Methods and their Relative Performance

    Authors: William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying **, Michael Kommenda, Jason H. Moore

    Abstract: Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. In this paper, we address this shortcoming by introducing an open-source, reproducible benchmarking platform for symbolic regression. We assess 14 symbolic regression methods and 7 machine learnin… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: To appear in Neurips 2021 Track on Datasets and Benchmarks. Main text: 10 pages, 3 figures; Appendix: 7 pages, 8 figures. https://openreview.net/forum?id=xVQMrDLyGst

  16. arXiv:2107.10495  [pdf

    cs.LG

    Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims

    Authors: Roland Albert A. Romero, Mariefel Nicole Y. Deypalan, Suchit Mehrotra, John Titus Jungao, Natalie E. Sheils, Elisabetta Manduchi, Jason H. Moore

    Abstract: We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets. We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: 22 pages, 8 figures, 7 tables

  17. arXiv:2107.06475  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiers

    Authors: Patryk Orzechowski, Jason H. Moore

    Abstract: Understanding the strengths and weaknesses of machine learning (ML) algorithms is crucial for determine their scope of application. Here, we introduce the DIverse and GENerative ML Benchmark (DIGEN) - a collection of synthetic datasets for comprehensive, reproducible, and interpretable benchmarking of machine learning algorithms for classification of binary outcomes. The DIGEN resource consists of… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: 12 pages, 3 figures with subfigures

    MSC Class: 68T09 (Primary) 62R07; 68-04; 68-11 (Secondary) ACM Class: I.5.2; I.1.2; I.5.1; I.6.5; I.2.0; G.1.6

  18. arXiv:2105.01196  [pdf, other

    cs.LG cs.AI cs.DC cs.NE q-bio.GN

    EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering Algorithm in Julia

    Authors: Paweł Renc, Patryk Orzechowski, Aleksander Byrski, Jarosław Wąs, Jason H. Moore

    Abstract: Biclustering is a data mining technique which searches for local patterns in numeric tabular data with main application in bioinformatics. This technique has shown promise in multiple areas, including development of biomarkers for cancer, disease subtype identification, or gene-drug interactions among others. In this paper we introduce EBIC.JL - an implementation of one of the most accurate biclus… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: 9 pages, 11 figures

    MSC Class: 68W50 ACM Class: D.1.3; G.4; I.2.8; I.2.11; I.5.3; J.3

  19. arXiv:2012.00058  [pdf

    cs.LG cs.DB

    PMLB v1.0: An open source dataset collection for benchmarking machine learning methods

    Authors: Joseph D. Romano, Trang T. Le, William La Cava, John T. Gregg, Daniel J. Goldberg, Natasha L. Ray, Praneel Chakraborty, Daniel Himmelstein, Weixuan Fu, Jason H. Moore

    Abstract: Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of… ▽ More

    Submitted 6 April, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

    Comments: 4 pages, 1 figure. *: These authors contributed equally

    ACM Class: H.2.8

  20. arXiv:2008.12829  [pdf, other

    cs.LG stat.ML

    A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary Classification: Application in Pancreatic Cancer Nested Case-control Studies with Implications for Bias Assessments

    Authors: Ryan J. Urbanowicz, Pranshu Suri, Yuhan Cui, Jason H. Moore, Karen Ruth, Rachael Stolzenberg-Solomon, Shannon M. Lynch

    Abstract: Machine learning (ML) offers a collection of powerful approaches for detecting and modeling associations, often applied to data having a large number of features and/or complex associations. Currently, there are many tools to facilitate implementing custom ML analyses (e.g. scikit-learn). Interest is also increasing in automated ML packages, which can make it easier for non-experts to apply ML and… ▽ More

    Submitted 8 September, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: 22 pages, 12 figures

  21. arXiv:2007.03488  [pdf, other

    cs.NE cs.PF math.OC stat.AP

    Benchmarking in Optimization: Best Practice and Open Issues

    Authors: Thomas Bartz-Beielstein, Carola Doerr, Daan van den Berg, Jakob Bossek, Sowmya Chandrasekaran, Tome Eftimov, Andreas Fischbach, Pascal Kerschke, William La Cava, Manuel Lopez-Ibanez, Katherine M. Malan, Jason H. Moore, Boris Naujoks, Patryk Orzechowski, Vanessa Volz, Markus Wagner, Thomas Weise

    Abstract: This survey compiles ideas and recommendations from more than a dozen researchers with different backgrounds and from different institutes around the world. Promoting best practice in benchmarking is its main goal. The article discusses eight essential topics in benchmarking: clearly stated goals, well-specified problems, suitable algorithms, adequate performance measures, thoughtful analysis, eff… ▽ More

    Submitted 16 December, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: Version 2

    MSC Class: 68W50 ACM Class: A.1; B.8.0; G.1.6; G.4; I.2.8

  22. arXiv:2006.06730  [pdf, other

    cs.LG cs.NE stat.ML

    Is deep learning necessary for simple classification tasks?

    Authors: Joseph D. Romano, Trang T. Le, Weixuan Fu, Jason H. Moore

    Abstract: Automated machine learning (AutoML) and deep learning (DL) are two cutting-edge paradigms used to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists for when to choose one approach over the other in the context of specific real-world problems. Furthermore, relatively few tools exist that allow the integration of both AutoML and DL in the same analysis t… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: 14 pages, 5 figures, 3 tables

    ACM Class: I.5.2

  23. Genetic programming approaches to learning fair classifiers

    Authors: William La Cava, Jason H. Moore

    Abstract: Society has come to rely on algorithms like classifiers for important decision making, giving rise to the need for ethical guarantees such as fairness. Fairness is typically defined by asking that some statistic of a classifier be approximately equal over protected groups within a population. In this paper, current approaches to fairness are discussed and used to motivate algorithmic proposals tha… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 9 pages, 7 figures. GECCO 2020

  24. arXiv:2001.11535  [pdf, other

    cs.NE

    SGP-DT: Semantic Genetic Programming Based on Dynamic Targets

    Authors: Stefano Ruberto, Valerio Terragni, Jason H. Moore

    Abstract: Semantic GP is a promising approach that introduces semantic awareness during genetic evolution. This paper presents a new Semantic GP approach based on Dynamic Target (SGP-DT) that divides the search problem into multiple GP runs. The evolution in each run is guided by a new (dynamic) target based on the residual errors. To obtain the final solution, SGP-DT combines the solutions of each run usin… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: 16 pages, European Conference on Genetic Programming (EuroGP 20)

  25. arXiv:1905.09205  [pdf, other

    cs.LG cs.IR

    Evaluating recommender systems for AI-driven biomedical informatics

    Authors: William La Cava, Heather Williams, Weixuan Fu, Steve Vitale, Durga Srivatsan, Jason H. Moore

    Abstract: Motivation: Many researchers with domain expertise are unable to easily apply machine learning to their bioinformatics data due to a lack of machine learning and/or coding expertise. Methods that have been proposed thus far to automate machine learning mostly require programming experience as well as expert knowledge to tune and apply the algorithms correctly. Here, we study a method of automating… ▽ More

    Submitted 28 April, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: 17 pages, 8 figures. this version fixes link to pennai in abstract

  26. Semantic variation operators for multidimensional genetic programming

    Authors: William La Cava, Jason H. Moore

    Abstract: Multidimensional genetic programming represents candidate solutions as sets of programs, and thereby provides an interesting framework for exploiting building block identification. Towards this goal, we investigate the use of machine learning as a way to bias which components of programs are promoted, and propose two semantic operators to choose where useful building blocks are placed during cross… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

    Comments: 9 pages, 8 figures, GECCO 2019

  27. arXiv:1903.12074  [pdf, other

    cs.CY cs.LG stat.ML

    Interpretation of machine learning predictions for patient outcomes in electronic health records

    Authors: William La Cava, Christopher Bauer, Jason H. Moore, Sarah A Pendergrass

    Abstract: Electronic health records are an increasingly important resource for understanding the interactions between patient health, environment, and clinical decisions. In this paper we report an empirical study of predictive modeling of several patient outcomes using three state-of-the-art machine learning methods. Our primary goal is to validate the models by interpreting the importance of predictors in… ▽ More

    Submitted 14 March, 2019; originally announced March 2019.

    Comments: 10 pages, 5 figures, submitted to AMIA Symposium

  28. arXiv:1807.09932  [pdf, ps, other

    q-bio.GN cs.LG stat.ML

    EBIC: an open source software for high-dimensional and big data biclustering analyses

    Authors: Patryk Orzechowski, Jason H. Moore

    Abstract: Motivation: In this paper we present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding support for big data, making it possible to efficiently run large genomic data mining analyses. Additional enhancements include integration with R and Bioconductor and an option to remove influence of missing value on the… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

    Comments: 2 pages, 1 figure

    MSC Class: 68; 92 ACM Class: I.5.2; I.2.11; I.5.3; J.3

  29. arXiv:1807.00981  [pdf, other

    cs.NE

    Learning concise representations for regression by evolving networks of trees

    Authors: William La Cava, Tilak Raj Singh, James Taggart, Srinivas Suri, Jason H. Moore

    Abstract: We propose and study a method for learning interpretable representations for the task of regression. Features are represented as networks of multi-type expression trees comprised of activation functions common in neural networks in addition to other elementary functions. Differentiable features are trained via gradient descent, and the performance of features in a linear model is used to weight th… ▽ More

    Submitted 25 March, 2019; v1 submitted 3 July, 2018; originally announced July 2018.

    Comments: 16 pages, 11 figures (including Appendix), published in ICLR 2019

  30. Gamorithm

    Authors: Moshe Sipper, Jason H. Moore

    Abstract: Examining games from a fresh perspective we present the idea of game-inspired and game-based algorithms, dubbed "gamorithms".

    Submitted 27 August, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

    Comments: IEEE Transactions on Games, 2018

    Journal ref: IEEE Transactions on Games, Volume: 12 , Issue: 1 , March 2020, pp. 115 - 118

  31. Where are we now? A large benchmark study of recent symbolic regression methods

    Authors: Patryk Orzechowski, William La Cava, Jason H. Moore

    Abstract: In this paper we provide a broad benchmarking of recent genetic programming approaches to symbolic regression in the context of state of the art machine learning approaches. We use a set of nearly 100 regression benchmark problems culled from open source repositories across the web. We conduct a rigorous benchmarking of four recent symbolic regression approaches as well as nine machine learning ap… ▽ More

    Submitted 7 June, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: 8 pages, 4 figures. GECCO 2018

  32. arXiv:1801.03039  [pdf, ps, other

    cs.LG cs.CV cs.IR q-bio.GN

    EBIC: an evolutionary-based parallel biclustering algorithm for pattern discover

    Authors: Patryk Orzechowski, Moshe Sipper, Xiuzhen Huang, Jason H. Moore

    Abstract: In this paper a novel biclustering algorithm based on artificial intelligence (AI) is introduced. The method called EBIC aims to detect biologically meaningful, order-preserving patterns in complex data. The proposed algorithm is probably the first one capable of discovering with accuracy exceeding 50% multiple complex patterns in real gene expression datasets. It is also one of the very few biclu… ▽ More

    Submitted 26 July, 2018; v1 submitted 9 January, 2018; originally announced January 2018.

    Comments: 9 pages, 7 figures

    MSC Class: 68; 92 ACM Class: I.5.2; I.2.11; I.5.3; J.3

  33. arXiv:1711.08477  [pdf, other

    cs.LG

    Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining

    Authors: Ryan J. Urbanowicz, Randal S. Olson, Peter Schmitt, Melissa Meeker, Jason H. Moore

    Abstract: Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. `omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally… ▽ More

    Submitted 2 April, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

    Comments: Revised submission to JBI

  34. arXiv:1711.08421  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Relief-Based Feature Selection: Introduction and Review

    Authors: Ryan J. Urbanowicz, Melissa Meeker, William LaCava, Randal S. Olson, Jason H. Moore

    Abstract: Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. intera… ▽ More

    Submitted 2 April, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

    Comments: Submitted revisions for publication based on reviews by the Journal of Biomedical Informatics

  35. arXiv:1709.05394  [pdf, other

    cs.NE

    A probabilistic and multi-objective analysis of lexicase selection and epsilon-lexicase selection

    Authors: William La Cava, Thomas Helmuth, Lee Spector, Jason H. Moore

    Abstract: Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems in program synthesis and symbolic regression, the central goal of this paper is to develop the theoretical underpinnings that explain its performan… ▽ More

    Submitted 29 April, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

    Comments: 30 pages, 8 figures. To appear in Evolutionary Computation Journal

  36. arXiv:1708.05070  [pdf, other

    q-bio.QM cs.LG stat.ML

    Data-driven Advice for Applying Machine Learning to Bioinformatics Problems

    Authors: Randal S. Olson, William La Cava, Zairah Mustahsan, Akshay Varik, Jason H. Moore

    Abstract: As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual compari… ▽ More

    Submitted 7 January, 2018; v1 submitted 8 August, 2017; originally announced August 2017.

    Comments: 12 pages, 5 figures, 4 tables. To be published in the proceedings of PSB 2018. Randal S. Olson and William La Cava contributed equally as co-first authors

  37. Investigating the Parameter Space of Evolutionary Algorithms

    Authors: Moshe Sipper, Weixuan Fu, Karuna Ahuja, Jason H. Moore

    Abstract: The practice of evolutionary algorithms involves the tuning of many parameters. How big should the population be? How many generations should the algorithm run? What is the (tournament selection) tournament size? What probabilities should one assign to crossover and mutation? Through an extensive series of experiments over multiple evolutionary algorithm implementations and problems we show that p… ▽ More

    Submitted 10 October, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

    Journal ref: BioData Mining, 2018, 11:2

  38. arXiv:1705.00594  [pdf, other

    cs.AI cs.HC cs.NE

    A System for Accessible Artificial Intelligence

    Authors: Randal S. Olson, Moshe Sipper, William La Cava, Sharon Tartarone, Steven Vitale, Weixuan Fu, Patryk Orzechowski, Ryan J. Urbanowicz, John H. Holmes, Jason H. Moore

    Abstract: While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source… ▽ More

    Submitted 10 August, 2017; v1 submitted 1 May, 2017; originally announced May 2017.

    Comments: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 workshop

  39. arXiv:1703.06934  [pdf, other

    cs.NE cs.LG stat.ML

    Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods

    Authors: William La Cava, Jason H. Moore

    Abstract: Recently we proposed a general, ensemble-based feature engineering wrapper (FEW) that was paired with a number of machine learning methods to solve regression problems. Here, we adapt FEW for supervised classification and perform a thorough analysis of fitness and survival methods within this framework. Our tests demonstrate that two fitness metrics, one introduced as an adaptation of the silhouet… ▽ More

    Submitted 3 August, 2017; v1 submitted 20 March, 2017; originally announced March 2017.

    Comments: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin, Germany

  40. arXiv:1703.00512  [pdf, other

    cs.LG cs.AI

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Authors: Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, Jason H. Moore

    Abstract: The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchma… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

    Comments: 14 pages, 5 figures, submitted for review to JMLR

  41. arXiv:1702.01780  [pdf, other

    cs.NE cs.LG q-bio.QM stat.ML

    Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming

    Authors: Andrew Sohn, Randal S. Olson, Jason H. Moore

    Abstract: Machine learning has been gaining traction in recent years to meet the demand for tools that can efficiently analyze and make sense of the ever-growing databases of biomedical data in health care systems around the world. However, effectively using machine learning methods requires considerable domain expertise, which can be a barrier of entry for bioinformaticians new to computational data scienc… ▽ More

    Submitted 6 February, 2017; originally announced February 2017.

    Comments: 9 pages, 4 figures, submitted to GECCO 2017 conference and currently under review

  42. arXiv:1607.08878  [pdf, other

    cs.NE cs.AI cs.LG

    Identifying and Harnessing the Building Blocks of Machine Learning Pipelines for Sensible Initialization of a Data Science Automation Tool

    Authors: Randal S. Olson, Jason H. Moore

    Abstract: As data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series… ▽ More

    Submitted 29 July, 2016; originally announced July 2016.

    Comments: 13 pages, 5 figures, preprint of chapter to appear in GPTP 2016 book

  43. arXiv:1603.08233  [pdf, other

    cs.CV cs.LG cs.NE

    Evolution of active categorical image classification via saccadic eye movement

    Authors: Randal S. Olson, Jason H. Moore, Christoph Adami

    Abstract: Pattern recognition and classification is a central concern for modern information processing systems. In particular, one key challenge to image and video classification has been that the computational cost of image processing scales linearly with the number of pixels in the image or video. Here we present an intelligent machine (the "active categorical classifier," or ACC) that is inspired by the… ▽ More

    Submitted 16 June, 2016; v1 submitted 27 March, 2016; originally announced March 2016.

    Comments: 10 pages, 5 figures, to appear in PPSN 2016 conference proceedings

    Journal ref: Lecture Notes in Computer Science 9921 (2016) 581-590

  44. arXiv:1603.06212  [pdf, other

    cs.NE cs.AI cs.LG

    Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

    Authors: Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, Jason H. Moore

    Abstract: As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and d… ▽ More

    Submitted 20 March, 2016; originally announced March 2016.

    Comments: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comments

  45. Exploring the coevolution of predator and prey morphology and behavior

    Authors: Randal S. Olson, Arend Hintze, Fred C. Dyer, Jason H. Moore, Christoph Adami

    Abstract: A common idiom in biology education states, "Eyes in the front, the animal hunts. Eyes on the side, the animal hides." In this paper, we explore one possible explanation for why predators tend to have forward-facing, high-acuity visual systems. We do so using an agent-based computational model of evolution, where predators and prey interact and adapt their behavior and morphology to one another ov… ▽ More

    Submitted 28 February, 2016; originally announced February 2016.

    Comments: 8 pages, 8 figures, submitted to Artificial Life 2016 conference

    Journal ref: Proceedings Artificial Life 15 (C. Gershenson, T. Froese, J.M. Sisqueiros, W. Aguilar, E.J. Izquierdo, H. Sayama, eds.) MIT Press (Cambridge, MA, 2016), pp. 250-258

  46. arXiv:1601.07925  [pdf, other

    cs.LG cs.NE

    Automating biomedical data science through tree-based pipeline optimization

    Authors: Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, Jason H. Moore

    Abstract: Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and… ▽ More

    Submitted 28 January, 2016; originally announced January 2016.

    Comments: 16 pages, 5 figures, to appear in EvoBIO 2016 proceedings