Skip to main content

Showing 1–25 of 25 results for author: Boley, M

.
  1. arXiv:2402.15691  [pdf, other

    cs.LG stat.ML

    Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles

    Authors: Fan Yang, Pierre Le Bodic, Michael Kamp, Mario Boley

    Abstract: Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models. However, actual interpretability requires to limit the number and size of the generated rules, and existing boosting variants are not designed for this purpose. Though corrective boosting refits all rule weights in each iteration to minimise prediction risk, the incl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 21 pages, 11 figures, accepted at AISTATS 2024

  2. arXiv:2402.10932  [pdf

    cond-mat.mtrl-sci physics.data-an

    Roadmap on Data-Centric Materials Science

    Authors: Stefan Bauer, Peter Benner, Tristan Bereau, Volker Blum, Mario Boley, Christian Carbogno, C. Richard A. Catlow, Gerhard Dehm, Sebastian Eibl, Ralph Ernstorfer, Ádám Fekete, Lucas Foppa, Peter Fratzl, Christoph Freysoldt, Baptiste Gault, Luca M. Ghiringhelli, Sajal K. Giri, Anton Gladyshev, Pawan Goyal, Jason Hattrick-Simpers, Lara Kabalan, Petr Karpov, Mohammad S. Khorrami, Christoph Koch, Sebastian Kokott , et al. (36 additional authors not shown)

    Abstract: Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) a… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Review, outlook, roadmap, perspective

  3. Scaling K2 VII: Evidence for a high occurrence rate of hot sub-Neptunes at intermediate ages

    Authors: Jessie L. Christiansen, Jon K. Zink, Kevin K. Hardegree-Ullman, Rachel B. Fernandes, Philip F. Hopkins, Luisa M. Rebull, Kiersten M. Boley, Galen J. Bergsten, Sakhee Bhure

    Abstract: The NASA K2 mission obtained high precision time-series photometry for four young clusters, including the near-twin 600-800 Myr-old Praesepe and Hyades clusters. Hot sub-Neptunes are highly prone to mass-loss mechanisms, given their proximity to the the host star and the weakly bound gaseous envelopes, and analyzing this population at young ages can provide strong constraints on planetary evolutio… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 14 pages, 6 figures, published in AJ

    Journal ref: AJ 166 248 (2023)

  4. arXiv:2311.15549  [pdf

    cond-mat.mtrl-sci cs.AI cs.LG

    From Prediction to Action: Critical Role of Performance Estimation for Machine-Learning-Driven Materials Discovery

    Authors: Mario Boley, Felix Luong, Simon Teshuva, Daniel F Schmidt, Lucas Foppa, Matthias Scheffler

    Abstract: Materials discovery driven by statistical property models is an iterative decision process, during which an initial data collection is extended with new data proposed by a model-informed acquisition function--with the goal to maximize a certain "reward" over time, such as the maximum property value discovered so far. While the materials science community achieved much progress in develo** proper… ▽ More

    Submitted 6 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Simplified notation

  5. arXiv:2310.18860  [pdf, other

    stat.ML cs.LG

    Bayes beats Cross Validation: Efficient and Accurate Ridge Regression via Expectation Maximization

    Authors: Shu Yu Tew, Mario Boley, Daniel F. Schmidt

    Abstract: We present a novel method for tuning the regularization hyper-parameter, $λ$, of a ridge regression that is faster to compute than leave-one-out cross-validation (LOOCV) while yielding estimates of the regression parameters of equal, or particularly in the setting of sparse covariates, superior quality to those obtained by minimising the LOOCV risk. The LOOCV risk can suffer from multiple and bad… ▽ More

    Submitted 2 November, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

  6. arXiv:2307.13726  [pdf, other

    astro-ph.EP

    Fizzy Super-Earths: Impacts of Magma Composition on the Bulk Density and Structure of Lava Worlds

    Authors: Kiersten M. Boley, Wendy R. Panero, Cayman T. Unterborn, Joseph G. Schulze, Romy Rodrıguez Martınez, Ji Wang

    Abstract: Lava worlds are a potential emerging population of Super-Earths that are on close-in orbits around their host stars with likely partially molten mantles. To date, few studies address the impact of magma on the observed properties of a planet. At ambient conditions magma is less dense than solid rock; however, it is also more compressible with increasing pressure. Therefore, it is unclear how large… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted to ApJ

  7. arXiv:2307.13034  [pdf, other

    astro-ph.EP astro-ph.SR

    A Comparison of the Composition of Planets in Single- and Multi-Planet Systems Orbiting M dwarfs

    Authors: Romy Rodríguez Martínez, David V. Martin, B. Scott Gaudi, Joseph G. Schulze, Anusha Pai Asnodkar, Kiersten M. Boley, Sarah Ballard

    Abstract: We investigate and compare the composition of M-dwarf planets in systems with only one known planet (``singles") to those residing in multi-planet systems (``multis") and the fundamental properties of their host stars. We restrict our analysis to planets with directly measured masses and radii, which comprise a total of 70 planets: 30 singles and 40 multis in 19 systems. We compare the bulk densit… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 20 pages, 11 figures, 2 tables. Submitted to ApJ and under review. Comments welcome!

  8. arXiv:2305.13389  [pdf, other

    astro-ph.EP astro-ph.GA

    Scaling K2. VI. Reduced Small Planet Occurrence in High Galactic Amplitude Stars

    Authors: Jon K. Zink, Kevin K. Hardegree-Ullman, Jessie L. Christiansen, Erik A. Petigura, Kiersten M. Boley, Sakhee Bhure, Malena Rice, Samuel W. Yee, Howard Isaacson, Rachel B. Fernandes, Andrew W. Howard, Sarah Blunt, Jack Lubin, Ashley Chontos, Daria Pidhorodetska, Mason G. MacDougall

    Abstract: In this study, we performed a homogeneous analysis of the planets around FGK dwarf stars observed by the Kepler and K2 missions, providing spectroscopic parameters for 310 K2 targets -- including 239 Scaling K2 hosts -- observed with Keck/HIRES. For orbital periods less than 40 days, we found that the distribution of planets as a function of orbital period, stellar effective temperature, and metal… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 28 Pages, 12 Figures, 3 Tables; Accepted for Publication AJ

  9. A Reanalysis of the Composition of K2-106b: an Ultra-short Period Super-Mercury Candidate

    Authors: Romy Rodríguez Martínez, B. Scott Gaudi, Joseph G. Schulze, Lorena Acuña, Jared Kolecki, Jennifer A. Johnson, Anusha Pai Asnodkar, Kiersten M. Boley, Magali Deleuil, Olivier Mousis, Wendy R. Panero, Ji Wang

    Abstract: We present a reanalysis of the K2-106 transiting planetary system, with a focus on the composition of K2-106b, an ultra-short period, super-Mercury candidate. We globally model existing photometric and radial velocity data and derive a planetary mass and radius for K2-106b of $M_{p} = 8.53\pm1.02~M_{\oplus}$ and $R_{p} = 1.71^{+0.069}_{-0.057}~R_{\oplus}$, which leads to a density of… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 19 pages, 8 figures, submitted to AJ

  10. arXiv:2206.01259  [pdf, other

    astro-ph.EP astro-ph.SR

    Spectroscopy of TOI-1259B -- an unpolluted white dwarf companion to an inflated warm Saturn

    Authors: Evan Fitzmaurice, David V. Martin, Romy Rodriguez Martinez, Patrick Vallely, Alexander P. Stephan, Kiersten M. Boley, Rick Pogge, Kareem El-Badry, Vedad Kunovac, Amaury H. M. J. Triaud

    Abstract: TOI-1259 consists of a transiting exoplanet orbiting a main sequence star, with a bound outer white dwarf companion. Less than a dozen systems with this architecture are known. We conduct follow-up spectroscopy on the white dwarf TOI-1259B using the Large Binocular Telescope (LBT) to better characterise it. We observe only strong hydrogen lines, making TOI-1259B a DA white dwarf. We see no evidenc… ▽ More

    Submitted 12 September, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: 6 pages, accepted to MNRAS, minor revision of first arXiv version

  11. arXiv:2106.13242  [pdf, other

    astro-ph.EP astro-ph.GA astro-ph.SR

    Searching For Transiting Planets Around Halo Stars. II. Constraining the Occurrence Rate of Hot Jupiters

    Authors: Kiersten M. Boley, Ji Wang, Joel C. Zinn, Karen A. Collins, Kevin I. Collins, Tianjun Gan, Ting S. Li

    Abstract: Jovian planet formation has been shown to be strongly correlated with host star metallicity, which is thought to be a proxy for disk solids. Observationally, previous works have indicated that jovian planets preferentially form around stars with solar and super solar metallicities. Given these findings, it is challenging to form planets within metal-poor environments, particularly for hot Jupiters… ▽ More

    Submitted 28 June, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Accepted, ApJ. This entry will be updated with journal reference and DOI when available. Corrected typos on Figure 2

  12. arXiv:2104.01352  [pdf

    cond-mat.mtrl-sci physics.comp-ph physics.data-an

    Learning Rules for Materials Properties and Functions

    Authors: Mario Boley, Matthias Scheffler

    Abstract: In materials science and engineering, one is typically searching for materials that exhibit exceptional performance for a certain function, and the number of these materials is extremely small. Thus, statistically speaking, we are interested in the identification of *rare phenomena*, and the scientific discovery typically resembles the proverbial hunt for the needle in a haystack.

    Submitted 3 April, 2021; originally announced April 2021.

  13. arXiv:2101.08380  [pdf, other

    cs.LG

    Better Short than Greedy: Interpretable Models through Optimal Rule Boosting

    Authors: Mario Boley, Simon Teshuva, Pierre Le Bodic, Geoffrey I Webb

    Abstract: Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability. However, the myopic and random search components of current rule ensemble methods can compromise this goal: they often need more rules than necessary to reach a certain accuracy level or can even outright fail to accurately model a distribution that can actually be described well with… ▽ More

    Submitted 20 January, 2021; originally announced January 2021.

    Comments: SDM 2021

  14. arXiv:2009.02728  [pdf, other

    cs.LG cs.AI stat.ML

    Discovering Reliable Causal Rules

    Authors: Kailash Budhathoki, Mario Boley, Jilles Vreeken

    Abstract: We study the problem of deriving policies, or rules, that when enacted on a complex system, cause a desired outcome. Absent the ability to perform controlled experiments, such rules have to be inferred from past observations of the system's behaviour. This is a challenging problem for two reasons: First, observational effects are often unrepresentative of the underlying causal effect because they… ▽ More

    Submitted 8 September, 2020; v1 submitted 6 September, 2020; originally announced September 2020.

    Comments: Poster presented in NeurIPS 2018 Workshop on Causal Learning

  15. arXiv:2001.00939  [pdf, other

    cs.LG stat.ML

    Relative Flatness and Generalization

    Authors: Henning Petzka, Michael Kamp, Linara Adilova, Cristian Sminchisescu, Mario Boley

    Abstract: Flatness of the loss curve is conjectured to be connected to the generalization ability of machine learning models, in particular neural networks. While it has been empirically observed that flatness measures consistently correlate strongly with generalization, it is still an open theoretical problem why and under which circumstances flatness is connected to generalization, in particular in light… ▽ More

    Submitted 4 November, 2021; v1 submitted 3 January, 2020; originally announced January 2020.

    Comments: The first two authors made equal contribution; Accepted for publication at NeurIPS 2021; arXiv admin note: substantial text overlap with arXiv:1912.00058

  16. arXiv:1911.12899  [pdf, other

    cs.LG cs.DC stat.ML

    Communication-Efficient Distributed Online Learning with Kernels

    Authors: Michael Kamp, Sebastian Bothe, Mario Boley, Michael Mock

    Abstract: We propose an efficient distributed online learning protocol for low-latency real-time services. It extends a previously presented protocol to kernelized online learners that represent their models by a support vector expansion. While such learners often achieve higher predictive performance than their linear counterparts, communicating the support vector expansions becomes inefficient for large n… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Journal ref: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016

  17. arXiv:1911.12896  [pdf, ps, other

    cs.DC cs.LG stat.ML

    Adaptive Communication Bounds for Distributed Online Learning

    Authors: Michael Kamp, Mario Boley, Michael Mock, Daniel Keren, Assaf Schuster, Izchak Sharfman

    Abstract: We consider distributed online learning protocols that control the exchange of information between local learners in a round-based learning scenario. The learning performance of such a protocol is intuitively optimal if approximately the same loss is incurred as in a hypothetical serial setting. If a protocol accomplishes this, it is inherently impossible to achieve a strong communication bound at… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Journal ref: Proceedings of the 7th NIPS Workshop on Optimization for Machine Learning, 2014

  18. arXiv:1908.11682  [pdf, other

    cs.LG cs.DB cs.IT stat.ML

    Discovering Reliable Correlations in Categorical Data

    Authors: Panagiotis Mandros, Mario Boley, Jilles Vreeken

    Abstract: In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably c… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: Accepted to the IEEE International Conference on Data Mining 2019 (ICDM'19)

    ACM Class: H.2.8; G.3

  19. arXiv:1810.03530  [pdf, other

    cs.LG cs.AI cs.DC stat.ML

    Effective Parallelisation for Machine Learning

    Authors: Michael Kamp, Mario Boley, Olana Missura, Thomas Gärtner

    Abstract: We present a novel parallelisation scheme that simplifies the adaptation of learning algorithms to growing amounts of data as well as growing needs for accurate and confident predictions in critical applications. In contrast to other parallelisation techniques, it can be applied to a broad class of learning algorithms without further mathematical derivations and without writing dedicated code, whi… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

    Comments: Advances in Neural Information Processing Systems, 2017

  20. arXiv:1809.05467  [pdf, other

    cs.AI cs.DB cs.IT

    Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

    Authors: Panagiotis Mandros, Mario Boley, Jilles Vreeken

    Abstract: The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve t… ▽ More

    Submitted 14 September, 2018; originally announced September 2018.

    Comments: Accepted to Proceedings of the IEEE International Conference on Data Mining (ICDM'18)

    ACM Class: H.2.8; G.3

  21. arXiv:1709.07941  [pdf, other

    cs.DB cs.AI

    Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

    Authors: Janis Kalofolias, Mario Boley, Jilles Vreeken

    Abstract: Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global d… ▽ More

    Submitted 22 September, 2017; originally announced September 2017.

    Comments: 10 pages, To appear in ICDM17

  22. arXiv:1705.09391  [pdf, other

    cs.DB cs.AI cs.IT

    Discovering Reliable Approximate Functional Dependencies

    Authors: Panagiotis Mandros, Mario Boley, Jilles Vreeken

    Abstract: Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $α$-approximate top-$k$ dependencies… ▽ More

    Submitted 18 June, 2017; v1 submitted 25 May, 2017; originally announced May 2017.

    Comments: Accepted: In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), August 13-17, 2017, Halifax, NS, Canada

    ACM Class: H.2.8; G.3

  23. Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery

    Authors: Mario Boley, Bryan R. Goldsmith, Luca M. Ghiringhelli, Jilles Vreeken

    Abstract: Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new cl… ▽ More

    Submitted 23 April, 2017; v1 submitted 26 January, 2017; originally announced January 2017.

    Comments: significance of empirical results tested; additional illustrations; table of used notations

  24. arXiv:1612.04307  [pdf

    cond-mat.mtrl-sci

    Uncovering structure-property relationships of materials by subgroup discovery

    Authors: B. R. Goldsmith, M. Boley, J. Vreeken, M. Scheffler, L. M. Ghiringhelli

    Abstract: Subgroup discovery (SGD) is presented here as a data-mining approach to help find interpretable local patterns, correlations, and descriptors of a target property in materials-science data. Specifically, we will be concerned with data generated by density-functional theory calculations. At first, we demonstrate that SGD can identify physically meaningful models that classify the crystal structures… ▽ More

    Submitted 13 December, 2016; originally announced December 2016.

    Journal ref: New J. Phys. 2017, 19, 013031

  25. arXiv:1205.2610  [pdf

    cs.LG

    Probabilistic Structured Predictors

    Authors: Shankar Vembu, Thomas Gartner, Mario Boley

    Abstract: We consider MAP estimators for structured prediction with exponential family models. In particular, we concentrate on the case that efficient algorithms for uniform sampling from the output space exist. We show that under this assumption (i) exact computation of the partition function remains a hard problem, and (ii) the partition function and the gradient of the log partition function can be appr… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009). arXiv admin note: substantial text overlap with arXiv:0912.4473

    Report number: UAI-P-2009-PG-557-564