Skip to main content

Showing 1–32 of 32 results for author: Pérez, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.12009  [pdf, other

    stat.ME math.DS

    Moduli of Continuity in Metric Models and Extension of Liveability Indices

    Authors: R. Arnau, J. M. Calabuig, Álvaro González, Enrique A. Sánchez Pérez

    Abstract: Index spaces serve as valuable metric models for studying properties relevant to various applications, such as social science or economics. These properties are represented by real Lipschitz functions that describe the degree of association with each element within the underlying metric space. After determining the index value within a given sample subset, the classic McShane and Whitney formulas… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  2. arXiv:2401.12485  [pdf, other

    cs.LG cs.AI quant-ph stat.ML

    Adiabatic Quantum Support Vector Machines

    Authors: Prasanna Date, Dong Jun Woun, Kathleen Hamilton, Eduardo A. Coello Perez, Mayanka Chandra Shekhar, Francisco Rios, John Gounley, In-Saeng Suh, Travis Humble, Georgia Tourassi

    Abstract: Adiabatic quantum computers can solve difficult optimization problems (e.g., the quadratic unconstrained binary optimization problem), and they seem well suited to train machine learning models. In this paper, we describe an adiabatic quantum approach for training support vector machines. We show that the time complexity of our quantum approach is an order of magnitude better than the classical ap… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  3. arXiv:2401.01148  [pdf, ps, other

    stat.ML cs.LG

    PAC-Bayes-Chernoff bounds for unbounded losses

    Authors: Ioar Casado, Luis A. Ortega, Andrés R. Masegosa, Aritz Pérez

    Abstract: We introduce a new PAC-Bayes oracle bound for unbounded losses. This result can be understood as a PAC-Bayesian version of the Cramér-Chernoff bound. The proof technique relies on controlling the tails of certain random variables involving the Cramér transform of the loss. We highlight several applications of the main theorem. First, we show that our result naturally allows exact optimization of t… ▽ More

    Submitted 6 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: Updated Section 5

  4. arXiv:2311.09369  [pdf, other

    stat.ML cs.CY cs.LG

    Time-dependent Probabilistic Generative Models for Disease Progression

    Authors: Onintze Zaballa, Aritz Pérez, Elisa Gómez-Inhiesto, Teresa Acaiturri-Ayesta, Jose A. Lozano

    Abstract: Electronic health records contain valuable information for monitoring patients' health trajectories over time. Disease progression models have been developed to understand the underlying patterns and dynamics of diseases using these data as sequences. However, analyzing temporal data from EHRs is challenging due to the variability and irregularities present in medical records. We propose a Markovi… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 17 pages

  5. arXiv:2310.18278  [pdf, other

    q-bio.BM physics.bio-ph physics.chem-ph stat.ML

    Navigating protein landscapes with a machine-learned transferable coarse-grained model

    Authors: Nicholas E. Charron, Felix Musil, Andrea Guljas, Yaoyi Chen, Klara Bonneau, Aldo S. Pasos-Trejo, Jacopo Venturin, Daria Gusew, Iryna Zaporozhets, Andreas Krämer, Clark Templeton, Atharva Kelkar, Aleksander E. P. Durumeric, Simon Olsson, Adrià Pérez, Maciej Majewski, Brooke E. Husic, Ankit Patel, Gianni De Fabritiis, Frank Noé, Cecilia Clementi

    Abstract: The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-a… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  6. arXiv:2309.13349  [pdf, other

    cs.NE stat.ML

    Speeding-up Evolutionary Algorithms to solve Black-Box Optimization Problems

    Authors: Judith Echevarrieta, Etor Arza, Aritz Pérez

    Abstract: Population-based evolutionary algorithms are often considered when approaching computationally expensive black-box optimization problems. They employ a selection mechanism to choose the best solutions from a given population after comparing their objective values, which are then used to generate the next population. This iterative process explores the solution space efficiently, leading to improve… ▽ More

    Submitted 29 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

  7. arXiv:2306.09628  [pdf, other

    cs.CV stat.ML

    Structural Restricted Boltzmann Machine for image denoising and classification

    Authors: Arkaitz Bidaurrazaga, Aritz Pérez, Roberto Santana

    Abstract: Restricted Boltzmann Machines are generative models that consist of a layer of hidden variables connected to another layer of visible units, and they are used to model the distribution over visible variables. In order to gain a higher representability power, many hidden units are commonly used, which, in combination with a large number of visible units, leads to a high number of trainable paramete… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  8. arXiv:2306.06649  [pdf, ps, other

    stat.ML cs.LG

    Efficient Learning of Minimax Risk Classifiers in High Dimensions

    Authors: Kartheek Bondugula, Santiago Mazuelas, Aritz Pérez

    Abstract: High-dimensional data is common in multiple areas, such as health care and genomics, where the number of features can be tens of thousands. In such scenarios, the large number of features often leads to inefficient learning. Constraint generation methods have recently enabled efficient learning of L1-regularized support vector machines (SVMs). In this paper, we leverage such methods to obtain an e… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted for the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)

  9. arXiv:2305.07345  [pdf, other

    cs.PF cs.DS math.OC stat.AP

    On the Fair Comparison of Optimization Algorithms in Different Machines

    Authors: Etor Arza, Josu Ceberio, Ekhiñe Irurozki, Aritz Pérez

    Abstract: An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stop** criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running… ▽ More

    Submitted 7 August, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Journal ref: Ann. Appl. Stat. 18(1): 42-62 (March 2024)

  10. arXiv:2211.04806  [pdf, other

    hep-ph cs.LG hep-ex physics.data-an stat.ML

    Machine-Learned Exclusion Limits without Binning

    Authors: Ernesto Arganda, Andres D. Perez, Martin de los Rios, Rosa María Sandá Seoane

    Abstract: Machine-Learned Likelihoods (MLL) combines machine-learning classification techniques with likelihood-based inference tests to estimate the experimental sensitivity of high-dimensional data sets. We extend the MLL method by including Kernel Density Estimators (KDE) to avoid binning the classifier output to extract the resulting one-dimensional signal and background probability density functions. W… ▽ More

    Submitted 15 December, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 24 pages, 8 figures, 3 tables, 1 appendix (version published in EPJC). MLL+KDE code available from https://github.com/AndresDanielPerez/2211.04806-ML-Likelihood-with-KDE

    Report number: IFT-UAM/CSIC-22-134

  11. arXiv:2210.06872  [pdf, other

    stat.ML cs.LG

    Dirichlet process mixture models for non-stationary data streams

    Authors: Ioar Casado, Aritz Pérez

    Abstract: In recent years, we have seen a handful of work on inference algorithms over non-stationary data streams. Given their flexibility, Bayesian non-parametric models are a good candidate for these scenarios. However, reliable streaming inference under the concept drift phenomenon is still an open problem for these models. In this work, we propose a variational inference algorithm for Dirichlet process… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: 9 pages

  12. arXiv:2203.07889  [pdf, other

    stat.ML cs.LG stat.ME

    Comparing Two Samples Through Stochastic Dominance: A Graphical Approach

    Authors: Etor Arza, Josu Ceberio, Ekhiñe Irurozki, Aritz Pérez

    Abstract: Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools… ▽ More

    Submitted 30 August, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Journal ref: Etor Arza, Josu Ceberio, Ekhiñe Irurozki & Aritz Pérez (2022) Comparing Two Samples Through Stochastic Dominance: A Graphical Approach, Journal of Computational and Graphical Statistics

  13. arXiv:2108.01952  [pdf, ps, other

    stat.ML cs.LG

    MRCpy: A Library for Minimax Risk Classifiers

    Authors: Kartheek Bondugula, Verónica Álvarez, José I. Segovia-Martín, Aritz Pérez, Santiago Mazuelas

    Abstract: Libraries for supervised classification have enabled the wide-spread usage of machine learning methods. Existing libraries, such as scikit-learn, caret, and mlpack, implement techniques based on the classical empirical risk minimization (ERM) approach. We present a Python library, MRCpy, that implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach. The libra… ▽ More

    Submitted 29 May, 2024; v1 submitted 4 August, 2021; originally announced August 2021.

  14. arXiv:2012.03628  [pdf, other

    stat.ML cs.LG

    K-means for Evolving Data Streams

    Authors: Arkaitz Bidaurrazaga, Aritz Pérez, Marco Capó

    Abstract: Currently the amount of data produced worldwide is increasing beyond measure, thus a high volume of unsupervised data must be processed continuously. One of the main unsupervised data analysis is clustering. In streaming data scenarios, the data is composed by an increasing sequence of batches of samples where the concept drift phenomenon may happen. In this paper, we formally define the Streaming… ▽ More

    Submitted 17 September, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: This is a extended version of a short paper published in ICDM 2021. Please cite the short paper instead of this version. As soon as it is published we will add how to reference it

    MSC Class: 62H30 ACM Class: I.5.3

  15. arXiv:2012.00054  [pdf, other

    stat.ME

    Empirical best prediction of small area bivariate parameters

    Authors: M. D. Esteban, M. J. Lombardía, E. López-Vizcaíno, D. Morales, A. Pérez

    Abstract: This paper introduces empirical best predictors of small area bivariate parameters, like ratios of sums or sums of ratios, by assuming that the target unit-level vector follows a bivariate nested error regression model. The corresponding means squared errors are estimated by parametric bootstrap. Several simulation experiments empirically study the behavior of the introduced statistical methodolog… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

  16. arXiv:2010.07964  [pdf, ps, other

    stat.ML cs.LG

    Minimax Classification with 0-1 Loss and Performance Guarantees

    Authors: Santiago Mazuelas, Andrea Zanoni, Aritz Perez

    Abstract: Supervised classification techniques use training samples to find classification rules with small expected 0-1 loss. Conventional methods achieve efficient learning and out-of-sample generalization by minimizing surrogate losses over specific families of rules. This paper presents minimax risk classifiers (MRCs) that do not rely on a choice of surrogate loss and family of rules. MRCs achieve effic… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Journal ref: Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 302-312

  17. arXiv:2007.11412  [pdf, other

    physics.comp-ph physics.bio-ph physics.chem-ph q-bio.BM stat.ML

    Coarse Graining Molecular Dynamics with Graph Neural Networks

    Authors: Brooke E. Husic, Nicholas E. Charron, Dominik Lemm, Jiang Wang, Adrià Pérez, Maciej Majewski, Andreas Krämer, Yaoyi Chen, Simon Olsson, Gianni de Fabritiis, Frank Noé, Cecilia Clementi

    Abstract: Coarse graining enables the investigation of molecular dynamics for larger systems and at longer timescales than is possible at atomic resolution. However, a coarse graining model must be formulated such that the conclusions we draw from it are consistent with the conclusions we would draw from a model at a finer level of detail. It has been proven that a force matching scheme defines a thermodyna… ▽ More

    Submitted 6 November, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: 17 pages, 9 figures

  18. arXiv:2007.05447  [pdf, ps, other

    stat.ML cs.LG

    Generalized Maximum Entropy for Supervised Classification

    Authors: Santiago Mazuelas, Yuan Shen, Aritz Pérez

    Abstract: The maximum entropy principle advocates to evaluate events' probabilities using a distribution that maximizes entropy among those that satisfy certain expectations' constraints. Such principle can be generalized for arbitrary decision problems where it corresponds to minimax approaches. This paper establishes a framework for supervised classification based on the generalized maximum entropy princi… ▽ More

    Submitted 15 December, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

  19. arXiv:2007.01811  [pdf, other

    cs.DC cs.DS stat.ML

    JAMPI: efficient matrix multiplication in Spark using Barrier Execution Mode

    Authors: Tamas Foldi, Chris von Csefalvay, Nicolas A. Perez

    Abstract: The new barrier mode in Apache Spark allows embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combin… ▽ More

    Submitted 27 June, 2020; originally announced July 2020.

    Comments: 8 pages, 4 figures

    MSC Class: 68W15 ACM Class: F.2.1

  20. arXiv:2005.05587  [pdf, ps, other

    cs.LG stat.ML

    Robustness Verification for Classifier Ensembles

    Authors: Dennis Gross, Nils Jansen, Guillermo A. Pérez, Stephan Raaijmakers

    Abstract: We give a formal verification procedure that decides whether a classifier ensemble is robust against arbitrary randomized attacks. Such attacks consist of a set of deterministic attacks and a distribution over this set. The robustness-checking problem consists of assessing, given a set of classifiers and a labelled data set, whether there exists a randomized attack that induces a certain expected… ▽ More

    Submitted 9 July, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

  21. arXiv:2004.02593  [pdf, ps, other

    cs.LG stat.ML

    Let's Agree to Degree: Comparing Graph Convolutional Networks in the Message-Passing Framework

    Authors: Floris Geerts, Filip Mazowiecki, Guillermo A. Pérez

    Abstract: In this paper we cast neural networks defined on graphs as message-passing neural networks (MPNNs) in order to study the distinguishing power of different classes of such models. We are interested in whether certain architectures are able to tell vertices apart based on the feature labels given as input with the graph. We consider two variants of MPNNS: anonymous MPNNs whose message functions depe… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Comments: 22 pages

  22. Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem

    Authors: Etor Arza, Aritz Perez, Ekhine Irurozki, Josu Ceberio

    Abstract: The Quadratic Assignment Problem (QAP) is a well-known permutation-based combinatorial optimization problem with real applications in industrial and logistics environments. Motivated by the challenge that this NP-hard problem represents, it has captured the attention of the optimization community for decades. As a result, a large number of algorithms have been proposed to tackle this problem. Amon… ▽ More

    Submitted 18 August, 2020; v1 submitted 19 October, 2019; originally announced October 2019.

    Comments: 23 pages

  23. arXiv:1910.08795  [pdf, other

    stat.ML cs.LG

    Rank aggregation for non-stationary data streams

    Authors: Ekhine Irurozki, Jesus Lobo, Aritz Perez, Javier Del Ser

    Abstract: We consider the problem of learning over non-stationary ranking streams. The rankings can be interpreted as the preferences of a population and the non-stationarity means that the distribution of preferences changes over time. Our goal is to learn, in an online manner, the current distribution of rankings. The bottleneck of this process is a rank aggregation problem. We propose a generalization… ▽ More

    Submitted 27 October, 2020; v1 submitted 19 October, 2019; originally announced October 2019.

    Comments: 23 pages

  24. arXiv:1902.00693  [pdf, ps, other

    stat.ML cs.LG

    Supervised classification via minimax probabilistic transformations

    Authors: Santiago Mazuelas, Andrea Zanoni, Aritz Perez

    Abstract: Conventional techniques for supervised classification constrain the classification rules considered and use surrogate losses for classification 0-1 loss. Favored families of classification rules are those that enjoy parametric representations suitable for surrogate loss minimization, and low complexity properties suitable for overfitting control. This paper presents classification techniques based… ▽ More

    Submitted 30 May, 2019; v1 submitted 2 February, 2019; originally announced February 2019.

  25. arXiv:1901.08552  [pdf, ps, other

    stat.ML cs.LG

    General Supervision via Probabilistic Transformations

    Authors: Santiago Mazuelas, Aritz Perez

    Abstract: Different types of training data have led to numerous schemes for supervised classification. Current learning techniques are tailored to one specific scheme and cannot handle general ensembles of training data. This paper presents a unifying framework for supervised classification with general ensembles of training data, and proposes the learning methodology of generalized robust risk minimization… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Journal ref: 24th European Conference on Artificial Intelligence-ECAI 2020, Aug. 2020, pp. 1348-2354

  26. arXiv:1812.01736  [pdf, other

    physics.comp-ph cs.LG stat.ML

    Machine Learning of coarse-grained Molecular Dynamics Force Fields

    Authors: Jiang Wang, Simon Olsson, Christoph Wehmeyer, Adria Perez, Nicholas E. Charron, Gianni de Fabritiis, Frank Noe, Cecilia Clementi

    Abstract: Atomistic or ab-initio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time- and length-scales accessible with such computationally expensive simulations is the definition of coarse-grained molecular models. Existing coarse-graining approaches define an effective interaction potential t… ▽ More

    Submitted 3 April, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

  27. arXiv:1811.10791  [pdf, other

    cs.LG stat.ML

    Accurate, Data-Efficient Learning from Noisy, Choice-Based Labels for Inherent Risk Scoring

    Authors: W. Ronny Huang, Miguel A. Perez

    Abstract: Inherent risk scoring is an important function in anti-money laundering, used for determining the riskiness of an individual during onboarding $\textit{before}$ fraudulent transactions occur. It is, however, often fraught with two challenges: (1) inconsistent notions of what constitutes as high or low risk by experts and (2) the lack of labeled data. This paper explores a new paradigm of data labe… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: Presented as an oral at the NIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy (FEAP-AI4Fin 2018). 9 pages, 4 figures

  28. arXiv:1804.10023  [pdf, other

    stat.ML cs.LG

    Candidate Labeling for Crowd Learning

    Authors: Iker Beñaran-Muñoz, Jerónimo Hernández-González, Aritz Pérez

    Abstract: Crowdsourcing has become very popular among the machine learning community as a way to obtain labels that allow a ground truth to be estimated for a given dataset. In most of the approaches that use crowdsourced labels, annotators are asked to provide, for each presented instance, a single class label. Such a request could be inefficient, that is, considering that the labelers may not be experts,… ▽ More

    Submitted 8 August, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

    Comments: 7 pages, 3 figures, to be published

    MSC Class: stat.ML

  29. arXiv:1802.02548  [pdf, other

    cs.LG cs.AI cs.CY physics.ao-ph stat.ML

    Predicting Hurricane Trajectories using a Recurrent Neural Network

    Authors: Sheila Alemany, Jonathan Beltran, Adrian Perez, Sam Ganzfried

    Abstract: Hurricanes are cyclones circulating about a defined center whose closed wind speeds exceed 75 mph originating over tropical and subtropical waters. At landfall, hurricanes can result in severe disasters. The accuracy of predicting their trajectory paths is critical to reduce economic loss and save human lives. Given the complexity and nonlinearity of weather data, a recurrent neural network (RNN)… ▽ More

    Submitted 12 September, 2018; v1 submitted 1 February, 2018; originally announced February 2018.

  30. arXiv:1801.02949  [pdf, other

    stat.ML cs.LG

    An efficient K -means clustering algorithm for massive data

    Authors: Marco Capó, Aritz Pérez, Jose A. Lozano

    Abstract: The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the implementation and relatively low computational cost. Among these algorithms, the K -means algorithm stands out as the most popular approach, besides its high depende… ▽ More

    Submitted 9 January, 2018; originally announced January 2018.

  31. arXiv:1711.03654  [pdf, other

    stat.ML cs.CV cs.LG

    Poverty Prediction with Public Landsat 7 Satellite Imagery and Machine Learning

    Authors: Anthony Perez, Christopher Yeh, George Azzari, Marshall Burke, David Lobell, Stefano Ermon

    Abstract: Obtaining detailed and reliable data about local economic livelihoods in develo** countries is expensive, and data are consequently scarce. Previous work has shown that it is possible to measure local-level economic livelihoods using high-resolution satellite imagery. However, such imagery is relatively expensive to acquire, often not updated frequently, and is mainly available for recent years.… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017 Workshop on Machine Learning for the Develo** World

  32. arXiv:1605.02989  [pdf, ps, other

    stat.ML cs.LG

    An efficient K-means algorithm for Massive Data

    Authors: Marco Capó, Aritz Pérez, José Antonio Lozano

    Abstract: Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm remains as one of the most popular clustering methods, in spite of its dependency on the initial settings and high computational cost, especially in terms of di… ▽ More

    Submitted 10 May, 2016; originally announced May 2016.

    Comments: 38 pages, 10 figures