Skip to main content

Showing 1–11 of 11 results for author: Polo, F M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17202  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Efficient multi-prompt evaluation of LLMs

    Authors: Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Onkar Bhardwaj, Leshem Choshen, Allysson Flavio Melo de Oliveira, Yuekai Sun, Mikhail Yurochkin

    Abstract: Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards. Many recent works empirically verify prompt sensitivity and advocate for changes in LLM evaluation. In this paper, we consider the problem of estimating the performance distribution across many prompt va… ▽ More

    Submitted 7 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.16236  [pdf, ps, other

    stat.ML cs.LG

    A statistical framework for weak-to-strong generalization

    Authors: Seamus Somerstep, Felipe Maia Polo, Moulinath Banerjee, Ya'acov Ritov, Mikhail Yurochkin, Yuekai Sun

    Abstract: Modern large language model (LLM) alignment techniques rely on human feedback, but it is unclear whether the techniques fundamentally limit the capabilities of aligned LLMs. In particular, it is unclear whether it is possible to align (stronger) LLMs with superhuman capabilities with (weaker) human feedback without degrading their capabilities. This is an instance of the weak-to-strong generalizat… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  3. arXiv:2402.14992  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    tinyBenchmarks: evaluating LLMs with fewer examples

    Authors: Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, Mikhail Yurochkin

    Abstract: The versatility of large language models (LLMs) led to the creation of diverse benchmarks that thoroughly test a variety of language models' abilities. These benchmarks consist of tens of thousands of examples making evaluation of LLMs very expensive. In this paper, we investigate strategies to reduce the number of evaluations needed to assess the performance of an LLM on several key benchmarks. F… ▽ More

    Submitted 26 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning (ICML)

  4. arXiv:2312.04601  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Estimating Fréchet bounds for validating programmatic weak supervision

    Authors: Felipe Maia Polo, Mikhail Yurochkin, Moulinath Banerjee, Subha Maity, Yuekai Sun

    Abstract: We develop methods for estimating Fréchet bounds on (possibly high-dimensional) distribution classes in which some variables are continuous-valued. We establish the statistical correctness of the computed bounds under uncertainty in the marginal constraints and demonstrate the usefulness of our algorithms by evaluating the performance of machine learning (ML) models trained with programmatic weak… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  5. arXiv:2310.01542  [pdf, other

    cs.LG

    Fusing Models with Complementary Expertise

    Authors: Hongyi Wang, Felipe Maia Polo, Yuekai Sun, Souvik Kundu, Eric Xing, Mikhail Yurochkin

    Abstract: Training AI models that generalize across tasks and domains has long been among the open problems driving AI research. The emergence of Foundation Models made it easier to obtain expert models for a given task, but the heterogeneity of data that may be encountered at test time often means that any single expert is insufficient. We consider the Fusion of Experts (FoE) problem of fusing outputs of e… ▽ More

    Submitted 9 May, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: This paper was published at ICLR 2024

  6. arXiv:2307.02520  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Conditional independence testing under misspecified inductive biases

    Authors: Felipe Maia Polo, Yuekai Sun, Moulinath Banerjee

    Abstract: Conditional independence (CI) testing is a fundamental and challenging task in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step; we refer to this class of tests as regression-based tests. Although these methods are guaranteed to control Type-I error when… ▽ More

    Submitted 27 October, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 proceedings

  7. arXiv:2205.08340  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    A unified framework for dataset shift diagnostics

    Authors: Felipe Maia Polo, Rafael Izbicki, Evanildo Gomes Lacerda Jr, Juan Pablo Ibieta-Jimenez, Renato Vicente

    Abstract: Supervised learning techniques typically assume training data originates from the target population. Yet, in reality, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors. In this work, we propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts, encompassing shifts in t… ▽ More

    Submitted 12 September, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

    Journal ref: Information Sciences (2023): 119612

  8. arXiv:2110.15709  [pdf, other

    cs.CL cs.LG

    LegalNLP -- Natural Language Processing methods for the Brazilian Legal Language

    Authors: Felipe Maia Polo, Gabriel Caiaffa Floriano Mendonça, Kauê Capellato J. Parreira, Lucka Gianvechio, Peterson Cordeiro, Jonathan Batista Ferreira, Leticia Maria Paz de Lima, Antônio Carlos do Amaral Maia, Renato Vicente

    Abstract: We present and make available pre-trained language models (Phraser, Word2Vec, Doc2Vec, FastText, and BERT) for the Brazilian legal language, a Python package with functions to facilitate their use, and a set of demonstrations/tutorials containing some applications involving them. Given that our material is built upon legal texts coming from several Brazilian courts, this initiative is extremely he… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  9. arXiv:2107.05767  [pdf, other

    stat.AP cs.CY cs.LG

    Effects of personality traits in predicting grade retention of Brazilian students

    Authors: Carmen Melo Toledo, Guilherme Mendes Bassedon, Jonathan Batista Ferreira, Lucka de Godoy Gianvechio, Carlos Guatimosim, Felipe Maia Polo, Renato Vicente

    Abstract: Student's grade retention is a key issue faced by many education systems, especially those in develo** countries. In this paper, we seek to gauge the relevance of students' personality traits in predicting grade retention in Brazil. For that, we used data collected in 2012 and 2017, in the city of Sertaozinho, countryside of the state of Sao Paulo, Brazil. The surveys taken in Sertaozinho includ… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  10. arXiv:2010.01184  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Effective Sample Size, Dimensionality, and Generalization in Covariate Shift Adaptation

    Authors: Felipe Maia Polo, Renato Vicente

    Abstract: In supervised learning, training and test datasets are often sampled from distinct distributions. Domain adaptation techniques are thus required. Covariate shift adaptation yields good generalization performance when domains differ only by the marginal distribution of features. Covariate shift adaptation is usually implemented using importance weighting, which may fail, according to common wisdom,… ▽ More

    Submitted 8 January, 2022; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Neural Comput & Applic (2022)

  11. arXiv:2003.11561  [pdf, other

    cs.CL cs.LG stat.ML

    Predicting Legal Proceedings Status: Approaches Based on Sequential Text Data

    Authors: Felipe Maia Polo, Itamar Ciochetti, Emerson Bertolo

    Abstract: The objective of this paper is to develop predictive models to classify Brazilian legal proceedings in three possible classes of status: (i) archived proceedings, (ii) active proceedings, and (iii) suspended proceedings. This problem's resolution is intended to assist public and private institutions in managing large portfolios of legal proceedings, providing gains in scale and efficiency. In this… ▽ More

    Submitted 22 June, 2021; v1 submitted 13 March, 2020; originally announced March 2020.

    Comments: Published at the 18th International Conference on Artificial Intelligence and Law (ICAIL) 2021 as an extended abstract