Search | arXiv e-print repository

A Taxonomy of the Biases of the Images created by Generative Artificial Intelligence

Authors: Adriana Fernández de Caleya Vázquez, Eduardo C. Garrido-Merchán

Abstract: Generative artificial intelligence models show an amazing performance creating unique content automatically just by being given a prompt by the user, which is revolutionizing several fields such as marketing and design. Not only are there models whose generated output belongs to the text format but we also find models that are able to automatically generate high quality genuine images and videos g… ▽ More Generative artificial intelligence models show an amazing performance creating unique content automatically just by being given a prompt by the user, which is revolutionizing several fields such as marketing and design. Not only are there models whose generated output belongs to the text format but we also find models that are able to automatically generate high quality genuine images and videos given a prompt. Although the performance in image creation seems impressive, it is necessary to slowly assess the content that these models are generating, as the users are uploading massively this material on the internet. Critically, it is important to remark that generative AI are statistical models whose parameter values are estimated given algorithms that maximize the likelihood of the parameters given an image dataset. Consequently, if the image dataset is biased towards certain values for vulnerable variables such as gender or skin color, we might find that the generated content of these models can be harmful for certain groups of people. By generating this content and being uploaded into the internet by users, these biases are perpetuating harmful stereotypes for vulnerable groups, polarizing social vision about, for example, what beauty or disability is and means. In this work, we analyze in detail how the generated content by these models can be strongly biased with respect to a plethora of variables, which we organize into a new image generative AI taxonomy. We also discuss the social, political and economical implications of these biases and possible ways to mitigate them. △ Less

Submitted 2 May, 2024; originally announced July 2024.

arXiv:2405.07340 [pdf, other]

Machine Consciousness as Pseudoscience: The Myth of Conscious Machines

Authors: Eduardo C. Garrido-Merchán

Abstract: The hypothesis of conscious machines has been debated since the invention of the notion of artificial intelligence, powered by the assumption that the computational intelligence achieved by a system is the cause of the emergence of phenomenal consciousness in that system as an epiphenomenon or as a consequence of the behavioral or internal complexity of the system surpassing some threshold. As a c… ▽ More The hypothesis of conscious machines has been debated since the invention of the notion of artificial intelligence, powered by the assumption that the computational intelligence achieved by a system is the cause of the emergence of phenomenal consciousness in that system as an epiphenomenon or as a consequence of the behavioral or internal complexity of the system surpassing some threshold. As a consequence, a huge amount of literature exploring the possibility of machine consciousness and how to implement it on a computer has been published. Moreover, common folk psychology and transhumanism literature has fed this hypothesis with the popularity of science fiction literature, where intelligent robots are usually antropomorphized and hence given phenomenal consciousness. However, in this work, we argue how these literature lacks scientific rigour, being impossible to falsify the opposite hypothesis, and illustrate a list of arguments that show how every approach that the machine consciousness literature has published depends on philosophical assumptions that cannot be proven by the scientific method. Concretely, we also show how phenomenal consciousness is not computable, independently on the complexity of the algorithm or model, cannot be objectively measured nor quantitatively defined and it is basically a phenomenon that is subjective and internal to the observer. Given all those arguments we end the work arguing why the idea of conscious machines is nowadays a myth of transhumanism and science fiction culture. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 19 pages

arXiv:2312.03728 [pdf]

Real Customization or Just Marketing: Are Customized Versions of Chat GPT Useful?

Authors: Eduardo C. Garrido-Merchán, Jose L. Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortiz-Lozano, Antonio Rua-Vieites

Abstract: Large Language Models (LLMs), as the case of OpenAI ChatGPT-4 Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalized through a fine-tuning process to meet the student demands on every particular subject, like statistics. Recently, OpenAI has launched the possibility to fine-tune their model with a natural language web interface, enablin… ▽ More Large Language Models (LLMs), as the case of OpenAI ChatGPT-4 Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalized through a fine-tuning process to meet the student demands on every particular subject, like statistics. Recently, OpenAI has launched the possibility to fine-tune their model with a natural language web interface, enabling the possibility to create customized GPT version deliberately conditioned to meet the demands of a specific task. The objective of this research is to assess the potential of the customized GPTs that have recently been launched by OpenAI. After develo** a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behavior was evaluated and compared with that of ChatGPT-4 Turbo. The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP provided responses in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, and this is a matter of relevance, when explicitly asked for something like, "I would like to practice a programming exercise similar to those in R practice 4," BSVP was capable of providing a far superior response: having access to contextual documentation, it could fulfill the request, something beyond ChatGPT-4 Turbo's capabilities. On the downside, the response times were generally higher. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo. It appears that customized assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo. △ Less

Submitted 27 November, 2023; originally announced December 2023.

Comments: 9 pages, 1 figure, 1 table

arXiv:2311.11175 [pdf, ps, other]

Best uses of ChatGPT and Generative AI for computer science research

Authors: Eduardo C. Garrido-Merchan

Abstract: Generative Artificial Intelligence (AI), particularly tools like OpenAI's popular ChatGPT, is resha** the landscape of computer science research. Used wisely, these tools can boost the productivity of a computer research scientist. This paper provides an exploration of the diverse applications of ChatGPT and other generative AI technologies in computer science academic research, making recommend… ▽ More Generative Artificial Intelligence (AI), particularly tools like OpenAI's popular ChatGPT, is resha** the landscape of computer science research. Used wisely, these tools can boost the productivity of a computer research scientist. This paper provides an exploration of the diverse applications of ChatGPT and other generative AI technologies in computer science academic research, making recommendations about the use of Generative AI to make more productive the role of the computer research scientist, with the focus of writing new research papers. We highlight innovative uses such as brainstorming research ideas, aiding in the drafting and styling of academic papers and assisting in the synthesis of state-of-the-art section. Further, we delve into using these technologies in understanding interdisciplinary approaches, making complex texts simpler, and recommending suitable academic journals for publication. Significant focus is placed on generative AI's contributions to synthetic data creation, research methodology, and mentorship, as well as in task organization and article quality assessment. The paper also addresses the utility of AI in article review, adapting texts to length constraints, constructing counterarguments, and survey development. Moreover, we explore the capabilities of these tools in disseminating ideas, generating images and audio, text transcription, and engaging with editors. We also describe some non-recommended uses of generative AI for computer science research, mainly because of the limitations of this technology. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2307.11119 [pdf, other]

From computational ethics to morality: how decision-making algorithms can help us understand the emergence of moral principles, the existence of an optimal behaviour and our ability to discover it

Authors: Eduardo C. Garrido-Merchán, Sara Lumbreras-Sancho

Abstract: This paper adds to the efforts of evolutionary ethics to naturalize morality by providing specific insights derived from a computational ethics view. We propose a stylized model of human decision-making, which is based on Reinforcement Learning, one of the most successful paradigms in Artificial Intelligence. After the main concepts related to Reinforcement Learning have been presented, some parti… ▽ More This paper adds to the efforts of evolutionary ethics to naturalize morality by providing specific insights derived from a computational ethics view. We propose a stylized model of human decision-making, which is based on Reinforcement Learning, one of the most successful paradigms in Artificial Intelligence. After the main concepts related to Reinforcement Learning have been presented, some particularly useful parallels are drawn that can illuminate evolutionary accounts of ethics. Specifically, we investigate the existence of an optimal policy (or, as we will refer to, objective ethical principles) given the conditions of an agent. In addition, we will show how this policy is learnable by means of trial and error, supporting our hypotheses on two well-known theorems in the context of Reinforcement Learning. We conclude by discussing how the proposed framework can be enlarged to study other potentially interesting areas of human behavior from a formalizable perspective. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.09631 [pdf, other]

Deep Reinforcement Learning for ESG financial portfolio management

Authors: Eduardo C. Garrido-Merchán, Sol Mora-Figueroa-Cruz-Guzmán, María Coronado-Vaca

Abstract: This paper investigates the application of Deep Reinforcement Learning (DRL) for Environment, Social, and Governance (ESG) financial portfolio management, with a specific focus on the potential benefits of ESG score-based market regulation. We leveraged an Advantage Actor-Critic (A2C) agent and conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platf… ▽ More This paper investigates the application of Deep Reinforcement Learning (DRL) for Environment, Social, and Governance (ESG) financial portfolio management, with a specific focus on the potential benefits of ESG score-based market regulation. We leveraged an Advantage Actor-Critic (A2C) agent and conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The study includes a comparative analysis of DRL agent performance under standard Dow Jones Industrial Average (DJIA) market conditions and a scenario where returns are regulated in line with company ESG scores. In the ESG-regulated market, grants were proportionally allotted to portfolios based on their returns and ESG scores, while taxes were assigned to portfolios below the mean ESG score of the index. The results intriguingly reveal that the DRL agent within the ESG-regulated market outperforms the standard DJIA market setup. Furthermore, we considered the inclusion of ESG variables in the agent state space, and compared this with scenarios where such data were excluded. This comparison adds to the understanding of the role of ESG factors in portfolio management decision-making. We also analyze the behaviour of the DRL agent in IBEX 35 and NASDAQ-100 indexes. Both the A2C and Proximal Policy Optimization (PPO) algorithms were applied to these additional markets, providing a broader perspective on the generalization of our findings. This work contributes to the evolving field of ESG investing, suggesting that market regulation based on ESG scoring can potentially improve DRL-based portfolio management, with significant implications for sustainable investing strategies. △ Less

Submitted 19 June, 2023; originally announced July 2023.

arXiv:2306.02781 [pdf, ps, other]

A survey of Generative AI Applications

Authors: Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchán

Abstract: Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of… ▽ More Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field. △ Less

Submitted 14 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.03429 [pdf, other]

Simulating H.P. Lovecraft horror literature with the ChatGPT large language model

Authors: Eduardo C. Garrido-Merchán, José Luis Arroyo-Barrigüete, Roberto Gozalo-Brizuela

Abstract: In this paper, we present a novel approach to simulating H.P. Lovecraft's horror literature using the ChatGPT large language model, specifically the GPT-4 architecture. Our study aims to generate text that emulates Lovecraft's unique writing style and themes, while also examining the effectiveness of prompt engineering techniques in guiding the model's output. To achieve this, we curated a prompt… ▽ More In this paper, we present a novel approach to simulating H.P. Lovecraft's horror literature using the ChatGPT large language model, specifically the GPT-4 architecture. Our study aims to generate text that emulates Lovecraft's unique writing style and themes, while also examining the effectiveness of prompt engineering techniques in guiding the model's output. To achieve this, we curated a prompt containing several specialized literature references and employed advanced prompt engineering methods. We conducted an empirical evaluation of the generated text by administering a survey to a sample of undergraduate students. Utilizing statistical hypothesis testing, we assessed the students ability to distinguish between genuine Lovecraft works and those generated by our model. Our findings demonstrate that the participants were unable to reliably differentiate between the two, indicating the effectiveness of the GPT-4 model and our prompt engineering techniques in emulating Lovecraft's literary style. In addition to presenting the GPT model's capabilities, this paper provides a comprehensive description of its underlying architecture and offers a comparative analysis with related work that simulates other notable authors and philosophers, such as Dennett. By exploring the potential of large language models in the context of literary emulation, our study contributes to the body of research on the applications and limitations of these models in various creative domains. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.11215 [pdf, other]

ChatGPT: More than a Weapon of Mass Deception, Ethical challenges and responses from the Human-Centered Artificial Intelligence (HCAI) perspective

Authors: Alejo Jose G. Sison, Marco Tulio Daza, Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchán

Abstract: This article explores the ethical problems arising from the use of ChatGPT as a kind of generative AI and suggests responses based on the Human-Centered Artificial Intelligence (HCAI) framework. The HCAI framework is appropriate because it understands technology above all as a tool to empower, augment, and enhance human agency while referring to human wellbeing as a grand challenge, thus perfectly… ▽ More This article explores the ethical problems arising from the use of ChatGPT as a kind of generative AI and suggests responses based on the Human-Centered Artificial Intelligence (HCAI) framework. The HCAI framework is appropriate because it understands technology above all as a tool to empower, augment, and enhance human agency while referring to human wellbeing as a grand challenge, thus perfectly aligning itself with ethics, the science of human flourishing. Further, HCAI provides objectives, principles, procedures, and structures for reliable, safe, and trustworthy AI which we apply to our ChatGPT assessments. The main danger ChatGPT presents is the propensity to be used as a weapon of mass deception (WMD) and an enabler of criminal activities involving deceit. We review technical specifications to better comprehend its potentials and limitations. We then suggest both technical (watermarking, styleme, detectors, and fact-checkers) and non-technical measures (terms of use, transparency, educator considerations, HITL) to mitigate ChatGPT misuse or abuse and recommend best uses (creative writing, non-creative writing, teaching and learning). We conclude with considerations regarding the role of humans in ensuring the proper use of ChatGPT for individual and social wellbeing. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2303.13373 [pdf, ps, other]

Fine-tuning ClimateBert transformer with ClimaText for the disclosure analysis of climate-related financial risks

Authors: Eduardo C. Garrido-Merchán, Cristina González-Barthe, María Coronado Vaca

Abstract: In recent years there has been a growing demand from financial agents, especially from particular and institutional investors, for companies to report on climate-related financial risks. A vast amount of information, in text format, can be expected to be disclosed in the short term by firms in order to identify these types of risks in their financial and non financial reports, particularly in resp… ▽ More In recent years there has been a growing demand from financial agents, especially from particular and institutional investors, for companies to report on climate-related financial risks. A vast amount of information, in text format, can be expected to be disclosed in the short term by firms in order to identify these types of risks in their financial and non financial reports, particularly in response to the growing regulation that is being passed on the matter. To this end, this paper applies state-of-the-art NLP techniques to achieve the detection of climate change in text corpora. We use transfer learning to fine-tune two transformer models, BERT and ClimateBert -a recently published DistillRoBERTa-based model that has been specifically tailored for climate text classification-. These two algorithms are based on the transformer architecture which enables learning the contextual relationships between words in a text. We carry out the fine-tuning process of both models on the novel Clima-Text database, consisting of data collected from Wikipedia, 10K Files Reports and web-based claims. Our text classification model obtained from the ClimateBert fine-tuning process on ClimaText, outperforms the models created with BERT and the current state-of-the-art transformer in this particular problem. Our study is the first one to implement on the ClimaText database the recently published ClimateBert algorithm. Based on our results, it can be said that ClimateBert fine-tuned on ClimaText is an outstanding tool within the NLP pre-trained transformer models that may and should be used by investors, institutional agents and companies themselves to monitor the disclosure of climate risk in financial reports. In addition, our transfer learning methodology is cheap in computational terms, thus allowing any organization to perform it. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.01485 [pdf, other]

Bayesian Optimization of ESG Financial Investments

Authors: Eduardo C. Garrido-Merchán, Gabriel González Piris, Maria Coronado Vaca

Abstract: Financial experts and analysts seek to predict the variability of financial markets. In particular, the correct prediction of this variability ensures investors successful investments. However, there has been a big trend in finance in the last years, which are the ESG criteria. Concretely, ESG (Economic, Social and Governance) criteria have become more significant in finance due to the growing imp… ▽ More Financial experts and analysts seek to predict the variability of financial markets. In particular, the correct prediction of this variability ensures investors successful investments. However, there has been a big trend in finance in the last years, which are the ESG criteria. Concretely, ESG (Economic, Social and Governance) criteria have become more significant in finance due to the growing importance of investments being socially responsible, and because of the financial impact companies suffer when not complying with them. Consequently, creating a stock portfolio should not only take into account its performance but compliance with ESG criteria. Hence, this paper combines mathematical modelling, with ESG and finance. In more detail, we use Bayesian optimization (BO), a sequential state-of-the-art design strategy to optimize black-boxes with unknown analytical and costly-to compute expressions, to maximize the performance of a stock portfolio under the presence of ESG criteria soft constraints incorporated to the objective function. In an illustrative experiment, we use the Sharpe ratio, that takes into consideration the portfolio returns and its variance, in other words, it balances the trade-off between maximizing returns and minimizing risks. In the present work, ESG criteria have been divided into fourteen independent categories used in a linear combination to estimate a firm total ESG score. Most importantly, our presented approach would scale to alternative black-box methods of estimating the performance and ESG compliance of the stock portfolio. In particular, this research has opened the door to many new research lines, as it has proved that a portfolio can be optimized using a BO that takes into consideration financial performance and the accomplishment of ESG criteria. △ Less

Submitted 10 February, 2023; originally announced March 2023.

arXiv:2301.04655 [pdf, other]

ChatGPT is not all you need. A State of the Art Review of large Generative AI models

Authors: Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchan

Abstract: During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as being a general question and answering system or automatically creating artistic images that are revolutionizing several sectors. Consequently, the implications that these generative models have in th… ▽ More During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as being a general question and answering system or automatically creating artistic images that are revolutionizing several sectors. Consequently, the implications that these generative models have in the industry and society are enormous, as several job positions may be transformed. For example, Generative AI is capable of transforming effectively and creatively texts to images, like the DALLE-2 model; text to 3D images, like the Dreamfusion model; images to text, like the Flamingo model; texts to video, like the Phenaki model; texts to audio, like the AudioLM model; texts to other texts, like ChatGPT; texts to code, like the Codex model; texts to scientific texts, like the Galactica model or even create algorithms like AlphaTensor. This work consists on an attempt to describe in a concise way the main models are sectors that are affected by generative AI and to provide a taxonomy of the main generative models published recently. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Comments: 22 pages

arXiv:2212.04589 [pdf, other]

Optimizing Integrated Information with a Prior Guided Random Search Algorithm

Authors: Eduardo C. Garrido-Merchán, Javier Sánchez-Cañizares

Abstract: Integrated information theory (IIT) is a theoretical framework that provides a quantitative measure to estimate when a physical system is conscious, its degree of consciousness, and the complexity of the qualia space that the system is experiencing. Formally, IIT rests on the assumption that if a surrogate physical system can fully embed the phenomenological properties of consciousness, then the s… ▽ More Integrated information theory (IIT) is a theoretical framework that provides a quantitative measure to estimate when a physical system is conscious, its degree of consciousness, and the complexity of the qualia space that the system is experiencing. Formally, IIT rests on the assumption that if a surrogate physical system can fully embed the phenomenological properties of consciousness, then the system properties must be constrained by the properties of the qualia being experienced. Following this assumption, IIT represents the physical system as a network of interconnected elements that can be thought of as a probabilistic causal graph, $\mathcal{G}$, where each node has an input-output function and all the graph is encoded in a transition probability matrix. Consequently, IIT's quantitative measure of consciousness, $Φ$, is computed with respect to the transition probability matrix and the present state of the graph. In this paper, we provide a random search algorithm that is able to optimize $Φ$ in order to investigate, as the number of nodes increases, the structure of the graphs that have higher $Φ$. We also provide arguments that show the difficulties of applying more complex black-box search algorithms, such as Bayesian optimization or metaheuristics, in this particular problem. Additionally, we suggest specific research lines for these techniques to enhance the search algorithm that guarantees maximal $Φ$. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2207.11089 [pdf, other]

Do Artificial Intelligence Systems Understand?

Authors: Eduardo C. Garrido-Merchán, Carlos Blanco

Abstract: Are intelligent machines really intelligent? Is the underlying philosophical concept of intelligence satisfactory for describing how the present systems work? Is understanding a necessary and sufficient condition for intelligence? If a machine could understand, should we attribute subjectivity to it? This paper addresses the problem of deciding whether the so-called "intelligent machines" are capa… ▽ More Are intelligent machines really intelligent? Is the underlying philosophical concept of intelligence satisfactory for describing how the present systems work? Is understanding a necessary and sufficient condition for intelligence? If a machine could understand, should we attribute subjectivity to it? This paper addresses the problem of deciding whether the so-called "intelligent machines" are capable of understanding, instead of merely processing signs. It deals with the relationship between syntaxis and semantics. The main thesis concerns the inevitability of semantics for any discussion about the possibility of building conscious machines, condensed into the following two tenets: "If a machine is capable of understanding (in the strong sense), then it must be capable of combining rules and intuitions"; "If semantics cannot be reduced to syntaxis, then a machine cannot understand." Our conclusion states that it is not necessary to attribute understanding to a machine in order to explain its exhibited "intelligent" behavior; a merely syntactic and mechanistic approach to intelligence as a task-solving tool suffices to justify the range of operations that it can display in the current state of technological development. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2206.07438 [pdf, other]

doi 10.1145/3610536

Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview

Authors: Florian Karl, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, Janek Thomas, Jakob Richter, Michel Lang, Eduardo C. Garrido-Merchán, Juergen Branke, Bernd Bischl

Abstract: Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metric… ▽ More Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies, both from the domain of evolutionary algorithms and Bayesian optimization. We illustrate the utility of MOO in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability and robustness. △ Less

Submitted 6 June, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Published at ACM TELO

Journal ref: ACM Transactions on Evolutionary Learning and Optimization 3.4 (2023): 1-50

arXiv:2107.04126 [pdf, other]

Many Objective Bayesian Optimization

Authors: Lucia Asencio Martín, Eduardo C. Garrido-Merchán

Abstract: Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOB… ▽ More Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOBO) is a set of methods that has been successfully applied for the simultaneous optimization of black-boxes. Concretely, BO methods rely on a probabilistic model of the objective functions, typically a Gaussian process. This model generates a predictive distribution of the objectives. However, MOBO methods have problems when the number of objectives in a multi-objective optimization problem are 3 or more, which is the many objective setting. In particular, the BO process is more costly as more objectives are considered, computing the quality of the solution via the hyper-volume is also more costly and, most importantly, we have to evaluate every objective function, wasting expensive computational, economic or other resources. However, as more objectives are involved in the optimization problem, it is highly probable that some of them are redundant and not add information about the problem solution. A measure that represents how similar are GP predictive distributions is proposed. We also propose a many objective Bayesian optimization algorithm that uses this metric to determine whether two objectives are redundant. The algorithm stops evaluating one of them if the similarity is found, saving resources and not hurting the performance of the multi-objective BO algorithm. We show empirical evidence in a set of toy, synthetic, benchmark and real experiments that GPs predictive distributions of the effectiveness of the metric and the algorithm. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:2101.08061

arXiv:2101.08061 [pdf, other]

A Similarity Measure of Gaussian Process Predictive Distributions

Authors: Lucia Asencio-Martín, Eduardo C. Garrido-Merchán

Abstract: Some scenarios require the computation of a predictive distribution of a new value evaluated on an objective function conditioned on previous observations. We are interested on using a model that makes valid assumptions on the objective function whose values we are trying to predict. Some of these assumptions may be smoothness or stationarity. Gaussian process (GPs) are probabilistic models that c… ▽ More Some scenarios require the computation of a predictive distribution of a new value evaluated on an objective function conditioned on previous observations. We are interested on using a model that makes valid assumptions on the objective function whose values we are trying to predict. Some of these assumptions may be smoothness or stationarity. Gaussian process (GPs) are probabilistic models that can be interpreted as flexible distributions over functions. They encode the assumptions through covariance functions, making hypotheses about new data through a predictive distribution by being fitted to old observations. We can face the case where several GPs are used to model different objective functions. GPs are non-parametric models whose complexity is cubic on the number of observations. A measure that represents how similar is one GP predictive distribution with respect to another would be useful to stop using one GP when they are modelling functions of the same input space. We are really inferring that two objective functions are correlated, so one GP is enough to model both of them by performing a transformation of the prediction of the other function in case of inverse correlation. We show empirical evidence in a set of synthetic and benchmark experiments that GPs predictive distributions can be compared and that one of them is enough to predict two correlated functions in the same input space. This similarity metric could be extremely useful used to discard objectives in Bayesian many-objective optimization. △ Less

Submitted 20 January, 2021; originally announced January 2021.

arXiv:2101.04525 [pdf, other]

doi 10.1007/JHEP05(2021)108

A comparison of optimisation algorithms for high-dimensional particle and astrophysics applications

Authors: The DarkMachines High Dimensional Sampling Group, Csaba Balázs, Melissa van Beekveld, Sascha Caron, Barry M. Dillon, Ben Farmer, Andrew Fowlie, Eduardo C. Garrido-Merchán, Will Handley, Luc Hendriks, Guðlaugur Jóhannesson, Adam Leinweber, Judita Mamužić, Gregory D. Martinez, Sydney Otten, Pat Scott, Roberto Ruiz de Austri, Zachary Searle, Bob Stienen, Joaquin Vanschoren, Martin White

Abstract: Optimisation problems are ubiquitous in particle and astrophysics, and involve locating the optimum of a complicated function of many parameters that may be computationally expensive to evaluate. We describe a number of global optimisation algorithms that are not yet widely used in particle astrophysics, benchmark them against random sampling and existing techniques, and perform a detailed compari… ▽ More Optimisation problems are ubiquitous in particle and astrophysics, and involve locating the optimum of a complicated function of many parameters that may be computationally expensive to evaluate. We describe a number of global optimisation algorithms that are not yet widely used in particle astrophysics, benchmark them against random sampling and existing techniques, and perform a detailed comparison of their performance on a range of test functions. These include four analytic test functions of varying dimensionality, and a realistic example derived from a recent global fit of weak-scale supersymmetry. Although the best algorithm to use depends on the function being investigated, we are able to present general conclusions about the relative merits of random sampling, Differential Evolution, Particle Swarm Optimisation, the Covariance Matrix Adaptation Evolution Strategy, Bayesian Optimisation, Grey Wolf Optimisation, and the PyGMO Artificial Bee Colony, Gaussian Particle Filter and Adaptive Memory Programming for Global Optimisation algorithms. △ Less

Submitted 1 April, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

Comments: Experimental framework publicly available at http://www.github.com/darkmachines/high-dimensional-sampling

arXiv:2011.14475 [pdf, other]

An Artificial Consciousness Model and its relations with Philosophy of Mind

Authors: Eduardo C. Garrido-Merchán, Martin Molina, Francisco M. Mendoza

Abstract: This work seeks to study the beneficial properties that an autonomous agent can obtain by implementing a cognitive architecture similar to the one of conscious beings. Along this document, a conscious model of autonomous agent based in a global workspace architecture is presented. We describe how this agent is viewed from different perspectives of philosophy of mind, being inspired by their ideas.… ▽ More This work seeks to study the beneficial properties that an autonomous agent can obtain by implementing a cognitive architecture similar to the one of conscious beings. Along this document, a conscious model of autonomous agent based in a global workspace architecture is presented. We describe how this agent is viewed from different perspectives of philosophy of mind, being inspired by their ideas. The goal of this model is to create autonomous agents able to navigate within an environment composed of multiple independent magnitudes, adapting to its surroundings in order to find the best possible position in base of its inner preferences. The purpose of the model is to test the effectiveness of many cognitive mechanisms that are incorporated, such as an attention mechanism for magnitude selection, pos-session of inner feelings and preferences, usage of a memory system to storage beliefs and past experiences, and incorporating a global workspace which controls and integrates information processed by all the subsystem of the model. We show in a large experiment set how an autonomous agent can benefit from having a cognitive architecture such as the one described. △ Less

Submitted 1 December, 2020; v1 submitted 29 November, 2020; originally announced November 2020.

arXiv:2011.12075 [pdf, other]

Fuzzy Stochastic Timed Petri Nets for Causal properties representation

Authors: Alejandro Sobrino, Eduardo C. Garrido-Merchan, Cristina Puente

Abstract: Imagery is frequently used to model, represent and communicate knowledge. In particular, graphs are one of the most powerful tools, being able to represent relations between objects. Causal relations are frequently represented by directed graphs, with nodes denoting causes and links denoting causal influence. A causal graph is a skeletal picture, showing causal associations and impact between enti… ▽ More Imagery is frequently used to model, represent and communicate knowledge. In particular, graphs are one of the most powerful tools, being able to represent relations between objects. Causal relations are frequently represented by directed graphs, with nodes denoting causes and links denoting causal influence. A causal graph is a skeletal picture, showing causal associations and impact between entities. Common methods used for graphically representing causal scenarios are neurons, truth tables, causal Bayesian networks, cognitive maps and Petri Nets. Causality is often defined in terms of precedence (the cause precedes the effect), concurrency (often, an effect is provoked simultaneously by two or more causes), circularity (a cause provokes the effect and the effect reinforces the cause) and imprecision (the presence of the cause favors the effect, but not necessarily causes it). We will show that, even though the traditional graphical models are able to represent separately some of the properties aforementioned, they fail trying to illustrate indistinctly all of them. To approach that gap, we will introduce Fuzzy Stochastic Timed Petri Nets as a graphical tool able to represent time, co-occurrence, loo** and imprecision in causal flow. △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2011.01150 [pdf, other]

Improved Max-value Entropy Search for Multi-objective Bayesian Optimization with Constraints

Authors: Daniel Fernández-Sánchez, Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato

Abstract: We present MESMOC+, an improved version of Max-value Entropy search for Multi-Objective Bayesian optimization with Constraints (MESMOC). MESMOC+ can be used to solve constrained multi-objective problems when the objectives and the constraints are expensive to evaluate. MESMOC+ works by minimizing the entropy of the solution of the optimization problem in function space, i.e., the Pareto frontier,… ▽ More We present MESMOC+, an improved version of Max-value Entropy search for Multi-Objective Bayesian optimization with Constraints (MESMOC). MESMOC+ can be used to solve constrained multi-objective problems when the objectives and the constraints are expensive to evaluate. MESMOC+ works by minimizing the entropy of the solution of the optimization problem in function space, i.e., the Pareto frontier, to guide the search for the optimum. The cost of MESMOC+ is linear in the number of objectives and constraints. Furthermore, it is often significantly smaller than the cost of alternative methods based on minimizing the entropy of the Pareto set. The reason for this is that it is easier to approximate the required computations in MESMOC+. Moreover, MESMOC+'s acquisition function is expressed as the sum of one acquisition per each black-box (objective or constraint). Thus, it can be used in a decoupled evaluation setting in which one chooses not only the next input location to evaluate, but also which black-box to evaluate there. We compare MESMOC+ with related methods in synthetic and real optimization problems. These experiments show that the entropy estimation provided by MESMOC+ is more accurate than that of previous methods. This leads to better optimization results. MESMOC+ is also competitive with other information-based methods for constrained multi-objective Bayesian optimization, but it is significantly faster. △ Less

Submitted 2 April, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

arXiv:2005.13012 [pdf, ps, other]

doi 10.47852/bonviewJCCE3202838

Comparing BERT against traditional machine learning text classification

Authors: Santiago González-Carvajal, Eduardo C. Garrido-Merchán

Abstract: The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years that is able to cope with multiple NLP tasks such as supervised text classification without human supervision. Its flexibility to cope with any type of corpus delivering great results has make this approach very popular not only in academia but also in the industry. Although, there are lots of differ… ▽ More The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years that is able to cope with multiple NLP tasks such as supervised text classification without human supervision. Its flexibility to cope with any type of corpus delivering great results has make this approach very popular not only in academia but also in the industry. Although, there are lots of different approaches that have been used throughout the years with success. In this work, we first present BERT and include a little review on classical NLP approaches. Then, we empirically test with a suite of experiments dealing different scenarios the behaviour of BERT against the traditional TF-IDF vocabulary fed to machine learning algorithms. Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks. Experiments show the superiority of BERT and its independence of features of the NLP problem such as the language of the text adding empirical evidence to use BERT as a default technique to be used in NLP problems. △ Less

Submitted 12 January, 2021; v1 submitted 26 May, 2020; originally announced May 2020.

Comments: 12 pages

Journal ref: Journal of Computational and Cognitive Engineering (2023)

arXiv:2004.00601 [pdf, other]

Parallel Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints

Authors: Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato

Abstract: Real-world problems often involve the optimization of several objectives under multiple constraints. An example is the hyper-parameter tuning problem of machine learning algorithms. In particular, the minimization of the estimation of the generalization error of a deep neural network and at the same time the minimization of its prediction time. We may also consider as a constraint that the deep ne… ▽ More Real-world problems often involve the optimization of several objectives under multiple constraints. An example is the hyper-parameter tuning problem of machine learning algorithms. In particular, the minimization of the estimation of the generalization error of a deep neural network and at the same time the minimization of its prediction time. We may also consider as a constraint that the deep neural network must be implemented in a chip with an area below some size. Here, both the objectives and the constraint are black boxes, i.e., functions whose analytical expressions are unknown and are expensive to evaluate. Bayesian optimization (BO) methodologies have given state-of-the-art results for the optimization of black-boxes. Nevertheless, most BO methods are sequential and evaluate the objectives and the constraints at just one input location, iteratively. Sometimes, however, we may have resources to evaluate several configurations in parallel. Notwithstanding, no parallel BO method has been proposed to deal with the optimization of multiple objectives under several constraints. If the expensive evaluations can be carried out in parallel (as when a cluster of computers is available), sequential evaluations result in a waste of resources. This article introduces PPESMOC, Parallel Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints, an information-based batch method for the simultaneous optimization of multiple expensive-to-evaluate black-box functions under the presence of several constraints. Iteratively, PPESMOC selects a batch of input locations at which to evaluate the black-boxes so as to maximally reduce the entropy of the Pareto set of the optimization problem. We present empirical evidence in the form of synthetic, benchmark and real-world experiments that illustrate the effectiveness of PPESMOC. △ Less

Submitted 1 July, 2021; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:2002.01065 [pdf, other]

Fake News Detection by means of Uncertainty Weighted Causal Graphs

Authors: Eduardo C. Garrido-Merchán, Cristina Puente, Rafael Palacios

Abstract: Society is experimenting changes in information consumption, as new information channels such as social networks let people share news that do not necessarily be trust worthy. Sometimes, these sources of information produce fake news deliberately with doubtful purposes and the consumers of that information share it to other users thinking that the information is accurate. This transmission of info… ▽ More Society is experimenting changes in information consumption, as new information channels such as social networks let people share news that do not necessarily be trust worthy. Sometimes, these sources of information produce fake news deliberately with doubtful purposes and the consumers of that information share it to other users thinking that the information is accurate. This transmission of information represents an issue in our society, as can influence negatively the opinion of people about certain figures, groups or ideas. Hence, it is desirable to design a system that is able to detect and classify information as fake and categorize a source of information as trust worthy or not. Current systems experiment difficulties performing this task, as it is complicated to design an automatic procedure that can classify this information independent on the context. In this work, we propose a mechanism to detect fake news through a classifier based on weighted causal graphs. These graphs are specific hybrid models that are built through causal relations retrieved from texts and consider the uncertainty of causal relations. We take advantage of this representation to use the probability distributions of this graph and built a fake news classifier based on the entropy and KL divergence of learned and new information. We believe that the problem of fake news is accurately tackled by this model due to its hybrid nature between a symbolic and quantitative methodology. We describe the methodology of this classifier and add empirical evidence of the usefulness of our proposed approach in the form of synthetic experiments and a real experiment involving lung cancer. △ Less

Submitted 2 April, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

arXiv:2002.00429 [pdf, other]

Uncertainty Weighted Causal Graphs

Authors: Eduardo C. Garrido-Merchán, C. Puente, A. Sobrino, J. A. Olivas

Abstract: Causality has traditionally been a scientific way to generate knowledge by relating causes to effects. From an imaginery point of view, causal graphs are a helpful tool for representing and infering new causal information. In previous works, we have generated automatically causal graphs associated to a given concept by analyzing sets of documents and extracting and representing the found causal in… ▽ More Causality has traditionally been a scientific way to generate knowledge by relating causes to effects. From an imaginery point of view, causal graphs are a helpful tool for representing and infering new causal information. In previous works, we have generated automatically causal graphs associated to a given concept by analyzing sets of documents and extracting and representing the found causal information in that visual way. The retrieved information shows that causality is frequently imperfect rather than exact, feature gathered by the graph. In this work we will attempt to go a step further modelling the uncertainty in the graph through probabilistic improving the management of the imprecision in the quoted graph. △ Less

Submitted 6 February, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

Comments: 12 pages, 7 figures

arXiv:2001.10523 [pdf, other]

Multi-class Gaussian Process Classification with Noisy Inputs

Authors: Carlos Villacampa-Calvo, Bryan Zaldivar, Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato

Abstract: It is a common practice in the machine learning community to assume that the observed data are noise-free in the input attributes. Nevertheless, scenarios with input noise are common in real problems, as measurements are never perfectly accurate. If this input noise is not taken into account, a supervised machine learning method is expected to perform sub-optimally. In this paper, we focus on mult… ▽ More It is a common practice in the machine learning community to assume that the observed data are noise-free in the input attributes. Nevertheless, scenarios with input noise are common in real problems, as measurements are never perfectly accurate. If this input noise is not taken into account, a supervised machine learning method is expected to perform sub-optimally. In this paper, we focus on multi-class classification problems and use Gaussian processes (GPs) as the underlying classifier. Motivated by a data set coming from the astrophysics domain, we hypothesize that the observed data may contain noise in the inputs. Therefore, we devise several multi-class GP classifiers that can account for input noise. Such classifiers can be efficiently trained using variational inference to approximate the posterior distribution of the latent variables of the model. Moreover, in some situations, the amount of noise can be known before-hand. If this is the case, it can be readily introduced in the proposed methods. This prior information is expected to lead to better performance results. We have evaluated the proposed methods by carrying out several experiments, involving synthetic and real data. These include several data sets from the UCI repository, the MNIST data set and a data set coming from astrophysics. The results obtained show that, although the classification error is similar across methods, the predictive distribution of the proposed methods is better, in terms of the test log-likelihood, than the predictive distribution of a classifier based on GPs that ignores input noise. △ Less

Submitted 30 December, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

arXiv:1811.03868 [pdf, other]

doi 10.1007/978-3-030-03493-1_30

Suggesting Cooking Recipes Through Simulation and Bayesian Optimization

Authors: Eduardo C. Garrido-Merchán, Alejandro Albarca-Molina

Abstract: Cooking typically involves a plethora of decisions about ingredients and tools that need to be chosen in order to write a good cooking recipe. Cooking can be modelled in an optimization framework, as it involves a search space of ingredients, kitchen tools, cooking times or temperatures. If we model as an objective function the quality of the recipe, several problems arise. No analytical expressio… ▽ More Cooking typically involves a plethora of decisions about ingredients and tools that need to be chosen in order to write a good cooking recipe. Cooking can be modelled in an optimization framework, as it involves a search space of ingredients, kitchen tools, cooking times or temperatures. If we model as an objective function the quality of the recipe, several problems arise. No analytical expression can model all the recipes, so no gradients are available. The objective function is subjective, in other words, it contains noise. Moreover, evaluations are expensive both in time and human resources. Bayesian Optimization (BO) emerges as an ideal methodology to tackle problems with these characteristics. In this paper, we propose a methodology to suggest recipe recommendations based on a Machine Learning (ML) model that fits real and simulated data and BO. We provide empirical evidence with two experiments that support the adequacy of the methodology. △ Less

Submitted 9 November, 2018; originally announced November 2018.

arXiv:1806.11015 [pdf, other]

doi 10.1007/978-3-030-00374-6_5

Bayesian optimization of the PC algorithm for learning Gaussian Bayesian networks

Authors: Irene Córdoba, Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato, Concha Bielza, Pedro Larrañaga

Abstract: The PC algorithm is a popular method for learning the structure of Gaussian Bayesian networks. It carries out statistical tests to determine absent edges in the network. It is hence governed by two parameters: (i) The type of test, and (ii) its significance level. These parameters are usually set to values recommended by an expert. Nevertheless, such an approach can suffer from human bias, leading… ▽ More The PC algorithm is a popular method for learning the structure of Gaussian Bayesian networks. It carries out statistical tests to determine absent edges in the network. It is hence governed by two parameters: (i) The type of test, and (ii) its significance level. These parameters are usually set to values recommended by an expert. Nevertheless, such an approach can suffer from human bias, leading to suboptimal reconstruction results. In this paper we consider a more principled approach for choosing these parameters in an automatic way. For this we optimize a reconstruction score evaluated on a set of different Gaussian Bayesian networks. This objective is expensive to evaluate and lacks a closed-form expression, which means that Bayesian optimization (BO) is a natural choice. BO methods use a model to guide the search and are hence able to exploit smoothness properties of the objective surface. We show that the parameters found by a BO method outperform those found by a random search strategy and the expert recommendation. Importantly, we have found that an often overlooked statistical test provides the best over-all reconstruction results. △ Less

Submitted 28 June, 2018; originally announced June 2018.

Journal ref: Lecture Notes in Artificial Intelligence (CAEPIA 2018), 11160:44:54, 2018

arXiv:1805.03463 [pdf, other]

doi 10.1016/j.neucom.2019.11.004

Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes

Authors: Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato

Abstract: Bayesian Optimization (BO) methods are useful for optimizing functions that are expen- sive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. The acquisition function guides the optimization process and m… ▽ More Bayesian Optimization (BO) methods are useful for optimizing functions that are expen- sive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. The acquisition function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continous input variables. When this is not the case, for example when some of the input variables take categorical or integer values, one has to introduce extra approximations. Consider a suggested input location taking values in the real line. Before doing the evaluation of the objective, a common approach is to use a one hot encoding approximation for categorical variables, or to round to the closest integer, in the case of integer-valued variables. We show that this can lead to problems in the optimization process and describe a more principled approach to account for input variables that are categorical or integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods using Gaussian processes on problems with categorical or integer-valued variables. △ Less

Submitted 22 May, 2018; v1 submitted 9 May, 2018; originally announced May 2018.

arXiv:1706.03673 [pdf, other]

doi 10.1016/j.neucom.2019.11.004

Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes

Authors: Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato

Abstract: Bayesian optimization (BO) methods are useful for optimizing functions that are expensive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. This function guides the optimization process and measures the e… ▽ More Bayesian optimization (BO) methods are useful for optimizing functions that are expensive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. This function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continous input variables. When this is not the case, such as when some of the input variables take integer values, one has to introduce extra approximations. A common approach is to round the suggested variable value to the closest integer before doing the evaluation of the objective. We show that this can lead to problems in the optimization process and describe a more principled approach to account for input variables that are integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods on problems involving integer-valued variables. △ Less

Submitted 13 June, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

Comments: 7 pages

arXiv:1609.01051 [pdf, other]

doi 10.1016/j.neucom.2019.06.025

Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints

Authors: Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato

Abstract: This work presents PESMOC, Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints, an information-based strategy for the simultaneous optimization of multiple expensive-to-evaluate black-box functions under the presence of several constraints. PESMOC can hence be used to solve a wide range of optimization problems. Iteratively, PESMOC chooses an input location on whic… ▽ More This work presents PESMOC, Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints, an information-based strategy for the simultaneous optimization of multiple expensive-to-evaluate black-box functions under the presence of several constraints. PESMOC can hence be used to solve a wide range of optimization problems. Iteratively, PESMOC chooses an input location on which to evaluate the objective functions and the constraints so as to maximally reduce the entropy of the Pareto set of the corresponding optimization problem. The constraints considered in PESMOC are assumed to have similar properties to those of the objective functions in typical Bayesian optimization problems. That is, they do not have a known expression (which prevents gradient computation), their evaluation is considered to be very expensive, and the resulting observations may be corrupted by noise. These constraints arise in a plethora of expensive black-box optimization problems. We carry out synthetic experiments to illustrate the effectiveness of PESMOC, where we sample both the objectives and the constraints from a Gaussian process prior. The results obtained show that PESMOC is able to provide better recommendations with a smaller number of evaluations than a strategy based on random search. △ Less

Submitted 26 September, 2016; v1 submitted 5 September, 2016; originally announced September 2016.

Comments: 6 pages 2 figures

Journal ref: Neurocomputing, 361:50-68, 2019

Showing 1–31 of 31 results for author: Garrido-Merchán, E C