-
Challenges with the application and adoption of artificial intelligence for drug discovery
Authors:
Ghita Ghislat,
Saiveth Hernandez-Hernandez,
Chayanit Piwajanusorn,
Pedro J. Ballester
Abstract:
Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges limiting the impact and scope of AI models. Typically, these models are evaluated on benchmarks that are unlikely to anticipate their prospective performance, which inadvertently misguides their development. Indeed, while all the…
▽ More
Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges limiting the impact and scope of AI models. Typically, these models are evaluated on benchmarks that are unlikely to anticipate their prospective performance, which inadvertently misguides their development. Indeed, while all the developed models excel in a selected benchmark, only a small proportion of them are ultimately reported to have prospective value (e.g. by discovering potent and innovative drug leads for a therapeutic target). Here we discuss a range of data issues (bias, inconsistency, skewness, irrelevance, small size, high dimensionality), how they challenge AI models and which issue-specific mitigations have been effective. Next, we point out the challenges faced by uncertainty quantification techniques aimed at enhancing these AI models. We also discuss how conceptual errors, unrealistic benchmarks and performance misestimation can confound the evaluation of models and thus their development. Lastly, we explain how human bias, whether from AI experts or drug discovery experts, constitutes another challenge that can be alleviated with prospective studies.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Scaffold Splits Overestimate Virtual Screening Performance
Authors:
Qianrong Guo,
Saiveth Hernandez-Hernandez,
Pedro J Ballester
Abstract:
Virtual Screening (VS) of vast compound libraries guided by Artificial Intelligence (AI) models is a highly productive approach to early drug discovery. Data splitting is crucial for better benchmarking of such AI models. Traditional random data splits produce similar molecules between training and test sets, conflicting with the reality of VS libraries which mostly contain structurally distinct c…
▽ More
Virtual Screening (VS) of vast compound libraries guided by Artificial Intelligence (AI) models is a highly productive approach to early drug discovery. Data splitting is crucial for better benchmarking of such AI models. Traditional random data splits produce similar molecules between training and test sets, conflicting with the reality of VS libraries which mostly contain structurally distinct compounds. Scaffold split, grou** molecules by shared core structure, is widely considered to reflect this real-world scenario. However, here we show that the scaffold split also overestimates VS performance. The reason is that molecules with different chemical scaffolds are often similar, which hence introduces unrealistically high similarities between training molecules and test molecules following a scaffold split. Our study examined three representative AI models on 60 NCI-60 datasets, each with approximately 30,000 to 50,000 molecules tested on a different cancer cell line. Each dataset was split with three methods: scaffold, Butina clustering and the more accurate Uniform Manifold Approximation and Projection (UMAP) clustering. Regardless of the model, model performance is much worse with UMAP splits from the results of the 2100 models trained and evaluated for each algorithm and split. These robust results demonstrate the need for more realistic data splits to tune, compare, and select models for VS. For the same reason, avoiding the scaffold split is also recommended for other molecular property prediction problems. The code to reproduce these results is available at https://github.com/ScaffoldSplitsOverestimateVS
△ Less
Submitted 30 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Artist Area Versus Numerical Cluster
Authors:
Patrice Ballester
Abstract:
Artist area versus numerical cluster. Between land management and production of new creative space: the Poblenou 22@ in Barcelona. Poblenou (or the new village in Catalan) is a district of Barcelona known for over ten years urban regeneration based on the concept of the new economics of ICT (Information and Communications Technology) as a source of new attractive image for the location of internat…
▽ More
Artist area versus numerical cluster. Between land management and production of new creative space: the Poblenou 22@ in Barcelona. Poblenou (or the new village in Catalan) is a district of Barcelona known for over ten years urban regeneration based on the concept of the new economics of ICT (Information and Communications Technology) as a source of new attractive image for the location of international companies with a creative project called 22@ district. A transition produced a territorial gentrification thwarted by artists, intellectuals and local people adopting as the emblem of their action, the rescue of a flagship industrial architecture: Can Ricart, home of many artists. Since, urban planning has been redesigned in its margins and townscape of the area tends to reflect the consensus between politicians, multinational companies and artists survivors of this high-pressure real estate. An assessment of the financial risks is considered by the territorial upheavals and the financial sums committed to rebuilding a neighborhood through the chimerical model of "Silicon Valley". These artists became referents, while trying to fit into the theme of the new creative economy of the territory through alliances, orders and cooperation shy but regular. The new european Soho research yet an identity and a certain lifestyle based on the new balance of power highlighted by the artists in their criticism, actions and communications: cohabitation between old and new residents ; the choice between contemporary architecture and heritage protection; the artist as creator, assistant or complicit in the so-called creative class.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Barcelona in the face of globalization, how to think of the city through the organization and evaluation of major events?
Authors:
Patrice Ballester
Abstract:
The event questions men whether it is political, cultural or touristic. It has its own meaning as it starts something while showing a will, a new possibility to create, to meet and to surprise. The event is in fact an "advent that reaches everything" generally integrating itself into a long process or phase of the evolution of societies in terms of its societal structure. However, if the event exi…
▽ More
The event questions men whether it is political, cultural or touristic. It has its own meaning as it starts something while showing a will, a new possibility to create, to meet and to surprise. The event is in fact an "advent that reaches everything" generally integrating itself into a long process or phase of the evolution of societies in terms of its societal structure. However, if the event exists, it is significant and therefore assessable. How to evaluate an Olympic Games or an international exhibition? Can we evaluate the before (from the statement of position, application file), the during (the course of the tourist, cultural or sporting event) and the after (the closing of the festival and its direct and indirect effects on society)? How then to evaluate all the dimensions of the event generally based on major tourist and urban planning operations? We will take as a field of study a city -- capital -- commercial port which has become a sort of archetype in the context of the organization of mega-events with a tourist vocation, such as the Universal or International Exhibitions and the Olympic Games, namely Barcelona and three very important dates for the Catalan capital: 1888, universal exhibition, 1929 universal and international exhibition, 1992 the modern Olympic Games and 2004 the Universal Forum of Cultures of Unesco, international exhibition. The economic interest of organizing an ephemeral giant event is still relevant and will be confirmed in the future with the role of Asian countries within the IOC and the BIE (Bureau international des expositions). The important thing is to develop a rational argument to communicate on the economic merits of the event. Sustainable development and Corporate Social Responsibility (CSR) are predominant elements for ephemeral events that are intended to be sustainable over time for future generations.
△ Less
Submitted 4 November, 2022;
originally announced December 2022.
-
Barcelona, The Innovation District: between vulnerability and social equity
Authors:
P. Ballester
Abstract:
The municipal actors of the Catalan capital wish to obtain a reversal of the negative representations concerning the lack of attractiveness of the district of Poblenou considered as one of the poorest and most marginalized districts of the city while having as identity the expression of "Manchester southern Europe", recalling its role as a factory for Spain and an emblem of the Industrial Revoluti…
▽ More
The municipal actors of the Catalan capital wish to obtain a reversal of the negative representations concerning the lack of attractiveness of the district of Poblenou considered as one of the poorest and most marginalized districts of the city while having as identity the expression of "Manchester southern Europe", recalling its role as a factory for Spain and an emblem of the Industrial Revolution for the Iberian Peninsula. Conceptualization, programming and construction of a new image of the city through this large urban wasteland are made concrete thanks to an urban marketing operation. The specifications allow the construction of innovative buildings, with resolutely contemporary architecture, breaking with the regulations in force on the limitation of the height of skyscrapers in order to make this pericentral territory visible and attractive. Catalan society in the 21st century has produced a fragmented and dominating neo-capitalist space through a thoughtless real estate frenzy and complex land games. Walking through the streets of Poblenou, one is forced to think of the work of Henri Lefebvre on the production of space and the three orders of understanding the design of an urban composition. Here, the city has become a stake and the place of economic growth based on innovation. Finally, neighborhood life is in complete mutation which seems for this territory to leave a chance for a long time on the sidelines. We present an economic analysis grid of a cluster with graphs and statistics.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Stochastic-based Neural Network hardware acceleration for an efficient ligand-based virtual screening
Authors:
Christian F. Frasser,
Carola de Benito,
Vincent Canals,
Miquel Roca,
Pedro J. Ballester,
Josep L. Rossello
Abstract:
Artificial Neural Networks (ANN) have been popularized in many science and technological areas due to their capacity to solve many complex pattern matching problems. That is the case of Virtual Screening, a research area that studies how to identify those molecular compounds with the highest probability to present biological activity for a therapeutic target. Due to the vast number of small organi…
▽ More
Artificial Neural Networks (ANN) have been popularized in many science and technological areas due to their capacity to solve many complex pattern matching problems. That is the case of Virtual Screening, a research area that studies how to identify those molecular compounds with the highest probability to present biological activity for a therapeutic target. Due to the vast number of small organic compounds and the thousands of targets for which such large-scale screening can potentially be carried out, there has been an increasing interest in the research community to increase both, processing speed and energy efficiency in the screening of molecular databases. In this work, we present a classification model describing each molecule with a single energy-based vector and propose a machine-learning system based on the use of ANNs. Different ANNs are studied with respect to their suitability to identify biochemical similarities. Also, a high-performance and energy-efficient hardware acceleration platform based on the use of stochastic computing is proposed for the ANN implementation. This platform is of utility when screening vast libraries of compounds. As a result, the proposed model showed appreciable improvements with respect previously published works in terms of the main relevant characteristics (accuracy, speed and energy-efficiency).
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification
Authors:
Eduardo H. P. Pooch,
Pedro L. Ballester,
Rodrigo C. Barros
Abstract:
While deep learning models become more widespread, their ability to handle unseen data and generalize for any scenario is yet to be challenged. In medical imaging, there is a high heterogeneity of distributions among images based on the equipment that generates them and their parametrization. This heterogeneity triggers a common issue in machine learning called domain shift, which represents the d…
▽ More
While deep learning models become more widespread, their ability to handle unseen data and generalize for any scenario is yet to be challenged. In medical imaging, there is a high heterogeneity of distributions among images based on the equipment that generates them and their parametrization. This heterogeneity triggers a common issue in machine learning called domain shift, which represents the difference between the training data distribution and the distribution of where a model is employed. A high domain shift tends to implicate in a poor generalization performance from the models. In this work, we evaluate the extent of domain shift on four of the largest datasets of chest radiographs. We show how training and testing with different datasets (e.g., training in ChestX-ray14 and testing in CheXpert) drastically affects model performance, posing a big question over the reliability of deep learning models trained on public datasets. We also show that models trained on CheXpert and MIMIC-CXR generalize better to other datasets.
△ Less
Submitted 22 June, 2020; v1 submitted 3 September, 2019;
originally announced September 2019.
-
Unsupervised domain adaptation for medical imaging segmentation with self-ensembling
Authors:
Christian S. Perone,
Pedro Ballester,
Rodrigo C. Barros,
Julien Cohen-Adad
Abstract:
Recent advances in deep learning methods have come to define the state-of-the-art for many medical imaging applications, surpassing even human judgment in several tasks. Those models, however, when trained to reduce the empirical risk on a single domain, fail to generalize when applied to other domains, a very common scenario in medical imaging due to the variability of images and anatomical struc…
▽ More
Recent advances in deep learning methods have come to define the state-of-the-art for many medical imaging applications, surpassing even human judgment in several tasks. Those models, however, when trained to reduce the empirical risk on a single domain, fail to generalize when applied to other domains, a very common scenario in medical imaging due to the variability of images and anatomical structures, even across the same imaging modality. In this work, we extend the method of unsupervised domain adaptation using self-ensembling for the semantic segmentation task and explore multiple facets of the method on a small and realistic publicly-available magnetic resonance (MRI) dataset. Through an extensive evaluation, we show that self-ensembling can indeed improve the generalization of the models even when using a small amount of unlabelled data.
△ Less
Submitted 10 January, 2019; v1 submitted 14 November, 2018;
originally announced November 2018.
-
Persistence of slow dynamics in Tb(OETAP)$_2$ single molecule magnets embedded in conducting polymers
Authors:
Tomas Orlando,
Marta Filibian,
Samuele Sanna,
Nelson Giménez-Agullo,
Cristina Sáenz de Pipaón,
Pau Ballester,
José Ramón Galán-Mascarós,
Pietro Carretta
Abstract:
The spin dynamics of Tb(OETAP)$_2$ single ion magnets was investigated by means of muon spin resonance ($μ$SR) both in the bulk material as well as when the system is embedded into PEDOT:PSS polymer conductor. The characteristic spin fluctuation time is characterized by a high temperature activated trend, with an energy barrier around 320 K, and by a low temperature tunneling regime. When the sing…
▽ More
The spin dynamics of Tb(OETAP)$_2$ single ion magnets was investigated by means of muon spin resonance ($μ$SR) both in the bulk material as well as when the system is embedded into PEDOT:PSS polymer conductor. The characteristic spin fluctuation time is characterized by a high temperature activated trend, with an energy barrier around 320 K, and by a low temperature tunneling regime. When the single ion magnet is embedded into the polymer the energy barrier only slightly decreases and the fluctuation time remains of the same order of magnitude even at low temperature. This finding shows that these single molecule magnets preserve their characteristics which, if combined with those of the conducting polymer, result in a hybrid material of potential interest for organic spintronics.
△ Less
Submitted 7 May, 2016;
originally announced May 2016.
-
Molecfit: A general tool for telluric absorption correction. I. Method and application to ESO instruments
Authors:
A. Smette,
H. Sana,
S. Noll,
H. Horst,
W. Kausch,
S. Kimeswenger,
M. Barden,
C. Szyszka,
A. M. Jones,
A. Gallenne,
J. Vinther,
P. Ballester,
J. Taylor
Abstract:
Context: The interaction of the light from astronomical objects with the constituents of the Earth's atmosphere leads to the formation of telluric absorption lines in ground-based collected spectra. Correcting for these lines, mostly affecting the red and infrared region of the spectrum, usually relies on observations of specific stars obtained close in time and airmass to the science targets, the…
▽ More
Context: The interaction of the light from astronomical objects with the constituents of the Earth's atmosphere leads to the formation of telluric absorption lines in ground-based collected spectra. Correcting for these lines, mostly affecting the red and infrared region of the spectrum, usually relies on observations of specific stars obtained close in time and airmass to the science targets, therefore using precious observing time. Aims: We present molecfit, a tool for correcting for telluric absorption lines based on synthetic modelling of the Earth's atmospheric transmission. Molecfit is versatile and can be used with data obtained with various ground-based telescopes and instruments. Methods: Molecfit combines a publicly available radiative transfer code, a molecular line database, atmospheric profiles, and various kernels to model the instrument line spread function. The atmospheric profiles are created by merging a standard atmospheric profile representative of a given observatory's climate, of local meteorological data, and of dynamically retrieved altitude profiles for temperature, pressure, and humidity. We discuss the various ingredients of the method, its applicability, and its limitations. We also show examples of telluric line correction on spectra obtained with a suite of ESO Very Large Telescope (VLT) instruments. Results: Compared to previous similar tools, molecfit takes the best results for temperature, pressure, and humidity in the atmosphere above the observatory into account. As a result, the standard deviation of the residuals after correction of unsaturated telluric lines is frequently better than 2% of the continuum. Conclusion: Molecfit is able to accurately model and correct for telluric lines over a broad range of wavelengths and spectral resolutions. (Abridged)
△ Less
Submitted 28 January, 2015;
originally announced January 2015.
-
Automated data reduction workflows for astronomy
Authors:
W. Freudling,
M. Romaniello,
D. M. Bramich,
P. Ballester,
V. Forchi,
C. E. Garcia-Dablo,
S. Moehler,
M. J. Neeser
Abstract:
Data from complex modern astronomical instruments often consist of a large number of different science and calibration files, and their reduction requires a variety of software tools. The execution chain of the tools represents a complex workflow that needs to be tuned and supervised, often by individual researchers that are not necessarily experts for any specific instrument. The efficiency of da…
▽ More
Data from complex modern astronomical instruments often consist of a large number of different science and calibration files, and their reduction requires a variety of software tools. The execution chain of the tools represents a complex workflow that needs to be tuned and supervised, often by individual researchers that are not necessarily experts for any specific instrument. The efficiency of data reduction can be improved by using automatic workflows to organise data and execute the sequence of data reduction steps. To realize such efficiency gains, we designed a system that allows intuitive representation, execution and modification of the data reduction workflow, and has facilities for inspection and interaction with the data. The European Southern Observatory (ESO) has developed Reflex, an environment to automate data reduction workflows. Reflex is implemented as a package of customized components for the Kepler workflow engine. Kepler provides the graphical user interface to create an executable flowchart-like representation of the data reduction process. Key features of Reflex are a rule-based data organiser, infrastructure to re-use results, thorough book-kee**, data progeny tracking, interactive user interfaces, and a novel concept to exploit information created during data organisation for the workflow execution. Reflex includes novel concepts to increase the efficiency of astronomical data processing. While Reflex is a specific implementation of astronomical scientific workflows within the Kepler workflow engine, the overall design choices and methods can also be applied to other environments for running automated science workflows.
△ Less
Submitted 21 November, 2013;
originally announced November 2013.
-
Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties
Authors:
Michael P. Menden,
Francesco Iorio,
Mathew Garnett,
Ultan McDermott,
Cyril Benes,
Pedro J. Ballester,
Julio Saez-Rodriguez
Abstract:
Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been…
▽ More
Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC50 values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC50 values in a 8-fold cross-validation and an independent blind test with coefficient of determination R2 of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R2 of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC50 values rather than experimentally measure them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity.
△ Less
Submitted 18 March, 2013; v1 submitted 3 December, 2012;
originally announced December 2012.
-
Reflex: Scientific Workflows for the ESO Pipelines
Authors:
Pascal Ballester,
Daniel Bramich,
Vincenzo Forchi,
Wolfram Freudling,
Cesar Enrique Garcia-Dabo,
Maurice klein Gebbinck,
Andrea Modigliani,
Martino Romaniello
Abstract:
The recently released Reflex scientific workflow environment supports the interactive execution of ESO VLT data reduction pipelines. Reflex is based upon the Kepler workflow engine, and provides components for organising the data, executing pipeline recipes based on the ESO Common Pipeline Library, invoking Python scripts, and constructing interaction loops. Reflex will greatly enhance the quick v…
▽ More
The recently released Reflex scientific workflow environment supports the interactive execution of ESO VLT data reduction pipelines. Reflex is based upon the Kepler workflow engine, and provides components for organising the data, executing pipeline recipes based on the ESO Common Pipeline Library, invoking Python scripts, and constructing interaction loops. Reflex will greatly enhance the quick validation and reduction of the scientific data. In this paper we summarize the main features of Reflex, and demonstrate as an example its application to the reduction of echelle UVES data.
△ Less
Submitted 10 February, 2011;
originally announced February 2011.
-
The SINFONI pipeline
Authors:
Andrea Modigliani,
Wolfgang Hummel,
Roberto Abuter,
Paola Amico,
Pascal Ballester,
Richard Davies,
Christophe Dumas,
Mattew Horrobin,
Mark Neeser,
Markus Kissler-Patig,
Michele Peron,
Juha Rehunanen,
Juergen Schreiber,
Thomas Szeifert
Abstract:
The SINFONI data reduction pipeline, as part of the ESO-VLT Data Flow System, provides recipes for Paranal Science Operations, and for Data Flow Operations at Garching headquarters. At Paranal, it is used for the quick-look data evaluation. For Data Flow Operations, it fulfills several functions: creating master calibrations; monitoring instrument health and data quality; and reducing science da…
▽ More
The SINFONI data reduction pipeline, as part of the ESO-VLT Data Flow System, provides recipes for Paranal Science Operations, and for Data Flow Operations at Garching headquarters. At Paranal, it is used for the quick-look data evaluation. For Data Flow Operations, it fulfills several functions: creating master calibrations; monitoring instrument health and data quality; and reducing science data for delivery to service mode users. The pipeline is available to the science community for reprocessing data with personalised reduction strategies and parameters. The pipeline recipes can be executed either with EsoRex at the command line level or through the Gasgano graphical user interface. The recipes are implemented with the ESO Common Pipeline Library (CPL).
SINFONI is the Spectrograph for INtegral Field Observations in the Near Infrared (1.1-2.45 um) at the ESO-VLT. SINFONI was developed and build by ESO and MPE in collaboration with NOVA. It consists of the SPIFFI integral field spectrograph and an adaptive optics module which allows diffraction limited and seeing limited observations. The image slicer of SPIFFI chops the SINFONI field of view on the sky in 32 slices which are re-arranged to a pseudo slit. The latter is dispersed by one of the four possible gratings (J, H, K, H+K). The detector thus sees a spatial dimension (along the pseudo-slit) and a spectral dimension.
We describe in this paper the main data reduction procedures of the SINFONI pipeline, which is based on SPRED - the SPIFFI data reduction software developed by MPE, and the most recent developments after more than a year of SINFONI operations.
△ Less
Submitted 5 February, 2007; v1 submitted 10 January, 2007;
originally announced January 2007.
-
Calibration Observations of Fomalhaut with the VLTI
Authors:
J. Davis,
A. Richichi,
P. Ballester,
Ph. Gitton,
A. Glindemann,
S. Morel,
M. Schoeller,
M. Wittkowski,
F. Paresce
Abstract:
An investigation of the stability of the transfer function of the European Southern Observatory's Very Large Telescope Interferometer has been carried out through observations of Fomalhaut over a wide range in hour angle. No significant variation in the transfer function was found for the zenith angle range 5-70 degrees. The projected baseline varied between 139.7 m and 49.8 m during the observa…
▽ More
An investigation of the stability of the transfer function of the European Southern Observatory's Very Large Telescope Interferometer has been carried out through observations of Fomalhaut over a wide range in hour angle. No significant variation in the transfer function was found for the zenith angle range 5-70 degrees. The projected baseline varied between 139.7 m and 49.8 m during the observations and, as an integral part of the determination of the transfer function, a new accurate limb-darkened angular diameter for Fomalhaut has been established. This has led to improved values for the emergent flux, effective temperature, radius and luminosity.
△ Less
Submitted 26 October, 2004;
originally announced October 2004.