Search | arXiv e-print repository

Structured Generations: Using Hierarchical Clusters to guide Diffusion Models

Authors: Jorge da Silva Goncalves, Laura Manduchi, Moritz Vandenhirtz, Julia Vogt

Abstract: This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate… ▽ More This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 8 pages, 7 figures, Structured Probabilistic Inference & Generative Modeling workshop of ICML 2024

arXiv:2407.02626 [pdf]

The text2term tool to map free-text descriptions of biomedical terms to ontologies

Authors: Rafael S. Gonçalves, Jason Payne, Amelia Tan, Carmen Benitez, Jamie Haddock, Robert Gentleman

Abstract: There is an ongoing need for scalable tools to aid researchers in both retrospective and prospective standardization of discrete entity types -- such as disease names, cell types or chemicals -- that are used in metadata associated with biomedical data. When metadata are not well-structured or precise, the associated data are harder to find and are often burdensome to reuse, analyze or integrate w… ▽ More There is an ongoing need for scalable tools to aid researchers in both retrospective and prospective standardization of discrete entity types -- such as disease names, cell types or chemicals -- that are used in metadata associated with biomedical data. When metadata are not well-structured or precise, the associated data are harder to find and are often burdensome to reuse, analyze or integrate with other datasets due to the upfront curation effort required to make the data usable -- typically through retrospective standardization and cleaning of the (meta)data. With the goal of facilitating the task of standardizing metadata -- either in bulk or in a one-by-one fashion; for example, to support auto-completion of biomedical entities in forms -- we have developed an open-source tool called text2term that maps free-text descriptions of biomedical entities to controlled terms in ontologies. The tool is highly configurable and can be used in multiple ways that cater to different users and expertise levels -- it is available on PyPI and can be used programmatically as any Python package; it can also be used via a command-line interface; or via our hosted, graphical user interface-based Web application (https://text2term.hms.harvard.edu); or by deploying a local instance of our interactive application using Docker. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2311.17771 [pdf, other]

Supervising the Centroid Baseline for Extractive Multi-Document Summarization

Authors: Simão Gonçalves, Gonçalo Correia, Diogo Pernes, Afonso Mendes

Abstract: The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed. We further refine it by adding a beam search process to the sentence selection and also a centroid estimation attention model that leads to improved results. We demonstrate this in several multi-document summarization datasets, including in a multilingual s… ▽ More The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed. We further refine it by adding a beam search process to the sentence selection and also a centroid estimation attention model that leads to improved results. We demonstrate this in several multi-document summarization datasets, including in a multilingual scenario. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: Accepted at "The 4th New Frontiers in Summarization (with LLMs) Workshop"

arXiv:2304.08457 [pdf, other]

doi 10.1016/j.chaos.2023.113579

Deep Learning Criminal Networks

Authors: Haroldo V. Ribeiro, Diego D. Lopes, Arthur A. B. Pessa, Alvaro F. Martins, Bruno R. da Cunha, Sebastian Goncalves, Ervin K. Lenzi, Quentin S. Hanley, Matjaz Perc

Abstract: Recent advances in deep learning methods have enabled researchers to develop and apply algorithms for the analysis and modeling of complex networks. These advances have sparked a surge of interest at the interface between network science and machine learning. Despite this, the use of machine learning methods to investigate criminal networks remains surprisingly scarce. Here, we explore the potenti… ▽ More Recent advances in deep learning methods have enabled researchers to develop and apply algorithms for the analysis and modeling of complex networks. These advances have sparked a surge of interest at the interface between network science and machine learning. Despite this, the use of machine learning methods to investigate criminal networks remains surprisingly scarce. Here, we explore the potential of graph convolutional networks to learn patterns among networked criminals and to predict various properties of criminal networks. Using empirical data from political corruption, criminal police intelligence, and criminal financial networks, we develop a series of deep learning models based on the GraphSAGE framework that are able to recover missing criminal partnerships, distinguish among types of associations, predict the amount of money exchanged among criminal agents, and even anticipate partnerships and recidivism of criminals during the growth dynamics of corruption networks, all with impressive accuracy. Our deep learning models significantly outperform previous shallow learning approaches and produce high-quality embeddings for node and edge properties. Moreover, these models inherit all the advantages of the GraphSAGE framework, including the generalization to unseen nodes and scaling up to large graph structures. △ Less

Submitted 4 June, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: 14 two-column pages, 5 figures

Journal ref: Chaos, Solitons & Fractals 172, 113579 (2023)

arXiv:2301.02608 [pdf, other]

doi 10.1038/s41698-024-00539-4

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Authors: Pedro C. Neto, Diana Montezuma, Sara P. Oliveira, Domingos Oliveira, João Fraga, Ana Monteiro, João Monteiro, Liliana Ribeiro, Sofia Gonçalves, Stefan Reinhard, Inti Zlobec, Isabel M. Pinto, Jaime S. Cardoso

Abstract: Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an app… ▽ More Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996. △ Less

Submitted 30 April, 2024; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: Accepted at npj Precision Oncology. Available at: https://www.nature.com/articles/s41698-024-00539-4

Journal ref: npj Precis. Onc. 8, 56 (2024)

arXiv:2209.03171 [pdf, other]

doi 10.1038/s41598-022-20025-w

Machine Learning Partners in Criminal Networks

Authors: Diego D. Lopes, Bruno R. da Cunha, Alvaro F. Martins, Sebastian Goncalves, Ervin K. Lenzi, Quentin S. Hanley, Matjaz Perc, Haroldo V. Ribeiro

Abstract: Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can b… ▽ More Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can be used to recover missing criminal partnerships, distinguish among different types of criminal and legal associations, as well as predict the total amount of money exchanged among criminal agents, all with outstanding accuracy. We also show that our approach can anticipate future criminal associations during the dynamic growth of corruption networks with significant accuracy. Thus, similar to evidence found at crime scenes, we conclude that structural patterns of criminal networks carry crucial information about illegal activities, which allows machine learning methods to predict missing information and even anticipate future criminal behavior. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: 10 pages, 4 figures, supplementary information; accepted for publication in Scientific Reports

Journal ref: Sci. Rep. 12, 15746 (2022)

arXiv:2111.12209 [pdf]

Sistema de sensoriamento sem fio aplicavel a deteccao de incendios florestais

Authors: Lucas Santos Goncalves, Celso Barbosa Carvalho

Abstract: In this research work, a hardware and software system is developed that uses wireless sensors to monitor environmental variables such as temperature, gas concentration and luminosity, in order to detect the existence of forest fires. Lora technology was used for wireless sensor networks with communication range that can reach on average up to 5km in urban areas and 10km in rural areas. The develop… ▽ More In this research work, a hardware and software system is developed that uses wireless sensors to monitor environmental variables such as temperature, gas concentration and luminosity, in order to detect the existence of forest fires. Lora technology was used for wireless sensor networks with communication range that can reach on average up to 5km in urban areas and 10km in rural areas. The developed system also has an integrated web application (dashboard) and that in real time, collects data from wireless sensors, which together form the sensor module, also called device. Then, this data is presented on a map associated with the positioning of each sensor module. The developed system was tested using practical experiments that used flames, gases and lighting, simulating the occurrence of fires. With the tests performed, it was observed the feasibility of the system, hardware/software developed, in detecting the fires in the simulated scenarios. Therefore, it was found that the research is promising, and may advance in the future for the detection of real fires. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: in Portuguese

arXiv:2105.12118 [pdf, other]

Solving the One-dimensional Distance Geometry Problem by Optical Computing

Authors: S. B. Hengeveld, N. Rubiano da Silva, D. S. Gonçalves, P. H. Souto Ribeiro, A. Mucherino

Abstract: Distance geometry problem belongs to a class of hard problems in classical computation that can be understood in terms of a set of inputs processed according to a given transformation, and for which the number of possible outcomes grows exponentially with the number of inputs. It is conjectured that quantum computing schemes can solve problems belonging to this class in a time that grows only at a… ▽ More Distance geometry problem belongs to a class of hard problems in classical computation that can be understood in terms of a set of inputs processed according to a given transformation, and for which the number of possible outcomes grows exponentially with the number of inputs. It is conjectured that quantum computing schemes can solve problems belonging to this class in a time that grows only at a polynomial rate with the number of inputs. While quantum computers are still being developed, there are some classical optics computation approaches that can perform very well for specific tasks. Here, we present an optical computing approach for the distance geometry problem in one dimension and show that it is very promising in the classical computing regime. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: 8 pages, 1 figure

arXiv:2009.05404 [pdf, ps, other]

doi 10.1007/s00453-021-00835-6

A new algorithm for the $^K$DMDGP subclass of Distance Geometry Problems

Authors: Douglas S. Goncalves, Carlile Lavor, Leo Liberti, Michael Souza

Abstract: The fundamental inverse problem in distance geometry is the one of finding positions from inter-point distances. The Discretizable Molecular Distance Geometry Problem (DMDGP) is a subclass of the Distance Geometry Problem (DGP) whose search space can be discretized and represented by a binary tree, which can be explored by a Branch-and-Prune (BP) algorithm. It turns out that this combinatorial sea… ▽ More The fundamental inverse problem in distance geometry is the one of finding positions from inter-point distances. The Discretizable Molecular Distance Geometry Problem (DMDGP) is a subclass of the Distance Geometry Problem (DGP) whose search space can be discretized and represented by a binary tree, which can be explored by a Branch-and-Prune (BP) algorithm. It turns out that this combinatorial search space possesses many interesting symmetry properties that were studied in the last decade. In this paper, we present a new algorithm for this subclass of the DGP, which exploits DMDGP symmetries more effectively than its predecessors. Computational results show that the speedup, with respect to the classic BP algorithm, is considerable for sparse DMDGP instances related to protein conformation. △ Less

Submitted 11 September, 2020; originally announced September 2020.

Comments: This is a full version of the extended abstract accepted at CTW2020

arXiv:1907.02106 [pdf, other]

doi 10.1007/978-3-030-30796-7_26

Use of OWL and Semantic Web Technologies at Pinterest

Authors: Rafael S. Gonçalves, Matthew Horridge, Rui Li, Yu Liu, Mark A. Musen, Csongor I. Nyulas, Evelyn Obamos, Dhananjay Shrouty, David Temple

Abstract: Pinterest is a popular Web application that has over 250 million active users. It is a visual discovery engine for finding ideas for recipes, fashion, weddings, home decoration, and much more. In the last year, the company adopted Semantic Web technologies to create a knowledge graph that aims to represent the vast amount of content and users on Pinterest, to help both content recommendation and a… ▽ More Pinterest is a popular Web application that has over 250 million active users. It is a visual discovery engine for finding ideas for recipes, fashion, weddings, home decoration, and much more. In the last year, the company adopted Semantic Web technologies to create a knowledge graph that aims to represent the vast amount of content and users on Pinterest, to help both content recommendation and ads targeting. In this paper, we present the engineering of an OWL ontology---the Pinterest Taxonomy---that forms the core of Pinterest's knowledge graph, the Pinterest Taste Graph. We describe modeling choices and enhancements to WebProtégé that we used for the creation of the ontology. In two months, eight Pinterest engineers, without prior experience of OWL and WebProtégé, revamped an existing taxonomy of noisy terms into an OWL ontology. We share our experience and present the key aspects of our work that we believe will be useful for others working in this area. △ Less

Submitted 3 July, 2019; originally announced July 2019.

arXiv:1905.06480 [pdf]

doi 10.1007/978-3-319-68204-4_10

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments

Authors: Rafael S. Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, Mark A. Musen

Abstract: The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed--the CEDAR Workbench--is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. T… ▽ More The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed--the CEDAR Workbench--is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source. △ Less

Submitted 15 May, 2019; originally announced May 2019.

arXiv:1903.08206 [pdf, other]

doi 10.1007/978-3-030-21348-0_10

Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

Authors: Rafael S. Gonçalves, Maulik R. Kamdar, Mark A. Musen

Abstract: The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity---there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering scientific data, it is crucial that they be represe… ▽ More The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity---there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering scientific data, it is crucial that they be represented in a uniform way that can be queried effectively. One step toward uniformly-represented metadata is to normalize the multiple, distinct field names used in metadata (e.g., lat lon, lat and long) to describe the same type of value. To that end, we present a new method based on clustering and embeddings (i.e., vector representations of words) to align metadata field names with ontology terms. We apply our method to biomedical metadata by generating embeddings for terms in biomedical ontologies from the BioPortal repository. We carried out a comparative study between our method and the NCBO Annotator, which revealed that our method yields more and substantially better alignments between metadata and ontology terms. △ Less

Submitted 16 May, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

arXiv:1902.08251 [pdf]

doi 10.1145/3308560.3317707

WebProtégé: A Cloud-Based Ontology Editor

Authors: Matthew Horridge, Rafael S. Gonçalves, Csongor I. Nyulas, Tania Tudorache, Mark A. Musen

Abstract: We present WebProtégé, a tool to develop ontologies represented in the Web Ontology Language (OWL). WebProtégé is a cloud-based application that allows users to collaboratively edit OWL ontologies, and it is available for use at https://webprotege.stanford.edu. WebProtégeé currently hosts more than 68,000 OWL ontology projects and has over 50,000 user accounts. In this paper, we detail the main ne… ▽ More We present WebProtégé, a tool to develop ontologies represented in the Web Ontology Language (OWL). WebProtégé is a cloud-based application that allows users to collaboratively edit OWL ontologies, and it is available for use at https://webprotege.stanford.edu. WebProtégeé currently hosts more than 68,000 OWL ontology projects and has over 50,000 user accounts. In this paper, we detail the main new features of the latest version of WebProtégé. △ Less

Submitted 5 March, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

arXiv:1810.05962 [pdf, ps, other]

Empirical determination of the optimum attack for fragmentation of modular networks

Authors: Carolina de Abreu, Bruno Requião da Cunha, Sebastián Gonçalves

Abstract: All possible removals of $n=5$ nodes from networks of size $N=100$ are performed in order to find the optimal set of nodes which fragments the original network into the smallest largest connected component. The resulting attacks are ordered according to the size of the largest connected component and compared with the state of the art methods of network attacks. We chose attacks of size $5$ on rel… ▽ More All possible removals of $n=5$ nodes from networks of size $N=100$ are performed in order to find the optimal set of nodes which fragments the original network into the smallest largest connected component. The resulting attacks are ordered according to the size of the largest connected component and compared with the state of the art methods of network attacks. We chose attacks of size $5$ on relative small networks of size $100$ because the number of all possible attacks, ${100}\choose{5}$ $\approx 10^8$, is at the verge of the possible to compute with the available standard computers. Besides, we applied the procedure in a series of networks with controlled and varied modularity, comparing the resulting statistics with the effect of removing the same amount of vertices, according to the known most efficient disruption strategies, i.e., High Betweenness Adaptive attack (HBA), Collective Index attack (CI), and Modular Based Attack (MBA). Results show that modularity has an inverse relation with robustness, with $Q_c \approx 0.7$ being the critical value. For modularities lower than $Q_c$, all heuristic method gives mostly the same results than with random attacks, while for bigger $Q$, networks are less robust and highly vulnerable to malicious attacks. △ Less

Submitted 13 October, 2018; originally announced October 2018.

Comments: 14 pages, 6 figures

arXiv:1808.06907 [pdf]

doi 10.1038/sdata.2019.21

The variable quality of metadata about biological samples used in biomedical experiments

Authors: Rafael S. Gonçalves, Mark A. Musen

Abstract: We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample---a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples---a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4M sample metadata records… ▽ More We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample---a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples---a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4M sample metadata records in the two repositories are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the metadata. Most metadata field names and their values are not standardized or controlled. Even simple binary or numeric fields are often populated with inadequate values of different data types. By clustering metadata field names, we discovered there are often many distinct ways to represent the same aspect of a sample. Overall, the metadata we analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The significant aberrancies that we found in the metadata are likely to impede search and secondary use of the associated datasets. △ Less

Submitted 18 January, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

Comments: arXiv admin note: text overlap with arXiv:1708.01286

arXiv:1803.00985 [pdf, other]

Hybrid Model For Word Prediction Using Naive Bayes and Latent Information

Authors: Henrique X. Goulart, Mauro D. L. Tosi, Daniel Soares Gonçalves, Rodrigo F. Maia, Guilherme A. Wachs-Lopes

Abstract: Historically, the Natural Language Processing area has been given too much attention by many researchers. One of the main motivation beyond this interest is related to the word prediction problem, which states that given a set words in a sentence, one can recommend the next word. In literature, this problem is solved by methods based on syntactic or semantic analysis. Solely, each of these analysi… ▽ More Historically, the Natural Language Processing area has been given too much attention by many researchers. One of the main motivation beyond this interest is related to the word prediction problem, which states that given a set words in a sentence, one can recommend the next word. In literature, this problem is solved by methods based on syntactic or semantic analysis. Solely, each of these analysis cannot achieve practical results for end-user applications. For instance, the Latent Semantic Analysis can handle semantic features of text, but cannot suggest words considering syntactical rules. On the other hand, there are models that treat both methods together and achieve state-of-the-art results, e.g. Deep Learning. These models can demand high computational effort, which can make the model infeasible for certain types of applications. With the advance of the technology and mathematical models, it is possible to develop faster systems with more accuracy. This work proposes a hybrid word suggestion model, based on Naive Bayes and Latent Semantic Analysis, considering neighbouring words around unfilled gaps. Results show that this model could achieve 44.2% of accuracy in the MSR Sentence Completion Challenge. △ Less

Submitted 2 March, 2018; originally announced March 2018.

arXiv:1708.01286 [pdf]

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies

Authors: Rafael S. Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, John Graybeal, Mark A. Musen

Abstract: The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample--a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata… ▽ More The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample--a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata records are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the analyzed metadata. The BioSample metadata field names and their values are not standardized or controlled--15% of the metadata fields use field names not specified in the BioSample data dictionary. Only 9 out of 452 BioSample-specified fields ordinarily require ontology terms as values, and the quality of these controlled fields is better than that of uncontrolled ones, as even simple binary or numeric fields are often populated with inadequate values of different data types (e.g., only 27% of Boolean values are valid). Overall, the metadata in BioSample reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The aberrancies in the metadata are likely to impede search and secondary use of the associated datasets. △ Less

Submitted 3 August, 2017; originally announced August 2017.

arXiv:1608.02619 [pdf, other]

doi 10.1093/comnet/cnx015

Performance of attack strategies on modular networks

Authors: Bruno Requião da Cunha, Sebastián Gonçalves

Abstract: Vulnerabilities of complex networks have became a trend topic in complex systems recently due to its real world applications. Most real networks tend to be very fragile to high betweenness adaptive attacks. However, recent contributions have shown the importance of interconnected nodes in the integrity of networks and module-based attacks have appeared promising when compared to traditional malici… ▽ More Vulnerabilities of complex networks have became a trend topic in complex systems recently due to its real world applications. Most real networks tend to be very fragile to high betweenness adaptive attacks. However, recent contributions have shown the importance of interconnected nodes in the integrity of networks and module-based attacks have appeared promising when compared to traditional malicious non-adaptive attacks. In the present work we deeply explore the trade-off associated with attack procedures, introducing a generalized robustness measure and presenting an attack performance index that takes into account both robustness of the network against the attack and the run-time needed to obtained the list of targeted nodes for the attack. Besides, we introduce the concept of deactivation point aimed to mark the point at which the network stops to function properly. We then show empirically that non-adaptive module-based attacks perform better than high degree and betweenness adaptive attacks in networks with well defined community structures and consequent high modularity. △ Less

Submitted 8 August, 2016; originally announced August 2016.

Comments: 14 pages, 4 figures, pre-print

arXiv:1504.06177 [pdf, ps, other]

State of the Art of the Intra-Task Dynamic Voltage and Frequency Scaling Technique

Authors: Rawlinson S. Gonçalves, Raimundo da Silva Barreto

Abstract: In recent years there has been an increasing use of embedded systems because of advances in technology, the reduction of the costs of electronic equipment and mainly the popularity of mobile devices. Many of these systems implement low power consumption policies to extend their autonomy, usually because they have a reduced amount of resources and the great majority of them use electric power from… ▽ More In recent years there has been an increasing use of embedded systems because of advances in technology, the reduction of the costs of electronic equipment and mainly the popularity of mobile devices. Many of these systems implement low power consumption policies to extend their autonomy, usually because they have a reduced amount of resources and the great majority of them use electric power from batteries. One way to minimize the power consumption of these devices is through of the application of low power consumption techniques. From the various techniques presented in the literature - the intra-task Dynamic Voltage and Frequency Scaling (DVFS) has played an important role. The main aim of DVFS is to allow each task to manage the minimum resources necessary for tasks execution, this way reducing the processor power consumption and, at the same time, respecting the task deadlines when considered a real-time system context. Therefore, this paper aims to apply a systematic literature review with the goal of identifying and presenting the main methods using the intra-task DVFS technique, applied in the context of real-time systems to reduce energy consumption on the processor. Finally, this work will show the advantages and disadvantages of each cataloged methodology. △ Less

Submitted 23 April, 2015; originally announced April 2015.

Comments: 94 pages, in Portuguese

arXiv:1502.00353 [pdf, other]

doi 10.1371/journal.pone.0142824

Complex networks vulnerability to module-based attacks

Authors: Bruno Requião da Cunha, Juan Carlos González-Avella, Sebastián Gonçalves

Abstract: In the multidisciplinary field of Network Science, optimization of procedures for efficiently breaking complex networks is attracting much attention from practical points of view. In this contribution we present a module-based method to efficiently break complex networks. The procedure first identifies the communities in which the network can be represented, then it deletes the nodes (edges) that… ▽ More In the multidisciplinary field of Network Science, optimization of procedures for efficiently breaking complex networks is attracting much attention from practical points of view. In this contribution we present a module-based method to efficiently break complex networks. The procedure first identifies the communities in which the network can be represented, then it deletes the nodes (edges) that connect different modules by its order in the betweenness centrality ranking list. We illustrate the method by applying it to various well known examples of social, infrastructure, and biological networks. We show that the proposed method always outperforms vertex (edge) attacks which are based on the ranking of node (edge) degree or centrality, with a huge gain in efficiency for some examples. Remarkably, for the US power grid, the present method breaks the original network of 4941 nodes to many fragments smaller than 197 nodes (4% of the original size) by removing mere 164 nodes (~3%) identified by the procedure. By comparison, any degree or centrality based procedure, deleting the same amount of nodes, removes only 22% of the original network, i.e. more than 3800 nodes continue to be connected after that △ Less

Submitted 1 February, 2015; originally announced February 2015.

Comments: 8 pages, 8 figures

arXiv:1208.2609 [pdf, ps, other]

doi 10.1371/journal.pone.0049009

Epidemics scenarios in the "Romantic network"

Authors: Alexsandro M. Carvalho, Sebastian Goncalves

Abstract: The structure of sexual contacts, its contacts network and its temporal interactions, play an important role in the spread of sexually transmitted infections. Unfortunately, that kind of data is very hard to obtain. One of the few exceptions is the "Romantic network" which is a complete structure of a real sexual network of a high school. In terms of topology, unlike other sexual networks classifi… ▽ More The structure of sexual contacts, its contacts network and its temporal interactions, play an important role in the spread of sexually transmitted infections. Unfortunately, that kind of data is very hard to obtain. One of the few exceptions is the "Romantic network" which is a complete structure of a real sexual network of a high school. In terms of topology, unlike other sexual networks classified as scale-free network. Regarding the temporal structure, several studies indicate that relationship timing can have effects on diffusion through networks, as relationship order determines transmission routes.With the aim to check if the particular structure, static and dynamic, of the Romantic network is determinant for the propagation of an STI in it, we perform simulations in two scenarios: the static network where all contacts are available and the dynamic case where contacts evolve in time. In the static case, we compare the epidemic results in the Romantic network with some paradigmatic topologies. We further study the behavior of the epidemic on the Romantic network in response to the effect of any individual, belonging to the network, having a contact with an external infected subject, the influence of the degree of the initial infected, and the effect of the variability of contacts per unit time. We also consider the dynamics of formation of pairs in and we study the propagation of the diseases in this dynamic scenario. Our results suggest that while the Romantic network can not be labeled as a Watts-Strogatz network, it is, regarding the propagation of an STI, very close to one with high disorder. Our simulations confirm that relationship timing affects, but strongly lowering, the final outbreak size. Besides, shows a clear correlation between the average degree and the outbreak size over time. △ Less

Submitted 9 August, 2012; originally announced August 2012.

Comments: 9 pages text, plus references, and 10 figures (with subfigures) Epidemic simulations on a small real network

MSC Class: 81T80; 91D30; 92D25

arXiv:1201.1572 [pdf, ps, other]

doi 10.1103/PhysRevE.85.056103

A dynamical model for competing opinions

Authors: S. R. Souza, S. Goncalves

Abstract: We propose an opinion model based on agents located at the vertices of a regular lattice. Each agent has an independent opinion (among an arbitrary, but fixed, number of choices) and its own degree of conviction. The latter changes every time it interacts with another agent who has a different opinion. The dynamics leads to size distributions of clusters (made up of agents which have the same opin… ▽ More We propose an opinion model based on agents located at the vertices of a regular lattice. Each agent has an independent opinion (among an arbitrary, but fixed, number of choices) and its own degree of conviction. The latter changes every time it interacts with another agent who has a different opinion. The dynamics leads to size distributions of clusters (made up of agents which have the same opinion and are located at contiguous spatial positions) which follow a power law, as long as the range of the interaction between the agents is not too short, i.e. the system self-organizes into a critical state. Short range interactions lead to an exponential cut off in the size distribution and to spatial correlations which cause agents which have the same opinion to be closely grouped. When the diversity of opinions is restricted to two, non-consensus dynamic is observed, with unequal population fractions, whereas consensus is reached if the agents are also allowed to interact with those which are located far from them. △ Less

Submitted 7 January, 2012; originally announced January 2012.

Journal ref: Physical Review E 85, 056103 (2012)

Showing 1–22 of 22 results for author: Goncalves, S