Search | arXiv e-print repository

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

Authors: Aehong Min, Xuan Wang, Rion Brattig Correia, Jordan Rozum, Wendy R. Miller, Luis M. Rocha

Abstract: We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. Open… ▽ More We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07072 [pdf, other]

Selecting focused digital cohorts from social media using the metric backbone of biomedical knowledge graphs

Authors: Ziqi Guo, Jack Felag, Jordan C. Rozum, Rion Brattig Correia, Luis M. Rocha

Abstract: The abundance of social media data allows researchers to construct large digital cohorts to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics,… ▽ More The abundance of social media data allows researchers to construct large digital cohorts to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse. To hone in on relevant users anywhere, we have developed a general framework and applied it to epilepsy discourse in social media as a test case. We analyzed the text from posts by users who mention epilepsy drugs in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We curated a medical terms dictionary and used it to generate a knowledge graph (KG) for each online community. For each KG, we computed the metric backbone--the smallest subgraph that preserves all shortest paths in the network. By comparing the subset of users who contribute to the backbone to the subset who do not, we found that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrated that users who do not contribute to the backbone are more than twice as likely to use dictionary terms in a manner inconsistent with their biomedical meaning. For biomedical research applications, our backbone-based approach thus has several benefits over simple engagement-based approaches: It can retain low-engagement users who nonetheless contribute meaningful biomedical insights. It can filter out very vocal users who contribute no relevant content. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.05229 [pdf, other]

myAURA: Personalized health library for epilepsy management via knowledge graph sparsification and visualization

Authors: Rion Brattig Correia, Jordan C. Rozum, Leonard Cross, Jack Felag, Michael Gallant, Ziqi Guo, Bruce W. Herr II, Aehong Min, Deborah Stungis Rocha, Xuan Wang, Katy Börner, Wendy Miller, Luis M. Rocha

Abstract: Objective: We report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and researchers in making decisions about care and self-management. Materials and Methods: myAURA rests on the federation of an unprecedented collection of heterogeneous data resources relevant to epilepsy, such as biomedical databases, social media,… ▽ More Objective: We report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and researchers in making decisions about care and self-management. Materials and Methods: myAURA rests on the federation of an unprecedented collection of heterogeneous data resources relevant to epilepsy, such as biomedical databases, social media, and electronic health records. A generalizable, open-source methodology was developed to compute a multi-layer knowledge graph linking all this heterogeneous data via the terms of a human-centered biomedical dictionary. Results: The power of the approach is first exemplified in the study of the drug-drug interaction phenomenon. Furthermore, we employ a novel network sparsification methodology using the metric backbone of weighted graphs, which reveals the most important edges for inference, recommendation, and visualization, such as pharmacology factors patients discuss on social media. The network sparsification approach also allows us to extract focused digital cohorts from social media whose discourse is more relevant to epilepsy or other biomedical problems. Finally, we present our patient-centered design and pilot-testing of myAURA, including its user interface, based on focus groups and other stakeholder input. Discussion: The ability to search and explore myAURA's heterogeneous data sources via a sparsified multi-layer knowledge graph, as well as the combination of those layers in a single map, are useful features for integrating relevant information for epilepsy. Conclusion: Our stakeholder-driven, scalable approach to integrate traditional and non-traditional data sources, enables biomedical discovery and data-powered patient self-management in epilepsy, and is generalizable to other chronic conditions. △ Less

Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2403.12705 [pdf, other]

The ultrametric backbone is the union of all minimum spanning forests

Authors: Jordan C Rozum, Luis M Rocha

Abstract: Minimum spanning trees and forests are powerful sparsification techniques that remove cycles from weighted graphs to minimize total edge weight while preserving node connectivity. They have applications in computer science, network science, and graph theory. Despite their utility and ubiquity, they have several limitations, including that they are only defined for undirected networks, they signifi… ▽ More Minimum spanning trees and forests are powerful sparsification techniques that remove cycles from weighted graphs to minimize total edge weight while preserving node connectivity. They have applications in computer science, network science, and graph theory. Despite their utility and ubiquity, they have several limitations, including that they are only defined for undirected networks, they significantly alter dynamics on networks, and they do not generally preserve important network features such as shortest distances, shortest path distribution, and community structure. In contrast, distance backbones, which are subgraphs formed by all edges that obey a generalized triangle inequality, are well defined in both directed and undirected graphs and preserve those and other important network features. The backbone of a graph is defined with respect to a specified path-length operator that aggregates weights along a path to define its length, thereby associating a cost to indirect connections. The backbone is the union of all shortest paths between each pair of nodes according to the specified operator. One such operator, the max function, computes the length of a path as the largest weight of the edges that compose it (a weakest link criterion). It is the only operator that yields an algebraic structure for computing shortest paths that is consistent with De Morgan's laws. Applying this operator yields the ultrametric backbone of a graph in that (semi-triangular) edges whose weights are larger than the length of an indirect path connecting the same nodes (i.e., those that break the generalized triangle inequality based on max as a path-length operator) are removed. We show that the ultrametric backbone is the union of all minimum spanning forests in undirected graphs and provides a new generalization of minimum spanning trees to directed graphs. △ Less

Submitted 22 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 10 pages, 1 figure. Revision corrects typo in abstract

arXiv:2402.06297 [pdf, other]

Dynamic Q-planning for Online UAV Path Planning in Unknown and Complex Environments

Authors: Lidia Gianne Souza da Rocha, Kenny Anderson Queiroz Caldas, Marco Henrique Terra, Fabio Ramos, Kelen Cristiane Teixeira Vivaldini

Abstract: Unmanned Aerial Vehicles need an online path planning capability to move in high-risk missions in unknown and complex environments to complete them safely. However, many algorithms reported in the literature may not return reliable trajectories to solve online problems in these scenarios. The Q-Learning algorithm, a Reinforcement Learning Technique, can generate trajectories in real-time and has d… ▽ More Unmanned Aerial Vehicles need an online path planning capability to move in high-risk missions in unknown and complex environments to complete them safely. However, many algorithms reported in the literature may not return reliable trajectories to solve online problems in these scenarios. The Q-Learning algorithm, a Reinforcement Learning Technique, can generate trajectories in real-time and has demonstrated fast and reliable results. This technique, however, has the disadvantage of defining the iteration number. If this value is not well defined, it will take a long time or not return an optimal trajectory. Therefore, we propose a method to dynamically choose the number of iterations to obtain the best performance of Q-Learning. The proposed method is compared to the Q-Learning algorithm with a fixed number of iterations, A*, Rapid-Exploring Random Tree, and Particle Swarm Optimization. As a result, the proposed Q-learning algorithm demonstrates the efficacy and reliability of online path planning with a dynamic number of iterations to carry out online missions in unknown and complex environments. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2311.14817 [pdf, other]

Semi-metric topology characterizes epidemic spreading on complex networks

Authors: David Soriano Paños, Felipe Xavier Costa, Luis M. Rocha

Abstract: Network sparsification represents an essential tool to extract the core of interactions sustaining both networks dynamics and their connectedness. In the case of infectious diseases, network sparsification methods remove irrelevant connections to unveil the primary subgraph driving the unfolding of epidemic outbreaks in real networks. In this paper, we explore the features determining whether the… ▽ More Network sparsification represents an essential tool to extract the core of interactions sustaining both networks dynamics and their connectedness. In the case of infectious diseases, network sparsification methods remove irrelevant connections to unveil the primary subgraph driving the unfolding of epidemic outbreaks in real networks. In this paper, we explore the features determining whether the metric backbone, a subgraph capturing the structure of shortest paths across a network, allows reconstructing epidemic outbreaks. We find that both the relative size of the metric backbone, capturing the fraction of edges kept in such structure, and the distortion of semi-metric edges, quantifying how far those edges not included in the metric backbone are from their associated shortest path, shape the retrieval of Susceptible-Infected (SI) dynamics. We propose a new method to progressively dismantle networks relying on the semi-metric edge distortion, removing first those connections farther from those included in the metric backbone, i.e. those with highest semi-metric distortion values. We apply our method in both synthetic and real networks, finding that semi-metric distortion provides solid ground to preserve spreading dynamics and connectedness while sparsifying networks. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 11 pages, 4 figures. Supplementary Text: 6 pages, 1 table, 5 figures

arXiv:2310.14379 [pdf, ps, other]

Offline Metrics for Evaluating Explanation Goals in Recommender Systems

Authors: André Levi Zanon, Marcelo Garcia Manzato, Leonardo Rocha

Abstract: Explanations are crucial for improving users' transparency, persuasiveness, engagement, and trust in Recommender Systems (RSs). However, evaluating the effectiveness of explanation algorithms regarding those goals remains challenging due to existing offline metrics' limitations. This paper introduces new metrics for the evaluation and validation of explanation algorithms based on the items and pro… ▽ More Explanations are crucial for improving users' transparency, persuasiveness, engagement, and trust in Recommender Systems (RSs). However, evaluating the effectiveness of explanation algorithms regarding those goals remains challenging due to existing offline metrics' limitations. This paper introduces new metrics for the evaluation and validation of explanation algorithms based on the items and properties used to form the sentence of an explanation. Towards validating the metrics, the results of three state-of-the-art post-hoc explanation algorithms were evaluated for six RSs, comparing the offline metrics results with those of an online user study. The findings show the proposed offline metrics can effectively measure the performance of explanation algorithms and highlight a trade-off between the goals of transparency and trust, which are related to popular properties, and the goals of engagement and persuasiveness, which are associated with the diversification of properties displayed to users. Furthermore, the study contributes to the development of more robust evaluation methods for explanation algorithms in RSs. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.03491 [pdf, other]

TPDR: A Novel Two-Step Transformer-based Product and Class Description Match and Retrieval Method

Authors: Washington Cunha, Celso França, Leonardo Rocha, Marcos André Gonçalves

Abstract: There is a niche of companies responsible for intermediating the purchase of large batches of varied products for other companies, for which the main challenge is to perform product description standardization, i.e., matching an item described by a client with a product described in a catalog. The problem is complex since the client's product description may be: (1) potentially noisy; (2) short an… ▽ More There is a niche of companies responsible for intermediating the purchase of large batches of varied products for other companies, for which the main challenge is to perform product description standardization, i.e., matching an item described by a client with a product described in a catalog. The problem is complex since the client's product description may be: (1) potentially noisy; (2) short and uninformative (e.g., missing information about model and size); and (3) cross-language. In this paper, we formalize this problem as a ranking task: given an initial client product specification (query), return the most appropriate standardized descriptions (response). In this paper, we propose TPDR, a two-step Transformer-based Product and Class Description Retrieval method that is able to explore the semantic correspondence between IS and SD, by exploiting attention mechanisms and contrastive learning. First, TPDR employs the transformers as two encoders sharing the embedding vector space: one for encoding the IS and another for the SD, in which corresponding pairs (IS, SD) must be close in the vector space. Closeness is further enforced by a contrastive learning mechanism leveraging a specialized loss function. TPDR also exploits a (second) re-ranking step based on syntactic features that are very important for the exact matching (model, dimension) of certain products that may have been neglected by the transformers. To evaluate our proposal, we consider 11 datasets from a real company, covering different application contexts. Our solution was able to retrieve the correct standardized product before the 5th ranking position in 71% of the cases and its correct category in the first position in 80% of the situations. Moreover, the effectiveness gains over purely syntactic or semantic baselines reach up to 3.7 times, solving cases that none of the approaches in isolation can do by themselves. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 10 pages, 8 figures, 5 tables

arXiv:2307.15614 [pdf, other]

Fast but multi-partisan: Bursts of communication increase opinion diversity in the temporal Deffuant model

Authors: Fatemeh Zarei, Yerali Gandica, Luis Enrique Correa Rocha

Abstract: Human interactions create social networks forming the backbone of societies. Individuals adjust their opinions by exchanging information through social interactions. Two recurrent questions are whether social structures promote opinion polarisation or consensus in societies and whether polarisation can be avoided, particularly on social media. In this paper, we hypothesise that not only network st… ▽ More Human interactions create social networks forming the backbone of societies. Individuals adjust their opinions by exchanging information through social interactions. Two recurrent questions are whether social structures promote opinion polarisation or consensus in societies and whether polarisation can be avoided, particularly on social media. In this paper, we hypothesise that not only network structure but also the timings of social interactions regulate the emergence of opinion clusters. We devise a temporal version of the Deffuant opinion model where pairwise interactions follow temporal patterns and show that burstiness alone is sufficient to refrain from consensus and polarisation by promoting the reinforcement of local opinions. Individuals self-organise into a multi-partisan society due to network clustering, but the diversity of opinion clusters further increases with burstiness, particularly when individuals have low tolerance and prefer to adjust to similar peers. The emergent opinion landscape is well-balanced regarding clusters' size, with a small fraction of individuals converging to extreme opinions. We thus argue that polarisation is more likely to emerge in social media than offline social networks because of the relatively low social clustering observed online. Counter-intuitively, strengthening online social networks by increasing social redundancy may be a venue to reduce polarisation and promote opinion diversity. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 9 pages, 6 figures. Comments (e.g. missing references, suggestions, ...) are welcomed

arXiv:2303.16361 [pdf, other]

Dynamical Modularity in Automata Models of Biochemical Networks

Authors: Thomas Parmer, Luis M. Rocha

Abstract: Given the large size and complexity of most biochemical regulation and signaling networks, there is a non-trivial relationship between the micro-level logic of component interactions and the observed macro-dynamics. Here we address this issue by formalizing the existing concept of pathway modules, which are sequences of state updates that are guaranteed to occur (barring outside interference) in t… ▽ More Given the large size and complexity of most biochemical regulation and signaling networks, there is a non-trivial relationship between the micro-level logic of component interactions and the observed macro-dynamics. Here we address this issue by formalizing the existing concept of pathway modules, which are sequences of state updates that are guaranteed to occur (barring outside interference) in the dynamics of automata networks after the perturbation of a subset of driver nodes. We present a novel algorithm to automatically extract pathway modules from networks and we characterize the interactions that may take place between modules. This methodology uses only the causal logic of individual node variables (micro-dynamics) without the need to compute the dynamical landscape of the networks (macro-dynamics). Specifically, we identify complex modules, which maximize pathway length and require synergy between their components. This allows us to propose a new take on dynamical modularity that partitions complex networks into causal pathways of variables that are guaranteed to transition to specific states given a perturbation to a set of driver nodes. Thus, the same node variable can take part in distinct modules depending on the state it takes. Our measure of dynamical modularity of a network is then inversely proportional to the overlap among complex modules and maximal when complex modules are completely decouplable from one another in the network dynamics. We estimate dynamical modularity for several genetic regulatory networks, including the Drosophila melanogaster segment-polarity network. We discuss how identifying complex modules and the dynamical modularity portrait of networks explains the macro-dynamics of biological networks, such as uncovering the (more or less) decouplable building blocks of emergent computation (or collective behavior) in biochemical regulation and signaling. △ Less

Submitted 17 April, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: 42 pages, 7 figures; updated author information

arXiv:2303.16098 [pdf, other]

Carolina: a General Corpus of Contemporary Brazilian Portuguese with Provenance, Typology and Versioning Information

Authors: Maria Clara Ramos Morales Crespo, Maria Lina de Souza Jeannine Rocha, Mariana Lourenço Sturzeneker, Felipe Ribas Serras, Guilherme Lamartine de Mello, Aline Silva Costa, Mayara Feliciano Palma, Renata Morais Mesquita, Raquel de Paula Guets, Mariana Marques da Silva, Marcelo Finger, Maria Clara Paixão de Sousa, Cristiane Namiuti, Vanessa Martins do Monte

Abstract: This paper presents the first publicly available version of the Carolina Corpus and discusses its future directions. Carolina is a large open corpus of Brazilian Portuguese texts under construction using web-as-corpus methodology enhanced with provenance, typology, versioning, and text integrality. The corpus aims at being used both as a reliable source for research in Linguistics and as an import… ▽ More This paper presents the first publicly available version of the Carolina Corpus and discusses its future directions. Carolina is a large open corpus of Brazilian Portuguese texts under construction using web-as-corpus methodology enhanced with provenance, typology, versioning, and text integrality. The corpus aims at being used both as a reliable source for research in Linguistics and as an important resource for Computer Science research on language models, contributing towards removing Portuguese from the set of low-resource languages. Here we present the construction of the corpus methodology, comparing it with other existing methodologies, as well as the corpus current state: Carolina's first public version has $653,322,577$ tokens, distributed over $7$ broad types. Each text is annotated with several different metadata categories in its header, which we developed using TEI annotation standards. We also present ongoing derivative works and invite NLP researchers to contribute with their own. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: 14 pages, 3 figures, 1 appendix

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2209.01181 [pdf, other]

doi 10.1007/978-3-031-21131-7_11

The distance backbone of directed networks

Authors: Felipe Xavier Costa, Rion Brattig Correia, Luis M. Rocha

Abstract: In weighted graphs the shortest path between two nodes is often reached through an indirect path, out of all possible connections, leading to structural redundancies which play key roles in the dynamics and evolution of complex networks. We have previously developed a parameter-free, algebraically-principled methodology to uncover such redundancy and reveal the distance backbone of weighted graphs… ▽ More In weighted graphs the shortest path between two nodes is often reached through an indirect path, out of all possible connections, leading to structural redundancies which play key roles in the dynamics and evolution of complex networks. We have previously developed a parameter-free, algebraically-principled methodology to uncover such redundancy and reveal the distance backbone of weighted graphs, which has been shown to be important in transmission dynamics, inference of important paths, and quantifying the robustness of networks. However, the method was developed for undirected graphs. Here we expand this methodology to weighted directed graphs and study the redundancy and robustness found in nine networks ranging from social, biomedical, and technical systems. We found that similarly to undirected graphs, directed graphs in general also contain a large amount of redundancy, as measured by the size of their (directed) distance backbone. Our methodology adds an additional tool to the principled sparsification of complex networks and the measure of their robustness. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: Accepted at the 11th International Conference on Complex Networks and their Applications

arXiv:2208.04358 [pdf, other]

doi 10.1109/TVCG.2022.3209477

LargeNetVis: Visual Exploration of Large Temporal Networks Based on Community Taxonomies

Authors: Claudio D. G. Linhares, Jean R. Ponciano, Diogenes S. Pedro, Luis E. C. Rocha, Agma J. M. Traina, Jorge Poco

Abstract: Temporal (or time-evolving) networks are commonly used to model complex systems and the evolution of their components throughout time. Although these networks can be analyzed by different means, visual analytics stands out as an effective way for a pre-analysis before doing quantitative/statistical analyses to identify patterns, anomalies, and other behaviors in the data, thus leading to new insig… ▽ More Temporal (or time-evolving) networks are commonly used to model complex systems and the evolution of their components throughout time. Although these networks can be analyzed by different means, visual analytics stands out as an effective way for a pre-analysis before doing quantitative/statistical analyses to identify patterns, anomalies, and other behaviors in the data, thus leading to new insights and better decision-making. However, the large number of nodes, edges, and/or timestamps in many real-world networks may lead to polluted layouts that make the analysis inefficient or even infeasible. In this paper, we propose LargeNetVis, a web-based visual analytics system designed to assist in analyzing small and large temporal networks. It successfully achieves this goal by leveraging three taxonomies focused on network communities to guide the visual exploration process. The system is composed of four interactive visual components: the first (Taxonomy Matrix) presents a summary of the network characteristics, the second (Global View) gives an overview of the network evolution, the third (a node-link diagram) enables community- and node-level structural analysis, and the fourth (a Temporal Activity Map -- TAM) shows the community- and node-level activity under a temporal perspective. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 11 pages, 9 figures

Journal ref: IEEE Transactions on Visualization and Computer Graphics, 2022

arXiv:2207.10924 [pdf, other]

Evolution of the public opinion on COVID-19 vaccination in Japan

Authors: Yuri Nakayama, Yuka Takedomi, Towa Suda, Takeaki Uno, Takako Hashimoto, Masashi Toyoda, Naoki Yoshinaga, Masaru Kitsuregawa, Luis E. C. Rocha, Ryota Kobayashi

Abstract: Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns to vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. We collected more than… ▽ More Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns to vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. We collected more than 100 million vaccine-related tweets posted by 8 million users and used the Latent Dirichlet Allocation model to perform automated topic modeling of tweet texts during the vaccination campaign in Japan. We identified 15 topics grouped into 4 themes on Personal issue, Breaking news, Politics, and Conspiracy and humour. The evolution of the popularity of themes revealed a shift in public opinion, initially sharing the attention over personal issues (individual aspect), collecting information from the news (knowledge acquisition), and government criticisms, towards personal experiences once confidence in the vaccination campaign was established. An interrupted time series regression analysis showed that the Tokyo Olympic Games affected public opinion more than other critical events but not the course of the vaccination. Public opinion on politics was significantly affected by various events, positively shifting the attention in the early stages of the vaccination campaign and negatively later. Tweets about personal issues were mostly retweeted when the vaccination reached the younger population. The associations between the vaccination campaign stages and tweet themes suggest that the public engagement in the social platform contributed to speedup vaccine uptake by reducing anxiety via social learning and support. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.10794 [pdf, other]

doi 10.1186/s12859-023-05394-x

Neuroimaging Feature Extraction using a Neural Network Classifier for Imaging Genetics

Authors: Cédric Beaulac, Sidi Wu, Erin Gibson, Michelle F. Miranda, Jiguo Cao, Leno Rocha, Mirza Faisal Beg, Farouk S. Nathoo

Abstract: A major issue in the association of genes to neuroimaging phenotypes is the high dimension of both genetic data and neuroimaging data. In this article, we tackle the latter problem with an eye toward develo** solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neu… ▽ More A major issue in the association of genes to neuroimaging phenotypes is the high dimension of both genetic data and neuroimaging data. In this article, we tackle the latter problem with an eye toward develo** solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer's Disease (AD) for subsequent relation to genetics. Our neuroimaging-genetic pipeline is comprised of image processing, neuroimaging feature extraction and genetic association steps. We propose a neural network classifier for extracting neuroimaging features that are related with disease and a multivariate Bayesian group sparse regression model for genetic association. We compare the predictive power of these features to expert selected features and take a closer look at the SNPs identified with the new neuroimaging features. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: Under review

Journal ref: BMC Bioinformatics 24, 271 (2023)

arXiv:2201.07552 [pdf, other]

Small Cohort of Epilepsy Patients Showed Increased Activity on Facebook before Sudden Unexpected Death

Authors: Ian B. Wood, Rion Brattig Correia, Wendy R. Miller, Luis M. Rocha

Abstract: Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis. Specifically… ▽ More Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis. Specifically, we analyze Facebook timelines of six epilepsy patients deceased due to SUDEP, donated by surviving family members. We find preliminary evidence for behavioral changes detectable by text and sentiment analysis tools. Namely, in the months preceding their SUDEP event patient social media timelines show: i) increase in verbosity; ii) increased use of functional words; and iii) sentiment shifts as measured by different sentiment analysis tools. Combined, these results suggest that social media engagement, as well as its sentiment, may serve as possible early-warning signals for SUDEP in people with epilepsy. While the small sample of patient timelines analyzed in this study prevents generalization, our preliminary investigation demonstrates the potential of social media data as complementary data in larger studies of SUDEP and epilepsy. △ Less

Submitted 19 January, 2022; originally announced January 2022.

Comments: Submitted to Epilepsy & Behavior

MSC Class: 62P10 (Primary) 92D50; 68U15; 92D30 (Secondary) ACM Class: J.3; I.5.4

arXiv:2107.13902 [pdf, other]

Developers perception on the severity of test smells: an empirical study

Authors: Denivan Campos, Larissa Rocha, Ivan Machado

Abstract: Unit testing is an essential component of the software development life-cycle. A developer could easily and quickly catch and fix software faults introduced in the source code by creating and running unit tests. Despite their importance, unit tests are subject to bad design or implementation decisions, the so-called test smells. These might decrease software systems quality from various aspects, m… ▽ More Unit testing is an essential component of the software development life-cycle. A developer could easily and quickly catch and fix software faults introduced in the source code by creating and running unit tests. Despite their importance, unit tests are subject to bad design or implementation decisions, the so-called test smells. These might decrease software systems quality from various aspects, making it harder to understand, more complex to maintain, and more prone to errors and bugs. Many studies discuss the likely effects of test smells on test code. However, there is a lack of studies that capture developers perceptions of such issues. This study empirically analyzes how developers perceive the severity of test smells in the test code they develop. Severity refers to the degree to how a test smell may negatively impact the test code. We selected six open-source software projects from GitHub and interviewed their developers to understand whether and how the test smells affected the test code. Although most of the interviewed developers considered the test smells as having a low severity to their code, they indicated that test smells might negatively impact the project, particularly in test code maintainability and evolution. Also, detecting and removing test smells from the test code may be positive for the project. △ Less

Submitted 29 July, 2021; originally announced July 2021.

Comments: 14 pages

arXiv:2106.08878 [pdf, other]

doi 10.1007/s40313-021-00828-4

Autonomous Navigation System for a Delivery Drone

Authors: Victor R. F. Miranda, Adriano M. C. Rezende, Thiago L. Rocha, Héctor Azpúrua, Luciano C. A. Pimenta, Gustavo M. Freitas

Abstract: The use of delivery services is an increasing trend worldwide, further enhanced by the COVID pandemic. In this context, drone delivery systems are of great interest as they may allow for faster and cheaper deliveries. This paper presents a navigation system that makes feasible the delivery of parcels with autonomous drones. The system generates a path between a start and a final point and controls… ▽ More The use of delivery services is an increasing trend worldwide, further enhanced by the COVID pandemic. In this context, drone delivery systems are of great interest as they may allow for faster and cheaper deliveries. This paper presents a navigation system that makes feasible the delivery of parcels with autonomous drones. The system generates a path between a start and a final point and controls the drone to follow this path based on its localization obtained through GPS, 9DoF IMU, and barometer. In the landing phase, information of poses estimated by a marker (ArUco) detection technique using a camera, ultra-wideband (UWB) devices, and the drone's software estimation are merged by utilizing an Extended Kalman Filter algorithm to improve the landing precision. A vector field-based method controls the drone to follow the desired path smoothly, reducing vibrations or harsh movements that could harm the transported parcel. Real experiments validate the delivery strategy and allow to evaluate the performance of the adopted techniques. Preliminary results state the viability of our proposal for autonomous drone delivery. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 12 pages, 15 figures, extended version of an paper published at the XXIII Brazilian Congress of Automatica, entitled "Desenvolvimento de um drone autônomo para tarefas de entrega de carga"

arXiv:2106.06422 [pdf, other]

From Blackboard to the Office: A Look Into How Practitioners Perceive Software Testing Education

Authors: Luana Martins, Vinicius Brito, Daniela Feitosa, Larissa Rocha, Heitor Costa, Ivan Machado

Abstract: The teaching-learning process may require specific pedagogical approaches to establish a relationship with industry practices. Recently, some studies investigated the educators' perspectives and the undergraduate courses curriculum to identify potential weaknesses and solutions for the software testing teaching process. However, it is still unclear how the practitioners evaluate the acquisition of… ▽ More The teaching-learning process may require specific pedagogical approaches to establish a relationship with industry practices. Recently, some studies investigated the educators' perspectives and the undergraduate courses curriculum to identify potential weaknesses and solutions for the software testing teaching process. However, it is still unclear how the practitioners evaluate the acquisition of knowledge about software testing in undergraduate courses. This study carried out an expert survey with 68 newly graduated practitioners to determine what the industry expects from them and what they learned in academia. The yielded results indicated that those practitioners learned at a similar rate as others with a long industry experience. Also, they studied less than half of the 35 software testing topics collected in the survey and took industry-backed extracurricular courses to complement their learning. Additionally, our findings point out a set of implications for future research, as the respondents' learning difficulties (e.g., lack of learning sources) and the gap between academic education and industry expectations (e.g., certifications). △ Less

Submitted 11 June, 2021; originally announced June 2021.

Comments: Preprint of the manuscript accepted for publication at EASE 2021

arXiv:2105.00500 [pdf, other]

Assessing Exception Handling Testing Practices in Open-Source Libraries

Authors: Luan P. Lima, Lincoln S. Rocha, Carla I. M. Bezerra, Matheus Paixao

Abstract: Modern programming languages (e.g., Java and C#) provide features to separate error-handling code from regular code, seeking to enhance software comprehensibility and maintainability. Nevertheless, the way exception handling (EH) code is structured in such languages may lead to multiple, different, and complex control flows, which may affect the software testability. Previous studies have reported… ▽ More Modern programming languages (e.g., Java and C#) provide features to separate error-handling code from regular code, seeking to enhance software comprehensibility and maintainability. Nevertheless, the way exception handling (EH) code is structured in such languages may lead to multiple, different, and complex control flows, which may affect the software testability. Previous studies have reported that EH code is typically neglected, not well tested, and its misuse can lead to reliability degradation and catastrophic failures. However, little is known about the relationship between testing practices and EH testing effectiveness. In this exploratory study, we (i) measured the adequacy degree of EH testing concerning code coverage (instruction, branch, and method) criteria; and (ii) evaluated the effectiveness of the EH testing by measuring its capability to detect artificially injected faults (i.e., mutants) using 7 EH mutation operators. Our study was performed using test suites of 27 long-lived Java libraries from open-source ecosystems. Our results show that instructions and branches within $\mathtt{catch}$ blocks and $\mathtt{throw}$ instructions are less covered, with statistical significance than the overall instructions and branches. Nevertheless, most of the studied libraries presented test suites capable of detecting more than 70% of the injected faults. From a total of 12,331 mutants created in this study, the test suites were able to detect 68% of them. △ Less

Submitted 2 May, 2021; originally announced May 2021.

Comments: Submitted to Empirical Software Engineering Journal

arXiv:2103.04668 [pdf, other]

doi 10.1093/comnet/cnab021

The distance backbone of complex networks

Authors: Tiago Simas, Rion Brattig Correia, Luis M. Rocha

Abstract: Redundancy needs more precise characterization as it is a major factor in the evolution and robustness of networks of multivariate interactions. We investigate the complexity of such interactions by inferring a connection transitivity that includes all possible measures of path length for weighted graphs. The result, without breaking the graph into smaller components, is a distance backbone subgra… ▽ More Redundancy needs more precise characterization as it is a major factor in the evolution and robustness of networks of multivariate interactions. We investigate the complexity of such interactions by inferring a connection transitivity that includes all possible measures of path length for weighted graphs. The result, without breaking the graph into smaller components, is a distance backbone subgraph sufficient to compute all shortest paths. This is important for understanding the dynamics of spread and communication phenomena in real-world networks. The general methodology we formally derive yields a principled graph reduction technique and provides a finer characterization of the triangular geometry of all edges -- those that contribute to shortest paths and those that do not but are involved in other network phenomena. We demonstrate that the distance backbone is very small in large networks across domains ranging from air traffic to the human brain connectome, revealing that network robustness to attacks and failures seems to stem from surprisingly vast amounts of redundancy. △ Less

Submitted 11 May, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: To appear in the Journal of Complex Networks

MSC Class: 05C12 (Primary) 05C22; 05C82; 91D30 (Secondary) ACM Class: G.2.2; F.2.2; I.2.4; I.2.1; J.3

Journal ref: Journal of Complex Networks, Volume 9, Issue 6, December 2021, cnab021

arXiv:2003.05613 [pdf, other]

A survey on test practitioners' awareness of test smells

Authors: Nildo Silva Junior, Larissa Rocha, Luana Almeida Martins, Ivan Machado

Abstract: Develo** test code may be a time-consuming task that usually requires much effort and cost, especially when it is done manually. Besides, during this process, developers and testers are likely to adopt bad design choices, which may lead to the introduction of the so-called test smells in test code. Test smells are bad solutions to either implement or design test code. As the test code with test… ▽ More Develo** test code may be a time-consuming task that usually requires much effort and cost, especially when it is done manually. Besides, during this process, developers and testers are likely to adopt bad design choices, which may lead to the introduction of the so-called test smells in test code. Test smells are bad solutions to either implement or design test code. As the test code with test smells increases in size, these tests might become more complex, and as a consequence, much harder to understand and evolve correctly. Therefore, test smells may have a negative impact on the quality and maintenance of test code and may also harm the whole software testing activities. In this context, this study aims to understand whether test professionals non-intentionally insert test smells. We carried out an expert survey to analyze the usage frequency of a set of test smells. Sixty professionals from different companies participated in the survey. We selected 14 widely studied smells from the literature, which are also implemented in existing test smell detection tools. The yielded results indicate that experienced professionals introduce test smells during their daily programming tasks, even when they are using standardized practices from their companies, and not only for their personal assumptions. Another relevant evidence was that developers' professional experience can not be considered as a root-cause for the insertion of test smells in test code. △ Less

Submitted 12 March, 2020; originally announced March 2020.

Comments: 14 pages, 2 figures and 3 tables

arXiv:2001.10285 [pdf, other]

doi 10.1146/annurev-biodatasci-030320-040844

Mining social media data for biomedical signals and health-related behavior

Authors: Rion Brattig Correia, Ian B. Wood, Johan Bollen, Luis M. Rocha

Abstract: Social media data has been increasingly used to study biomedical and health-related phenomena. From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated with a variety of health conditions and medical treatments. Here we review recent work in mining s… ▽ More Social media data has been increasingly used to study biomedical and health-related phenomena. From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance, sentiment analysis especially for mental health, and other areas. We also discuss a variety of innovative uses of social media data for health-related applications and important limitations in social media data access and use. △ Less

Submitted 28 January, 2020; originally announced January 2020.

Comments: To appear in the Annual Review of Biomedical Data Science

ACM Class: K.4; I.7

Journal ref: Annual Review of Biomedical Data Science, 3:1 (2020)

arXiv:1811.03341 [pdf, other]

Modelling Opinion Dynamics in the Age of Algorithmic Personalisation

Authors: Nicola Perra, Luis E C Rocha

Abstract: Modern technology has drastically changed the way we interact and consume information. For example, online social platforms allow for seamless communication exchanges at an unprecedented scale. However, we are still bounded by cognitive and temporal constraints. Our attention is limited and extremely valuable. Algorithmic personalisation has become a standard approach to tackle the information ove… ▽ More Modern technology has drastically changed the way we interact and consume information. For example, online social platforms allow for seamless communication exchanges at an unprecedented scale. However, we are still bounded by cognitive and temporal constraints. Our attention is limited and extremely valuable. Algorithmic personalisation has become a standard approach to tackle the information overload problem. As result, the exposure to our friends' opinions and our perception about important issues might be distorted. However, the effects of algorithmic gatekee** on our hyper-connected society are poorly understood. Here, we devise an opinion dynamics model where individuals are connected through a social network and adopt opinions as function of the view points they are exposed to. We apply various filtering algorithms that select the opinions shown to users i) at random ii) considering time ordering or iii) their current beliefs. Furthermore, we investigate the interplay between such mechanisms and crucial features of real networks. We found that algorithmic filtering might influence opinions' share and distributions, especially in case information is biased towards the current opinion of each user. These effects are reinforced in networks featuring topological and spatial correlations where echo chambers and polarisation emerge. Conversely, heterogeneity in connectivity patterns reduces such tendency. We consider also a scenario where one opinion, through nudging, is centrally pushed to all users. Interestingly, even minimal nudging is able to change the status quo moving it towards the desired view point. Our findings suggest that simple filtering algorithms might be powerful tools to regulate opinion dynamics taking place on social networks △ Less

Submitted 8 November, 2018; originally announced November 2018.

arXiv:1809.04624 [pdf, other]

doi 10.1109/ICIP.2018.8451356

Visual-Quality-Driven Learning for Underwater Vision Enhancement

Authors: Walysson Vital Barbosa, Henrique Grandinetti Barbosa Amaral, Thiago Lages Rocha, Erickson Rangel Nascimento

Abstract: The image processing community has witnessed remarkable advances in enhancing and restoring images. Nevertheless, restoring the visual quality of underwater images remains a great challenge. End-to-end frameworks might fail to enhance the visual quality of underwater images since in several scenarios it is not feasible to provide the ground truth of the scene radiance. In this work, we propose a C… ▽ More The image processing community has witnessed remarkable advances in enhancing and restoring images. Nevertheless, restoring the visual quality of underwater images remains a great challenge. End-to-end frameworks might fail to enhance the visual quality of underwater images since in several scenarios it is not feasible to provide the ground truth of the scene radiance. In this work, we propose a CNN-based approach that does not require ground truth data since it uses a set of image quality metrics to guide the restoration learning process. The experiments showed that our method improved the visual quality of underwater images preserving their edges and also performed well considering the UCIQE metric. △ Less

Submitted 12 September, 2018; originally announced September 2018.

Comments: Accepted for publication and presented in 2018 IEEE International Conference on Image Processing (ICIP)

arXiv:1803.04774 [pdf, other]

doi 10.3389/fphys.2018.01046

CANA: A python package for quantifying control and canalization in Boolean Networks

Authors: Rion Brattig Correia, Alexander J. Gates, Xuan Wang, Luis M. Rocha

Abstract: Logical models offer a simple but powerful means to understand the complex dynamics of biochemical regulation, without the need to estimate kinetic parameters. However, even simple automata components can lead to collective dynamics that are computationally intractable when aggregated into networks. In previous work we demonstrated that automata network models of biochemical regulation are highly… ▽ More Logical models offer a simple but powerful means to understand the complex dynamics of biochemical regulation, without the need to estimate kinetic parameters. However, even simple automata components can lead to collective dynamics that are computationally intractable when aggregated into networks. In previous work we demonstrated that automata network models of biochemical regulation are highly canalizing, whereby many variable states and their grou**s are redundant (Marques-Pita and Rocha, 2013). The precise charting and measurement of such canalization simplifies these models, making even very large networks amenable to analysis. Moreover, canalization plays an important role in the control, robustness, modularity and criticality of Boolean network dynamics, especially those used to model biochemical regulation (Gates and Rocha, 2016; Gates et al., 2016; Manicka, 2017). Here we describe a new publicly-available Python package that provides the necessary tools to extract, measure, and visualize canalizing redundancy present in Boolean network models. It extracts the pathways most effective in controlling dynamics in these models, including their effective graph and dynamics canalizing map, as well as other tools to uncover minimum sets of control variables. △ Less

Submitted 9 May, 2018; v1 submitted 9 March, 2018; originally announced March 2018.

Comments: Submitted to the Systems Biology section of Frontiers in Physiology

MSC Class: 94C (Primary) 93; 92C42 (Secondary) ACM Class: G.4; I.1; J.3

Journal ref: Frontiers in Physiology, 9:1046, 2018

arXiv:1803.03571 [pdf, other]

doi 10.1038/s41746-019-0141-x

City-wide Analysis of Electronic Health Records Reveals Gender and Age Biases in the Administration of Known Drug-Drug Interactions

Authors: Rion Brattig Correia, Luciana P. de Araújo, Mauro M. Mattos, Luis M. Rocha

Abstract: The occurrence of drug-drug-interactions (DDI) from multiple drug dispensations is a serious problem, both for individuals and health-care systems, since patients with complications due to DDI are likely to reenter the system at a costlier level. We present a large-scale longitudinal study (18 months) of the DDI phenomenon at the primary- and secondary-care level using electronic health records (E… ▽ More The occurrence of drug-drug-interactions (DDI) from multiple drug dispensations is a serious problem, both for individuals and health-care systems, since patients with complications due to DDI are likely to reenter the system at a costlier level. We present a large-scale longitudinal study (18 months) of the DDI phenomenon at the primary- and secondary-care level using electronic health records (EHR) from the city of Blumenau in Southern Brazil (pop. $\approx 340,000$). We found that 181 distinct drug pairs known to interact were dispensed concomitantly to 12\% of the patients in the city's public health-care system. Further, 4\% of the patients were dispensed drug pairs that are likely to result in major adverse drug reactions (ADR)---with costs estimated to be much larger than previously reported in smaller studies. The large-scale analysis reveals that women have a 60\% increased risk of DDI as compared to men; the increase becomes 90\% when considering only DDI known to lead to major ADR. Furthermore, DDI risk increases substantially with age; patients aged 70-79 years have a 34\% risk of DDI when they are dispensed two or more drugs concomitantly. Interestingly, a statistical null model demonstrates that age- and female-specific risks from increased polypharmacy fail by far to explain the observed DDI risks in those populations, suggesting unknown social or biological causes. We also provide a network visualization of drugs and demographic factors that characterize the DDI phenomenon and demonstrate that accurate DDI prediction can be included in healthcare and public-health management, to reduce DDI-related ADR and costs. △ Less

Submitted 2 January, 2020; v1 submitted 9 March, 2018; originally announced March 2018.

MSC Class: J.3; G.3 ACM Class: J.3; G.3

Journal ref: npj Digit. Med. 2, 74 (2019)

arXiv:1708.06877 [pdf, ps, other]

The Reachability of Computer Programs

Authors: Reginaldo I. Silva Filho, Ricardo L. Azevedo da Rocha, Camila Leite Silva, Ricardo H. Gracini Guiraldelli

Abstract: Would it be possible to explain the emergence of new computational ideas using the computation itself? Would it be feasible to describe the discovery process of new algorithmic solutions using only mathematics? This study is the first effort to analyze the nature of such inquiry from the viewpoint of effort to find a new algorithmic solution to a given problem. We define program reachability as a… ▽ More Would it be possible to explain the emergence of new computational ideas using the computation itself? Would it be feasible to describe the discovery process of new algorithmic solutions using only mathematics? This study is the first effort to analyze the nature of such inquiry from the viewpoint of effort to find a new algorithmic solution to a given problem. We define program reachability as a probability function whose argument is a form of the energetic cost (algorithmic entropy) of the problem. △ Less

Submitted 22 August, 2017; originally announced August 2017.

ACM Class: E.4

arXiv:1707.03959 [pdf]

Human Sexual Cycles are Driven by Culture and Match Collective Moods

Authors: Ian B. Wood, Pedro Leal Varela, Johan Bollen, Luis M. Rocha, Joana Gonçalves-Sá

Abstract: It is a long-standing question whether human sexual and reproductive cycles are affected predominantly by biology or culture. The literature is mixed with respect to whether biological or cultural factors best explain the reproduction cycle phenomenon, with biological explanations dominating the argument. The biological hypothesis proposes that human reproductive cycles are an adaptation to the se… ▽ More It is a long-standing question whether human sexual and reproductive cycles are affected predominantly by biology or culture. The literature is mixed with respect to whether biological or cultural factors best explain the reproduction cycle phenomenon, with biological explanations dominating the argument. The biological hypothesis proposes that human reproductive cycles are an adaptation to the seasonal cycles caused by hemisphere positioning, while the cultural hypothesis proposes that conception dates vary mostly due to cultural factors, such as vacation schedule or religious holidays. However, for many countries, common records used to investigate these hypotheses are incomplete or unavailable, biasing existing analysis towards primarily Christian countries in the Northern Hemisphere. Here we show that interest in sex peaks sharply online during major cultural and religious celebrations, regardless of hemisphere location. This online interest, when shifted by nine months, corresponds to documented human birth cycles, even after adjusting for numerous factors such as language, season, and amount of free time due to holidays. We further show that mood, measured independently on Twitter, contains distinct collective emotions associated with those cultural celebrations, and these collective moods correlate with sex search volume outside of these holidays as well. Our results provide converging evidence that the cyclic sexual and reproductive behavior of human populations is mostly driven by culture and that this interest in sex is associated with specific emotions, characteristic of, but not limited to, major cultural and religious celebrations. △ Less

Submitted 27 October, 2017; v1 submitted 12 July, 2017; originally announced July 2017.

Comments: Main Paper: 21 pages, 4 figures Supplementary Material: 66 pages, 15 figures, 13 tables

arXiv:1707.02108 [pdf, other]

doi 10.1103/PhysRevE.96.052302

Sampling of Temporal Networks: Methods and Biases

Authors: Luis E C Rocha, Naoki Masuda, Petter Holme

Abstract: Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to… ▽ More Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect the network structure and thus caution is necessary to generalize results based on samples. In this paper, we study four sampling strategies applied to a variety of real-life temporal networks. We quantify the biases generated by each sampling strategy on a number of relevant statistics such as link activity, temporal paths and epidemic spread. We find that some biases are common in a variety of networks and statistics, but one strategy, uniform sampling of nodes, shows improved performance in most scenarios. Our results help researchers to better design network data collection protocols and to understand the limitations of sampled temporal network data. △ Less

Submitted 7 July, 2017; originally announced July 2017.

Comments: 10 pages, 8 figures, comments welcome

Journal ref: Phys. Rev. E 96, 052302 (2017)

arXiv:1603.04222 [pdf, other]

Multiple seed structure and disconnected networks in respondent-driven sampling

Authors: Jens Malmros, Luis E. C. Rocha

Abstract: Respondent-driven sampling (RDS) is a link-tracing sampling method that is especially suitable for sampling hidden populations. RDS combines an efficient snowball-type sampling scheme with inferential procedures that yield unbiased population estimates under some assumptions about the sampling procedure and population structure. Several seed individuals are typically used to initiate RDS recruitme… ▽ More Respondent-driven sampling (RDS) is a link-tracing sampling method that is especially suitable for sampling hidden populations. RDS combines an efficient snowball-type sampling scheme with inferential procedures that yield unbiased population estimates under some assumptions about the sampling procedure and population structure. Several seed individuals are typically used to initiate RDS recruitment. However, standard RDS estimation theory assume that all sampled individuals originate from only one seed. We present an estimator, based on a random walk with teleportation, which accounts for the multiple seed structure of RDS. The new estimator can also be used on populations with disconnected social networks. We numerically evaluate our estimator by simulations on artificial and real networks. Our estimator outperforms previous estimators, especially when the proportion of seeds in the sample is large. We recommend our new estimator to be used in RDS studies, in particular when the number of seeds is large or the social network of the population is disconnected. △ Less

Submitted 14 March, 2016; originally announced March 2016.

arXiv:1510.01006 [pdf, other]

Monitoring Potential Drug Interactions and Reactions via Network Analysis of Instagram User Timelines

Authors: Rion Brattig Correia, Lang Li, Luis M. Rocha

Abstract: Much recent research aims to identify evidence for Drug-Drug Interactions (DDI) and Adverse Drug reactions (ADR) from the biomedical scientific literature. In addition to this "Bibliome", the universe of social media provides a very promising source of large-scale data that can help identify DDI and ADR in ways that have not been hitherto possible. Given the large number of users, analysis of soci… ▽ More Much recent research aims to identify evidence for Drug-Drug Interactions (DDI) and Adverse Drug reactions (ADR) from the biomedical scientific literature. In addition to this "Bibliome", the universe of social media provides a very promising source of large-scale data that can help identify DDI and ADR in ways that have not been hitherto possible. Given the large number of users, analysis of social media data may be useful to identify under-reported, population-level pathology associated with DDI, thus further contributing to improvements in population health. Moreover, tap** into this data allows us to infer drug interactions with natural products--including cannabis--which constitute an array of DDI very poorly explored by biomedical research thus far. Our goal is to determine the potential of Instagram for public health monitoring and surveillance for DDI, ADR, and behavioral pathology at large. Using drug, symptom, and natural product dictionaries for identification of the various types of DDI and ADR evidence, we have collected ~7000 timelines. We report on 1) the development of a monitoring tool to easily observe user-level timelines associated with drug and symptom terms of interest, and 2) population-level behavior via the analysis of co-occurrence networks computed from user timelines at three different scales: monthly, weekly, and daily occurrences. Analysis of these networks further reveals 3) drug and symptom direct and indirect associations with greater support in user timelines, as well as 4) clusters of symptoms and drugs revealed by the collective behavior of the observed population. This demonstrates that Instagram contains much drug- and pathology specific data for public health monitoring of DDI and ADR, and that complex network analysis provides an important toolbox to extract health-related associations and their support from large-scale social media data. △ Less

Submitted 14 January, 2016; v1 submitted 4 October, 2015; originally announced October 2015.

Comments: Pacific Symposium on Biocomputing. 21:492-503

arXiv:1510.00217 [pdf, ps, other]

doi 10.1103/PhysRevE.93.040301

Temporal and structural heterogeneities emerging in adaptive temporal networks

Authors: Takaaki Aoki, Luis E. C. Rocha, Thilo Gross

Abstract: We introduce a model of adaptive temporal networks whose evolution is regulated by an interplay between node activity and dynamic exchange of information through links. We study the model by using a master equation approach. Starting from a homogeneous initial configuration, we show that temporal and structural heterogeneities, characteristic of real-world networks, spontaneously emerge. This theo… ▽ More We introduce a model of adaptive temporal networks whose evolution is regulated by an interplay between node activity and dynamic exchange of information through links. We study the model by using a master equation approach. Starting from a homogeneous initial configuration, we show that temporal and structural heterogeneities, characteristic of real-world networks, spontaneously emerge. This theoretically tractable model thus contributes to the understanding of the dynamics of human activity and interaction networks. △ Less

Submitted 4 April, 2016; v1 submitted 1 October, 2015; originally announced October 2015.

Journal ref: Physical Review E 93, 040301(R) (2016)

arXiv:1509.04386 [pdf, other]

doi 10.1103/PhysRevE.92.060801

Modularity and the spread of perturbations in complex dynamical systems

Authors: Artemy Kolchinsky, Alexander J. Gates, Luis M. Rocha

Abstract: We propose a method to decompose dynamical systems based on the idea that modules constrain the spread of perturbations. We find partitions of system variables that maximize 'perturbation modularity', defined as the autocovariance of coarse-grained perturbed trajectories. The measure effectively separates the fast intramodular from the slow intermodular dynamics of perturbation spreading (in this… ▽ More We propose a method to decompose dynamical systems based on the idea that modules constrain the spread of perturbations. We find partitions of system variables that maximize 'perturbation modularity', defined as the autocovariance of coarse-grained perturbed trajectories. The measure effectively separates the fast intramodular from the slow intermodular dynamics of perturbation spreading (in this respect, it is a generalization of the 'Markov stability' method of network community detection). Our approach captures variation of modular organization across different system states, time scales, and in response to different kinds of perturbations: aspects of modularity which are all relevant to real-world dynamical systems. It offers a principled alternative to detecting communities in networks of statistical dependencies between system variables (e.g., 'relevance networks' or 'functional networks'). Using coupled logistic maps, we demonstrate that the method uncovers hierarchical modular organization planted in a system's coupling matrix. Additionally, in homogeneously-coupled map lattices, it identifies the presence of self-organized modularity that depends on the initial state, dynamical parameters, and type of perturbations. Our approach offers a powerful tool for exploring the modular organization of complex dynamical systems. △ Less

Submitted 23 December, 2015; v1 submitted 14 September, 2015; originally announced September 2015.

Journal ref: Physical Review E, 2015

arXiv:1503.05826 [pdf, other]

doi 10.1111/rssa.12180

Respondent-driven sampling bias induced by clustering and community structure in social networks

Authors: Luis Enrique Correa Rocha, Anna Ekeus Thorson, Renaud Lambiotte, Fredrik Liljeros

Abstract: Sampling hidden populations is particularly challenging using standard sampling methods mainly because of the lack of a sampling frame. Respondent-driven sampling (RDS) is an alternative methodology that exploits the social contacts between peers to reach and weight individuals in these hard-to-reach populations. It is a snowball sampling procedure where the weight of the respondents is adjusted f… ▽ More Sampling hidden populations is particularly challenging using standard sampling methods mainly because of the lack of a sampling frame. Respondent-driven sampling (RDS) is an alternative methodology that exploits the social contacts between peers to reach and weight individuals in these hard-to-reach populations. It is a snowball sampling procedure where the weight of the respondents is adjusted for the likelihood of being sampled due to differences in the number of contacts. In RDS, the structure of the social contacts thus defines the sampling process and affects its coverage, for instance by constraining the sampling within a sub-region of the network. In this paper we study the bias induced by network structures such as social triangles, community structure, and heterogeneities in the number of contacts, in the recruitment trees and in the RDS estimator. We simulate different scenarios of network structures and response-rates to study the potential biases one may expect in real settings. We find that the prevalence of the estimated variable is associated with the size of the network community to which the individual belongs. Furthermore, we observe that low-degree nodes may be under-sampled in certain situations if the sample and the network are of similar size. Finally, we also show that low response-rates lead to reasonably accurate average estimates of the prevalence but generate relatively large biases. △ Less

Submitted 19 March, 2015; originally announced March 2015.

Comments: 14 pages, 11 figures

Journal ref: J. R. Stat. Soc. A, 180: 99 (2017)

arXiv:1501.03471 [pdf, other]

doi 10.1371/journal.pone.0128193

Computational fact checking from knowledge networks

Authors: Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Filippo Menczer, Alessandro Flammini

Abstract: Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under proper… ▽ More Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation. △ Less

Submitted 14 January, 2015; originally announced January 2015.

arXiv:1412.0744 [pdf, other]

doi 10.1371/journal.pone.0122199

Extraction of Pharmacokinetic Evidence of Drug-drug Interactions from the Literature

Authors: Artemy Kolchinsky, Anália Lourenço, Heng-Yi Wu, Lang Li, Luis M. Rocha

Abstract: Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, litera… ▽ More Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmaco-epidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1~=0.93, MCC~=0.74, iAUC~=0.99) and sentences (F1~=0.76, MCC~=0.65, iAUC~=0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. ... △ Less

Submitted 18 May, 2015; v1 submitted 1 December, 2014; originally announced December 2014.

Comments: PLOS One (2015)

ACM Class: H.2.8; H.3.1; J.3

arXiv:1406.6873 [pdf, other]

Designing a minimalist socially aware robotic agent for the home

Authors: Matthew R. Francisco, Ian Wood, Selma Šabanović, Luis M. Rocha

Abstract: We present a minimalist social robot that relies on long timeseries of low resolution data such as mechanical vibration, temperature, lighting, sounds and collisions. Our goal is to develop an experimental system for growing socially situated robotic agents whose behavioral repertoire is subsumed by the social order of the space. To get there we are designing robots that use their simple sensors a… ▽ More We present a minimalist social robot that relies on long timeseries of low resolution data such as mechanical vibration, temperature, lighting, sounds and collisions. Our goal is to develop an experimental system for growing socially situated robotic agents whose behavioral repertoire is subsumed by the social order of the space. To get there we are designing robots that use their simple sensors and motion feedback routines to recognize different classes of human activity and then associate to each class a range of appropriate behaviors. We use the Katie Family of robots, built on the iRobot Create platform, an Arduino Uno, and a Raspberry Pi. We describe its sensor abilities and exploratory tests that allow us to develop hypotheses about what objects (sensor data) correspond to something known and observable by a human subject. We use machine learning methods to classify three social scenarios from over a hundred experiments, demonstrating that it is possible to detect social situations with high accuracy, using the low-resolution sensors from our minimalist robot. △ Less

Submitted 26 June, 2014; originally announced June 2014.

Comments: 8 pages, 10 figures, To be published in the ALIFE 14 conference proceedings

arXiv:1401.5648 [pdf, other]

doi 10.1088/1367-2630/16/6/063023

Random walk centrality for temporal networks

Authors: Luis Enrique Correa Rocha, Naoki Masuda

Abstract: Nodes can be ranked according to their relative importance within the network. Ranking algorithms based on random walks are particularly useful because they connect topological and diffusive properties of the network. Previous methods based on random walks, as for example the PageRank, have focused on static structures. However, several realistic networks are indeed dynamic, meaning that their str… ▽ More Nodes can be ranked according to their relative importance within the network. Ranking algorithms based on random walks are particularly useful because they connect topological and diffusive properties of the network. Previous methods based on random walks, as for example the PageRank, have focused on static structures. However, several realistic networks are indeed dynamic, meaning that their structure changes in time. In this paper, we propose a centrality measure for temporal networks based on random walks which we call TempoRank. While in a static network, the stationary density of the random walk is proportional to the degree or the strength of a node, we find that in temporal networks, the stationary density is proportional to the in-strength of the so-called effective network. The stationary density also depends on the sojourn probability q which regulates the tendency of the walker to stay in the node. We apply our method to human interaction networks and show that although it is important for a node to be connected to another node with many random walkers at the right moment (one of the principles of the PageRank), this effect is negligible in practice when the time order of link activation is included. △ Less

Submitted 22 January, 2014; originally announced January 2014.

Comments: main text + supplementary material

Journal ref: New Journal of Physics 16 063023 (2014)

arXiv:1312.2459 [pdf, other]

Distance Closures on Complex Networks

Authors: Tiago Simas, Luis M Rocha

Abstract: To expand the toolbox available to network science, we study the isomorphism between distance and Fuzzy (proximity or strength) graphs. Distinct transitive closures in Fuzzy graphs lead to closures of their isomorphic distance graphs with widely different structural properties. For instance, the All Pairs Shortest Paths (APSP) problem, based on the Dijkstra algorithm, is equivalent to a metric clo… ▽ More To expand the toolbox available to network science, we study the isomorphism between distance and Fuzzy (proximity or strength) graphs. Distinct transitive closures in Fuzzy graphs lead to closures of their isomorphic distance graphs with widely different structural properties. For instance, the All Pairs Shortest Paths (APSP) problem, based on the Dijkstra algorithm, is equivalent to a metric closure, which is only one of the possible ways to calculate shortest paths. Understanding and map** this isomorphism is necessary to analyse models of complex networks based on weighted graphs. Any conclusions derived from such models should take into account the distortions imposed on graph topology when converting proximity/strength into distance graphs, to subsequently compute path length and shortest path measures. We characterise the isomorphism using the max-min and Dombi disjunction/conjunction pairs. This allows us to: (1) study alternative distance closures, such as those based on diffusion, metric, and ultra-metric distances; (2) identify the operators closest to the metric closure of distance graphs (the APSP), but which are logically consistent; and (3) propose a simple method to compute alternative distance closures using existing algorithms for the APSP. In particular, we show that a specific diffusion distance is promising for community detection in complex networks, and is based on desirable axioms for logical inference or approximate reasoning on networks; it also provides a simple algebraic means to compute diffusion processes on networks. Based on these results, we argue that choosing different distance closures can lead to different conclusions about indirect associations on network data, as well as the structure of complex networks, and are thus important to consider. △ Less

Submitted 16 October, 2014; v1 submitted 9 December, 2013; originally announced December 2013.

arXiv:1303.3245 [pdf, other]

doi 10.1103/PhysRevE.87.042814

Flow Motifs Reveal Limitations of the Static Framework to Represent Human interactions

Authors: Luis Enrique Correa Rocha, Vincent D Blondel

Abstract: Networks are commonly used to define underlying interaction structures where infections, information, or other quantities may spread. Although the standard approach has been to aggregate all links into a static structure, some studies suggest that the time order in which the links are established may alter the dynamics of spreading. In this paper, we study the impact of the time ordering in the li… ▽ More Networks are commonly used to define underlying interaction structures where infections, information, or other quantities may spread. Although the standard approach has been to aggregate all links into a static structure, some studies suggest that the time order in which the links are established may alter the dynamics of spreading. In this paper, we study the impact of the time ordering in the limits of flow on various empirical temporal networks. By using a random walk dynamics, we estimate the flow on links and convert the original undirected network (temporal and static) into a directed flow network. We then introduce the concept of flow motifs and quantify the divergence in the representativity of motifs when using the temporal and static frameworks. We find that the regularity of contacts and persistence of vertices (common in email communication and face-to-face interactions) result on little differences in the limits of flow for both frameworks. On the other hand, in the case of communication within a dating site (and of a sexual network), the flow between vertices changes significantly in the temporal framework such that the static approximation poorly represents the structure of contacts. We have also observed that cliques with 3 and 4 vertices con- taining only low-flow links are more represented than the same cliques with all high-flow links. The representativity of these low-flow cliques is higher in the temporal framework. Our results suggest that the flow between vertices connected in cliques depend on the topological context in which they are placed and in the time sequence in which the links are established. The structure of the clique alone does not completely characterize the potential of flow between the vertices. △ Less

Submitted 13 March, 2013; originally announced March 2013.

arXiv:1301.5831 [pdf, other]

doi 10.1371/journal.pone.0055946

Canalization and control in automata networks: body segmentation in Drosophila melanogaster

Authors: Manuel Marques-Pita, Luis M. Rocha

Abstract: We present schema redescription as a methodology to characterize canalization in automata networks used to model biochemical regulation and signalling. In our formulation, canalization becomes synonymous with redundancy present in the logic of automata. This results in straightforward measures to quantify canalization in an automaton (micro-level), which is in turn integrated into a highly scalabl… ▽ More We present schema redescription as a methodology to characterize canalization in automata networks used to model biochemical regulation and signalling. In our formulation, canalization becomes synonymous with redundancy present in the logic of automata. This results in straightforward measures to quantify canalization in an automaton (micro-level), which is in turn integrated into a highly scalable framework to characterize the collective dynamics of large-scale automata networks (macro-level). This way, our approach provides a method to link micro- to macro-level dynamics -- a crux of complexity. Several new results ensue from this methodology: uncovering of dynamical modularity (modules in the dynamics rather than in the structure of networks), identification of minimal conditions and critical nodes to control the convergence to attractors, simulation of dynamical behaviour from incomplete information about initial conditions, and measures of macro-level canalization and robustness to perturbations. We exemplify our methodology with a well-known model of the intra- and inter cellular genetic regulation of body segmentation in Drosophila melanogaster. We use this model to show that our analysis does not contradict any previous findings. But we also obtain new knowledge about its behaviour: a better understanding of the size of its wild-type attractor basin (larger than previously thought), the identification of novel minimal conditions and critical nodes that control wild-type behaviour, and the resilience of these to stochastic interventions. Our methodology is applicable to any complex network that can be modelled using automata, but we focus on biochemical regulation and signalling, towards a better understanding of the (decentralized) control that orchestrates cellular activity -- with the ultimate goal of explaining how do cells and tissues 'compute'. △ Less

Submitted 25 January, 2013; v1 submitted 24 January, 2013; originally announced January 2013.

Comments: 77 pages, 21 figures and 4 tables. Supplementary information not included. PLoS ONE (in press)

arXiv:1210.0734 [pdf, other]

Evaluation of linear classifiers on articles containing pharmacokinetic evidence of drug-drug interactions

Authors: Artemy Kolchinsky, Anália Lourenço, Lang Li, Luis M. Rocha

Abstract: Background. Drug-drug interaction (DDI) is a major cause of morbidity and mortality. [...] Biomedical literature mining can aid DDI research by extracting relevant DDI signals from either the published literature or large clinical databases. However, though drug interaction is an ideal area for translational research, the inclusion of literature mining methodologies in DDI workflows is still very… ▽ More Background. Drug-drug interaction (DDI) is a major cause of morbidity and mortality. [...] Biomedical literature mining can aid DDI research by extracting relevant DDI signals from either the published literature or large clinical databases. However, though drug interaction is an ideal area for translational research, the inclusion of literature mining methodologies in DDI workflows is still very preliminary. One area that can benefit from literature mining is the automatic identification of a large number of potential DDIs, whose pharmacological mechanisms and clinical significance can then be studied via in vitro pharmacology and in populo pharmaco-epidemiology. Experiments. We implemented a set of classifiers for identifying published articles relevant to experimental pharmacokinetic DDI evidence. These documents are important for identifying causal mechanisms behind putative drug-drug interactions, an important step in the extraction of large numbers of potential DDIs. We evaluate performance of several linear classifiers on PubMed abstracts, under different feature transformation and dimensionality reduction methods. In addition, we investigate the performance benefits of including various publicly-available named entity recognition features, as well as a set of internally-developed pharmacokinetic dictionaries. Results. We found that several classifiers performed well in distinguishing relevant and irrelevant abstracts. We found that the combination of unigram and bigram textual features gave better performance than unigram features alone, and also that normalization transforms that adjusted for feature frequency and document length improved classification. For some classifiers, such as linear discriminant analysis (LDA), proper dimensionality reduction had a large impact on performance. Finally, the inclusion of NER features and dictionaries was found not to help classification. △ Less

Submitted 2 October, 2012; originally announced October 2012.

Comments: Pacific Symposium on Biocomputing, 2013

ACM Class: H.2.8; H.3.1; J.3

Journal ref: Pac Symp Biocomput. 2013:409-20

arXiv:1209.1719 [pdf, other]

Semi-metric networks for recommender systems

Authors: Tiago Simas, Luis M. Rocha

Abstract: Weighted graphs obtained from co-occurrence in user-item relations lead to non-metric topologies. We use this semi-metric behavior to issue recommendations, and discuss its relationship to transitive closure on fuzzy graphs. Finally, we test the performance of this method against other item- and user-based recommender systems on the Movielens benchmark. We show that including highly semi-metric ed… ▽ More Weighted graphs obtained from co-occurrence in user-item relations lead to non-metric topologies. We use this semi-metric behavior to issue recommendations, and discuss its relationship to transitive closure on fuzzy graphs. Finally, we test the performance of this method against other item- and user-based recommender systems on the Movielens benchmark. We show that including highly semi-metric edges in our recommendation algorithms leads to better recommendations. △ Less

Submitted 8 September, 2012; originally announced September 2012.

Journal ref: 2012 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

arXiv:1206.6036 [pdf]

doi 10.1371/journal.pcbi.1002974

Temporal Heterogeneities Increase the Prevalence of Epidemics on Evolving Networks

Authors: Luis Enrique Correa Rocha, Vincent D. Blondel

Abstract: Empirical studies suggest that contact patterns follow heterogeneous inter-event times, meaning that intervals of high activity are followed by periods of inactivity. Combined with birth and death of individuals, these temporal constraints affect the spread of infections in a non-trivial way and are dependent on the particular contact dynamics. We propose a stochastic model to generate temporal ne… ▽ More Empirical studies suggest that contact patterns follow heterogeneous inter-event times, meaning that intervals of high activity are followed by periods of inactivity. Combined with birth and death of individuals, these temporal constraints affect the spread of infections in a non-trivial way and are dependent on the particular contact dynamics. We propose a stochastic model to generate temporal networks where vertices make instantaneous contacts following heterogeneous inter-event times, and leave and enter the system at fixed rates. We study how these temporal properties affect the prevalence of an infection and estimate R0, the number of secondary infections, by modeling simulated infections (SIR, SI and SIS) co-evolving with the network structure. We find that heterogeneous contact patterns cause earlier and larger epidemics on the SIR model in comparison to homogeneous scenarios. In case of SI and SIS, the epidemics is faster in the early stages (up to 90% of prevalence) followed by a slowdown in the asymptotic limit in case of heterogeneous patterns. In the presence of birth and death, heterogeneous patterns always cause higher prevalence in comparison to homogeneous scenarios with same average inter-event times. Our results suggest that R0 may be underestimated if temporal heterogeneities are not taken into account in the modeling of epidemics. △ Less

Submitted 26 June, 2012; originally announced June 2012.

Comments: 5 figures + supplementary information

Journal ref: PLoS Computational Biology 9(3): e1002974 (2013)

arXiv:1106.3703 [pdf, other]

Prediction and Modularity in Dynamical Systems

Authors: Artemy Kolchinsky, Luis M. Rocha

Abstract: Identifying and understanding modular organizations is centrally important in the study of complex systems. Several approaches to this problem have been advanced, many framed in information-theoretic terms. Our treatment starts from the complementary point of view of statistical modeling and prediction of dynamical systems. It is known that for finite amounts of training data, simpler models can h… ▽ More Identifying and understanding modular organizations is centrally important in the study of complex systems. Several approaches to this problem have been advanced, many framed in information-theoretic terms. Our treatment starts from the complementary point of view of statistical modeling and prediction of dynamical systems. It is known that for finite amounts of training data, simpler models can have greater predictive power than more complex ones. We use the trade-off between model simplicity and predictive accuracy to generate optimal multiscale decompositions of dynamical networks into weakly-coupled, simple modules. State-dependent and causal versions of our method are also proposed. △ Less

Submitted 16 January, 2015; v1 submitted 19 June, 2011; originally announced June 2011.

Comments: v1 published in ECAL 2011 (European Conference on Artificial Life). v2 fixes error in causal risk (number of parameters should be based on training distribution)

MSC Class: 62H20; 62M20; 62B10; 60G25; 68T05; 90B15; 05C82 ACM Class: G.3

arXiv:1103.4090 [pdf]

A Linear Classifier Based on Entity Recognition Tools and a Statistical Approach to Method Extraction in the Protein-Protein Interaction Literature

Authors: Anália Lourenço, Michael Conover, Andrew Wong, Azadeh Nematzadeh, Fengxia Pan, Hagit Shatkay, Luis M. Rocha

Abstract: We participated, in the Article Classification and the Interaction Method subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of the BioCreative III Challenge. For the ACT, we pursued an extensive testing of available Named Entity Recognition and dictionary tools, and used the most promising ones to extend our Variable Trigonometric Threshold linear classifier. For the IMT… ▽ More We participated, in the Article Classification and the Interaction Method subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of the BioCreative III Challenge. For the ACT, we pursued an extensive testing of available Named Entity Recognition and dictionary tools, and used the most promising ones to extend our Variable Trigonometric Threshold linear classifier. For the IMT, we experimented with a primarily statistical approach, as opposed to employing a deeper natural language processing strategy. Finally, we also studied the benefits of integrating the method extraction approach that we have used for the IMT into the ACT pipeline. For the ACT, our linear article classifier leads to a ranking and classification performance significantly higher than all the reported submissions. For the IMT, our results are comparable to those of other systems, which took very different approaches. For the ACT, we show that the use of named entity recognition tools leads to a substantial improvement in the ranking and classification of articles relevant to protein-protein interaction. Thus, we show that our substantially expanded linear classifier is a very competitive classifier in this domain. Moreover, this classifier produces interpretable surfaces that can be understood as "rules" for human understanding of the classification. In terms of the IMT task, in contrast to other participants, our approach focused on identifying sentences that are likely to bear evidence for the application of a PPI detection method, rather than on classifying a document as relevant to a method. As BioCreative III did not perform an evaluation of the evidence provided by the system, we have conducted a separate assessment; the evaluators agree that our tool is indeed effective in detecting relevant evidence for PPI detection methods. △ Less

Submitted 22 April, 2011; v1 submitted 21 March, 2011; originally announced March 2011.

Comments: BMC Bioinformatics. In Press

arXiv:1102.1691 [pdf, other]

Schema Redescription in Cellular Automata: Revisiting Emergence in Complex Systems

Authors: Manuel Marques-Pita, Luis M. Rocha

Abstract: We present a method to eliminate redundancy in the transition tables of Boolean automata: schema redescription with two symbols. One symbol is used to capture redundancy of individual input variables, and another to capture permutability in sets of input variables: fully characterizing the canalization present in Boolean functions. Two-symbol schemata explain aspects of the behaviour of automata n… ▽ More We present a method to eliminate redundancy in the transition tables of Boolean automata: schema redescription with two symbols. One symbol is used to capture redundancy of individual input variables, and another to capture permutability in sets of input variables: fully characterizing the canalization present in Boolean functions. Two-symbol schemata explain aspects of the behaviour of automata networks that the characterization of their emergent patterns does not capture. We use our method to compare two well-known cellular automata for the density classification task: the human engineered CA GKL, and another obtained via genetic programming (GP). We show that despite having very different collective behaviour, these rules are very similar. Indeed, GKL is a special case of GP. Therefore, we demonstrate that it is more feasible to compare cellular automata via schema redescriptions of their rules, than by looking at their emergent behaviour, leading us to question the tendency in complexity research to pay much more attention to emergent patterns than to local interactions. △ Less

Submitted 9 February, 2011; v1 submitted 8 February, 2011; originally announced February 2011.

Comments: paper submitted to the 2011 IEEE Symposium on Artificial Life

Journal ref: The 2011 IEEE Symposium on Artificial Life, at the IEEE Symposium Series on Computational Intelligence 2011. April 11 - 15, 201, Paris, France, pp: 233-240

arXiv:1102.1027 [pdf, other]

doi 10.1007/s12065-011-0052-5

Collective Classification of Textual Documents by Guided Self-Organization in T-Cell Cross-Regulation Dynamics

Authors: Alaa Abi-Haidar, Luis M. Rocha

Abstract: We present and study an agent-based model of T-Cell cross-regulation in the adaptive immune system, which we apply to binary classification. Our method expands an existing analytical model of T-cell cross-regulation (Carneiro et al. in Immunol Rev 216(1):48-68, 2007) that was used to study the self-organizing dynamics of a single population of T-Cells in interaction with an idealized antigen prese… ▽ More We present and study an agent-based model of T-Cell cross-regulation in the adaptive immune system, which we apply to binary classification. Our method expands an existing analytical model of T-cell cross-regulation (Carneiro et al. in Immunol Rev 216(1):48-68, 2007) that was used to study the self-organizing dynamics of a single population of T-Cells in interaction with an idealized antigen presenting cell capable of presenting a single antigen. With agent-based modeling we are able to study the self-organizing dynamics of multiple populations of distinct T-cells which interact via antigen presenting cells that present hundreds of distinct antigens. Moreover, we show that such self-organizing dynamics can be guided to produce an effective binary classification of antigens, which is competitive with existing machine learning methods when applied to biomedical text classification. More specifically, here we test our model on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge (Krallinger in The biocreative ii. 5 challenge overview, p 19, 2009). We study the robustness of our model's parameter configurations, and show that it leads to encouraging results comparable to state-of-the-art classifiers. Our results help us understand both T-cell cross-regulation as a general principle of guided self-organization, as well as its applicability to document classification. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general. △ Less

Submitted 4 February, 2011; originally announced February 2011.

Journal ref: Evolutionary Intelligence. 2011. Volume 4, Number 2, 69-80

arXiv:0909.4385 [pdf, ps, other]

doi 10.1088/1367-2630/11/12/123015

The meta book and size-dependent properties of written language

Authors: Sebastian Bernhardsson, Luis Enrique Correa da Rocha, Petter Minnhagen

Abstract: Evidence is given for a systematic text-length dependence of the power-law index gamma of a single book. The estimated gamma values are consistent with a monotonic decrease from 2 to 1 with increasing length of a text. A direct connection to an extended Heap's law is explored. The infinite book limit is, as a consequence, proposed to be given by gamma = 1 instead of the value gamma=2 expected if… ▽ More Evidence is given for a systematic text-length dependence of the power-law index gamma of a single book. The estimated gamma values are consistent with a monotonic decrease from 2 to 1 with increasing length of a text. A direct connection to an extended Heap's law is explored. The infinite book limit is, as a consequence, proposed to be given by gamma = 1 instead of the value gamma=2 expected if the Zipf's law was ubiquitously applicable. In addition we explore the idea that the systematic text-length dependence can be described by a meta book concept, which is an abstract representation reflecting the word-frequency structure of a text. According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length pulled out from an imaginary complete infinite corpus written by the same author. △ Less

Submitted 24 September, 2009; originally announced September 2009.

Comments: 7 pages, 6 figures, 1 table

Journal ref: New J. Phys. 11 (2009) 123015

Showing 1–50 of 57 results for author: Rocha, L